Fix #1360: Neo4j vector search returns empty results in shared-database multi-tenant mode#1974
Open
Memtensor-AI wants to merge 1 commit into
Open
Fix #1360: Neo4j vector search returns empty results in shared-database multi-tenant mode#1974Memtensor-AI wants to merge 1 commit into
Memtensor-AI wants to merge 1 commit into
Conversation
_prepare_node_metadata already JSON-encodes metadata['sources']; the
write paths (add_node / add_nodes_batch in neo4j.py and add_node in
neo4j_community.py) repeated the same json.dumps step, producing escaped
JSON strings that _parse_node could no longer decode (its [0]=='{' guard
saw a leading '"' and skipped json.loads), so callers received encoded
strings instead of dicts.
Removes the redundant serialization, leaving the single canonical encode
inside _prepare_node_metadata. Adds three regression tests covering
add_node, add_nodes_batch and the add_node->_parse_node round-trip.
Refs #1360
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Fixes issue #1360 by removing a leftover regression in the Neo4j sources serialization path. PR #1359 had already addressed the four root causes listed in the issue (post-filter rewrite, sources KeyError guard, _parse_node bracket check, missing-embedding warn-instead-of-raise), but a close re-read showed sources were still being double-encoded:
_prepare_node_metadataJSON-dumps each element once, and thenadd_node/add_nodes_batchinneo4j.pyplusadd_nodeinneo4j_community.pyrepeated the same json.dumps step. The resulting values came back from Neo4j with a leading"instead of{, so_parse_node's[0] == "{"guard skippedjson.loadsand callers received escaped JSON strings instead of dicts — the exact deserialization-skip symptom listed as bug #3 of the original issue, just one layer upstream.The change removes the redundant serialization at all three sites, leaving the single canonical encode inside
_prepare_node_metadata(matching the already-correct pattern inimport_graphandneo4j_community.add_nodes_batch). Existing rows written with the double-encoded value remain readable as escaped strings (no new exception), and new writes round-trip correctly.Tests: added
TestSourcesDoubleSerializationRegressiontotests/graph_dbs/test_neo4j_vector_search.pywith three cases —add_nodesingle-encode invariant,add_nodes_batchsingle-encode invariant, and fulladd_node → _parse_noderound-trip. Results:python3 -m pytest tests/graph_dbs/→ 32 passed / 3 skipped (skipped cases are live-Neo4j 5.18+ integration tests).ruff checkandruff format --checkare clean on all three touched files.Related Issue (Required): Fixes #1360
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Automated tests are pending.
Checklist
@MatthewZhuang, @CarltonXiang, @syzsunshine219, @World-controller please review this PR.
Reviewer Checklist