Description
When using Neo4j in shared-database multi-tenant mode (use_multi_db=False), search_by_embedding returns empty results for specific users even though their data exists in the database.
Root Cause
-
Post-filter problem: db.index.vector.queryNodes(k, embedding) returns the global top-k most similar nodes, then WHERE clauses (user_name, memory_type, status) filter them further. In a shared database with many users, the target user's nodes are often excluded from the initial top-k, yielding 0 results after filtering.
-
sources KeyError: metadata["sources"] is accessed without .get() in multiple locations in neo4j.py and neo4j_community.py. When metadata is serialized with model_dump(exclude_none=True) and sources is None, the key is absent, causing a KeyError.
-
_parse_node deserialization bug: The sources JSON deserialization check node["sources"][idx][0] == "}" should be node["sources"][idx][-1] == "}" (last char, not first), making json.loads never execute.
-
neo4j_community.py add_node crashes on missing embedding: raise ValueError when embedding is None, but some memory types (e.g., UserMemory) may not have embeddings.
Expected Behavior
- Vector search should return correct results for any user in a shared database, regardless of how many other users' data exists.
- Adding nodes without
sources or embedding should not crash.
Environment
- Neo4j >= 5.18 (required for
vector.similarity.cosine() pre-filtering)
- Shared-database multi-tenant mode (
use_multi_db=False)
Description
When using Neo4j in shared-database multi-tenant mode (
use_multi_db=False),search_by_embeddingreturns empty results for specific users even though their data exists in the database.Root Cause
Post-filter problem:
db.index.vector.queryNodes(k, embedding)returns the global top-k most similar nodes, thenWHEREclauses (user_name, memory_type, status) filter them further. In a shared database with many users, the target user's nodes are often excluded from the initial top-k, yielding 0 results after filtering.sourcesKeyError:metadata["sources"]is accessed without.get()in multiple locations inneo4j.pyandneo4j_community.py. When metadata is serialized withmodel_dump(exclude_none=True)andsourcesisNone, the key is absent, causing aKeyError._parse_nodedeserialization bug: The sources JSON deserialization checknode["sources"][idx][0] == "}"should benode["sources"][idx][-1] == "}"(last char, not first), makingjson.loadsnever execute.neo4j_community.pyadd_nodecrashes on missing embedding:raise ValueErrorwhen embedding isNone, but some memory types (e.g., UserMemory) may not have embeddings.Expected Behavior
sourcesorembeddingshould not crash.Environment
vector.similarity.cosine()pre-filtering)use_multi_db=False)