Summary
HybridCacheWorkingMemory.GenerateEmbeddingAsync repeatedly fails with System.ClientModel.ClientResultException: HTTP 400 (invalid_request_error: ) when embedding stash/subagent-* working-memory entries. The embedding vector is never stored for those keys, so they fall back to BM25-only retrieval (degraded semantic recall). The call is best-effort and swallowed (logged as a warning), so it's non-fatal — but it's frequent and noisy.
This is pre-existing and unrelated to the wisp MCP-readiness fix in #458/#459 — surfaced while tailing agent logs after that deploy.
Evidence (agent 0.12.26, single pod since boot)
warn: RockBot.Host.HybridCacheWorkingMemory[0]
Failed to generate working memory embedding for key stash/subagent-9c2baa39135d/call_TsEYnvQMDu0tWkQK4jimoXMp
System.ClientModel.ClientResultException: HTTP 400 (invalid_request_error: )
at RockBot.Host.HybridCacheWorkingMemory.GenerateEmbeddingAsync(...) in /src/src/RockBot.Host/HybridCacheWorkingMemory.cs:line 110
- 43 such warnings in one pod's lifetime, 100% on
stash/subagent-* keys (subagent tool-call results — typically large, dense JSON).
- The
EmbeddingTextPreparer truncation path did fire 12 times (Truncating embedding input ...), yet 400s still occur — so the char-based cap is not reliably keeping inputs under the endpoint's limit.
- A related perf smell in the same path: one embedding call logged
Embedding generated in 84921ms (~85 s) — consistent with very large inputs reaching the embedder.
Mechanism / hypotheses
GenerateEmbeddingAsync (HybridCacheWorkingMemory.cs:105-121) builds BuildDocumentText(key, value, category, tags) (:123), runs it through EmbeddingTextPreparer.Prepare (:109-110), then calls _embeddingGenerator.GenerateAsync(docText).
EmbeddingTextPreparer truncates by character count (MaxInputChars / MaxStructuredInputChars) as a proxy for the model's token limit. The IsStructured density heuristic estimates tokens-per-char, but for some subagent payloads it evidently under-estimates, so the post-truncation text still exceeds the embedding model's max-token window → endpoint returns 400 invalid_request_error. The exception detail is empty (invalid_request_error: ), so the precise reason isn't captured in logs.
Other (less likely) possibilities worth ruling out:
- Empty/whitespace-only
docText after processing (Prepare early-returns on empty, but BuildDocumentText could in theory yield near-empty content).
- Endpoint/model-specific rejection of certain byte sequences.
Impact
stash/subagent-* entries get no embedding → semantic (vector) recall misses them; only BM25 matches. The user's live instance runs hybrid search, so this is a real recall gap for subagent results.
- 85 s embedding calls waste an embedding-tier slot and could stack up under load.
- Log noise obscures genuine warnings.
Suggested fixes (not mutually exclusive)
- Capture the real 400 reason — surface the response body in the log (the
invalid_request_error: detail is currently empty). Without it we're guessing token-limit vs. something else. Cheapest first step.
- Token-aware truncation — replace/augment the char-cap proxy in
EmbeddingTextPreparer with an actual token count (tokenizer for the configured embedding model), or tighten MaxStructuredInputChars for the dense-JSON case so subagent payloads land safely under the window.
- Decide whether subagent stash entries should be embedded at all —
GenerateEmbeddingAsync already skips IsEphemeralKey(key) entries (wisp-scoped, exact-key retrieval). Subagent stash/* tool-call results may be similarly ephemeral/exact-key and not worth embedding; if so, extend the skip to cover them and avoid the cost entirely.
Verification
Reproduce by driving subagent activity that writes large stash/subagent-* results, then grep agent logs:
kubectl -n rockbot logs <agent-pod> -c agent | grep "Failed to generate working memory embedding"
After a fix, that count should drop to ~0 (or the entries should be intentionally skipped), and any remaining failures should log a specific, actionable reason.
Code references
src/RockBot.Host/HybridCacheWorkingMemory.cs:105 (GenerateEmbeddingAsync, catch at :117-120)
src/RockBot.Host/HybridCacheWorkingMemory.cs:123 (BuildDocumentText)
src/RockBot.Host/HybridCacheWorkingMemory.cs:97 (IsEphemeralKey skip)
src/RockBot.Host/EmbeddingTextPreparer.cs (char-based truncation / IsStructured heuristic)
Summary
HybridCacheWorkingMemory.GenerateEmbeddingAsyncrepeatedly fails withSystem.ClientModel.ClientResultException: HTTP 400 (invalid_request_error: )when embeddingstash/subagent-*working-memory entries. The embedding vector is never stored for those keys, so they fall back to BM25-only retrieval (degraded semantic recall). The call is best-effort and swallowed (logged as a warning), so it's non-fatal — but it's frequent and noisy.This is pre-existing and unrelated to the wisp MCP-readiness fix in #458/#459 — surfaced while tailing agent logs after that deploy.
Evidence (agent
0.12.26, single pod since boot)stash/subagent-*keys (subagent tool-call results — typically large, dense JSON).EmbeddingTextPreparertruncation path did fire 12 times (Truncating embedding input ...), yet 400s still occur — so the char-based cap is not reliably keeping inputs under the endpoint's limit.Embedding generated in 84921ms(~85 s) — consistent with very large inputs reaching the embedder.Mechanism / hypotheses
GenerateEmbeddingAsync(HybridCacheWorkingMemory.cs:105-121) buildsBuildDocumentText(key, value, category, tags)(:123), runs it throughEmbeddingTextPreparer.Prepare(:109-110), then calls_embeddingGenerator.GenerateAsync(docText).EmbeddingTextPreparertruncates by character count (MaxInputChars/MaxStructuredInputChars) as a proxy for the model's token limit. TheIsStructureddensity heuristic estimates tokens-per-char, but for some subagent payloads it evidently under-estimates, so the post-truncation text still exceeds the embedding model's max-token window → endpoint returns400 invalid_request_error. The exception detail is empty (invalid_request_error:), so the precise reason isn't captured in logs.Other (less likely) possibilities worth ruling out:
docTextafter processing (Prepare early-returns on empty, but BuildDocumentText could in theory yield near-empty content).Impact
stash/subagent-*entries get no embedding → semantic (vector) recall misses them; only BM25 matches. The user's live instance runs hybrid search, so this is a real recall gap for subagent results.Suggested fixes (not mutually exclusive)
invalid_request_error:detail is currently empty). Without it we're guessing token-limit vs. something else. Cheapest first step.EmbeddingTextPreparerwith an actual token count (tokenizer for the configured embedding model), or tightenMaxStructuredInputCharsfor the dense-JSON case so subagent payloads land safely under the window.GenerateEmbeddingAsyncalready skipsIsEphemeralKey(key)entries (wisp-scoped, exact-key retrieval). Subagentstash/*tool-call results may be similarly ephemeral/exact-key and not worth embedding; if so, extend the skip to cover them and avoid the cost entirely.Verification
Reproduce by driving subagent activity that writes large
stash/subagent-*results, then grep agent logs:After a fix, that count should drop to ~0 (or the entries should be intentionally skipped), and any remaining failures should log a specific, actionable reason.
Code references
src/RockBot.Host/HybridCacheWorkingMemory.cs:105(GenerateEmbeddingAsync, catch at:117-120)src/RockBot.Host/HybridCacheWorkingMemory.cs:123(BuildDocumentText)src/RockBot.Host/HybridCacheWorkingMemory.cs:97(IsEphemeralKeyskip)src/RockBot.Host/EmbeddingTextPreparer.cs(char-based truncation /IsStructuredheuristic)