Skip to content

HybridCacheWorkingMemory: embedding generation fails with HTTP 400 (invalid_request_error) on stash/subagent-* entries #460

@rockfordlhotka

Description

@rockfordlhotka

Summary

HybridCacheWorkingMemory.GenerateEmbeddingAsync repeatedly fails with System.ClientModel.ClientResultException: HTTP 400 (invalid_request_error: ) when embedding stash/subagent-* working-memory entries. The embedding vector is never stored for those keys, so they fall back to BM25-only retrieval (degraded semantic recall). The call is best-effort and swallowed (logged as a warning), so it's non-fatal — but it's frequent and noisy.

This is pre-existing and unrelated to the wisp MCP-readiness fix in #458/#459 — surfaced while tailing agent logs after that deploy.

Evidence (agent 0.12.26, single pod since boot)

warn: RockBot.Host.HybridCacheWorkingMemory[0]
      Failed to generate working memory embedding for key stash/subagent-9c2baa39135d/call_TsEYnvQMDu0tWkQK4jimoXMp
      System.ClientModel.ClientResultException: HTTP 400 (invalid_request_error: )
         at RockBot.Host.HybridCacheWorkingMemory.GenerateEmbeddingAsync(...) in /src/src/RockBot.Host/HybridCacheWorkingMemory.cs:line 110
  • 43 such warnings in one pod's lifetime, 100% on stash/subagent-* keys (subagent tool-call results — typically large, dense JSON).
  • The EmbeddingTextPreparer truncation path did fire 12 times (Truncating embedding input ...), yet 400s still occur — so the char-based cap is not reliably keeping inputs under the endpoint's limit.
  • A related perf smell in the same path: one embedding call logged Embedding generated in 84921ms (~85 s) — consistent with very large inputs reaching the embedder.

Mechanism / hypotheses

GenerateEmbeddingAsync (HybridCacheWorkingMemory.cs:105-121) builds BuildDocumentText(key, value, category, tags) (:123), runs it through EmbeddingTextPreparer.Prepare (:109-110), then calls _embeddingGenerator.GenerateAsync(docText).

EmbeddingTextPreparer truncates by character count (MaxInputChars / MaxStructuredInputChars) as a proxy for the model's token limit. The IsStructured density heuristic estimates tokens-per-char, but for some subagent payloads it evidently under-estimates, so the post-truncation text still exceeds the embedding model's max-token window → endpoint returns 400 invalid_request_error. The exception detail is empty (invalid_request_error: ), so the precise reason isn't captured in logs.

Other (less likely) possibilities worth ruling out:

  • Empty/whitespace-only docText after processing (Prepare early-returns on empty, but BuildDocumentText could in theory yield near-empty content).
  • Endpoint/model-specific rejection of certain byte sequences.

Impact

  • stash/subagent-* entries get no embedding → semantic (vector) recall misses them; only BM25 matches. The user's live instance runs hybrid search, so this is a real recall gap for subagent results.
  • 85 s embedding calls waste an embedding-tier slot and could stack up under load.
  • Log noise obscures genuine warnings.

Suggested fixes (not mutually exclusive)

  1. Capture the real 400 reason — surface the response body in the log (the invalid_request_error: detail is currently empty). Without it we're guessing token-limit vs. something else. Cheapest first step.
  2. Token-aware truncation — replace/augment the char-cap proxy in EmbeddingTextPreparer with an actual token count (tokenizer for the configured embedding model), or tighten MaxStructuredInputChars for the dense-JSON case so subagent payloads land safely under the window.
  3. Decide whether subagent stash entries should be embedded at allGenerateEmbeddingAsync already skips IsEphemeralKey(key) entries (wisp-scoped, exact-key retrieval). Subagent stash/* tool-call results may be similarly ephemeral/exact-key and not worth embedding; if so, extend the skip to cover them and avoid the cost entirely.

Verification

Reproduce by driving subagent activity that writes large stash/subagent-* results, then grep agent logs:

kubectl -n rockbot logs <agent-pod> -c agent | grep "Failed to generate working memory embedding"

After a fix, that count should drop to ~0 (or the entries should be intentionally skipped), and any remaining failures should log a specific, actionable reason.

Code references

  • src/RockBot.Host/HybridCacheWorkingMemory.cs:105 (GenerateEmbeddingAsync, catch at :117-120)
  • src/RockBot.Host/HybridCacheWorkingMemory.cs:123 (BuildDocumentText)
  • src/RockBot.Host/HybridCacheWorkingMemory.cs:97 (IsEphemeralKey skip)
  • src/RockBot.Host/EmbeddingTextPreparer.cs (char-based truncation / IsStructured heuristic)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions