HybridCacheWorkingMemory: embedding generation fails with HTTP 400 (invalid_request_error) on stash/subagent-* entries

## Summary

`HybridCacheWorkingMemory.GenerateEmbeddingAsync` repeatedly fails with `System.ClientModel.ClientResultException: HTTP 400 (invalid_request_error: )` when embedding `stash/subagent-*` working-memory entries. The embedding vector is never stored for those keys, so they fall back to BM25-only retrieval (degraded semantic recall). The call is best-effort and swallowed (logged as a warning), so it's non-fatal — but it's frequent and noisy.

This is **pre-existing** and unrelated to the wisp MCP-readiness fix in #458/#459 — surfaced while tailing agent logs after that deploy.

## Evidence (agent `0.12.26`, single pod since boot)

```
warn: RockBot.Host.HybridCacheWorkingMemory[0]
      Failed to generate working memory embedding for key stash/subagent-9c2baa39135d/call_TsEYnvQMDu0tWkQK4jimoXMp
      System.ClientModel.ClientResultException: HTTP 400 (invalid_request_error: )
         at RockBot.Host.HybridCacheWorkingMemory.GenerateEmbeddingAsync(...) in /src/src/RockBot.Host/HybridCacheWorkingMemory.cs:line 110
```

- **43** such warnings in one pod's lifetime, **100% on `stash/subagent-*` keys** (subagent tool-call results — typically large, dense JSON).
- The `EmbeddingTextPreparer` truncation path **did** fire 12 times (`Truncating embedding input ...`), yet 400s still occur — so the char-based cap is **not** reliably keeping inputs under the endpoint's limit.
- A related perf smell in the same path: one embedding call logged `Embedding generated in 84921ms` (~85 s) — consistent with very large inputs reaching the embedder.

## Mechanism / hypotheses

`GenerateEmbeddingAsync` (`HybridCacheWorkingMemory.cs:105-121`) builds `BuildDocumentText(key, value, category, tags)` (`:123`), runs it through `EmbeddingTextPreparer.Prepare` (`:109-110`), then calls `_embeddingGenerator.GenerateAsync(docText)`.

`EmbeddingTextPreparer` truncates by **character** count (`MaxInputChars` / `MaxStructuredInputChars`) as a *proxy* for the model's **token** limit. The `IsStructured` density heuristic estimates tokens-per-char, but for some subagent payloads it evidently under-estimates, so the post-truncation text still exceeds the embedding model's max-token window → endpoint returns `400 invalid_request_error`. The exception detail is empty (`invalid_request_error: `), so the precise reason isn't captured in logs.

Other (less likely) possibilities worth ruling out:
- Empty/whitespace-only `docText` after processing (Prepare early-returns on empty, but BuildDocumentText could in theory yield near-empty content).
- Endpoint/model-specific rejection of certain byte sequences.

## Impact

- `stash/subagent-*` entries get **no embedding** → semantic (vector) recall misses them; only BM25 matches. The user's live instance runs hybrid search, so this is a real recall gap for subagent results.
- 85 s embedding calls waste an embedding-tier slot and could stack up under load.
- Log noise obscures genuine warnings.

## Suggested fixes (not mutually exclusive)

1. **Capture the real 400 reason** — surface the response body in the log (the `invalid_request_error: ` detail is currently empty). Without it we're guessing token-limit vs. something else. Cheapest first step.
2. **Token-aware truncation** — replace/augment the char-cap proxy in `EmbeddingTextPreparer` with an actual token count (tokenizer for the configured embedding model), or tighten `MaxStructuredInputChars` for the dense-JSON case so subagent payloads land safely under the window.
3. **Decide whether subagent stash entries should be embedded at all** — `GenerateEmbeddingAsync` already skips `IsEphemeralKey(key)` entries (wisp-scoped, exact-key retrieval). Subagent `stash/*` tool-call results may be similarly ephemeral/exact-key and not worth embedding; if so, extend the skip to cover them and avoid the cost entirely.

## Verification

Reproduce by driving subagent activity that writes large `stash/subagent-*` results, then grep agent logs:
```
kubectl -n rockbot logs <agent-pod> -c agent | grep "Failed to generate working memory embedding"
```
After a fix, that count should drop to ~0 (or the entries should be intentionally skipped), and any remaining failures should log a specific, actionable reason.

## Code references
- `src/RockBot.Host/HybridCacheWorkingMemory.cs:105` (`GenerateEmbeddingAsync`, catch at `:117-120`)
- `src/RockBot.Host/HybridCacheWorkingMemory.cs:123` (`BuildDocumentText`)
- `src/RockBot.Host/HybridCacheWorkingMemory.cs:97` (`IsEphemeralKey` skip)
- `src/RockBot.Host/EmbeddingTextPreparer.cs` (char-based truncation / `IsStructured` heuristic)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HybridCacheWorkingMemory: embedding generation fails with HTTP 400 (invalid_request_error) on stash/subagent-* entries #460

Summary

Evidence (agent `0.12.26`, single pod since boot)

Mechanism / hypotheses

Impact

Suggested fixes (not mutually exclusive)

Verification

Code references

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

HybridCacheWorkingMemory: embedding generation fails with HTTP 400 (invalid_request_error) on stash/subagent-* entries #460

Description

Summary

Evidence (agent 0.12.26, single pod since boot)

Mechanism / hypotheses

Impact

Suggested fixes (not mutually exclusive)

Verification

Code references

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Evidence (agent `0.12.26`, single pod since boot)