Skip to content

fix(anthropic): surface cache_read/write input tokens in metadata chunk#2302

Open
jehoon-shin-mathflat wants to merge 1 commit into
strands-agents:mainfrom
jehoon-shin-mathflat:feat/anthropic-cache-token-usage
Open

fix(anthropic): surface cache_read/write input tokens in metadata chunk#2302
jehoon-shin-mathflat wants to merge 1 commit into
strands-agents:mainfrom
jehoon-shin-mathflat:feat/anthropic-cache-token-usage

Conversation

@jehoon-shin-mathflat
Copy link
Copy Markdown

Description of changes

Anthropic returns input_tokens as the non-cached portion only when prompt caching is in use. The metadata chunk formatter in models/anthropic.py reads input_tokens / output_tokens from the API response but drops the cache_read_input_tokens and cache_creation_input_tokens fields. As a result, anything that derives cost from Agent.metrics.accumulated_usage under-reports real usage — sometimes by an order of magnitude on image+text workloads where the cached prefix carries most of the prompt.

Repro

A vision classifier I run sends [{"text": ...}, {"cachePoint": {}}, {"image": ...}] so that the text+image prefix is cached. After warmup, the Anthropic API returns roughly:

{
  "input_tokens": 3,
  "cache_read_input_tokens": 1500,
  "cache_creation_input_tokens": 0,
  "output_tokens": 120
}

Strands currently surfaces only inputTokens=3, so accumulated cost looks ~10× lower than what Anthropic actually bills. The Usage TypedDict in types/event_loop.py already declares the cache fields as optional members — they just weren't being populated by the Anthropic adapter.

Change

models/anthropic.py — in the metadata case of format_chunk:

  • Pull cache_read_input_tokens and cache_creation_input_tokens from the usage dict (defaulting to 0 when absent).
  • Emit them as cacheReadInputTokens / cacheWriteInputTokens on the usage chunk, matching the existing field names already defined in types.event_loop.Usage.
  • Recompute totalTokens as uncached_input + cache_read + cache_write + output_tokens so it reflects the full billed input.
  • Omit the cache fields when both are zero so the chunk shape for non-cached responses is unchanged (no consumer needs to update).

Tests

tests/strands/models/test_anthropic.py:

  • test_format_chunk_metadata_with_cache_tokens — both cache fields present.
  • test_format_chunk_metadata_omits_zero_cache_tokens — fields absent/zero, legacy shape preserved.

All 59 tests in tests/strands/models/test_anthropic.py pass.

Related Issues

n/a — surfaced while debugging cost accounting in a downstream POC.

Documentation PR

n/a — no public API surface change, only adds optional fields that Usage already declares.

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Testing

pytest tests/strands/models/test_anthropic.py
# 59 passed

Checklist

  • I have read the CONTRIBUTING document
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have added/updated necessary documentation (if applicable)

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Anthropic returns `input_tokens` as the NON-CACHED portion only when prompt
caching is in use. `cache_read_input_tokens` and `cache_creation_input_tokens`
were dropped during metadata chunk formatting, so downstream consumers
(`Agent.metrics.accumulated_usage` and anything that derives cost from it)
saw only the uncached delta and under-reported real usage / cost — sometimes
by an order of magnitude on image+text workloads where the image dominates
the cached prefix.

This change:
- Maps `cache_read_input_tokens` → `cacheReadInputTokens` and
  `cache_creation_input_tokens` → `cacheWriteInputTokens` on the metadata
  chunk, both already defined as optional members of `types.event_loop.Usage`.
- Recomputes `totalTokens` as `uncached_input + cache_read + cache_write +
  output_tokens` so it reflects the actual billed input.
- Omits the cache fields when both are zero/absent, preserving the existing
  chunk shape for non-cached responses (no consumer change required).

Added tests covering the cached and non-cached metadata shapes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant