Skip to content

perf: reduce Codex cost refresh metadata work#1430

Merged
steipete merged 1 commit into
mainfrom
perf/cost-usage-file-stat
Jun 11, 2026
Merged

perf: reduce Codex cost refresh metadata work#1430
steipete merged 1 commit into
mainfrom
perf/cost-usage-file-stat

Conversation

@steipete

@steipete steipete commented Jun 11, 2026

Copy link
Copy Markdown
Owner

Summary

  • replace separate Foundation metadata/resource-identifier reads with one portable fstatat pass
  • skip repeated existence/root validation for Codex session paths already found during the current enumeration
  • accept the previous 0.33 cache producer key because parsing semantics did not change, then rewrite with the current key
  • add regressions for cache compatibility plus append, truncation, replacement, and symlink-target metadata

Closes #1392.

Root cause

Profiling current main against a real 365-day archive showed the proposed SQLite priority memo was aimed at a minority cost. The archive contains 60,607 JSONL files (~43 GB); logs_2.sqlite is ~713 MB with ~430k rows. In the expired refresh trace, priority SQLite work was about 4% while per-file Foundation metadata/resource-identifier reads and repeated cached-path validation dominated.

A direct 60,607-file benchmark measured:

Metadata path Time
Foundation attributes + resource identifier 14.16s
One POSIX file-stat pass 0.23s

The replacement keeps current refresh policy and cache correctness. It adds no cursor, persisted memo, overlap state, or new unbounded collection.

Exact candidate proof

All runs used the exact branch build and the same real archive. Candidate cache output was normalized by removing updatedAt; expired and immediate-refresh JSON were byte-identical.

Exact head / state Changed lines Wall time
Current main, expired cache baseline 28.79s
Current main, forced refresh baseline 31.46-37.57s
#1404 exact head, expired +952/-48 37.43s
#1421 exact head, persisted relaunch +623/-26 20.22s
#1422 exact head, persisted relaunch +681/-45 24.80s
#1423 exact head, expired +546/-26 37.03s
#1430, compatible-cache migration refresh +124/-24 12.00s
#1430, warm expired refresh +124/-24 6.74s
#1430, immediate relaunch/cache hit +124/-24 1.45s

The candidate is roughly 4-5.5x faster on the recurring expired-refresh path than current main or the stacked alternatives, with no extra persisted artifact. Peak footprint remained ~736-737 MB; the remaining dominant work is the existing ~39 MB JSON cache decode/encode.

Correctness contracts

  • append: size/mtime changes still select incremental parsing
  • truncation or replacement: size/mtime/inode changes still force the existing fallback path
  • symlinks: fstatat(..., 0) follows the target, matching previous Foundation behavior
  • window changes, late completion/model attribution, pricing changes, parser invalidation, and forced refreshes: existing scanner behavior unchanged
  • overlapping refreshes and bounded memory: no new mutable memo or retained state
  • Linux CLI: Darwin/Glibc timestamp fields are selected at compile time; required x64 and arm64 CI must pass before merge

Validation

  • swift test --filter 'CostUsageCacheTests|CostUsageScannerTests' - 21/21 pass
  • focused priority/cache/metadata regressions - 16/16 pass
  • make check - SwiftFormat, SwiftLint, and generated parser hash clean
  • full swift test run twice - only existing parallel app-server timeout flakes; each affected suite passes alone
  • branch autoreview (gpt-5.5) - clean, no accepted/actionable findings, confidence 0.82
  • git diff --check - clean

This supersedes the larger stateful stack while retaining contributor credit from #1404, #1421, #1422, and #1423.

@clawsweeper

clawsweeper Bot commented Jun 11, 2026

Copy link
Copy Markdown

Codex review: needs maintainer review before merge. Reviewed June 11, 2026, 6:32 AM ET / 10:32 UTC.

Summary
The PR replaces Codex cost-cache Foundation metadata/root checks with a POSIX stat-based path, accepts one compatible prior Codex cache producer key, updates the generated parser hash and changelog, and adds focused cache/metadata tests.

Reproducibility: not applicable. as a bug reproduction path for this PR review. The PR body reports profiling current main and the exact branch build on a real 60,607-file archive, but this read-only review did not rerun that benchmark.

Review metrics: 3 noteworthy metrics.

  • Diff Scope: 7 files, +124/-24. The performance change is bounded to cost-cache/scanner code, generated hash, tests, and one changelog entry.
  • Focused Tests Added: 2 @test methods added. The new tests cover the compatibility key and the metadata cases most likely to regress cache correctness.
  • Compatibility Key: 1 prior producer key accepted. Existing Codex cache reuse after upgrade is the main semantic choice maintainers should notice before merge.

Merge readiness
Overall: 🦞 diamond lobster
Proof: 🦞 diamond lobster
Patch quality: 🦞 diamond lobster
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Risk before merge

  • [P1] This read-only review did not execute Swift tests or the large-archive benchmark; CI should confirm the reported focused tests, full tests where stable, make check, and parser-hash check before merge.
  • [P1] The compatible v0.33 Codex producer key is an intentional upgrade decision while the generated parser hash changes; maintainers should be comfortable that the parser semantics did not change before relying on the old cache.

Maintainer options:

  1. Accept the compatible-cache migration (recommended)
    Maintain the prior producer key compatibility after CI passes if maintainers agree the parser hash changed only because of metadata/cache helper code, not parser semantics.
  2. Force a cold cache on doubt
    Remove the compatible producer key before merge if maintainers want every existing Codex cache rebuilt whenever the generated parser hash changes.

Next step before merge

  • [P2] No narrow automated repair target is present; maintainers should review CI and decide whether to accept the compatible producer-key migration before merge.

Security
Cleared: No concrete security or supply-chain concern was found; the diff does not touch workflows, dependencies, package resolution, install/build scripts, secrets, or publishing automation.

Review details

Best possible solution:

Land the focused stat-based metadata/cache-validation patch after CI confirms the added regressions and maintainers accept the compatible-cache migration semantics.

Do we have a high-confidence way to reproduce the issue?

Not applicable as a bug reproduction path for this PR review. The PR body reports profiling current main and the exact branch build on a real 60,607-file archive, but this read-only review did not rerun that benchmark.

Is this the best way to solve the issue?

Yes, based on source inspection this is a narrow maintainable solution: it targets the measured metadata/root-validation cost without adding a persisted cursor, memo, or new stateful artifact. The main remaining decision is whether to accept the prior producer key as parser-compatible.

AGENTS.md: found and applied where relevant.

Codex review notes: model internal, reasoning high; reviewed against 159d03ceb318.

Label changes

Label changes:

  • add merge-risk: 🚨 compatibility: The PR deliberately reuses a previous Codex cache producer key despite a generated parser-hash change, so upgrade cache semantics need maintainer acceptance.
  • add rating: 🦞 diamond lobster: Overall readiness is 🦞 diamond lobster; proof is 🦞 diamond lobster and patch quality is 🦞 diamond lobster.
  • remove rating: 🐚 platinum hermit: Current PR rating is rating: 🦞 diamond lobster, so this older rating label is no longer current.

Label justifications:

  • P2: This is a normal-priority performance/cache correctness improvement for large Codex session archives with limited product blast radius.
  • merge-risk: 🚨 compatibility: The PR deliberately reuses a previous Codex cache producer key despite a generated parser-hash change, so upgrade cache semantics need maintainer acceptance.
  • rating: 🦞 diamond lobster: Overall readiness is 🦞 diamond lobster; proof is 🦞 diamond lobster and patch quality is 🦞 diamond lobster.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Not applicable: The external-contributor proof gate does not apply to this OWNER-authored PR; the PR body still reports exact-branch real-archive benchmark results and focused validation commands.
Evidence reviewed

What I checked:

Likely related people:

  • steipete: Blame and log history show Peter Steinberger on the current cached-session helpers, metadata helper, Codex cost scanner performance work, generated producer-key cache invalidation, and this PR head. (role: recent area contributor; confidence: high; commits: 6cf422512061, 69de57b85de7, 5aded294c234; files: Sources/CodexBarCore/Vendored/CostUsage/CostUsageScanner.swift, Sources/CodexBarCore/Vendored/CostUsage/CostUsageScanner+CacheHelpers.swift, Sources/CodexBarCore/Vendored/CostUsage/CostUsageCache.swift)
  • ratulsarna: ratulsarna authored the earlier Codex cost scanner overcounting/cross-day fix that touched CostUsageCache.swift and CostUsageScanner.swift, which are central to this patch's cache correctness surface. (role: adjacent cache correctness contributor; confidence: medium; commits: fd19445056a1; files: Sources/CodexBarCore/Vendored/CostUsage/CostUsageCache.swift, Sources/CodexBarCore/Vendored/CostUsage/CostUsageScanner.swift, Tests/CodexBarTests/CostUsageCacheTests.swift)
  • hhh2210: hhh2210 co-authored the generated parser-hash producer-key cache invalidation work that this PR's compatible-producer-key migration adjusts. (role: adjacent cache invalidation coauthor; confidence: medium; commits: 5aded294c234; files: Scripts/regenerate-codex-parser-hash.sh, Sources/CodexBarCore/Generated/CodexParserHash.generated.swift, Sources/CodexBarCore/Vendored/CostUsage/CostUsageCache.swift)
  • ProspectOre: The PR head has a Co-authored-by trailer mapping to ProspectOre, and the PR body/changelog credit ProspectOre for the related Codex cost-scan performance context. (role: PR coauthor and adjacent performance contributor; confidence: medium; commits: e3b0597e02c5; files: CHANGELOG.md, Sources/CodexBarCore/Vendored/CostUsage/CostUsageScanner+CacheHelpers.swift, Sources/CodexBarCore/Vendored/CostUsage/CostUsageScanner.swift)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. P2 Normal priority bug or improvement with limited blast radius. labels Jun 11, 2026
Co-authored-by: pickaxe <54486432+ProspectOre@users.noreply.github.com>
@steipete steipete force-pushed the perf/cost-usage-file-stat branch from 7dceb10 to e3b0597 Compare June 11, 2026 10:25
@clawsweeper clawsweeper Bot added rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. merge-risk: 🚨 compatibility 🚨 Merging this PR could break existing users, config, migrations, defaults, or upgrades. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. labels Jun 11, 2026
@steipete steipete merged commit dd8cf8b into main Jun 11, 2026
7 checks passed
@steipete steipete deleted the perf/cost-usage-file-stat branch June 11, 2026 10:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-risk: 🚨 compatibility 🚨 Merging this PR could break existing users, config, migrations, defaults, or upgrades. P2 Normal priority bug or improvement with limited blast radius. rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize Codex cost refresh policy for large history windows

1 participant