Add performance gates for incremental Codex cost scans by steipete · Pull Request #1434 · steipete/CodexBar

steipete · 2026-06-11T11:24:38Z

Summary

add a regression gate proving unchanged Codex session corpora reuse the warm cache
add a regression gate proving priority-turn refreshes scan appended SQLite rows only
make both gates deterministic by asserting cache and cursor behavior instead of wall-clock ratios
keep the test-only change separate from production scanner behavior

Supersedes #1423 because its fork branch does not allow maintainer edits.

Proof

swift test --filter CostUsagePerformanceGateTests twice: 2 tests passed on both runs
make check: clean
git diff --check: clean
structured autoreview: clean, no accepted/actionable findings (0.86 confidence)

clawsweeper · 2026-06-11T11:25:59Z

Codex review: needs maintainer review before merge. Reviewed June 11, 2026, 6:18 PM ET / 22:18 UTC.

Summary
Review failed before ClawSweeper could summarize the requested change.

Reproducibility: unclear. The review failed before ClawSweeper could establish a reproduction path.

Review metrics: none identified.

Merge readiness
Overall: 🌊 off-meta tidepool
Proof: 🌊 off-meta tidepool
Patch quality: 🌊 off-meta tidepool
Result: rating does not apply to this item.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Risk before merge

[P1] No close action taken because the review did not complete.

Maintainer options:

Decide the mitigation before merge
Retry the Codex review after fixing the execution failure.
Pause or close
Do not merge this PR until maintainers decide whether the risk is worth taking.

Next step before merge

Review did not complete, so no work-lane recommendation was made.

Review details

Best possible solution:

Retry the Codex review after fixing the execution failure.

Do we have a high-confidence way to reproduce the issue?

Unclear. The review failed before ClawSweeper could establish a reproduction path.

Is this the best way to solve the issue?

Unclear. Retry the review first so ClawSweeper can evaluate the actual issue and fix direction.

AGENTS.md: unclear because the file could not be read completely.

Codex review notes: model internal, reasoning high; reviewed against 1912f75f4962.

Label changes

Label changes:

remove P3: Current review triage priority is none.
remove merge-risk: 🚨 automation: Current PR review selected no merge-risk labels.

Label justifications:

rating: 🌊 off-meta tidepool: Overall readiness is 🌊 off-meta tidepool; proof is 🌊 off-meta tidepool and patch quality is 🌊 off-meta tidepool.

Evidence reviewed

What I checked:

failure reason: retryable codex transport failure.
codex failure detail: Codex review failed for this PR with exit 1.
codex stderr: record.\n- The suite is .serialized so the two timed gates don't contend with each other.\n\n## Runtime Proof (local run, Apple Silicon)\n\ntext\nPERF-GATE codex-session-corpus: cold=3004ms warm=17ms ratio=173x\n􁁛 Test \"warm codex refresh over an unchanged session corpus must not re-parse it\" passed after 3.196 seconds.\nPERF-GATE priority-turns: full=25ms incremental=0ms ratio=95x\n􁁛 Test \"priority turns refresh must scan only appended trace rows\" passed after 0.201 seconds.\n􁁛 Test run with 2 tests in 1 suite passed after 3.398 seconds.\n\n\nNegative test — the gate fires when the protection is removed. Forcing state = nil per refresh (the pre-memo full-scan behavior) and rerunning:\n\ntext\nPERF-GATE priority-turns: full=25ms incremental=24ms ratio=1x\nTest \"priority turns refresh must scan only appended trace rows\" recorded an issue:\nExpectation failed: (refreshDuration * 5 → 0.1189...) < (fullDuration → 0.0254...)\nSuite CostUsagePerformanceGateTests failed after 0.186 seconds with 1 issue.\n\n\nReverting the sabotage restores ratio=95x and the gate passes.\n\n## Validation\n\n- swift test --filter CostUsagePerformanceGate — 2/2 pass.\n- make check — 0 violations.\n\nNo CHANGELOG edit per current review guidance (release-owned).\n".
codex stdout: No stdout captured.

Likely related people:

unknown: Codex failed before it could trace repository history. (role: review did not complete; confidence: low)

What the crustacean ranks mean

🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works

ClawSweeper keeps one durable marker-backed review comment per issue or PR.
Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
Maintainers can also comment @clawsweeper review to request a fresh review only.
Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

steipete · 2026-06-11T14:21:00Z

Rebased onto current main, fixed the timing gate exposed by local validation, and revalidated exact head a9acafaf.

The original 200k-row fixture produced only a 4x cold/incremental ratio on this machine. The fixture now uses 500k rows while retaining the 5x requirement; two consecutive runs passed with roughly 70 ms full scans and sub-millisecond incremental scans.

Proof:

swift test --filter CostUsagePerformanceGateTests — 2 tests passed twice
make check — clean
autoreview --mode branch --base origin/main — clean, no actionable findings

steipete · 2026-06-11T14:59:45Z

Updated the PR head to 16ae06071814e1cdbce7a90b31e7732dc68c732c.

rebased onto current main
swift test --filter CostUsagePerformanceGateTests: 2 passed
measured unchanged-session warm/cold ratio: 433x
measured priority-turn incremental/full ratio: 116x
make check: clean
structured autoreview: clean, no actionable findings (0.86 confidence)

Fresh exact-head CI is now running.

steipete · 2026-06-11T15:56:43Z

Rebased on current main (88c43eeb) and pushed head da0c4562.

Autoreview found and I fixed a CI-flakiness blocker: the gates no longer assert wall-clock ratios. They now prove cache reuse by changing old content while preserving cache metadata, and prove row-cursor behavior by changing an old SQLite row before appending a new one.

Proof:

swift test --filter CostUsagePerformanceGateTests: 2 tests passed
make check: passed, 0 lint violations
final autoreview: clean, confidence 0.84
diff check: clean

steipete · 2026-06-11T16:43:34Z

Rebased onto current main and replaced timing-ratio assertions with deterministic behavior gates.

Proof on 69b28189:

swift test --filter CostUsagePerformanceGateTests twice: 2 tests passed on both runs
make check: clean
git diff --check: clean
structured autoreview: no actionable findings (0.86 confidence)

Exact-head CI is now running.

Co-authored-by: pickaxe <54486432+ProspectOre@users.noreply.github.com>

steipete · 2026-06-12T00:59:29Z

Validated exact head c70bb0d367c45e09198c34c889cdf7488dc199a2.
Landed as ff6c42d47967ca0d4057fe5bbbea33ff0e26e5fc.

swift test --filter CostUsagePerformanceGateTests twice (2 tests each run)
make check (SwiftFormat and SwiftLint clean)
git diff --check origin/main...HEAD
released/fictitious model-name gate clean
structured autoreview clean with no actionable findings (0.94 confidence)
exact-head macOS, Linux x64, Linux arm64, and GitGuardian checks green
current-main merge-tree clean

The new deterministic gates prove that unchanged Codex session files retain cached parse results and that priority-turn refreshes process appended SQLite rows without replaying mutated historical rows.

steipete force-pushed the test/codex-cost-performance-gates branch from 82f5852 to a9acafa Compare June 11, 2026 14:20

steipete force-pushed the test/codex-cost-performance-gates branch from a9acafa to 16ae060 Compare June 11, 2026 14:59

steipete force-pushed the test/codex-cost-performance-gates branch from 16ae060 to da0c456 Compare June 11, 2026 15:56

steipete force-pushed the test/codex-cost-performance-gates branch from da0c456 to 69b2818 Compare June 11, 2026 16:43

steipete force-pushed the test/codex-cost-performance-gates branch from 69b2818 to 3df376e Compare June 11, 2026 20:58

test: gate incremental Codex cost scan performance

c70bb0d

Co-authored-by: pickaxe <54486432+ProspectOre@users.noreply.github.com>

steipete force-pushed the test/codex-cost-performance-gates branch from 3df376e to c70bb0d Compare June 11, 2026 22:15

steipete merged commit ff6c42d into main Jun 12, 2026
4 of 7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add performance gates for incremental Codex cost scans#1434

Add performance gates for incremental Codex cost scans#1434
steipete merged 1 commit into
mainfrom
test/codex-cost-performance-gates

steipete commented Jun 11, 2026 •

edited

Loading

Uh oh!

clawsweeper Bot commented Jun 11, 2026 •

edited

Loading

Uh oh!

steipete commented Jun 11, 2026

Uh oh!

steipete commented Jun 11, 2026

Uh oh!

steipete commented Jun 11, 2026

Uh oh!

steipete commented Jun 11, 2026

Uh oh!

Uh oh!

steipete commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

steipete commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Proof

Uh oh!

clawsweeper Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

steipete commented Jun 11, 2026

Uh oh!

steipete commented Jun 11, 2026

Uh oh!

steipete commented Jun 11, 2026

Uh oh!

steipete commented Jun 11, 2026

Uh oh!

Uh oh!

steipete commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

steipete commented Jun 11, 2026 •

edited

Loading

clawsweeper Bot commented Jun 11, 2026 •

edited

Loading