Vision (CRO)
The loop should build, over time, a skill tree that encodes how to do tasks on different aspects of the repo. The maestro and agents keep evolving it — adding and refining leaves (concrete procedures) and internal nodes (generalized patterns) — so the loop gets faster, cheaper, and more accurate with every merged ticket instead of starting cold each time.
Problem it solves
Every worker starts cold today: it re-derives repo conventions, file maps, test recipes, and 'the right way' to do X on every ticket — burning tokens (one recent ticket: ~7.4M cached tokens / 57 turns / $8.50, much of it re-derivation), making inconsistent choices, and repeating known mistakes. The biggest faster+cheaper+more-accurate lever is the same: retrieve learned procedure instead of re-deriving it.
What it is
Learned procedural memory — a curated, evolving corpus of structured 'how to do X here' cards with provenance and verification. Precedent that already works: the Claude Code memory system (index + one-fact cards + verified: dates + routing eval + health dashboard + curator agent). Adopt the same shape pointed at procedural repo knowledge; reuse the existing Lumen semantic index for retrieval.
Skill card (leaf) schema
- area — node path (e.g.
node/http-route, ledger/endpoint, ui/page, db/migration) = the tree
- trigger — when it applies (retrievability)
- procedure — file map + template + test recipe + known pitfalls
- provenance — the merged, critic-clean PR/SHA that proves it
- verified — commit it was last validated against
Lifecycle (the 'evolving' part)
- Harvest — after a critic-clean merge, a librarian sub-agent distills/updates a card, stamped with the SHA.
- Retrieve — at dispatch, maestro injects top-K relevant cards (semantic + area-tag) into the brief.
- Curate/evolve — periodic curator dedupes, promotes recurring leaves into general internal nodes, expires stale cards.
- Falsify — a card earns trust only if re-application reproduces green; demote on block/revert.
Design traps to engineer around (acceptance bar)
- Stale > empty: confidently-wrong card is worse than none → provenance SHA + verified date + auto-expiry mandatory.
- Retrieval is the hard part: card quality + tagging + semantic relevance, not raw storage. Falsifiable: a seeded query returns the right card and excludes irrelevant ones.
- Curation/contradiction sprawl: one curator role, check-before-write, dedupe.
Every sub-ticket falsifiable with an adversarial test (removing the new behavior turns a test red); no happy-path-only; provenance-backed.
Seed slices (sub-issues)
- S1 skill-card schema + store + retrieval index
- S2 harvest: post-merge librarian distills a provenance-stamped card
- S3 retrieve: inject top-K cards into the worker brief; measure turns/tokens delta
- S4 curate/evolve: dedupe + promote-to-internal-node + expire-stale (later)
- S5 falsify/eval: re-application reproduces green or the card is demoted (later)
Adjacent efficiency levers (separate tickets, same theme): cache critic verdicts keyed on diff SHA (don't re-review unchanged); per-stage model tiering; area-aware parallelism of disjoint tickets.
Vision (CRO)
The loop should build, over time, a skill tree that encodes how to do tasks on different aspects of the repo. The maestro and agents keep evolving it — adding and refining leaves (concrete procedures) and internal nodes (generalized patterns) — so the loop gets faster, cheaper, and more accurate with every merged ticket instead of starting cold each time.
Problem it solves
Every worker starts cold today: it re-derives repo conventions, file maps, test recipes, and 'the right way' to do X on every ticket — burning tokens (one recent ticket: ~7.4M cached tokens / 57 turns / $8.50, much of it re-derivation), making inconsistent choices, and repeating known mistakes. The biggest faster+cheaper+more-accurate lever is the same: retrieve learned procedure instead of re-deriving it.
What it is
Learned procedural memory — a curated, evolving corpus of structured 'how to do X here' cards with provenance and verification. Precedent that already works: the Claude Code memory system (index + one-fact cards +
verified:dates + routing eval + health dashboard + curator agent). Adopt the same shape pointed at procedural repo knowledge; reuse the existing Lumen semantic index for retrieval.Skill card (leaf) schema
node/http-route,ledger/endpoint,ui/page,db/migration) = the treeLifecycle (the 'evolving' part)
Design traps to engineer around (acceptance bar)
Every sub-ticket falsifiable with an adversarial test (removing the new behavior turns a test red); no happy-path-only; provenance-backed.
Seed slices (sub-issues)
Adjacent efficiency levers (separate tickets, same theme): cache critic verdicts keyed on diff SHA (don't re-review unchanged); per-stage model tiering; area-aware parallelism of disjoint tickets.