Skip to content

EPIC: learned skill-tree — procedural memory the maestro retrieves and agents evolve #458

Description

@hadamrd

Vision (CRO)

The loop should build, over time, a skill tree that encodes how to do tasks on different aspects of the repo. The maestro and agents keep evolving it — adding and refining leaves (concrete procedures) and internal nodes (generalized patterns) — so the loop gets faster, cheaper, and more accurate with every merged ticket instead of starting cold each time.

Problem it solves

Every worker starts cold today: it re-derives repo conventions, file maps, test recipes, and 'the right way' to do X on every ticket — burning tokens (one recent ticket: ~7.4M cached tokens / 57 turns / $8.50, much of it re-derivation), making inconsistent choices, and repeating known mistakes. The biggest faster+cheaper+more-accurate lever is the same: retrieve learned procedure instead of re-deriving it.

What it is

Learned procedural memory — a curated, evolving corpus of structured 'how to do X here' cards with provenance and verification. Precedent that already works: the Claude Code memory system (index + one-fact cards + verified: dates + routing eval + health dashboard + curator agent). Adopt the same shape pointed at procedural repo knowledge; reuse the existing Lumen semantic index for retrieval.

Skill card (leaf) schema

  • area — node path (e.g. node/http-route, ledger/endpoint, ui/page, db/migration) = the tree
  • trigger — when it applies (retrievability)
  • procedure — file map + template + test recipe + known pitfalls
  • provenance — the merged, critic-clean PR/SHA that proves it
  • verified — commit it was last validated against

Lifecycle (the 'evolving' part)

  1. Harvest — after a critic-clean merge, a librarian sub-agent distills/updates a card, stamped with the SHA.
  2. Retrieve — at dispatch, maestro injects top-K relevant cards (semantic + area-tag) into the brief.
  3. Curate/evolve — periodic curator dedupes, promotes recurring leaves into general internal nodes, expires stale cards.
  4. Falsify — a card earns trust only if re-application reproduces green; demote on block/revert.

Design traps to engineer around (acceptance bar)

  1. Stale > empty: confidently-wrong card is worse than none → provenance SHA + verified date + auto-expiry mandatory.
  2. Retrieval is the hard part: card quality + tagging + semantic relevance, not raw storage. Falsifiable: a seeded query returns the right card and excludes irrelevant ones.
  3. Curation/contradiction sprawl: one curator role, check-before-write, dedupe.

Every sub-ticket falsifiable with an adversarial test (removing the new behavior turns a test red); no happy-path-only; provenance-backed.

Seed slices (sub-issues)

  • S1 skill-card schema + store + retrieval index
  • S2 harvest: post-merge librarian distills a provenance-stamped card
  • S3 retrieve: inject top-K cards into the worker brief; measure turns/tokens delta
  • S4 curate/evolve: dedupe + promote-to-internal-node + expire-stale (later)
  • S5 falsify/eval: re-application reproduces green or the card is demoted (later)

Adjacent efficiency levers (separate tickets, same theme): cache critic verdicts keyed on diff SHA (don't re-review unchanged); per-stage model tiering; area-aware parallelism of disjoint tickets.

Metadata

Metadata

Assignees

No one assigned

    Labels

    epicMulti-PR umbrella tracking a major theme

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions