Skip to content

feat(campaign): harness×model as an eval axis, collapsed onto AgentProfileCell#297

Merged
drewstone merged 2 commits into
mainfrom
feat/harness-model-axis
Jul 1, 2026
Merged

feat(campaign): harness×model as an eval axis, collapsed onto AgentProfileCell#297
drewstone merged 2 commits into
mainfrom
feat/harness-model-axis

Conversation

@drewstone

Copy link
Copy Markdown
Contributor

Why

The harness×model matrix was hand-rolled (and drifted) across tax/legal/gtm because agent-eval offered no axis: runProfileMatrix ate bare AgentProfile[] and grouped by profileId only, so harness identity got smuggled through metadata differently in each app (that drift silently broke legal's harness pivot).

What

  • expandProfileAxes({base, harnesses?, models?})AgentProfile[] — the ONE generator for the harness×model sweep. Omit the lists → CODING_HARNESSES × the base model (the 'turn it on for everything' switch). Incompatible (harness, model) pairs dropped via harnessSupportsModel. CODING_HARNESSES is the single canonical list — consumers import it instead of re-declaring.
  • Collapsed identity onto the EXISTING AgentProfileCell: runProfileMatrix now builds a (profileId, harness, model) cell per column and stamps it on every RunRecord, so results group via the existing groupRunsByAgentProfileCell (harness/model aware) — not a new parallel carrier.

Harness stays at the run layer (AgentSpec); this only carries it into the eval cell. harnessAxisOf reads the generator's stamp; nothing hand-recomputes a key.

Safety

Purely additive (new exports; RunRecord.agentProfile was already optional) → no forced fleet bump; consumers adopt opt-in. A non-axis profile yields a harness-free cell — unchanged grouping.

Proof

Full suite 2650 green, typecheck clean, new tests cover expandProfileAxes (14) + the harness→cell→groupRunsByAgentProfileCell round-trip through the matrix.

drewstone added 2 commits July 1, 2026 10:45
…ofileCell

The harness×model matrix was hand-rolled (and drifted) across products because
agent-eval offered no axis: runProfileMatrix ate bare AgentProfile[] and grouped
by profileId only, so harness identity got smuggled through metadata differently
in each app.

- expandProfileAxes({base, harnesses?, models?}) → AgentProfile[]: the ONE
  generator for the harness×model sweep. Omit the lists → CODING_HARNESSES ×
  the base model (the 'turn it on for everything' switch). Incompatible
  (harness, model) pairs dropped via harnessSupportsModel. CODING_HARNESSES is
  the single canonical list; consumers import it instead of re-declaring.
- Collapsed identity onto the EXISTING AgentProfileCell: runProfileMatrix now
  builds a (profileId, harness, model) cell per column and stamps it on every
  RunRecord, so results group by the existing groupRunsByAgentProfileCell
  (harness/model aware) — not a new parallel carrier. Harness lives at the run
  layer (AgentSpec) as before; this only carries it into the eval cell.

Additive: RunRecord.agentProfile was already optional; a non-axis profile yields
a harness-free cell (unchanged grouping). Full suite green (2650), typecheck
clean; new tests cover expand + the harness→cell→group round-trip.

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Auto-approved drewstone PR — e94c1525

This PR was opened by the trusted drewstone account.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

tangletools · auto-approval · reason: drewstone_author · 2026-07-01T17:08:50Z

@drewstone drewstone merged commit 2dec35d into main Jul 1, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants