feat(campaign): harness×model as an eval axis, collapsed onto AgentProfileCell by drewstone · Pull Request #297 · tangle-network/agent-eval

drewstone · 2026-07-01T17:08:42Z

Why

The harness×model matrix was hand-rolled (and drifted) across tax/legal/gtm because agent-eval offered no axis: runProfileMatrix ate bare AgentProfile[] and grouped by profileId only, so harness identity got smuggled through metadata differently in each app (that drift silently broke legal's harness pivot).

What

expandProfileAxes({base, harnesses?, models?}) → AgentProfile[] — the ONE generator for the harness×model sweep. Omit the lists → CODING_HARNESSES × the base model (the 'turn it on for everything' switch). Incompatible (harness, model) pairs dropped via harnessSupportsModel. CODING_HARNESSES is the single canonical list — consumers import it instead of re-declaring.
Collapsed identity onto the EXISTING AgentProfileCell: runProfileMatrix now builds a (profileId, harness, model) cell per column and stamps it on every RunRecord, so results group via the existing groupRunsByAgentProfileCell (harness/model aware) — not a new parallel carrier.

Harness stays at the run layer (AgentSpec); this only carries it into the eval cell. harnessAxisOf reads the generator's stamp; nothing hand-recomputes a key.

Safety

Purely additive (new exports; RunRecord.agentProfile was already optional) → no forced fleet bump; consumers adopt opt-in. A non-axis profile yields a harness-free cell — unchanged grouping.

Proof

Full suite 2650 green, typecheck clean, new tests cover expandProfileAxes (14) + the harness→cell→groupRunsByAgentProfileCell round-trip through the matrix.

…ofileCell The harness×model matrix was hand-rolled (and drifted) across products because agent-eval offered no axis: runProfileMatrix ate bare AgentProfile[] and grouped by profileId only, so harness identity got smuggled through metadata differently in each app. - expandProfileAxes({base, harnesses?, models?}) → AgentProfile[]: the ONE generator for the harness×model sweep. Omit the lists → CODING_HARNESSES × the base model (the 'turn it on for everything' switch). Incompatible (harness, model) pairs dropped via harnessSupportsModel. CODING_HARNESSES is the single canonical list; consumers import it instead of re-declaring. - Collapsed identity onto the EXISTING AgentProfileCell: runProfileMatrix now builds a (profileId, harness, model) cell per column and stamps it on every RunRecord, so results group by the existing groupRunsByAgentProfileCell (harness/model aware) — not a new parallel carrier. Harness lives at the run layer (AgentSpec) as before; this only carries it into the eval cell. Additive: RunRecord.agentProfile was already optional; a non-axis profile yields a harness-free cell (unchanged grouping). Full suite green (2650), typecheck clean; new tests cover expand + the harness→cell→group round-trip.

tangletools

✅ Auto-approved drewstone PR — `e94c1525`

This PR was opened by the trusted drewstone account.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

_{tangletools · auto-approval · reason: drewstone_author · 2026-07-01T17:08:50Z}

drewstone added 2 commits July 1, 2026 10:45

chore(release): agent-eval 0.101.0

e94c152

tangletools approved these changes Jul 1, 2026

View reviewed changes

drewstone merged commit 2dec35d into main Jul 1, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(campaign): harness×model as an eval axis, collapsed onto AgentProfileCell#297

feat(campaign): harness×model as an eval axis, collapsed onto AgentProfileCell#297
drewstone merged 2 commits into
mainfrom
feat/harness-model-axis

drewstone commented Jul 1, 2026

Uh oh!

tangletools left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

drewstone commented Jul 1, 2026

Why

What

Safety

Proof

Uh oh!

tangletools left a comment

Choose a reason for hiding this comment

✅ Auto-approved drewstone PR — e94c1525

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

✅ Auto-approved drewstone PR — `e94c1525`