feat(campaign): harness×model as an eval axis, collapsed onto AgentProfileCell#297
Merged
Conversation
…ofileCell
The harness×model matrix was hand-rolled (and drifted) across products because
agent-eval offered no axis: runProfileMatrix ate bare AgentProfile[] and grouped
by profileId only, so harness identity got smuggled through metadata differently
in each app.
- expandProfileAxes({base, harnesses?, models?}) → AgentProfile[]: the ONE
generator for the harness×model sweep. Omit the lists → CODING_HARNESSES ×
the base model (the 'turn it on for everything' switch). Incompatible
(harness, model) pairs dropped via harnessSupportsModel. CODING_HARNESSES is
the single canonical list; consumers import it instead of re-declaring.
- Collapsed identity onto the EXISTING AgentProfileCell: runProfileMatrix now
builds a (profileId, harness, model) cell per column and stamps it on every
RunRecord, so results group by the existing groupRunsByAgentProfileCell
(harness/model aware) — not a new parallel carrier. Harness lives at the run
layer (AgentSpec) as before; this only carries it into the eval cell.
Additive: RunRecord.agentProfile was already optional; a non-axis profile yields
a harness-free cell (unchanged grouping). Full suite green (2650), typecheck
clean; new tests cover expand + the harness→cell→group round-trip.
tangletools
approved these changes
Jul 1, 2026
tangletools
left a comment
Contributor
There was a problem hiding this comment.
✅ Auto-approved drewstone PR — e94c1525
This PR was opened by the trusted drewstone account.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.
tangletools · auto-approval · reason: drewstone_author · 2026-07-01T17:08:50Z
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
The harness×model matrix was hand-rolled (and drifted) across tax/legal/gtm because agent-eval offered no axis:
runProfileMatrixate bareAgentProfile[]and grouped byprofileIdonly, so harness identity got smuggled throughmetadatadifferently in each app (that drift silently broke legal's harness pivot).What
expandProfileAxes({base, harnesses?, models?})→AgentProfile[]— the ONE generator for the harness×model sweep. Omit the lists →CODING_HARNESSES× the base model (the 'turn it on for everything' switch). Incompatible(harness, model)pairs dropped viaharnessSupportsModel.CODING_HARNESSESis the single canonical list — consumers import it instead of re-declaring.AgentProfileCell:runProfileMatrixnow builds a(profileId, harness, model)cell per column and stamps it on everyRunRecord, so results group via the existinggroupRunsByAgentProfileCell(harness/model aware) — not a new parallel carrier.Harness stays at the run layer (
AgentSpec); this only carries it into the eval cell.harnessAxisOfreads the generator's stamp; nothing hand-recomputes a key.Safety
Purely additive (new exports;
RunRecord.agentProfilewas already optional) → no forced fleet bump; consumers adopt opt-in. A non-axis profile yields a harness-free cell — unchanged grouping.Proof
Full suite 2650 green, typecheck clean, new tests cover
expandProfileAxes(14) + the harness→cell→groupRunsByAgentProfileCellround-trip through the matrix.