Problem
mdredd plants an empty .git/ in each sandbox to keep host branch name, status, recent commits, real git history, and per-project auto-memory from leaking into variants. This is the right default for fairness — but it makes A/B tests of branch-review-style prompts (e.g. "review my recent changes", "what changed in this branch", "are these commits ready to ship?") effectively unevaluable: every variant sees an empty history regardless of its own quality.
Proposal
An opt-in mode that computes git context once on the host and surfaces the same snapshot to every variant and to the judge, without ever exposing the real .git/:
- Compute on the host before any variant spawns:
git status --porcelain=v1
git diff --stat <base>...HEAD
git log <base>..HEAD (recent commit subjects/bodies)
- Optionally per-file
git diff excerpts (capped, same shape as judge transcript caps)
- Inject the snapshot into the variant via
--append-system-prompt or a planted file inside the sandbox.
- Pass the same snapshot into
buildHarnessConstraints (src/server/judge.ts) so the judge knows exactly what the variant could see.
This stays additive to the existing isolation contract: no real .git/ exposure, no host-config bleed, no auto-memory leakage. The snapshot is the single source of truth and is identical across variants.
Open design questions
- Session-level toggle, similar shape to
userScopeEnabled. Default off.
- What counts as
<base> — main/master autodetect, or a manual override per session?
- Snapshot freshness: re-compute per run (lets staged changes evolve mid-session) or once per session?
- Token budgeting for the diff excerpts.
Why open now (but not do now)
The empty .git/ is intentional, so this is a feature ask, not a sandbox bug. Filing now to capture the design context. Worth picking up when the first concrete branch-review evaluation use case lands.
Where the code lives
- Sandbox:
src/server/sandbox.ts (planted .git/ and the empty-history rationale)
- Spawn args:
src/server/runner.ts buildArgs() (where --append-system-prompt is already used for write mode)
- Judge harness constraints:
src/server/judge.ts buildHarnessConstraints() lines 224-260
Problem
mdredd plants an empty
.git/in each sandbox to keep host branch name, status, recent commits, real git history, and per-project auto-memory from leaking into variants. This is the right default for fairness — but it makes A/B tests of branch-review-style prompts (e.g. "review my recent changes", "what changed in this branch", "are these commits ready to ship?") effectively unevaluable: every variant sees an empty history regardless of its own quality.Proposal
An opt-in mode that computes git context once on the host and surfaces the same snapshot to every variant and to the judge, without ever exposing the real
.git/:git status --porcelain=v1git diff --stat <base>...HEADgit log <base>..HEAD(recent commit subjects/bodies)git diffexcerpts (capped, same shape as judge transcript caps)--append-system-promptor a planted file inside the sandbox.buildHarnessConstraints(src/server/judge.ts) so the judge knows exactly what the variant could see.This stays additive to the existing isolation contract: no real
.git/exposure, no host-config bleed, no auto-memory leakage. The snapshot is the single source of truth and is identical across variants.Open design questions
userScopeEnabled. Default off.<base>—main/masterautodetect, or a manual override per session?Why open now (but not do now)
The empty
.git/is intentional, so this is a feature ask, not a sandbox bug. Filing now to capture the design context. Worth picking up when the first concrete branch-review evaluation use case lands.Where the code lives
src/server/sandbox.ts(planted.git/and the empty-history rationale)src/server/runner.tsbuildArgs()(where--append-system-promptis already used for write mode)src/server/judge.tsbuildHarnessConstraints()lines 224-260