Skip to content

Trusted git-context mode for branch-review evaluations #40

@slaFFik

Description

@slaFFik

Problem

mdredd plants an empty .git/ in each sandbox to keep host branch name, status, recent commits, real git history, and per-project auto-memory from leaking into variants. This is the right default for fairness — but it makes A/B tests of branch-review-style prompts (e.g. "review my recent changes", "what changed in this branch", "are these commits ready to ship?") effectively unevaluable: every variant sees an empty history regardless of its own quality.

Proposal

An opt-in mode that computes git context once on the host and surfaces the same snapshot to every variant and to the judge, without ever exposing the real .git/:

  • Compute on the host before any variant spawns:
    • git status --porcelain=v1
    • git diff --stat <base>...HEAD
    • git log <base>..HEAD (recent commit subjects/bodies)
    • Optionally per-file git diff excerpts (capped, same shape as judge transcript caps)
  • Inject the snapshot into the variant via --append-system-prompt or a planted file inside the sandbox.
  • Pass the same snapshot into buildHarnessConstraints (src/server/judge.ts) so the judge knows exactly what the variant could see.

This stays additive to the existing isolation contract: no real .git/ exposure, no host-config bleed, no auto-memory leakage. The snapshot is the single source of truth and is identical across variants.

Open design questions

  • Session-level toggle, similar shape to userScopeEnabled. Default off.
  • What counts as <base>main/master autodetect, or a manual override per session?
  • Snapshot freshness: re-compute per run (lets staged changes evolve mid-session) or once per session?
  • Token budgeting for the diff excerpts.

Why open now (but not do now)

The empty .git/ is intentional, so this is a feature ask, not a sandbox bug. Filing now to capture the design context. Worth picking up when the first concrete branch-review evaluation use case lands.

Where the code lives

  • Sandbox: src/server/sandbox.ts (planted .git/ and the empty-history rationale)
  • Spawn args: src/server/runner.ts buildArgs() (where --append-system-prompt is already used for write mode)
  • Judge harness constraints: src/server/judge.ts buildHarnessConstraints() lines 224-260

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions