Skip to content

fgmacedo/buchecha

Repository files navigation

buchecha

Stop watching a single agent drift through a long spec. bcc splits the job across four roles that plan, brief, execute, and review under audit.

bcc TUI running a spec

Status: early development, not stable.

The problem

Driving a single agent through a long Markdown spec with claude -p or codex breaks in predictable ways:

  • Focus drifts halfway through. The agent forgets phase boundaries and acceptance criteria.
  • It declares done early. One success message ends the session before the spec is actually covered.
  • Scope creep. A small fix sprawls into adjacent files the spec never mentioned.
  • Retries redo approved work. There is no granularity below "rerun the whole thing".
  • No audit trail. When something goes wrong, you have a transcript, not a record of decisions.

How bcc solves it

bcc replaces the single-loop pattern with four cognitive roles, each with its own context, coordinated through an in-process MCP server. The Planner produces a typed DAG of phases and tasks. The Briefer picks the next sub-DAG. The Executor edits the working tree for that scope only. The Reviewer audits per-task outcomes and decides what advances, retries, or escalates. Every role call is logged.

flowchart LR
    S[Markdown spec] --> P[Planner]
    P -->|plan_emit| M[(MCP server)]
    M --> B[Briefer]
    B -->|briefing_emit| M
    M --> E[Executor]
    E -->|task_progress<br/>iteration_finished| M
    M --> R[Reviewer]
    R -->|task_approved<br/>task_needs_fix| M
    M -.audit log.-> J[".bcc/sessions/&lt;id&gt;/mcp-log"]
Loading

Demo: Web UI

bcc Web UI live timeline

Quickstart in 60 seconds

Prerequisites: the bcc binary on $PATH (see Install) and an agent CLI (claude, codex, or gemini) reachable on $PATH.

# 1. scaffold .bcc.toml in the current project
# (optional if you want to override any settings)
bcc init

# 2. write a minimal spec
cat > spec.md <<'EOF'
# Add a /healthz endpoint

The HTTP server must expose `GET /healthz` returning `{"ok": true}`.

## Acceptance
- New route registered in the router.
- A test covers the 200 response with the expected body.
EOF

# 3. run it (live TUI)
bcc run spec.md

# 4. inspect what happened
bcc sessions list
bcc sessions show <id>

Use cases

  • Long refactors guided by a spec. Break a multi-file migration into phases the Planner sequences, retry only the tasks the Reviewer rejects.
  • Retroactive test coverage. Hand bcc a spec describing the surface to cover, let the Executor add cases per task while the Reviewer enforces the contract.
  • Multi-phase architectural changes. Schema migrations, port-and-adapter extractions, transport swaps. Phases that depend on each other land in order with verifiable boundaries.
  • Large feature prototyping. Describe the target state once, let the loop converge instead of hand-holding a single agent through 20 prompts.

Features

  • Four-role pipeline (Planner, Briefer, Executor, Reviewer) coordinated through an in-process MCP server with per-connection authorization.
  • Flexible input: a Markdown spec, a free-form --prompt/-p directive, or both composed together. The spec argument is optional when --prompt is set.
  • Typed, executable DAG with cycle validation; retries target only needs_fix tasks, never redo approved work.
  • Multi-vendor agnostic: adapters for claude, codex, and gemini; --planner and --provider pin roles to one vendor.
  • Multiple observability surfaces in a single run: live TUI (default), embedded Web UI (--webui), HTTP API + OpenAPI (--api), headless text or JSON (--output).
  • Persistent, resumable sessions under .bcc/sessions/<id>/ with manifest, plan, DAG, briefings, MCP log, and event stream; --resume[=<id>] retakes a run.
  • Target-project agnostic: embedded prompts stay neutral, the Executor picks the toolchain (build, lint, test) from the project under work.
  • Interactive escalation when the Reviewer rejects or the retry budget runs out: resume_with_hint, force_approve, skip, or abort. Every choice lands in the MCP log.
  • Full journal and audit: every MCP method call (role, input, result) is recorded for regression and debugging.

Why not just run claude -p directly?

Challenge Single-agent shell wrapper buchecha
Focus across a long spec One context window, drifts past phase 2 Per-role contexts, Planner DAG enforces phase boundaries
Premature done Agent decides when it is finished Reviewer is the gate, verdict per task
Retry granularity Rerun the whole spec Retry only tasks the Reviewer marked needs_fix
Auditability Terminal transcript Per-call MCP log, persisted session, event stream
Observability stdout TUI, Web UI, HTTP API, JSON output, all in one run
Vendor lock Tied to one CLI claude, codex, and gemini adapters, pinned per role

Output and observability

# Live TUI dashboard (default)
bcc run spec.md

# Free-form directive instead of a spec file
bcc run -p "add a /healthz endpoint returning {\"ok\": true} with a test"

# Compose spec and prompt: the directive layers on top of the spec
bcc run -p "focus only on phase 2 for now" spec.md

# Headless text: progress to stderr, result to stdout
bcc run --output text spec.md

# Structured result, scriptable
bcc run --output json spec.md

# Embedded Web UI: SSE timeline, DAG view, MCP log
bcc run --webui spec.md      # serve only
bcc run -W spec.md           # serve and open the browser

# HTTP API + OpenAPI document at /api/v1/openapi.json
bcc run --api spec.md

--webui and --api share one listener. The Web UI implies the API. The dashboard is read-only in V1; mutations go through the TUI or the MCP log.

Configuration

bcc init writes a .bcc.toml. Key fields:

Field Purpose
project.language Localized vocabulary (en, pt-BR). The wire protocol stays canonical English.
providers.<name> Vendor adapter config (binary path, flags, sandbox mode).
roles.planner / roles.briefer / roles.executor / roles.reviewer Per-role capability menus and provider selection.
journal.store Where to keep session artifacts (defaults to .bcc/sessions/).
loop.max_iterations Per-task retry budget before the loop escalates.
git.branch_prefix Prefix applied when the loop creates a working branch.

Pin a vendor at the CLI level with --provider claude (all roles) or --planner codex (one role). See docs/guides/director.md (pt-BR) for the full reference: sessions, MCP method surface, escalation flows, troubleshooting.

Install

# macOS or Linux Homebrew
brew install fgmacedo/tap/buchecha

# Linux: detect OS/arch, verify checksum, install to /usr/local/bin
curl -sSL https://raw.githubusercontent.com/fgmacedo/buchecha/main/scripts/install.sh | sh

# Debian/Ubuntu: download the .deb from the latest release and install
curl -LO https://github.com/fgmacedo/buchecha/releases/latest/download/bcc_<version>_linux_amd64.deb
sudo apt install ./bcc_<version>_linux_amd64.deb

# Any OS with a Go toolchain
go install github.com/fgmacedo/buchecha/cmd/bcc@latest

When installed via tar.gz or install.sh on macOS, Gatekeeper may quarantine the binary on first run. Either install via brew (which avoids quarantine), or clear the attribute manually: xattr -d com.apple.quarantine $(command -v bcc).

Deeper documentation

Contributing

Prerequisites are pinned in .mise.toml. Install mise and let it manage them, or match versions yourself.

mise install            # Go 1.25.x and Node 22.x at pinned versions
git clone https://github.com/fgmacedo/buchecha.git
cd buchecha
go mod download
make build              # produces ./bcc with the embedded SPA bundle
./bcc --help

Inner loop:

make test               # unit suite
make test-race          # race detector; required before commits touching concurrent code
make vet                # go vet ./...
make lint               # golangci-lint
make fmt                # gofmt -w .

Pre-commit hooks run the same gates locally. Install once:

make precommit-install

See CLAUDE.md and AGENTS.md for layer rules, naming, error conventions, and the testing model. Read those before editing any package.

Roadmap

The current architecture is captured in docs/specs/director/index.md. Open enhancements live as GitHub issues tagged enhancement.

License

MIT. See LICENSE.

About

Turns a spec file into an autonomous, DAG-driven coding loop, with a Director that plans, briefs, and reviews each task.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors