docs(examples): DX pass — fix the audit's worst offenders by drewstone · Pull Request #428 · tangle-network/agent-runtime

drewstone · 2026-07-01T02:28:54Z

From a 24-example DX audit (parallel read-only reviewers scored docs / code-simplicity / DX, flagged issues + a recommendation each). Result: 4 exemplary, 17 good, 3 needs-work, 0 broken, 0 redundant — the example set is healthy. This PR lands the highest-leverage, lowest-risk fixes: documentation truth.

self-improving-coder — adds the missing README (the only 1 of 28 example dirs without one, so it was absent from the catalog). Documents the honest-null behavior (the bundled task is saturated → the gate correctly returns no promotion), how to see a real lift (swap the harder env / SWE-bench), and the $0 CALIBRATE=1 check.
researcher-loop — fixes two false claims: the README said agent-knowledge "ships in node_modules already" (it does not — it's an optional peer you install; the example is excluded from CI typecheck), and it wrongly called itself "the primary, smallest runLoop example" (that's driver-loop).
examples/README catalog — adds the 3 missing entries: self-improving-coder (17b), intelligence-webcode (20b), webcode-matrix (9c).

Backlog (code simplifications — next, if you want them)

Example	Fix	Effort
intelligence-drop-in	prove the OFF `$0`-clamp by reading the span back (headline currently asserted, not shown)	small
supervisor-loop	collapse ~80%-duplicate `run-bridge`/`run-sandbox` into one `BACKEND` knob (matches its own thesis)	medium
strategy-suite	authored `doubleCheck` duplicates built-in `refine` — make it show a policy the built-ins can't	small
self-improving-loop	the n=3 "small-n mirage" warning is repeated 4×	small

Docs-only; no code touched.

…ADME, false claims, catalog gaps) From a 24-example DX audit (4 exemplary, 17 good, 3 needs-work, 0 broken/redundant). This lands the highest-leverage, lowest-risk fixes — documentation truth: - self-improving-coder: ADD the missing README (the only one of 28 example dirs without one, so it was invisible in the catalog). Documents the honest-null behavior (bundled task is saturated → gate correctly returns no-promotion) + how to see a real lift (swap the harder env / SWE-bench) + the $0 CALIBRATE mode. - researcher-loop: FIX two false claims — the README said agent-knowledge 'ships in node_modules already' (it does NOT — it's an optional peer you install; the example is excluded from CI typecheck), and it wrongly called itself 'the primary, smallest runLoop example' (that's driver-loop). Both corrected. - examples/README catalog: add the 3 missing entries — self-improving-coder (17b), intelligence-webcode (20b), webcode-matrix (9c). Backlog (code simplifications, next): intelligence-drop-in (prove the OFF $0-clamp by reading the span back), supervisor-loop (collapse the ~80%-duplicate run-bridge/run-sandbox into one BACKEND knob), strategy-suite (authored strategy duplicates built-in refine), self-improving-loop (n=3 warning ×4).

tangletools

✅ Auto-approved drewstone PR — `e365da29`

This PR was opened by the trusted drewstone account.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

_{tangletools · auto-approval · reason: drewstone_author · 2026-07-01T02:29:01Z}

…oven, simplified (#430) Clears the remaining backlog from the examples DX audit (#428). Each fix verified (tc 0, lint clean; the offline-runnable ones run end-to-end $0). - intelligence-drop-in: the headline (OFF tier = $0 intelligence spend) was ASSERTED, never shown. Now the example records inference-only at OFF, flushes, and reads the exported span BACK off the collector to PROVE intelligence_usd=0 (throws otherwise). Dropped the confusing double-client ping ceremony; README rewritten to match (glosses Mode 0 / inference-vs-intelligence, adds a 'Going live' section). Runs $0. - strategy-suite: the authored doubleCheck was a near-clone of built-in refine. Rewrote it as a policy the built-ins DON'T have — require TWO consecutive passes before done (a flake/luck guard) — the real point of defineStrategy. Fixed the README (it listed 4 built-ins but the run compares 2). Runs $0. - agents-of-all-shapes: shipToTangleOtlp hand-rolled the OTLP resourceSpans envelope — the exact wire createOtelExporter produces (the primitive the example itself teaches). Routed it through createOtelExporter (~45 lines → ~25). Aligned npx→pnpm. Runs $0. - self-improving-loop: the n=3 'small-n mirage' warning was restated 4×; trimmed to one authoritative block (the ⚠️ at the gate) + brief pointers. - agentic-data-creation: softened an unverifiable future-dated citation (arXiv 2606.25996 / 'Meta FAIR' / 'the paper's Table 1' with specific figures) stated as empirical fact → the real technique name (self-instruct, Wang et al. 2022) + 'illustrative target' framing for the by-construction numbers.

tangletools approved these changes Jul 1, 2026

View reviewed changes

drewstone merged commit 1f6e1e7 into main Jul 1, 2026
1 check passed

This was referenced Jul 1, 2026

refactor(examples): collapse supervisor-loop's duplicate runners into one WORKER_BACKEND knob #429

Merged

docs(examples): finish the DX-audit backlog — 5 examples polished, proven, simplified #430

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs(examples): DX pass — fix the audit's worst offenders#428

docs(examples): DX pass — fix the audit's worst offenders#428
drewstone merged 1 commit into
mainfrom
chore/examples-dx-pass

drewstone commented Jul 1, 2026

Uh oh!

tangletools left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

drewstone commented Jul 1, 2026

Backlog (code simplifications — next, if you want them)

Uh oh!

tangletools left a comment

Choose a reason for hiding this comment

✅ Auto-approved drewstone PR — e365da29

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

✅ Auto-approved drewstone PR — `e365da29`