Skip to content

docs(examples): DX pass — fix the audit's worst offenders#428

Merged
drewstone merged 1 commit into
mainfrom
chore/examples-dx-pass
Jul 1, 2026
Merged

docs(examples): DX pass — fix the audit's worst offenders#428
drewstone merged 1 commit into
mainfrom
chore/examples-dx-pass

Conversation

@drewstone

Copy link
Copy Markdown
Contributor

From a 24-example DX audit (parallel read-only reviewers scored docs / code-simplicity / DX, flagged issues + a recommendation each). Result: 4 exemplary, 17 good, 3 needs-work, 0 broken, 0 redundant — the example set is healthy. This PR lands the highest-leverage, lowest-risk fixes: documentation truth.

  • self-improving-coderadds the missing README (the only 1 of 28 example dirs without one, so it was absent from the catalog). Documents the honest-null behavior (the bundled task is saturated → the gate correctly returns no promotion), how to see a real lift (swap the harder env / SWE-bench), and the $0 CALIBRATE=1 check.
  • researcher-loopfixes two false claims: the README said agent-knowledge "ships in node_modules already" (it does not — it's an optional peer you install; the example is excluded from CI typecheck), and it wrongly called itself "the primary, smallest runLoop example" (that's driver-loop).
  • examples/README catalog — adds the 3 missing entries: self-improving-coder (17b), intelligence-webcode (20b), webcode-matrix (9c).

Backlog (code simplifications — next, if you want them)

Example Fix Effort
intelligence-drop-in prove the OFF $0-clamp by reading the span back (headline currently asserted, not shown) small
supervisor-loop collapse ~80%-duplicate run-bridge/run-sandbox into one BACKEND knob (matches its own thesis) medium
strategy-suite authored doubleCheck duplicates built-in refine — make it show a policy the built-ins can't small
self-improving-loop the n=3 "small-n mirage" warning is repeated 4× small

Docs-only; no code touched.

…ADME, false claims, catalog gaps)

From a 24-example DX audit (4 exemplary, 17 good, 3 needs-work, 0 broken/redundant). This lands the
highest-leverage, lowest-risk fixes — documentation truth:

- self-improving-coder: ADD the missing README (the only one of 28 example dirs without one, so it was
  invisible in the catalog). Documents the honest-null behavior (bundled task is saturated → gate correctly
  returns no-promotion) + how to see a real lift (swap the harder env / SWE-bench) + the $0 CALIBRATE mode.
- researcher-loop: FIX two false claims — the README said agent-knowledge 'ships in node_modules already'
  (it does NOT — it's an optional peer you install; the example is excluded from CI typecheck), and it
  wrongly called itself 'the primary, smallest runLoop example' (that's driver-loop). Both corrected.
- examples/README catalog: add the 3 missing entries — self-improving-coder (17b), intelligence-webcode
  (20b), webcode-matrix (9c).

Backlog (code simplifications, next): intelligence-drop-in (prove the OFF $0-clamp by reading the span
back), supervisor-loop (collapse the ~80%-duplicate run-bridge/run-sandbox into one BACKEND knob),
strategy-suite (authored strategy duplicates built-in refine), self-improving-loop (n=3 warning ×4).

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Auto-approved drewstone PR — e365da29

This PR was opened by the trusted drewstone account.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

tangletools · auto-approval · reason: drewstone_author · 2026-07-01T02:29:01Z

@drewstone drewstone merged commit 1f6e1e7 into main Jul 1, 2026
1 check passed
drewstone added a commit that referenced this pull request Jul 1, 2026
…oven, simplified (#430)

Clears the remaining backlog from the examples DX audit (#428). Each fix verified (tc 0, lint clean;
the offline-runnable ones run end-to-end $0).

- intelligence-drop-in: the headline (OFF tier = $0 intelligence spend) was ASSERTED, never shown. Now
  the example records inference-only at OFF, flushes, and reads the exported span BACK off the collector to
  PROVE intelligence_usd=0 (throws otherwise). Dropped the confusing double-client ping ceremony; README
  rewritten to match (glosses Mode 0 / inference-vs-intelligence, adds a 'Going live' section). Runs $0.
- strategy-suite: the authored doubleCheck was a near-clone of built-in refine. Rewrote it as a policy the
  built-ins DON'T have — require TWO consecutive passes before done (a flake/luck guard) — the real point
  of defineStrategy. Fixed the README (it listed 4 built-ins but the run compares 2). Runs $0.
- agents-of-all-shapes: shipToTangleOtlp hand-rolled the OTLP resourceSpans envelope — the exact wire
  createOtelExporter produces (the primitive the example itself teaches). Routed it through
  createOtelExporter (~45 lines → ~25). Aligned npx→pnpm. Runs $0.
- self-improving-loop: the n=3 'small-n mirage' warning was restated 4×; trimmed to one authoritative
  block (the ⚠️ at the gate) + brief pointers.
- agentic-data-creation: softened an unverifiable future-dated citation (arXiv 2606.25996 / 'Meta FAIR' /
  'the paper's Table 1' with specific figures) stated as empirical fact → the real technique name
  (self-instruct, Wang et al. 2022) + 'illustrative target' framing for the by-construction numbers.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants