Skip to content

chore(release): agent-eval 0.100.3#295

Merged
drewstone merged 3 commits into
mainfrom
feat/product-benchmark-contract
Jul 1, 2026
Merged

chore(release): agent-eval 0.100.3#295
drewstone merged 3 commits into
mainfrom
feat/product-benchmark-contract

Conversation

@drewstone

Copy link
Copy Markdown
Contributor

Summary

  • bump agent-eval and agent-eval-rpc to 0.100.3
  • publish the already-merged product-benchmark contract in the next tag/release
  • add changelog entry for @tangle-network/agent-eval/product-benchmark

Checks

  • pnpm typecheck
  • pnpm test (256 files, 2633 passed, 2 skipped)
  • python -m pip install -e .[dev] && pytest -q (18 passed)
  • NODE_OPTIONS=--max-old-space-size=8192 pnpm build
  • pnpm run verify:package
  • git merge-tree --write-tree origin/main HEAD
  • npm view @tangle-network/agent-eval@0.100.3 version returned 404 before publish

tangletools
tangletools previously approved these changes Jul 1, 2026

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Auto-approved drewstone PR — 4b2fd5a9

This PR was opened by the trusted drewstone account.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

tangletools · auto-approval · reason: drewstone_author · 2026-07-01T15:09:51Z

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Value Audit — redundant-or-flawed

Verdict redundant-or-flawed
Concerns 2 (2 strong-concern)
Heuristic 0.0s
Duplication 0.0s
Interrogation 244.0s (2 bridge agents)
Total 244.0s

💰 Value — redundant-or-flawed

Bumps package versions to 0.100.3 consistently, but mishandles CHANGELOG.md by replacing the [Unreleased] section instead of moving its entries, causing a merge conflict with origin/main and dropping release notes for changes that will ship in the same release.

  • What it does: Bumps @tangle-network/agent-eval and agent-eval-rpc from 0.100.2 to 0.100.3 in package.json:3, clients/python/pyproject.toml:7, and the PackageNotFoundError fallback in clients/python/src/agent_eval_rpc/__init__.py:61; adds a [0.100.3] entry to CHANGELOG.md:7 for the product-benchmark subpath.
  • Goals it achieves: Publishes the already-merged product-benchmark subpath (package.json:117-120, tsup.config.ts:26) under a new tag and keeps the npm and PyPI package versions locked, correcting the 0.100.2 release where __init__.py was left behind and needed a follow-up fix (1aaa5f6).
  • Assessment: The version bumps are correct and complete (all three version locations are now in sync). However, the CHANGELOG.md edit replaces the [Unreleased] section that existed on origin/main rather than moving those entries into [0.100.3]. As a result, release notes for the eval-fixture/campaign changes (47d2a4e) are lost even though that code is on main and will ship with 0.100.3. Additionally,
  • Better / existing approach: Preserve the existing [Unreleased] entries by moving them under the new [0.100.3] section alongside the product-benchmark note, then resolve the merge with origin/main. This follows the Keep a Changelog convention already used in the file and ensures the release notes cover everything in the tag.
  • Model: opencode/kimi-for-coding/k2p7
  • Bridge attempts: 1

🎯 Usefulness — sound

A clean release-mechanics bump to 0.100.3 that publishes the already-merged product-benchmark subpath; versions are in sync across all three files and the new module is correctly wired into exports, build, and the verify script.

  • Integration: Fully reachable. The product-benchmark subpath being published is wired into package.json exports (line 117), tsup build config (tsup.config.ts:26), and the verify-package-exports guard script (lines 37, 112). All three version literals (package.json:3, pyproject.toml:7, agent_eval_rpc/init.py:61) are synchronized at 0.100.3. Imminent caller per CHANGELOG: product agents consuming @tangle-netw
  • Fit with existing patterns: Follows the established release pattern exactly — identical shape to the 0.100.2 (3ba4252) and 0.100.1 (223b7f8) releases, and the subpath-export + re-export-from-src/index.ts + verify-package-exports triplet is the same pattern used by ./perf, ./multishot, ./campaign, and every other substrate subpath. No competing or duplicate surface.
  • Real-world viability: Version-literal bump plus CHANGELOG entry; no runtime logic changed. The init.py fallback literal at line 61 only fires when package metadata is absent (dev/egg install), which is the intended robustness behavior established by fix #294. No concurrency, error-path, or edge-input surface to evaluate.
  • Model: opencode/zai-coding-plan/glm-5.2
  • Bridge attempts: 1

💰 Value Audit

🔴 Branch has a CHANGELOG.md merge conflict with origin/main [against-grain] ``

git merge-tree --write-tree origin/main HEAD exits 1 with CONFLICT (content): Merge conflict in CHANGELOG.md. The release commit cannot land cleanly as-is. Evidence: merge-tree output shows three CHANGELOG.md versions (base, main, HEAD) and auto-merge fails.

🔴 CHANGELOG replaces [Unreleased] instead of moving its entries into [0.100.3] [against-grain] ``

origin/main has an [Unreleased] section documenting eval-fixture UX, planCampaignRun, dispatchRef, and a manifestHash resumability fix (47d2a4e, CHANGELOG.md:7-22 on main). The release commit removes all of those notes and replaces them with a single product-benchmark line (CHANGELOG.md:7-12 on HEAD). Since the code for those changes is on main and will be in the 0.100.3 tag, the release notes are now incomplete.


What this audit checks

It judges the change on its merits — not whether it was tasked out in an issue. Unticketed, fast-moving work is fine; the question is whether the change is good and whether a better or existing approach should be used instead.

Pass What it asks
Heuristic Vague title? Whitespace-only or cruft-bearing diff? (content signals only)
Duplication Do added function/class names already exist elsewhere in the repo?
Value Audit What does it do? What goal does it achieve? Is it good? Better architecture or already-exists?
Usefulness Audit Does it integrate and fit? Will it hold up in real use and actually get used?

Findings are concerns, not blocks — the human reviewer decides what to do with them.

value-audit · 20260701T151654Z

tangletools
tangletools previously approved these changes Jul 1, 2026

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Auto-approved drewstone PR — 63d7e3e9

This PR was opened by the trusted drewstone account.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

tangletools · auto-approval · reason: drewstone_author · 2026-07-01T15:17:22Z

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Auto-approved drewstone PR — 95a61cb5

This PR was opened by the trusted drewstone account.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

tangletools · auto-approval · reason: drewstone_author · 2026-07-01T15:21:20Z

@drewstone drewstone merged commit 20c7609 into main Jul 1, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants