Skip to content

feat(runtime): reasoning-aware routerChatWithUsage#434

Merged
drewstone merged 1 commit into
mainfrom
feat/reasoning-aware-router-client
Jul 2, 2026
Merged

feat(runtime): reasoning-aware routerChatWithUsage#434
drewstone merged 1 commit into
mainfrom
feat/reasoning-aware-router-client

Conversation

@drewstone

Copy link
Copy Markdown
Contributor

What

Makes routerChatWithUsage safe for thinking models across providers:

  • opts.reasoningEffort?: 'none'|'low'|'medium'|'high' → forwarded as reasoning_effort. 'none' is the load-bearing value: binary/single-token decisions (routing, gating) on thinking models otherwise spend the entire token budget inside the think block — on CPU-local backends that turns into a client timeout, not just waste.
  • RouterChatResult.reasoning?: string + always-clean content: normalizes the two provider shapes (OpenRouter/DeepSeek-style separate reasoning/reasoning_content field; Groq/local-style inline <think> block). An unclosed think block (budget exhausted mid-thought) yields empty content — honest, since no final answer was emitted.

Why (twice-hit, per the lab's lift rule)

Two real experiment casualties in agent-lab:

  • R230 (local routing): 20/20 audit cells died as fetch failed — the model thought past the client timeout because the thinking-off flag had no path through the body.
  • R231 (provider invariance): the same qwen3-32b weights scored "10/20 on Groq, 20/20 on OpenRouter" — a pure serialization artifact; single-token parsers were reading Groq's inlined reasoning prose. Raw-content autopsy confirmed the model's reasoning was correct on both providers.

Compatibility

Additive only. No call-site changes; non-thinking responses byte-identical; reasoning_effort omitted from the body unless explicitly set. 9 new tests; runtime suite green; typecheck green.

🤖 Generated with Claude Code

Two gaps, each hit twice by real experiment runs in agent-lab (R230 local
routing, R231 provider-invariance grid):

1. reasoning controls were droppable: the request body forwarded only
   model/messages/temperature/max_tokens, so a thinking model on a binary
   decision burned its whole budget inside the think block; on a slow
   (CPU-local) backend that becomes a client timeout, and 20/20 audit
   cells died as "fetch failed". opts.reasoningEffort now forwards as
   reasoning_effort ('none' is the load-bearing value for routing/gating).

2. reasoning and content were conflated: OpenRouter returns reasoning in
   a separate field with clean content; Groq inlines a <think> block into
   content. Downstream single-token parsers read the reasoning prose
   (which quotes both option tokens) and misread decisions, making the
   SAME weights look broken on one provider and fine on another (R231:
   groq qwen3-32b "10/20" vs openrouter "20/20", both actually correct).
   parseChatResult now splits both shapes into RouterChatResult.reasoning
   and always-clean content; an unclosed think block (budget exhausted
   mid-thought) yields empty content, which is honest: no answer was
   emitted.

Additive: no call-site changes required; non-thinking responses are
byte-identical. 9 new tests cover effort forwarding, both provider
shapes, reasoning_content (DeepSeek/Kimi), unclosed think, and the
unchanged non-thinking path.

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Auto-approved drewstone PR — a178dde1

This PR was opened by the trusted drewstone account.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

tangletools · auto-approval · reason: drewstone_author · 2026-07-02T02:38:26Z

@drewstone drewstone merged commit a08877a into main Jul 2, 2026
1 check failed
@drewstone drewstone deleted the feat/reasoning-aware-router-client branch July 2, 2026 02:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants