feat(runtime): reasoning-aware routerChatWithUsage#434
Merged
Conversation
Two gaps, each hit twice by real experiment runs in agent-lab (R230 local
routing, R231 provider-invariance grid):
1. reasoning controls were droppable: the request body forwarded only
model/messages/temperature/max_tokens, so a thinking model on a binary
decision burned its whole budget inside the think block; on a slow
(CPU-local) backend that becomes a client timeout, and 20/20 audit
cells died as "fetch failed". opts.reasoningEffort now forwards as
reasoning_effort ('none' is the load-bearing value for routing/gating).
2. reasoning and content were conflated: OpenRouter returns reasoning in
a separate field with clean content; Groq inlines a <think> block into
content. Downstream single-token parsers read the reasoning prose
(which quotes both option tokens) and misread decisions, making the
SAME weights look broken on one provider and fine on another (R231:
groq qwen3-32b "10/20" vs openrouter "20/20", both actually correct).
parseChatResult now splits both shapes into RouterChatResult.reasoning
and always-clean content; an unclosed think block (budget exhausted
mid-thought) yields empty content, which is honest: no answer was
emitted.
Additive: no call-site changes required; non-thinking responses are
byte-identical. 9 new tests cover effort forwarding, both provider
shapes, reasoning_content (DeepSeek/Kimi), unclosed think, and the
unchanged non-thinking path.
tangletools
approved these changes
Jul 2, 2026
tangletools
left a comment
Contributor
There was a problem hiding this comment.
✅ Auto-approved drewstone PR — a178dde1
This PR was opened by the trusted drewstone account.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.
tangletools · auto-approval · reason: drewstone_author · 2026-07-02T02:38:26Z
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Makes
routerChatWithUsagesafe for thinking models across providers:opts.reasoningEffort?: 'none'|'low'|'medium'|'high'→ forwarded asreasoning_effort.'none'is the load-bearing value: binary/single-token decisions (routing, gating) on thinking models otherwise spend the entire token budget inside the think block — on CPU-local backends that turns into a client timeout, not just waste.RouterChatResult.reasoning?: string+ always-cleancontent: normalizes the two provider shapes (OpenRouter/DeepSeek-style separatereasoning/reasoning_contentfield; Groq/local-style inline<think>block). An unclosed think block (budget exhausted mid-thought) yields empty content — honest, since no final answer was emitted.Why (twice-hit, per the lab's lift rule)
Two real experiment casualties in agent-lab:
fetch failed— the model thought past the client timeout because the thinking-off flag had no path through the body.Compatibility
Additive only. No call-site changes; non-thinking responses byte-identical;
reasoning_effortomitted from the body unless explicitly set. 9 new tests; runtime suite green; typecheck green.🤖 Generated with Claude Code