fix(gastown): distinguish null causes in PR status polling (#3149)#3154
Conversation
Replace PRStatusResult | null return type with discriminated PRStatusOutcome union in checkPRStatus. Each null cause (no token, HTTP error, invalid response, unrecognized URL, host mismatch) now surfaces a structured PRStatusError with actionable failure messages. - resolveGitHubToken returns GitHubTokenResolution with resolution chain - no_token and non-transient HTTP errors (401/403/404) fail immediately - invalid_response/unrecognized_url/host_mismatch fail after 3 strikes - Transient HTTP errors (5xx/429) keep existing 10-strike behavior - poll_null_count resets to 0 on successful poll at both call sites - failureKind persisted to bead metadata for analytics - AE event pr.poll_failed emitted on terminal failure - Unit tests for checkPRStatus, resolveGitHubToken, failureMessageFor, and threshold logic - Integration test for no_token immediate-fail path
| switch (error.kind) { | ||
| case 'no_token': | ||
| return ( | ||
| `No GitHub token resolved for this town. Tried (in order): ` + |
There was a problem hiding this comment.
WARNING: The no_token failure message hardcodes "GitHub token" but this function is also called for GitLab no_token errors (when error.provider === 'gitlab'). A user polling a GitLab MR with no gitlab_token configured would see a misleading "No GitHub token resolved for this town" message.
| `No GitHub token resolved for this town. Tried (in order): ` + | |
| `No ${error.provider === 'gitlab' ? 'GitLab' : 'GitHub'} token resolved for this town. Tried (in order): ` + |
| err | ||
| ); | ||
| } | ||
| } else if (!integrationId) { |
There was a problem hiding this comment.
WARNING: Missing tried entry when integrationId is set but GIT_TOKEN_SERVICE binding is absent. The condition else if (!integrationId) only fires when no integration ID is configured at all. When an integrationId exists but ctx.env.GIT_TOKEN_SERVICE is falsy, sourceLabel is never added to tried, so the resolution chain shown in the no_token error message silently omits a source that was nominally available. Add a third branch:
} else if (integrationId && !ctx.env.GIT_TOKEN_SERVICE) {
tried.push(`${sourceLabel} (GIT_TOKEN_SERVICE binding missing)`);
} else if (!integrationId) {
tried.push('platform integration (none configured)');
}
| ), | ||
| ]; | ||
| const nullCount = Number(rows[0]?.null_count ?? 0); | ||
| if (nullCount >= PR_POLL_NON_TRANSIENT_THRESHOLD) { |
There was a problem hiding this comment.
WARNING: Both the transient (10-strike) path and this non-transient (3-strike) path share the same $.poll_null_count counter. This means 2 transient errors followed by 1 non-transient error will produce a poll_null_count of 3, which trips this threshold even though only 1 non-transient error has occurred. The word "consecutive" in the log message (line 1383) is also inaccurate — the counter is never reset between error-kind switches.
Consider using a separate counter key (e.g. $.poll_non_transient_count) for the non-transient path so the two thresholds are independently tracked, or document this shared-counter behavior explicitly if it is intentional.
Code Review SummaryStatus: 3 Issues Found | Recommendation: Address before merge Overview
Issue Details (click to expand)WARNING
Other Observations (not in diff)No additional issues in unchanged code. Files Reviewed (4 files)
Fix these issues in Kilo Cloud Reviewed by claude-4.6-sonnet-20260217 · 1,483,161 tokens |
|
Superseded by #3160. Closing as duplicate from convoy retry loop. |
Summary
Replaces the
PRStatusResult | nullreturn type fromcheckPRStatuswith a discriminatedPRStatusOutcomeunion, so each null cause surfaces a structuredPRStatusErrorwith an actionable failure message. Fixes #3149.Key changes:
resolveGitHubTokennow returnsGitHubTokenResolution({ ok: true, token, source }|{ ok: false, tried }) instead ofstring | null, capturing the full resolution chain for the no-token error message.checkPRStatusnow returnsPRStatusOutcome({ ok: true, result }|{ ok: false, error }) instead ofPRStatusResult | null, with five discriminated error kinds:no_token,http_error,invalid_response,unrecognized_url,host_mismatch.no_tokenand non-transient HTTP errors (401/403/404) fail immediately (1 strike);invalid_response/unrecognized_url/host_mismatchfail after 3 consecutive strikes; transient HTTP errors (5xx/429) keep the existing 10-strike behavior.poll_null_countresets to 0 on successful poll at both call sites.failureKindpersisted to bead metadata for analytics; AE eventpr.poll_failedemitted on terminal failure.ApplyActionContext.checkPRStatustype,Town.do.tswrapper,refresh-git-token.handler.ts, and the three otherresolveGitHubTokencallers intown-scm.ts.Verification
checkPRStatuscover all five error kinds, transient/non-transient HTTP status discrimination,sampleKeyscapture, and host mismatch.resolveGitHubTokencover the resolution chain with and without tokens.failureMessageFor,shouldFailImmediately,shouldCountAsTransientcover all error kinds and threshold decisions.no_tokenimmediate-fail path through the DO alarm cycle.pnpm --filter cloudflare-gastown typecheckpasses cleanly.Visual Changes
N/A
Reviewer Notes
The biggest change is in
actions.ts— thepoll_prhandler's error branch was rewritten from a single null-counting block to three discriminated paths (immediate-fail, transient threshold, non-transient threshold). ThewriteEventcalls usectx.emitEventinstead ofwriteEventdirectly becauseApplyActionContextdoesn't exposeenv. The rig_id for the AE event is looked up from the bead's SQL row sincepoll_practions don't carry it.