Skip to content

fix(gastown): distinguish null causes in PR status polling (#3149)#3156

Closed
jrf0110 wants to merge 1 commit into
gastown-stagingfrom
convoy/gastown-observability-3149-fixes-re-slin/acf7c452/head
Closed

fix(gastown): distinguish null causes in PR status polling (#3149)#3156
jrf0110 wants to merge 1 commit into
gastown-stagingfrom
convoy/gastown-observability-3149-fixes-re-slin/acf7c452/head

Conversation

@jrf0110
Copy link
Copy Markdown
Contributor

@jrf0110 jrf0110 commented May 9, 2026

Summary

Replaces the null return from checkPRStatus with a discriminated PRStatusOutcome union ({ ok: true, result } | { ok: false, error }), and similarly updates resolveGitHubToken to return GitHubTokenResolution that captures the resolution chain. This ensures each distinct failure mode (no token, HTTP errors, invalid response, unrecognized URL, host mismatch) surfaces a specific, actionable failure message instead of a generic "null" after 10 polls.

Key changes:

  • checkPRStatus returns PRStatusOutcome with structured PRStatusError variants (no_token, http_error, invalid_response, unrecognized_url, host_mismatch)
  • resolveGitHubToken returns GitHubTokenResolution tracking which sources were tried
  • Three-bucket failure strategy: no_token/non-transient HTTP fail immediately (1 strike), invalid_response/unrecognized_url/host_mismatch fail after 3 strikes, transient 5xx/429 keep existing 10-strike behavior
  • poll_null_count resets to 0 on successful poll
  • AE telemetry: pr.poll_failed event emitted on terminal failure
  • Updated all callers including refresh-git-token.handler.ts and Town.do.ts

Fixes #3149.

Verification

  • Unit tests cover checkPRStatus error discrimination (no_token, 401/403/404, 5xx/429 transient, invalid_response, unrecognized_url, host_mismatch)
  • Unit tests cover failureMessageFor producing actionable messages for each error kind
  • Unit tests cover shouldFailImmediately and shouldCountAsTransient threshold logic
  • Integration test covers end-to-end no_token failure through DO alarm

Visual Changes

N/A

Reviewer Notes

The platformIntegrationId is not passed in the Town.do.ts checkPRStatus context — this matches the pre-existing behavior. The refresh-git-token.handler.ts does pass it, which is the intended pattern for rig-level token resolution.

Replace PRStatusResult | null return type with discriminated PRStatusOutcome
union in checkPRStatus. Each null cause (no token, HTTP error, invalid
response, unrecognized URL, host mismatch) now surfaces a structured
PRStatusError with actionable failure messages.

- resolveGitHubToken returns GitHubTokenResolution with resolution chain
- no_token and non-transient HTTP errors (401/403/404) fail immediately
- invalid_response/unrecognized_url/host_mismatch fail after 3 strikes
- Transient HTTP errors (5xx/429) keep existing 10-strike behavior
- poll_null_count resets to 0 on successful poll at both call sites
- failureKind persisted to bead metadata for analytics
- AE event pr.poll_failed emitted on terminal failure
- Unit tests for checkPRStatus, resolveGitHubToken, failureMessageFor,
  and threshold logic
- Integration test for no_token immediate-fail path
err
);
}
} else if (!integrationId) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Missing tried entry when integrationId is set but GIT_TOKEN_SERVICE binding is absent

The condition on line 62 is else if (!integrationId), so when integrationId is non-empty but ctx.env.GIT_TOKEN_SERVICE is falsy (e.g. the binding isn't wired in a test/staging env), neither branch executes and the source label is silently dropped from tried. The no_token failure message shown to the user will omit that integration step, making the resolution chain incomplete and potentially misleading.

Consider:

  } else if (integrationId) {
    // GIT_TOKEN_SERVICE binding unavailable — record that we couldn't try it
    tried.push(`${sourceLabel} (service unavailable)`);
  } else {
    tried.push('platform integration (none configured)');
  }

const PR_POLL_NULL_THRESHOLD = 10;

/** Fail MR bead after this many consecutive non-transient errors (invalid_response, unrecognized_url, host_mismatch). */
const PR_POLL_NON_TRANSIENT_THRESHOLD = 3;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: poll_null_count counter is shared between the two error-kind buckets (transient 10-strike vs non-transient 3-strike)

Both the shouldCountAsTransient branch (lines ~1291-1349) and the else branch (lines ~1350-1409) increment and read the same $.poll_null_count key. If error kinds alternate between poll cycles — e.g. a transient http_error (5xx) followed by an invalid_response — the counter from the first bucket bleeds into the second bucket's threshold check. An invalid_response that arrives after 2 transient errors will immediately fail on the first non-transient occurrence (counter is already 2, threshold is 3, so it fails after just one more), while also not correctly applying the 10-strike rule for the transient errors.

Consider using separate counter keys ($.poll_transient_count and $.poll_non_transient_count) so each bucket's threshold is independent and the PR bead doesn't fail prematurely due to cross-bucket counter accumulation.

@kilo-code-bot
Copy link
Copy Markdown
Contributor

kilo-code-bot Bot commented May 9, 2026

Code Review Summary

Status: 2 Issues Found | Recommendation: Address before merge

Overview

Severity Count
CRITICAL 0
WARNING 2
SUGGESTION 0
Issue Details (click to expand)

WARNING

File Line Issue
services/gastown/src/dos/town/town-scm.ts 62 tried list silently drops integration source when GIT_TOKEN_SERVICE binding is absent but integrationId is set
services/gastown/src/dos/town/actions.ts 334 Shared poll_null_count counter across transient (10-strike) and non-transient (3-strike) error buckets causes cross-bucket contamination
Other Observations (not in diff)

host_mismatch misleading expected field (services/gastown/src/dos/town/town-scm.ts:168):
When no gitlab_instance_url is configured and the PR URL is neither gitlab.com nor the configured host, configuredHost is null, so expected falls back to 'gitlab.com'. The failure message will read configured: \gitlab.com`even though gitlab.com was not explicitly configured — just implicitly allowed. This is cosmetic but could confuse users debugging why their custom GitLab instance URL was rejected. Consider using a clearer fallback like'gitlab.com (default)'` or exposing the actual constraint ("only gitlab.com is permitted; no custom instance URL configured").

poll_null_count not reset when error kind changes (services/gastown/src/dos/town/actions.ts — related to the counter-sharing issue above):
If a bead accumulates 2 transient errors and then switches to an invalid_response error, the non-transient branch reads a counter of 2 and will fail on the very next invalid_response (counter becomes 3, hitting the 3-strike threshold). This is the same root cause as the shared counter issue noted inline.

Files Reviewed (6 files)
  • services/gastown/src/dos/town/actions.ts — 1 issue
  • services/gastown/src/dos/town/town-scm.ts — 1 issue (+ 1 observation)
  • services/gastown/src/handlers/refresh-git-token.handler.ts — no issues
  • services/gastown/test/integration/pr-poll-errors.test.ts — no issues
  • services/gastown/test/unit/pr-poll-errors.test.ts — no issues
  • services/gastown/test/unit/pr-poll-thresholds.test.ts — no issues

Fix these issues in Kilo Cloud


Reviewed by claude-4.6-sonnet-20260217 · 651,027 tokens

@jrf0110
Copy link
Copy Markdown
Contributor Author

jrf0110 commented May 10, 2026

Superseded by #3160. Closing as duplicate from convoy retry loop.

@jrf0110 jrf0110 closed this May 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant