Skip to content

fix(gastown): distinguish null causes in PR status polling (#3149)#3160

Merged
jrf0110 merged 6 commits into
gastown-stagingfrom
convoy/gastown-observability-3149-fixes-re-slin/acf7c452/gt/toast/c412265a
May 11, 2026
Merged

fix(gastown): distinguish null causes in PR status polling (#3149)#3160
jrf0110 merged 6 commits into
gastown-stagingfrom
convoy/gastown-observability-3149-fixes-re-slin/acf7c452/gt/toast/c412265a

Conversation

@jrf0110
Copy link
Copy Markdown
Contributor

@jrf0110 jrf0110 commented May 9, 2026

Summary

Replace PRStatusResult | null return type with discriminated PRStatusOutcome union in checkPRStatus. Each null cause (no token, HTTP error, invalid response, unrecognized URL, host mismatch) now surfaces a structured PRStatusError with actionable failure messages.

Key changes:

  • resolveGitHubToken returns GitHubTokenResolution with resolution chain tracking which sources were tried
  • no_token and non-transient HTTP errors (401/403/404) fail the bead immediately (1 strike)
  • invalid_response/unrecognized_url/host_mismatch fail after 3 strikes
  • Transient HTTP errors (5xx/429) keep existing 10-strike behavior
  • poll_null_count resets to 0 on successful poll at both call sites
  • failureKind persisted to bead metadata for analytics
  • AE event pr.poll_failed emitted on terminal failure

Also includes an unrelated fix for boot hydration contention on /agents/start (commit 2ffcef28f).

Fixes #3149.

Verification

  • pnpm --filter cloudflare-gastown typecheck passes
  • Unit tests pass: test/unit/pr-poll-errors.test.ts (checkPRStatus, resolveGitHubToken), test/unit/pr-poll-thresholds.test.ts (failureMessageFor, shouldFailImmediately, shouldCountAsTransient)
  • Integration test for no_token immediate-fail path: test/integration/pr-poll-errors.test.ts
  • HTTP error scenarios covered by unit tests (mocking fetch is not practical in Cloudflare Workers integration test runtime)

Visual Changes

N/A

Reviewer Notes

  • The refresh-git-token.handler.ts change is a caller update for the new GitHubTokenResolution return type (was string | null)
  • The unrelated boot hydration commit (2ffcef28f) is included on this branch from a prior bead attempt — it fixes a separate timeout regression
  • The wrangler.jsonc max_instances change (800→500) is from the same unrelated commit

Comment thread services/gastown/src/dos/town/town-scm.ts
Comment thread services/gastown/src/dos/town/actions.ts
Comment thread services/gastown/src/dos/town/actions.ts
@kilo-code-bot
Copy link
Copy Markdown
Contributor

kilo-code-bot Bot commented May 9, 2026

Code Review Summary

Status: No Issues Found | Recommendation: Merge

Previously Flagged Issues — All Resolved ✓

File Issue Status
services/gastown/src/dos/town/town-scm.ts Missing tried entry when integrationId set but GIT_TOKEN_SERVICE absent ✅ Fixed
services/gastown/src/dos/town/actions.ts Shared poll_null_count counter causing cross-contamination between thresholds ✅ Fixed — split into poll_transient_count / poll_non_transient_count
services/gastown/src/dos/town/actions.ts unrecognized_url / host_mismatch routed through 3-strike retry rather than immediate fail ✅ Fixed

Latest Commit — No New Issues

The new commit adds:

  • Token priority reorder in resolveGitHubToken: github_cli_pat → platform integration (fresh) → stored github_token. Correct fix for the stale installation-token bug; well-documented and covered by new tests in test/unit/town-scm.test.ts.
  • resolveGitHubTokenString helper — thin wrapper, correctly typed string | null.
  • buildContainerConfig now requires townId — prevents stale-token foot-gun at all call sites; all 4 callers updated.
  • startAgentInContainer / startMergeInContainer — both resolve fresh tokens via resolveGitHubTokenString before container dispatch.
  • gastown.worker.ts logger tagging — regex-based middleware replaced with route-pattern app.use() handlers. No logic change, cleaner param extraction.
  • Three new debug endpoints (/debug/towns/:townId/rigs, /sling-convoy, /convoys) — all correctly guarded by ENVIRONMENT !== 'development'.
  • E2E testing doc — new Test C section covering review-then-land convoy mode.
Files Reviewed
  • services/gastown/src/dos/town/town-scm.ts — no issues
  • services/gastown/src/dos/town/actions.ts — no issues
  • services/gastown/src/dos/town/config.ts — no issues
  • services/gastown/src/dos/town/container-dispatch.ts — no issues
  • services/gastown/src/dos/Town.do.ts — no issues
  • services/gastown/src/gastown.worker.ts — no issues
  • services/gastown/src/handlers/refresh-git-token.handler.ts — no issues
  • services/gastown/test/unit/town-scm.test.ts — no issues
  • services/gastown/test/unit/pr-poll-errors.test.ts — no issues
  • services/gastown/test/unit/pr-poll-thresholds.test.ts — no issues
  • services/gastown/test/integration/pr-poll-errors.test.ts — no issues
  • services/gastown/docs/e2e-pr-feedback-testing.md — no issues

Reviewed by claude-4.6-sonnet-20260217 · 1,297,610 tokens

John Fawcett added 6 commits May 11, 2026 15:39
Replace PRStatusResult | null return type with discriminated PRStatusOutcome
union in checkPRStatus. Each null cause (no token, HTTP error, invalid
response, unrecognized URL, host mismatch) now surfaces a structured
PRStatusError with actionable failure messages.

- resolveGitHubToken returns GitHubTokenResolution with resolution chain
- no_token and non-transient HTTP errors (401/403/404) fail immediately
- invalid_response/unrecognized_url/host_mismatch fail after 3 strikes
- Transient HTTP errors (5xx/429) keep existing 10-strike behavior
- poll_null_count resets to 0 on successful poll at both call sites
- failureKind persisted to bead metadata for analytics
- AE event pr.poll_failed emitted on terminal failure
- Unit tests for checkPRStatus, resolveGitHubToken, failureMessageFor,
  and threshold logic
- Integration test for no_token immediate-fail path
… (review on town-scm.ts:66)

When integrationId is set but GIT_TOKEN_SERVICE binding is missing,
the configured integration source was silently omitted from the tried
array. Add an else branch that pushes the source label with a
'(GIT_TOKEN_SERVICE not bound)' annotation so the no_token error
message lists all attempted sources.
… (review on actions.ts:374)

Both are deterministic configuration errors that cannot self-resolve
on retry. Move them from the 3-strike bucket to the fail-immediately
bucket alongside no_token and non-transient http_error. Only
invalid_response remains in the 3-strike category.
…ll errors (review on actions.ts:1350)

Replace the shared poll_null_count with poll_transient_count and
poll_non_transient_count. Each error category increments only its own
counter and resets the other, preventing cross-contamination where 9
transient errors followed by 1 non-transient error would incorrectly
fail the bead.

Legacy poll_null_count is migrated on first read: the transient branch
falls back to poll_null_count when poll_transient_count is absent.
This ensures in-flight beads at deploy time retain their existing
counter value. The non-transient branch does not read the legacy field
since these counters reset on every success anyway — at worst an
in-flight bead gets one extra retry for invalid_response.
…aging priority with PR #3160 structured return type

- resolveGitHubToken now uses staging's priority: cli_pat → integration → stored token
- Returns GitHubTokenResolution discriminated union (from PR #3160)
- Includes unbound-service else branch (GIT_TOKEN_SERVICE not bound)
- Adds resolveGitHubTokenString helper for non-error-aware callers
- Updates Town.do.ts, container-dispatch.ts, config.ts to use helper
- Updates town-scm.test.ts for GitHubTokenResolution return shape
- Updates pr-poll-errors.test.ts for new priority order
@kilo-code-bot kilo-code-bot Bot force-pushed the convoy/gastown-observability-3149-fixes-re-slin/acf7c452/gt/toast/c412265a branch from 3d397ce to 6dbed96 Compare May 11, 2026 15:50
@jrf0110 jrf0110 merged commit 63873e4 into gastown-staging May 11, 2026
2 checks passed
@jrf0110 jrf0110 deleted the convoy/gastown-observability-3149-fixes-re-slin/acf7c452/gt/toast/c412265a branch May 11, 2026 16:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant