Skip to content

fix: Install Consul without Windows service to avoid dev agent conflict#77

Merged
keelerm84 merged 5 commits into
mainfrom
devin/1774373588-fix-consul-windows-service
Jun 8, 2026
Merged

fix: Install Consul without Windows service to avoid dev agent conflict#77
keelerm84 merged 5 commits into
mainfrom
devin/1774373588-fix-consul-windows-service

Conversation

@devin-ai-integration

@devin-ai-integration devin-ai-integration Bot commented Mar 24, 2026

Copy link
Copy Markdown
Contributor

Summary

The choco install consul command registers Consul as a Windows service, which races with the subsequent Start-Process consul agent -dev for port binding. When the service wins the race and binds to the configured port first, the dev agent silently fails to start, causing all Consul integration tests in downstream repos to fail with 500 No known Consul servers.

This was observed as a flaky failure in python-server-sdk#411 — only windows (3.14) lost the race on that run, but the issue is non-deterministic and affects all Windows matrix entries.

This PR:

  1. Installs Consul with the chocolatey /noservice package parameter so the Windows service is never registered, eliminating the process that raced with the dev agent for the port.
  2. Adds a readiness poll against /v1/status/leader (up to 30s) so the step only completes once Consul has elected a leader and is actually ready to serve requests.

Review & Testing Checklist for Human

  • Verify the chocolatey /noservice parameter is honored by the consul 1.20.1 package so no Windows service is registered (and therefore nothing competes for the port).
  • Test end-to-end by pointing a consumer repo (e.g. python-server-sdk#412) at this commit SHA and confirming all Windows Consul tests pass across all Python versions.
  • Confirm no unintended impact on Linux/macOS steps — the change is scoped to the runner.os == 'Windows' conditional, but worth a glance.

Notes

  • The -ErrorAction SilentlyContinue on Stop-Service / Set-Service is intentional: if a future chocolatey package version stops registering a Windows service, these calls will no-op gracefully instead of failing.
  • This action has no CI of its own — changes can only be validated by running a downstream consumer's CI against the commit SHA.

Link to Devin session: https://app.devin.ai/sessions/218fb7de91864819adbc8746029f7331
Requested by: @keelerm84

The chocolatey package registers Consul as a Windows service which
races with the dev agent for port binding. Stop and disable the
service before starting the dev agent, and add a readiness check
that polls the leader endpoint to confirm the agent is ready.

Co-Authored-By: mkeeler@launchdarkly.com <keelerm84@gmail.com>
@devin-ai-integration

Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

devin-ai-integration Bot added a commit to launchdarkly/python-server-sdk that referenced this pull request Mar 24, 2026
Point the Windows persistent-stores action at the commit that stops
the Consul Windows service before starting the dev agent, replacing
the local workaround.

See: launchdarkly/gh-actions#77
Co-Authored-By: mkeeler@launchdarkly.com <keelerm84@gmail.com>
@keelerm84 keelerm84 marked this pull request as ready for review June 8, 2026 13:46
@keelerm84 keelerm84 requested a review from a team as a code owner June 8, 2026 13:46
Comment thread actions/persistent-stores/action.yml Outdated
@@ -82,8 +82,34 @@ runs:
# 1.20.1 until they fix the issue
run: |
choco install consul --version=1.20.1

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should allow you to get rid of the next Stop-Service command.

Suggested change
choco install consul --version=1.20.1
choco install consul --version=1.20.1 --params="'/noservice'"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call -- done in 3352941. Switched to choco install consul --version=1.20.1 --params="'/noservice'" and removed the Stop-Service/Set-Service block, so the service is never registered rather than registered-then-disabled. Kept the /v1/status/leader readiness poll.

@devin-ai-integration

Copy link
Copy Markdown
Contributor Author

❌ Cannot revive Devin session - the session is too old. Please start a new session instead.

Use the chocolatey /noservice package parameter so Consul is never
registered as a Windows service, instead of registering it and then
stopping and disabling it. This removes the service that raced with the
dev agent for the port, so the Stop-Service and Set-Service steps are no
longer needed.
@keelerm84 keelerm84 requested a review from jsonbailey June 8, 2026 14:20
The dev agent elects itself leader effectively instantly, so the poll is
not waiting on slow Raft election. Reword the comment to reflect its
actual purpose: Start-Process returns immediately, so we wait until the
HTTP API is up and serving before tests run.
Comment thread actions/persistent-stores/action.yml Outdated
Comment on lines +91 to +110
# Start-Process returns immediately, so wait until the dev agent's HTTP
# API is up and serving (leader elected) before tests run.
$timeout = 30
$elapsed = 0
while ($elapsed -lt $timeout) {
try {
$response = Invoke-RestMethod -Uri 'http://127.0.0.1:${{ inputs.consul_port }}/v1/status/leader' -TimeoutSec 2
if ($response -and $response -ne '""') {
Write-Host "Consul is ready (leader: $response)"
break
}
} catch {}
Write-Host "Waiting for Consul to elect a leader... ($elapsed/$timeout seconds)"
Start-Sleep -Seconds 1
$elapsed++
}
if ($elapsed -eq $timeout) {
Write-Error "Consul did not become ready within $timeout seconds"
exit 1
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The readiness check can be simplified. In dev mode Consul immediately self-elects as leader, so the real intent here is just "is the agent's HTTP API up" — no leader polling needed. consul members (already on PATH after install) exits non-zero until the agent responds, making it a self-describing check without any HTTP request construction or JSON parsing.

Suggested change
# Start-Process returns immediately, so wait until the dev agent's HTTP
# API is up and serving (leader elected) before tests run.
$timeout = 30
$elapsed = 0
while ($elapsed -lt $timeout) {
try {
$response = Invoke-RestMethod -Uri 'http://127.0.0.1:${{ inputs.consul_port }}/v1/status/leader' -TimeoutSec 2
if ($response -and $response -ne '""') {
Write-Host "Consul is ready (leader: $response)"
break
}
} catch {}
Write-Host "Waiting for Consul to elect a leader... ($elapsed/$timeout seconds)"
Start-Sleep -Seconds 1
$elapsed++
}
if ($elapsed -eq $timeout) {
Write-Error "Consul did not become ready within $timeout seconds"
exit 1
}
# consul members exits non-zero until the agent's HTTP API responds.
# In dev mode the leader is self-elected immediately on startup, so
# this is purely a process-readiness check.
$deadline = (Get-Date).AddSeconds(30)
$ready = $false
while ((Get-Date) -lt $deadline) {
consul members -http-addr="http://127.0.0.1:${{ inputs.consul_port }}" 2>&1 | Out-Null
if ($LASTEXITCODE -eq 0) { $ready = $true; break }
Start-Sleep -Seconds 1
}
if (-not $ready) {
Write-Error "Consul did not become ready within 30 seconds"
exit 1
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, that's cleaner -- applied in dcd7d2f. Swapped the /v1/status/leader HTTP poll for the consul members exit-code check, so no HTTP/JSON handling and it reads as a plain process-readiness gate.

@jsonbailey jsonbailey left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a comment on an alternative check that doesn't need the try catch but won't block on it as I haven't tested it.

Replace the /v1/status/leader HTTP poll with consul members, which exits
non-zero until the agent's HTTP API responds. This drops the HTTP request
construction and JSON parsing in favor of a self-describing check using a
binary already on PATH after install.
@keelerm84 keelerm84 changed the title fix: stop Consul Windows service before starting dev agent fix: Install Consul without Windows service to avoid dev agent conflict Jun 8, 2026
@keelerm84 keelerm84 merged commit a14500d into main Jun 8, 2026
2 checks passed
@keelerm84 keelerm84 deleted the devin/1774373588-fix-consul-windows-service branch June 8, 2026 14:45
keelerm84 added a commit that referenced this pull request Jun 8, 2026
The consul members readiness check from #77 fails the step on its first
iteration. While Consul is still starting, consul members writes a
connection-refused message to stderr; the 2>&1 merge under the Actions
PowerShell wrapper ($ErrorActionPreference=Stop) surfaces that as a
NativeCommandError and terminates the step before the retry loop can run.

Restore the previous /v1/status/leader HTTP poll, where Invoke-RestMethod
raises a catchable exception that the try/catch swallows so the loop
retries until the agent is ready. The /noservice install change from #77
is unaffected.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants