Skip to content

[superseded] Route only apt-packages/container legs to GitHub-hosted#96

Closed
ChrisRackauckas-Claude wants to merge 1 commit into
SciML:masterfrom
ChrisRackauckas-Claude:default-runner-github-hosted
Closed

[superseded] Route only apt-packages/container legs to GitHub-hosted#96
ChrisRackauckas-Claude wants to merge 1 commit into
SciML:masterfrom
ChrisRackauckas-Claude:default-runner-github-hosted

Conversation

@ChrisRackauckas-Claude

@ChrisRackauckas-Claude ChrisRackauckas-Claude commented Jun 16, 2026

Copy link
Copy Markdown

Scoped: route only apt-packages/container legs to GitHub-hosted

Note: please ignore until reviewed by @ChrisRackauckas.

Supersedes #96 (which proposed pinning the default runner to ubuntu-24.04). That was wrong — it would have moved the whole SciML fleet's CI onto GitHub-hosted runners, which lack the capacity for SciML's throughput. #96 was closed and its branch force-pushed (so it can't be reopened); this PR replaces it on the same branch.

Runner facts (unchanged)

The SciML self-hosted pool (demeter*/arctic*, ephemeral *-cxnps-*) is registered with the custom label ubuntu-latest — the same label GitHub-hosted runners answer. Its full label set is {self-hosted, Linux, X64, gpu, high-memory, ubuntu-latest}; it does not carry the pinned ubuntu-24.04 label. So:

  • ubuntu-latest → self-hosted-capable (kept as the default for throughput).
  • ubuntu-24.04 → GitHub-hosted only (has passwordless sudo + docker).

What this changes

Keep ubuntu-latest as the default for normal test/downgrade legs, and force GitHub-hosted only for the legs that genuinely need passwordless sudo / docker — i.e. exactly the legs where the caller passes apt-packages or a container. The persistent demeter*/arctic* runners lack passwordless sudo, so sudo apt-get (apt provisioning) intermittently fails with sudo: a terminal is required to read the password whenever such a leg lands there (ChrisRackauckas/InternalJunk#52); containers likewise need a Docker host.

Conditional added at the reusable job's runs-on:

runs-on: ${{ (inputs.apt-packages != '' || inputs.container != '') && fromJSON('["ubuntu-24.04"]') || <existing default> }}

fromJSON('["ubuntu-24.04"]') is a non-empty (truthy) array, so the GitHub Actions &&/|| ternary does not fall through; the default branch returns the existing value (an array via fromJson(inputs.runner), or a string 'self-hosted' / inputs.os / 'ubuntu-latest'). Both branches are valid runs-on forms (array or string) — the same idiom tests.yml already used for the runner/os selection. actionlint validates it (exit 0).

Where it landed

The reusables that accept apt-packages/container and have a direct job-level runs-on:

Reusable Job Default branch
tests.yml tests (leaf) fromJson(inputs.runner) / self-hosted / inputs.os (ubuntu-latest)
downgrade.yml downgrade self-hosted / inputs.os (ubuntu-latest)
sublibrary-downgrade.yml test ubuntu-latest

grouped-tests.yml and sublibrary-project-tests.yml route their matrices through tests.yml (passing apt-packages/container through), so the single conditional in tests.yml covers them.

Reverted from #96 / left unchanged on purpose

  • scripts/compute_affected_sublibraries.jl default runner and the os defaults in tests.yml/downgrade.yml are back to ubuntu-latest.
  • detect/discover helper jobs (grouped-tests.yml, sublibrary-project-tests.yml, sublibrary-downgrade.yml discover) stay ubuntu-latest.
  • sublibrary-project-tests.yml exposes no apt-packages/container input and passes none through, so its matrix legs never need the override.
  • Groups with an explicit self-hosted runner (GPU) or an OS-axis override set no apt-packages/container, so they stay self-hosted / on their OS.

Affected repos

The only repos passing apt-packages/container today:

  • SciPyDiffEq.jlapt-packages: "python3-scipy"
  • deSolveDiffEq.jlapt-packages: "r-base-dev r-cran-desolve"
  • FEniCS.jlcontainer: "cmhyett/julia-fenics:latest"

Tests

test/runtests.jl re-asserts the default matrix runner is ubuntu-latest, and a new runs-on conditional testset confirms the real expression is present in each reusable and (emulating GitHub Actions truthiness) that apt-packages/container resolve to ubuntu-24.04 while the default (incl. GPU self-hosted override) is preserved. Passing on Julia 1.10 and 1.12; actionlint clean. Live routing itself can only be proven by a retagged run.

Deploy

Needs a v1 retag to take effect fleet-wide.

🤖 Generated with Claude Code

The SciML self-hosted pool (demeter*/arctic*, ephemeral *-cxnps-*) is
registered with the custom label `ubuntu-latest`, the same label GitHub-hosted
runners answer. So every default test leg that requests `ubuntu-latest` is a
coin-flip between GitHub-hosted and self-hosted scheduling. The persistent
demeter*/arctic* runners lack passwordless sudo, so the apt-packages
provisioning step (`sudo apt-get`) intermittently fails with
"sudo: a terminal is required to read the password" whenever a leg lands on
one of them (ChrisRackauckas/InternalJunk#52).

Evidence: across 205 jobs in 4 recent OrdinaryDiffEq.jl runs (CI / Sublibrary
CI / Downgrade / Downgrade Sublibraries), `ubuntu-latest` was answered by BOTH
github-hosted runners (runner group "GitHub Actions") AND self-hosted runners
(demeter*/arctic*/*-cxnps-*, group "default"). The self-hosted runners' only
assigned labels are {self-hosted, Linux, X64, gpu, high-memory, ubuntu-latest}
-- none carry the pinned `ubuntu-24.04`/`ubuntu-22.04` labels. A github-hosted
job's setup log shows `ubuntu-latest` currently resolves to image
`ubuntu-24.04`, so pinning to `ubuntu-24.04` keeps the identical environment
while removing the self-hosted pool from the candidate set.

Change: set the DEFAULT test-leg runner to the pinned `ubuntu-24.04` label,
which only GitHub-hosted runners answer, forcing default legs onto
GitHub-hosted (where `sudo apt-get` works via passwordless sudo). Applied to
the per-group `runner` default in compute_affected_sublibraries.jl (the source
for grouped-tests.yml / sublibrary-project-tests.yml matrices), the `os`
default in tests.yml and downgrade.yml, and the hardcoded `runs-on` in
sublibrary-downgrade.yml plus the detect/discover helper jobs.

GPU / self-hosted groups are intentionally PRESERVED: any group that sets an
explicit `runner` (e.g. ["self-hosted","Linux","X64","gpu"]) in its
test_groups.toml overrides the default and is left untouched; the OS-axis
overrides (os = ["ubuntu-latest", ...]) are likewise unchanged.

Needs a v1 retag to take effect fleet-wide.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@ChrisRackauckas-Claude ChrisRackauckas-Claude changed the title Default test legs to GitHub-hosted (pin ubuntu-24.04, not ubuntu-latest) [superseded] Route only apt-packages/container legs to GitHub-hosted Jun 16, 2026
@ChrisRackauckas-Claude

Copy link
Copy Markdown
Author

This PR's original approach (pinning the default runner to ubuntu-24.04, moving the whole fleet to GitHub-hosted) was wrong. It has been re-scoped to force GitHub-hosted only for apt-packages/container legs, with ubuntu-latest (self-hosted-capable) kept as the default. Because this PR was closed and its branch force-pushed, GitHub will not let it reopen — the live successor on the same branch is #97. Please review there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants