# CI migration — GitHub-hosted → self-hosted runners (JIT plan)

**Status:** planned 2026-06-03 (user decision). Execute after M4-D4b is CI-green,
before D4c. Motivation: GitHub Actions credits running out; hosted Windows runner
is slow (2-core, cold caches — every spt-daemon test exe links the full iroh
stack since D4a) and hung runs aren't debuggable. Self-hosted minutes are free,
caches persist across runs, and a hung test on our own box can be attached to.

## Target topology

> **2026-06-07 update:** kitsubito (Ubuntu 22.04.5, 16-core, tailscale
> `100.98.197.12`) replaced gravity-linux as the Linux runner — same glibc
> 2.35 / rustc 1.96 baseline, labels `self-hosted, Linux, kitsubito`.

| Runner | Machine | Labels | Covers |
|---|---|---|---|
| Windows | HFENDULEAM (this box) | `self-hosted, Windows, hfenduleam` | windows test leg |
| Linux | ~~gravity-linux~~ → kitsubito (Ubuntu 22.04) | `self-hosted, Linux, kitsubito` | linux test leg (the real clippy `-D warnings` gate) + traceability job |

## Steps

1. **Registration tokens** (repo admin):
   `gh api -X POST repos/SaberMage/spt-core/actions/runners/registration-token --jq .token`
2. **HFENDULEAM runner**: download actions-runner (win-x64), `config.cmd --url
   https://github.com/SaberMage/spt-core --token <T> --labels hfenduleam
   --unattended`, install as Windows service (`svc install` + `svc start`).
   Work dir on the fast disk, NOT inside the dev checkout (its own clone+target).
3. **gravity-linux runner**: same via ssh (linux-x64 tarball, `./config.sh`,
   `sudo ./svc.sh install && sudo ./svc.sh start`). Ensure rust toolchain +
   clippy match CI (rustup default = repo toolchain).
4. **Workflow edits** (`.github/workflows/*`):
   - matrix `runs-on`: `ubuntu-latest` → `[self-hosted, Linux]`;
     `windows-latest` → `[self-hosted, Windows]`.
   - traceability job → `[self-hosted, Linux]`.
   - DROP `actions/cache` steps (persistent local target/cargo make them
     pointless and they upload gigabytes for nothing).
   - ADD `timeout-minutes: 20` on test jobs — today's handoff.rs hang burned 22
     min before a manual cancel; on our own boxes a wedged job blocks the
     single runner slot, so the timeout matters MORE self-hosted.
   - `concurrency: { group: ci-${{ github.ref }}, cancel-in-progress: true }`
     so a re-push cancels the stale run instead of queuing behind it.
5. **Keep the workflow surface identical** — push → `gh run watch` → green
   stays the loop; PR checks/traceability gate unchanged.
6. **Decommission**: nothing to delete; hosted runners simply stop being
   referenced. (Optional `workflow_dispatch` input to force hosted runners as
   a fallback matrix — skip unless needed.)

## Hygiene / risks

- Repo must stay **private** while self-hosted runners are attached (fork PRs
  running arbitrary code on our machines — not applicable to a private repo,
  but never flip this repo public with runners attached).
- One job per runner at a time (default) — serializes pushes; acceptable, and
  `cancel-in-progress` keeps the queue short.
- Runner clone is separate from the dev checkout: first build per machine is
  cold (~full iroh build), every later one is warm — the whole point.
- gravity-linux availability becomes a CI dependency: if it's down, the linux
  leg queues. Acceptable for now; revisit if it bites.
- Windows runner shares the box with dev work: cargo builds compete for cores
  during a push. Acceptable; niceness knobs exist if it annoys.

## Open items
- [ ] Confirm gravity-linux ssh reachability + disk headroom (~20 GB for
      runner + target).
- [ ] Decide whether the netstream/netbroker UDP-binding tests need a firewall
      allowance on gravity (loopback-only since BindScope::Loopback — no).
