---
name: ci-kitsubito-speedup-plan
description: Planned CI speedup for the slow Linux runner kitsubito — mold linker + cargo-nextest first; cross-build-ship rejected
metadata: 
  node_type: memory
  type: project
  originSessionId: 66b6c4b8-4c00-481a-aba8-6c00d3c21a5d
---

Operator 2026-06-15: kitsubito (Linux CI runner) has a slow CPU → its `test` + `n1-gate` jobs are the wall-clock tail of every CI run (Windows/hfenduleam finishes its job pair well first; each self-hosted runner is a SINGLE slot so the two Linux jobs serialize on kitsubito). Operator asked to plan a speedup.

**Cross-build-on-hfenduleam-ship-to-kitsubito: REJECTED.** Possible (`cargo test --no-run --target x86_64-unknown-linux-musl` → scp binaries → run via `CARGO_TARGET_<T>_RUNNER` ssh-exec) but: needs a full musl cross toolchain on Windows for the native-C deps (ring/blake3/libsqlite3-sys) = Docker/WSL, brittle; COUPLES the runners (hfenduleam down ⇒ no Linux CI, defeats the deliberate isolation from [[ci-selfhost-migration]]); loses native-build fidelity (Linux-specific build bugs hide — cf the v0.4.1 Linux respawn bug [[v041-linux-respawn-blocker]]); and only speeds the BUILD half — much of kitsubito's time is TEST EXECUTION (PTY/net int tests twohost+attach have real sleeps/retries/timeouts), which prebuilt binaries don't help.

**EXECUTING 2026-06-15** (operator: don't wait 50min for one release, speed up ALL future releases now → cancelled the slow v0.7.0 CI run; the speedup CI run gates the tag instead). DONE: kitsubito `apt install mold` (2.30.0) + nextest 0.9.137 prebuilt → ~/.cargo/bin; hfenduleam nextest 0.9.137. ROOT CAUSE refined: kitsubito is 16-core/15GB — NOT weak. The slowness is `cargo test` running test BINARIES serially (only threads within a binary), so the many int-test binaries (with fixed wall-clock sleeps) queue behind each other while 15 cores idle. nextest = ONE global parallel pool across all binaries = the real win. ci.yml `test` job edit: Linux-only `RUSTFLAGS=-C link-arg=-fuse-ld=mold` step; `cargo build --all-targets`+`cargo test` → `cargo nextest run --workspace` + `cargo test --workspace --doc` (nextest skips doctests; 6 `//!` fences exist); DROP standalone build (clippy --all-targets still compiles examples/benches). Validating full `cargo nextest run --workspace` locally on hfenduleam BEFORE push (int-test parallel-safety is the risk — they use unique socket names so should be safe). Then push → that run greens → tag v0.7.0.

**PLANNED (priority order), all kitsubito-local, no runner coupling:**
1. **mold linker** (biggest single win on a weak CPU — linking is the slow tail). Install mold on kitsubito; add `~/.cargo/config.toml` `[target.x86_64-unknown-linux-gnu] linker="clang"`/`-C link-arg=-fuse-ld=mold` (or rustflags). Linux-only — do NOT touch the Windows runner config. Verify a clean build delta.
2. **cargo-nextest** — faster/more-parallel test runner than `cargo test`; swap the CI test invocation (Linux at least). Watch: nextest doesn't run doctests (keep a `cargo test --doc` leg if any doctests exist — check first).
3. **Persistent target cache, NOT clean** — the disk-full RED ([[hfenduleam-disk-full-ci]], [[hfenduleam-target-bloat-ci-red]]) means the lever is `sccache` with a SIZE CAP, not nuking target. Consider sccache local-disk backend on both runners with a bounded cache dir (avoids the 396GB bloat while keeping warm artifacts).
4. (low priority) second runner slot on kitsubito so test+n1-gate overlap — but doubles CPU contention on the slow box, likely net-neutral; skip unless 1-3 insufficient.

Sequence: land 1 (mold) first, measure; then 2 (nextest); then 3 (sccache cap) if disk pressure recurs. Do AFTER the v0.7.0 release lands. CI workflow = `.github/workflows/ci.yml` (matrix `[kitsubito Linux, hfenduleam Windows]`, fail-fast:false). Rig access: [[kitsubito-linux-rig]] (ssh reavus@kitsubito, key auth, ~/.cargo/bin).
