---
name: seedmap-test-collides-live-daemon
description: spt-daemon seedmap::request_stop_barrier_holds_until_no_listener 240s timeout on a LOADED shared box = STARVATION flake (NOT a prod-daemon socket collision — that theory was REFUTED). Rerunnable. with_home fully isolates.
metadata: 
  node_type: memory
  type: project
  originSessionId: 16842a97-6889-4f10-b3ac-07f857482143
---

**FINDING (2026-06-24, v0.15.0 W5-strip gate). CORRECTED — the first framing (live-daemon socket collision) was WRONG/refuted.** `cargo nextest -p spt-daemon` FULL run hits a 240s TIMEOUT on `seedmap::request_stop_barrier_holds_until_no_listener` (seedmap.rs:366) **when the shared box is loaded** (live agents + prod daemon 149588 up). Passes daemon-DOWN/idle (W3/W4/W5 gates green; CI kitsubito green). Unrelated to any code change.

**NOT A COLLISION (refuted, fully grounded):** the test uses `crate::test_home::with_home` (test_home.rs:24-28) which sets `SPT_HOME` to a FRESH `tempfile::tempdir()` per test; `spt_home()` is PURE — live `SPT_HOME` read, no cache (perch.rs:34-35, the "spt_home is pure" unit pins it). `seed_socket_name()` = `spt-daemon-seed-{hash(spt_home())}.sock` (endpoint.rs:32) → inside the test that's hash(unique tempdir) = a UNIQUE socket. Prod 149588 runs the DEFAULT home → a DIFFERENT socket name. So the test's `start()`/`request_stop` only ever reach the test's OWN in-process listener — they CANNOT touch prod. No shared socket, no request_stop-prod hazard. (todlando's "fixed/global name colliding with prod" theory, which doyle wrongly echoed as 'confirmed', is refuted.)

**REAL CAUSE = STARVATION/CONTENTION:** the in-process serve-thread + the `request_stop` barrier (+ the `start()` 400×5ms ping-wait) are timing-sensitive; when the shared self-hosted box is LOADED (live agents + prod daemon), thread scheduling starves the barrier → the 240s nextest slow-timeout SIGKILL backstop fires. Idle box → fast → passes. This is exactly the "cross-run scheduler starvation on the shared self-hosted boxes" the ci.yml env headroom (SPT_ATTACH_GATE_WATCHDOG_MS/IPC_DEADLINE_MS) documents — load noise, not a wedge. The daemon-up/down correlation is LOAD correlation, not socket correlation.

**RELEASE/GATE PROTOCOL:** it's a RERUNNABLE flake — a less-contended slot passes. When gating locally with a loaded box, EXCLUDE it (`-E 'not test(request_stop_barrier_holds_until_no_listener)'`) and lean on kitsubito + idle-box passes; on CI, rerun-per-attempt if it reds on hfenduleam. NEVER kill 149588 to get a clean run ([[no-machinewide-killon-shared-runner]], [[spt-daemon-is-live-infra]]).

**FOLLOW-UP (lower severity than first logged — flake not hazard):** harden the test for a loaded shared box — generous/load-tolerant barrier+ping timeout (mirror the SPT_ATTACH_*_MS env-headroom precedent), or mark it heavy-serial. NOT a v0.15.0 blocker.

**LESSON (doyle):** verify the FULL chain before confirming a diagnosis. doyle grounded the home_tag mechanism (necessary) but echoed "collision confirmed" to todlando + wrote a wrong memory BEFORE checking the two load-bearing facts — `spt_home()` purity + what `with_home` actually sets. A partial mechanism check is not a confirmation. Caught it before the release was mis-handled (held deployah, then reversed), but only after a wrong "confirmed". Kin [[gate-against-documented-design]], [[which process is the listener]].
