# M8 acceptance — rig steps (HFENDULEAM + kitsubito, isolated homes)

Prepared 2026-06-07. **Non-destructive**: everything runs against throwaway
`SPT_HOME`s — your live doyle listener, the real nodes, and BIGNET's
`09ef831e` fixture are untouched.

## Prepared environment

| | binary (D4 + notif-epoch fix) | throwaway home |
|---|---|---|
| **HFENDULEAM** | `C:\Users\decid\Documents\projects\spt-core\target\release\spt.exe` | `C:\Users\decid\spt-accept\home` |
| **kitsubito** | `~/spt-core-deploy/target/release/spt` | `~/spt-accept-home` |

Shorthands below:
- HF (run in an **elevated PowerShell** on HFENDULEAM):
  ```powershell
  $env:SPT_HOME="C:\Users\decid\spt-accept\home"
  $spt="C:\Users\decid\Documents\projects\spt-core\target\release\spt.exe"
  ```
- KIT (I can drive these over SSH for you; or run them yourself):
  ```sh
  KH=/home/reavus/spt-accept-home; KB=~/spt-core-deploy/target/release/spt
  run() { sudo env SPT_HOME=$KH HOME=/home/reavus PATH=/usr/local/bin:/usr/bin:/bin "$KB" "$@"; }
  ```

> Legend: **[KIT-SSH]** I can run it. **[HF-elev]** you run, elevated.
> **[reboot]** disruptive to the CI runner — your call on timing.

---

## ✅ M8 ACCEPTANCE COMPLETE (2026-06-07)

All criteria exercised on real hardware (HFENDULEAM ↔ kitsubito, orchestrated
doyle+todlando):

| # | Criterion | Verdict |
|---|---|---|
| 1/3 | election + de-elevation (both OS) | ✅ (Win SPT_HOME-respawn bug found+fixed `f99dc25`) |
| 2 | Windows install | ✅ |
| 4 | old verbs gone / hot path | ✅ |
| 5 | attach/detach + prune | prune ✅ · **detach-flip DEFERRED** (subnet-scoped probe) |
| 6 | convergence in seconds | ✅ |
| 7 | re-pair auto-evict + hazard 4.11 | ✅ |
| 8 | status honesty | ✅ |
| 9 | skewed-clock pairing | ✅ |
| 10 | pump-death render (+ healthy) | ✅ |

**Fixes shipped this run** (dev-freeform): SPT_HOME survives Win de-elevation
respawn (`f99dc25`), clean `--nodes` labels (`b24dd61`), bounded liveness
probe + notice (`acbfbf4`).
**Deferred** (DEFERRED.md): subnet-scoped liveness probe (detach false-online),
endpoint-less node label.
**Investigated & closed — NOT a bug:** a suspected Linux SPT_HOME-propagation
issue (kitsubito's ACCEPT landed in the default home, not the doc's `$KH`).
Verdict = setup artifact. Provenance: pid 293151 launched as plain `spt daemon
run`, SPT_HOME never set, history all bare `spt` — the `$KH` wrapper was never
used on kitsubito. Source: unix `spawn_detached` inherits + forwards the parent
env (no `env_clear`; only `SUDO_*` removed) and `setuid/setgid/setsid` never
strip envp — so SPT_HOME would survive even if set. Windows' `CreateEnvironment
Block` is the only from-scratch env rebuild (the unique reason for `f99dc25`);
Linux has no analog, hence no analog bug.

## Already passed (no rig action needed)
- **2** Windows install (enlyzeam: firewall + at-logon task, v0.1.0→D4) ✓
- **9** skewed-clock pairing (enlyzeam +206s; NTP-on JOINED, NTP-off NO_SEED_HOLDER) ✓
- **4** old verbs gone / hot path intact ✓ · **8** status honesty ✓
- **debug-converge** tooling (exit 0/1/2) ✓ · full **twohost** cross-node ladder ✓
- **10-healthy** `peer pump: live` render ✓

---

## ✅ Criterion 1/3 (2026-06-07, kitsubito) — bug found + FIXED + verified

**Acceptance found a real REQ-INSTALL-6 bug; fixed and re-verified on hardware.**
- **De-elevation (criterion 3, KH-5.7): PASS** — `sudo spt` lands daemon+state
  under the elected user, never root (`strace` confirms `setgid`/`setuid`).
- **Election (criterion 1, decision 8): PASS after fix.** Root cause:
  `spt/src/main.rs` de-elevated early via `sudo_invoker()` (drop-only) for
  *every* command, before clap — shadowing BOTH daemon-side election guards
  (`Daemon::run` + `spawn_detached`), which then always saw `euid != 0` and
  no-op'd. So `/etc/spt-core/default-user` was never written by any path
  (a D3 integration gap; the election was unit-tested in isolation but never
  wired into the path that runs as root). **Fix:** main.rs now uses
  `daemon_target_user()` (elect+record+drop) — the one site still running as
  root. Verified: first `sudo spt` →
  `ELECTED_DEFAULT_USER:reavus (recorded in /etc/spt-core/default-user)`,
  record = `reavus`, state under reavus, never root; 2nd run idempotent
  (elected-record early-return, no re-elect).

## ✅ Criterion 1/3 (2026-06-07, HFENDULEAM) — Windows de-elevation bug found + FIXED + verified

**Windows leg found a second REQ-HAZARD-ELEVATED-DAEMON-SPAWN bug; fixed and re-verified on hardware.**
- **Respawn (criterion 3): PASS** — elevated `daemon run` respawns de-elevated.
- **`daemon stop` ANOMALY → FIXED.** First run: `daemon stop` printed
  `DAEMON_NOT_RUNNING` while `Get-Process` showed two live unelevated `spt`.
  Root cause: the de-elevation respawn (`deelevate::create_with_token`)
  rebuilds the child env from the desktop-user token via
  `CreateEnvironmentBlock` (correct, to fix `%LOCALAPPDATA%`) but **dropped the
  invoker's `SPT_HOME`** — so the respawned daemon bound the seed-control
  socket under the *default* home, and `daemon stop` (still carrying
  `SPT_HOME`) pinged the accept-home socket → not-running, while the orphan
  served the wrong home. `SPT_HOME` is the documented universe-relocation knob
  (`perch.rs`), so it must survive de-elevation. **Fix:** `apply_env_overrides`
  overlays `SPT_HOME` onto the rebuilt block (production = no `SPT_HOME` →
  unchanged). Verified: `daemon run` → respawn under accept-home → `daemon
  stop` → `DAEMON_STOPPED`, processes gone. Unit: `env_overlay_keeps_explicit_spt_home_alive`.

### original finding (kept for the trail)
## ⚠️ Criterion 1/3 finding (2026-06-07, kitsubito) [SUPERSEDED — fixed above]

Driven over SSH against `~/spt-accept-home`:
- **De-elevation (criterion 3, KH-5.7 safety): PASS.** Every clean
  `sudo … spt daemon run` lands the daemon + state under **reavus**
  (`identity/node.key` owned `reavus:reavus`), **never root**;
  `/root/.spt-core` is never created. Verified across file/pipe/tty captures.
- **Election record (criterion 1 "decide once"): ANOMALY — needs local debug.**
  `/etc/spt-core/default-user` is never written and the daemon prints no
  `DEELEVATED`/`ELECTED` line, in *any* capture mode (incl. forced tty), even
  though de-elevation provably happens and env (`SUDO_USER`/`SUDO_UID`) +
  `/etc` writability are confirmed fine. **Not correctness-blocking:**
  `daemon_target_user` falls back to the sudo invoker each run, so the daemon
  always lands under the right account whether or not the record persists —
  the record is only the determinism/decide-once optimization (and the
  election ladder resolution is unit-covered). Flagged for a focused local
  debug pass (the remote SSH harness can't observe the daemon's startup
  output, which is blocking diagnosis).
  - **strace evidence (kitsubito, deploy binary):** `sudo … spt daemon run`
    runs `setgid(1000)`+`setuid(1000)` (de-elevation confirmed at the syscall
    level) but performs **no `openat` of `/etc/spt-core/default-user`**
    (`write_election` never reached) and **no `write(2,…)` of
    `DEELEVATED`/`ELECTED`** — even though `SUDO_USER=reavus` is present in the
    env and `getent passwd reavus`→`1000` resolves (so neither the
    `default_name.is_empty()` nor the `user_by_name()==None` branch should be
    taken). The confirmed-on-a-real-terminal "no output + no record" rules out
    SSH suppression. **Next step: instrument `deelevate::daemon_target_user`
    branch-by-branch (eprintln per arm), rebuild, run under sudo** — the
    binary's actual branch is not what the source predicts; remote tracing
    can't see the daemon's own stdout. Non-blocking: de-elevation lands state
    under the invoker every run regardless of the record.

## Criterion 1 + 3 — fresh-install election + de-elevation

**Linux election + de-elevation [KIT-SSH]:**
```sh
sudo rm -f /etc/spt-core/default-user            # clear election (system-wide)
sudo env SPT_HOME=/home/reavus/spt-accept-home ~/spt-core-deploy/target/release/spt daemon run
#   watch stderr for:  ELECTED_DEFAULT_USER:reavus ...   and   DEELEVATED: daemon dropped to uid <n>
#   Ctrl-C once both print
sudo cat /etc/spt-core/default-user              # → reavus
ls -ld /home/reavus/spt-accept-home/identity     # owned by reavus (state under the user)
sudo ls /root/.spt-core 2>/dev/null || echo "no root home — de-elevation held"
```
PASS: election recorded `reavus`, daemon de-elevated, state under the user, `/root` never created.

**Windows de-elevation [HF-elev]:**
```powershell
& $spt daemon run
#   expect:  DEELEVATED_RESPAWN: unelevated daemon pid <n>; elevated copy exits
#   (the elevated invocation respawns the daemon under your normal token via the UAC linked token)
Get-Process spt | Select-Object Id,@{n='Elevated';e={$false}}   # the survivor runs unelevated
& $spt daemon stop
```
PASS: elevated `daemon run` respawns de-elevated (KH 5.7) instead of serving with the admin token.

**Reboot-reachable (criterion 1 final) [reboot]:** structurally proven already
(linger enabled + systemd user unit enabled on kitsubito earlier). To observe
literally: the systemd unit targets the *default* home, so either reboot
kitsubito and confirm `systemctl --user is-active spt-daemon` + a reachable
daemon before any manual `spt` (uses the real node, disruptive to CI), or skip
— the mechanics are covered.

---

## Setup the test pair (needed for 5/6/7/10)

A fresh subnet `ACCEPT` between the two throwaway homes.

1. **[KIT-SSH]** seed-holder + code:
   ```sh
   run subnet create ACCEPT          # I capture the 6-digit code from show-code
   run subnet show-code ACCEPT       # prints current code
   ```
2. **[HF-elev]** join (I'll relay the current code):
   ```powershell
   & $spt subnet join ACCEPT --code <CODE>     # → JOINED SUBNET: ACCEPT
   ```
3. Confirm both online:
   ```sh
   run subnet status ACCEPT --nodes            # → KITSUBITO + HFENDULEAM, both online
   ```

---

## ✅ Criterion 5 — DETACH-LIVENESS BUG FIXED (2026-06-08, REQ-SUBNET-5 int stage)

`subnet detach ACCEPT` on HFENDULEAM correctly stops it serving/advertising
ACCEPT (rows withdraw — its friendly name vanishes from kitsubito's
`status --nodes`), **but the row stayed `online`** indefinitely. Root cause:
liveness fell to `Probe` → `probe_node` did a raw transport dial by
node-pubkey (`addr_for_node_hex` → `net_dial`), which succeeded because
HFENDULEAM's daemon endpoint stays bound for its own home. One ALPN
(`spt-core/net/0`) serves every subnet, so the dial was subnet-blind — it
proved "the box is up", not "still serving ACCEPT".

**FIXED 2026-06-08:** the probe is now subnet-scoped. A new `ServeProbeRecord`
(spt-net `serveprobe`) + daemon `serveprobe` handler/requester ask the reached
peer "serving subnet X?", answered from its own `AttachmentStore::is_detached`
∧ membership (the same signal the pump/responder already gate on); the
dispatcher classifies the stream by its `serve_probe` first-line field;
`wansend::probe_node_serving` dials (proves reachable) THEN asks — so a
detached-but-alive peer reads offline within a cadence. int evidence:
`dispatcher_serves_a_subnet_serve_probe` (loopback E2E, spt-daemon `dispatch`
suite). Real-hardware re-verify (the steps below) rides the next acceptance /
signed-update deploy pass.

## Criterion 5 — attach/detach flip + prune

**attach/detach [HF-elev] + observe [KIT-SSH]:**
```powershell
& $spt subnet detach ACCEPT        # HFENDULEAM stops serving ACCEPT
```
```sh
run subnet status ACCEPT --nodes   # HFENDULEAM drops to offline for ACCEPT within a cadence
```
```powershell
& $spt subnet attach ACCEPT        # back
```
```sh
run subnet status ACCEPT --nodes   # HFENDULEAM online again
```

**prune (synthetic dead identity, since BIGNET is intentionally untouched):**
```powershell
& $spt daemon stop                 # HFENDULEAM's ACCEPT node goes silent (trust rows linger on kitsubito)
```
```sh
run subnet status ACCEPT --nodes   # note HFENDULEAM's node-id prefix
run subnet prune <hf-prefix>       # elevation-gated; drops the dead identity's trust rows
run subnet status ACCEPT --nodes   # the pruned identity is gone; no more dead dials
```
PASS: detach/attach flips advertising live; prune removes the dead identity's trust rows.
(The real BIGNET `09ef831e` prune is identical against your live home when you want it:
`sudo spt subnet prune 09ef831e` — left for you since it mutates real trust.)

> ✅ **prune VERIFIED 2026-06-07** (HF↔kitsubito): `LEFT:ACCEPT (1 trust row dropped)`,
> pruned identity gone from kitsubito's view.
> ⚠️ **ORDERING GOTCHA (hit live):** prune is a LOCAL trust mutation — it removed HF
> from *kitsubito's* trust only; HF wasn't notified and still held the seed. So criteria
> 6/7 (which need the pair) could not run until a re-pair. **Run prune LAST, or re-pair
> between 5 and 6.** We re-paired (criterion 7's flow) to recover.

---

## Criterion 6 — convergence in seconds

> ✅ **VERIFIED 2026-06-07** (HF↔kitsubito re-pair): after `JOINED SUBNET: ACCEPT`,
> HF saw kitsubito online within seconds; kitsubito saw HF (`43a51d9a`) online `[1/1]`
> at the very first poll (19:34:20), stable across 6 polls @3s. Seconds, not the 60s
> cadence — both directions.

**post-restart resync [KIT-SSH restart] + [HF watch]:**
```sh
run daemon stop && run daemon run &     # restart kitsubito's ACCEPT daemon
```
```powershell
# immediately poll; kitsubito should reappear within seconds (addr-seed reload, not 60s)
while ($true) { & $spt subnet status ACCEPT --nodes; Start-Sleep 2 }
```
**event-driven advertisement:** start a ready endpoint on one side →
the peer's `status --nodes` shows it online within seconds (not a cadence wait):
```powershell
& $spt ready hf-probe --subnet ACCEPT   # leave running
```
```sh
run subnet status ACCEPT --nodes        # hf-probe shows [1/1] within seconds
```
PASS: both convergence paths land in seconds. (Evidence can also ride
`debug-converge` if you stage a debug rollout, but timing is the criterion.)

---

## Criterion 7 — re-pair proof (auto-evict + epoch sub-check)

> ✅ **VERIFIED 2026-06-07** (HF↔kitsubito): HF regenerated identity
> (`43a51d9a` → `96802fb2`, machine_id `fd8aed2a` unchanged) and re-joined. On
> kitsubito: at t0 old `43a51d9a` rendered **offline** [1/1] (superseded at the
> ceremony) beside new `96802fb2` **online** [1/1]; +33s (one cadence) the old row
> **evicted** — leaving exactly **one** HFENDULEAM row (`96802fb2`), no manual prune.
> Sub-check (hazard 4.11): the fresh rows rendered online immediately, never stale-dropped.

On HFENDULEAM, regenerate the node identity and re-join with the same label:
```powershell
& $spt daemon stop
Remove-Item "$env:SPT_HOME\identity\node.key"      # new identity on next start
& $spt daemon run        # (or let the join auto-start it) — note the NEW node id
& $spt subnet join ACCEPT --code <fresh CODE from kitsubito>
```
```sh
run subnet status ACCEPT --nodes
```
PASS: ACCEPT shows **one** trust row + **one** registry identity for HFENDULEAM
(the old identity auto-evicted at the ceremony — no manual prune). Sub-check
(hazard 4.11): the re-paired node's fresh rows are NOT dropped as stale — they
appear online, proving the peer-side epoch memory died with the evicted row.

---

## Criterion 10 — pump-death honesty (render)

> ✅ **VERIFIED 2026-06-07** (kitsubito isolated `$KH`, plain reavus — zero risk to
> SPT_DEV/default daemon): synthetic stall (`peer_pump:false` + 10-min-stale heartbeat).
> `daemon status` → `peer pump: STALLED (last tick 605s ago — …restart the daemon)`
> with `daemon: running` (dispatcher alive); `subnet status` → `warning: peer pump
> STALLED (605s …)`; `daemon stop` → `DAEMON_STOPPED`. Half-dead daemon renders
> honestly, never implied-healthy. (Bonus: kitsubito's real pump threw a transient 98s
> stall mid-run and **self-recovered via the supervisor** — live evidence of the
> supervised-restart path, complementing the unit test.)

Induce a stalled-pump render without a fault hook (write a stale heartbeat,
run the daemon with the pump off so it won't refresh it):
```sh
mkdir -p $KH/identity
echo '{"full_auto_update":false,"peer_pump":false}' > $KH/daemon.json   # pump off
printf '%s' $(( $(date +%s%3N) - 600000 )) > $KH/identity/pump-heartbeat.json  # 10-min-stale
run daemon run &        # daemon up, dispatcher alive, pump NOT writing the heartbeat
run daemon status       # → peer pump: STALLED (last tick 600s ago — ...)
run subnet status       # → warning: peer pump STALLED ...
run daemon stop
```
PASS: a half-dead daemon (pump stalled, dispatcher alive) renders the stall
honestly, never implied-healthy. (Supervised panic→restart with capped backoff
is unit-covered — `peerloop::tests::supervisor_restarts_a_panicking_pump_until_stop`;
live panic injection needs a fault hook, out of scope.)

---

## Teardown (after acceptance)
```sh
run subnet leave ACCEPT 2>/dev/null; rm -rf /home/reavus/spt-accept-home
sudo rm -f /etc/spt-core/default-user
```
```powershell
& $spt subnet leave ACCEPT 2>$null; Remove-Item -Recurse -Force "$env:SPT_HOME"
```
Real homes, the live listener, BIGNET `09ef831e`, and both nodes' real
identities were never touched.
