# M8 — CLI nounification & install hardening

> **Status: RATIFIED (2026-06-07).** Core decisions were user-ratified in the
> 2026-06-06 M7-closeout grills (nounification shape, subnet attachment verbs,
> Linux elevation model); the scope-inclusion decisions (former §Open
> decisions A–G, plus H/I from the 2026-06-07 pump diagnosis) were ratified
> in the 2026-06-07 grill — see decisions 16–24. Requirements minted before
> work begins (TRACEABILITY rule 3).

Goal: finish the CLI's breaking reorg **while the adapter window is still
open** (no downstream adapter depends on the surface yet — spt-claude-code
scoping starts after M8), and make the *installed* binary a first-class
citizen on both OSes: elevation that works on Linux, inbound that works on
Windows, a daemon that survives reboot. M8 is the last cheap moment for
surface breaks; after this, spt-claude-code freezes the verbs it consumes.

## Why now (sequencing)

1. **Adapter window.** DEFERRED row "CLI nounification": *"Second
   surface-breaking reorg mid-acceptance; adapter window still open
   immediately after M7 closes."* Every milestone after M8 makes renames
   costlier.
2. **The M7 acceptance run was the field test of the install path** — it
   surfaced the Linux `sudo: spt: command not found` dead-end, the
   HOME-universe split, and the Windows silent-inbound-drop (NO_SEED_HOLDER).
   Both root causes are designed (DEFERRED rows), neither is shipped.
3. **The installer's "next touch" is already owed** three legs at once
   (Linux elevation, firewall registration, OS-service) — one coherent
   install/elevation deliverable instead of three drive-bys.

## Root causes this milestone fixes

1. **Verb sprawl at top level.** fork/suspend/wake/shutdown/rename/stop/
   digest/access/resources sit flat beside the agent hot path; endpoint
   operations have no noun home; `spt daemon` is a hidden verb.
2. **A user-scope install cannot be elevated on Linux.** The binary lives in
   `~/.local/bin` (not on sudo's `secure_path`), and `sudo spt` — when it
   resolves at all — runs the daemon in root's HOME universe. (CLI guidance
   band-aid shipped post-M7 @ `1359649`; the *model* — symlink + default
   user-account election — is this milestone.)
3. **A fresh Windows install is silently unreachable inbound.** The detached
   daemon never triggers the first-bind firewall prompt; mDNS + QUIC meet /
   pairing UDP drops with no rendered cause.
4. **A node is unreachable after reboot** until something invokes `spt`
   (REQ-INSTALL-1's deferred third leg — no OS-service registration).
5. **Subnet participation is all-or-nothing** (attached/detached *term*
   shipped M7; the per-subnet serve-state machinery did not), and dead
   identities' trust rows cost dials every pump tick with no prune verb.
6. **Convergence is slow at both entry points** (~1 min observed each): the
   pairing ceremony holds a live authenticated connection, then the pumps
   re-find each other from scratch via id-only discovery (post-join); and a
   restarted daemon takes 60s+ to sync and pick up other subnet nodes
   (post-bringup — the same id-only cold start).
7. **`spt subnet status` misrenders the standalone case.** The zero-subnet
   text ("local messaging works without [a subnet]…") prints regardless of
   daemon state — it reads as "working" even when the daemon is down — and
   the hint footer noises up a status view meant for reading state.
8. **Re-pairing a known machine strands its old identity.** A reinstall /
   key regeneration pairs as a brand-new node; the old identity's trust +
   registry rows linger (dead dials, status clutter) with only the manual
   prune verb to clean them.

## Ratified decisions (carried from the 2026-06-06 grills — not re-litigated)

| # | Decision |
|---|----------|
| 1 | **`spt endpoint` namespace** absorbs `fork`/`suspend`/`wake`/`shutdown`/`rename`/`stop`/`digest` + `access` (per-endpoint store — NOT subnet-scoped) + `description` (ex-`resources` blurb; bare = show, `set` = author). |
| 2 | `spt endpoint list [--local\|--subnet <name>]` = merged listing; default all-subnets **grouped by subnet**, SELF endpoint pinned distinctly at top. Bare `spt endpoint` = this list. |
| 3 | **Agent HOT PATH stays flat**: `send` / `ring` / `ready` / `whoami` / `how-to`. |
| 4 | `notify` → `spt subnet notify`; **`notif` stays top-level** (user-recall, cross-subnet). |
| 5 | **`spt daemon stop\|status`** noun; hidden `daemon` verb → `daemon run`. Agent-endpoint `shutdown` keeps its name (under `endpoint`). |
| 6 | **Subnet attachment verbs** (user spec 2026-06-06; CONTEXT §subnet attachment): `spt subnet detach <NAME> [--save]` / `attach <NAME> [--save]` — daemon keeps running, stops/starts advertising + connecting for that subnet; `--save` persists the startup default (renamed from `--auto`, user 2026-06-07 — clearer semantics). `spt subnet leave <NAME>` elevation-gated. |
| 7 | **Trust-row prune**: `spt subnet prune <node>` (or ratified final shape) — removes a dead identity's TRUST rows; elevation-gated (trust mutation = security surface). BIGNET's two 09ef831e rows are the live test case. |
| 8 | **Linux elevation model** (user-ratified direction 2026-06-06): (1) install symlinks the binary into a sudo-reachable path (e.g. `/usr/local/bin`) so `sudo spt` resolves; (2) first `sudo spt` detects elevation and prompts once for the DEFAULT USER ACCOUNT — thereafter any `sudo spt` daemon launch runs the daemon (and state) under that account, never root. Windows has no equivalent problem (same user profile). |
| 9 | No deprecation shims, same as M7 decision 1 — zero external CLI consumers until spt-claude-code exists. Breaking renames land clean. |
| 10 | New requirements minted **before work starts** (TRACEABILITY rule 3): `REQ-CLI-*` family (endpoint noun · daemon noun · hot-path preservation), `REQ-SUBNET-*` additions (attach/detach serve-state · prune · re-pair overwrite · status render), `REQ-INSTALL-*` additions (Linux elevation install leg · firewall registration · OS-service), polish reqs per scope ratification. Exact ids assigned at execution; inactive until their deliverable starts (rule 5). |
| 11 | **`subnet status` is daemon-aware** (user 2026-06-07): zero-subnet text becomes "No subnets registered — this node is standalone." followed by a daemon-status-dependent blurb (daemon running vs not). Never implies messaging works while the daemon is down. |
| 12 | **Hint footer only on bare `spt subnet`** (user 2026-06-07): `spt subnet status` drops its "hint:" lines; the flagless `spt subnet` view keeps them (amends M7 decision 5's "always printed"). |
| 13 | **Per-machine re-pair trust overwrite** (user 2026-06-07): registry rows carry a stable per-machine identifier (NOT the MAC — it changes between wifi/ethernet; source per open decision G). A **successful pairing ceremony** presenting the same node label AND the same machine identifier as an existing trusted row is treated as a re-pair of that machine: the old identity's trust + registry rows are deleted automatically. Overwrite fires only inside a completed ceremony (TOTP + SPAKE2 + the join-side elevation gate already paid) — a gossiped claim alone never evicts trust. Manual `prune` (decision 7) stays for every other case. |
| 14 | **Fast convergence covers daemon bringup too** (user 2026-06-07): D4's address seeding targets both cold starts — post-join (seed from the ceremony connection) and post-restart (persist + reload the pump's last-known peer addresses), so a restarted daemon reconnects in seconds, not 60s+. |
| 15 | **Fast-track advertisement on node state change** (user 2026-06-07): an endpoint coming online / going offline (ready-listener start/stop, rest-state transition, perch death) triggers an immediate `advertise_local` + peer push instead of waiting out the pump cadence. The cadence stays the steady-state floor; the event path is the fast lane. Field motivation: a re-listened endpoint stayed rendered `[0/1]` for up to a full cadence after coming online (2026-06-07 diagnosis tail). |

## Ratified decisions (2026-06-07 grill — former open decisions A–G + new H/I)

| # | Was | Decision |
|---|-----|----------|
| 16 | A | **`spt subnet disband` DEFERRED.** `leave` covers the field need; adding `disband` later is non-breaking, so the adapter window doesn't force it. |
| 17 | B | **OS-service shape: Linux systemd *user* service + `loginctl enable-linger`** (linger enable rides the elevated install leg; daemon starts at boot pre-login, user-universe per KH 5.7, managed via `systemctl --user`). **Windows: scheduled task at-logon** (interactive session, no stored credentials, no session-0 isolation breaking terminal hosting). At-startup S4U variant = later opt-in if a headless Windows node ever exists. |
| 18 | C | **NTP IN SCOPE — trigger fired**: enlyzeam's system clock is known >1 min off (exceeds the ±1 TOTP window). Shape (C2): **TOTP offset-correction only** — query NTP at ceremony time (both sides), apply the offset to the TOTP calculation in-process; **system-clock fallback** when NTP unreachable (offline LAN pairing unaffected — NTP failure never blocks a pairing that succeeds today); never sets the OS clock; no background sync loop. Epoch/lease ordering stays monotonic-clock — untouched. |
| 19 | D | **`xtask debug-converge` IN SCOPE** (builds in D4, activates REQ-UPD-6 int, `docs/DEBUG-CONVERGE-PLAN.md` is the spec). The 2026-06-07 pump diagnosis was the hand-rolled version of this table — the tool pays for itself; acceptance 6 gains instrumented evidence. |
| 20 | E | **Address seeding = narrow patch, concretized**: there is NO existing addr cache (production resolution is the id-only `PeerResolver`, `peerloop.rs`). New minimal `peer-addrs.json` in the identity dir (sibling of `pump-ops.json`): pubkey-hex → last-known iroh NodeAddr. Resolver consults it FIRST, falls back to id-only discovery on miss **or dial failure** (a stale addr never strands a peer). Two writers: pairing ceremony (both sides, from the live connection) + the pump on successful connect. Serves both decision-14 cold starts with one file. No generalized cache seam (no third consumer). |
| 21 | F | **`endpoint access` ports 1:1**: `spt endpoint access allow\|revoke\|open\|list`, same args/semantics (ADR-0009 polarity naming preserved). M8's break is location-only; any verb reshape belongs to the consent framework's touch. |
| 22 | G | **Machine identifier = OS machine id** (`/etc/machine-id` / Windows `MachineGuid`), **domain-separated SHA-256 before gossip** (`sha256("spt-machine-id:" + raw)` — never raw on the wire). spt-minted persisted UUID as fallback where the OS id is unreadable. (Minted-UUID-as-primary fails the exact scenario decision 13 targets: a state-dir wipe destroys it. Cloned-VM id collision is acceptable — overwrite only fires inside a completed ceremony, so the worst case is visible and recoverable by re-pairing.) |
| 23 | — | **Pump liveness (NEW — from the 2026-06-07 hfenduleam half-death)**: (1) the pump writes a heartbeat (last-tick timestamp); `spt daemon status` renders it and `subnet status` honesty accounts for a stalled pump ("pump stalled since <time>" — never implied-healthy while half-dead); (2) the daemon supervises the pump task — a panic is caught, logged, and the pump restarts with capped backoff (≤5 min), so a 5.9-class death self-heals visibly instead of 24 h of silence. Fail-fast rejected: the inbound dispatcher was usefully alive; the at-logon task shape doesn't auto-restart anyway. |
| 24 | — | **Epoch-reset hazard (NEW): document + mint inactive.** KNOWN-HAZARDS entry + `REQ-HAZARD-*` with `required_stages = []` (rule 5): a node whose advertisement-epoch counter resets re-advertises low; peers' higher last-seen epoch drops the fresh rows as stale. The common case (full reinstall) is already mitigated by decision 13's re-pair eviction (epoch memory dies with the deleted row) — acceptance 7 verifies that explicitly. The residual narrow slice (epoch file lost, identity kept) waits for a field hit before a guard is designed. |
| 25 | — | **D1 surface details (user 2026-06-07):** (a) `spt subnet notify [BODY] [--target <subnet>]` — without `--target`, auto-selects the HOME SUBNET of the calling endpoint (or of the endpoint the calling shell is attached to); no resolvable home subnet + no `--target` = deny (a home default is a defined rule, not a guess). (b) Bare `spt daemon` = the NODE STATUS view: daemon state + subnets list + local endpoints list. (c) The ex-`resources list` yellow-pages projection becomes **`spt endpoint list --detail`** (blurbs ride the merged endpoint list as opt-in detail); `endpoint description` keeps bare=show + `set` only. |

## Deliverables

**D1 — `endpoint` + `daemon` nouns (breaking reorg, no new machinery):**
namespace skeletons; move fork/suspend/wake/shutdown/rename/stop/digest +
access (1:1 per decision 21) + description under `endpoint` (decisions 1–3);
merged `endpoint list` with SELF pinned + subnet grouping; `notify` → `subnet
notify`; `daemon run|stop|status` — **status renders the pump heartbeat**
(decision 23 leg 1; the heartbeat writer itself lands in D4); help-group
placement (`help_groups_cover_every_command` keeps passing); `how-to` topics
updated; `cli/reference.md` regen (drift gate).

**D2 — subnet serve-state, trust lifecycle + status UX:**
per-subnet attach/detach machinery (selective advertise/connect in the peer
pump + responder; persisted `--auto` startup flags in daemon config), the
all-attached banner's per-subnet states, `subnet leave` (elevation),
`subnet prune <node>` (elevation; kills the dead-identity dials; BIGNET's
09ef831e rows are the acceptance fixture). **Re-pair trust overwrite**
(decisions 13/22): machine identifier = hashed OS machine id (minted-UUID
fallback) riding the registry rows (additive serde-default, old rows parse
clean — the M7 label pattern), ceremony-time same-label+same-id match
deletes the superseded identity's trust + registry rows on the seed-holder
and replicates the eviction. **Status render fixes** (decisions 11/12):
daemon-aware standalone text, hint lines moved to bare `spt subnet` only.
Disband deferred (decision 16).

**D3 — install & elevation hardening (the installer's owed touch):**
Linux: install.sh symlink leg into `/usr/local/bin` (graceful when
unelevated: print the one-liner instead) + first-`sudo spt` default-account
election + KH 5.7 interplay verified (daemon + state under the elected
account, never root). Windows: elevated install leg registers the firewall
rule (`New-NetFirewallRule`, inbound UDP) **and** the daemon self-detects
blocked inbound → rendered as the "no connection" state in `subnet status` +
coming-online banner (covers user-scope installs that skip the elevated leg).
Both OSes: OS-service registration per decision 17 — Linux systemd user
service + `enable-linger` (linger rides the elevated install leg); Windows
scheduled task at-logon (REQ-INSTALL-1 third leg — reboot-survivable
daemon). Installer-script E2E rungs extended (`SPT_INSTALL_E2E` stays
hermetic — service/firewall legs env-gated off in CI, proven in acceptance
instead).

**D4 — convergence polish + pump resilience:**
address seeding per decisions 20/14, both cold starts: post-join (ceremony
writes both sides' addrs into the new `peer-addrs.json`) and post-bringup
(pump reloads it on start so reconnect skips id-only discovery; id-only
fallback on miss or dial failure). **Event-driven advertisement**
(decision 15): endpoint online/offline transitions nudge the pump's
registry leg immediately (a wake of the existing loop — no second
advertisement path, so the epoch lease and visibility gates ride
unchanged); the cadence remains the catch-all floor. Observed-fix target
for each: first `status --nodes` convergence in seconds, not ~1 min.
**NTP TOTP-offset** (decision 18): ceremony-time NTP query → in-process
TOTP offset, system-clock fallback, never sets the OS clock. **Pump
supervision + heartbeat** (decision 23): panic caught → capped-backoff
restart; last-tick heartbeat written for D1's `daemon status` render and
`subnet status` honesty. **`xtask debug-converge` watcher** (decision 19,
`docs/DEBUG-CONVERGE-PLAN.md`, activates REQ-UPD-6 int) — built here,
used by acceptance 6.

**D5 — docs + CI + acceptance:**
docs-site sweep for every renamed verb (no page references `spt fork`, bare
`spt notify`, hidden `daemon`…), quickstart/how-to deltas, README status
table, llms.txt, reference regen; **KNOWN-HAZARDS epoch-reset entry +
inactive `REQ-HAZARD-*`** (decision 24); `quickstart_e2e` + twohost ladder
updated where verbs moved; `traceable-reqs check` + `xtask check` green.

Order: **D1 → D2 ∥ D3 → D4 → D5** (docs last so captured outputs are real).
D1 first — everything downstream renders the new verbs.

## Acceptance (real hardware, like M7)

1. **Fresh Linux install on kitsubito** (clean user, real one-liner): `sudo
   spt subnet ...` resolves and prompts the account election exactly once;
   daemon + state land under the elected account (never root); the node
   survives a reboot and is reachable before any manual `spt` invocation.
2. **Fresh Windows install** (elevated leg): inbound works with zero
   accumulated dev rules — a join succeeds where M7's NO_SEED_HOLDER failed;
   the unelevated-install path renders the blocked-inbound state honestly in
   `subnet status`.
3. **Real elevated-spawn verification on both OSes** rides this run (the
   DEFERRED row-24 follow-up: KH 5.7 de-elevation seam observed live).
4. Old verb shapes are gone (`spt fork` etc. → unknown subcommand); hot path
   unchanged; e2e suites green; no doc page references a moved verb's old
   home.
5. `subnet detach`/`attach` flips advertising live (peer observes
   offline/online); `subnet prune` removes BIGNET's 09ef831e trust rows and
   the dead dials stop.
6. Post-join first sync AND post-restart resync both converge in seconds on
   the rig (daemon restart on one node; the peer re-appears in
   `status --nodes` without the 60s id-only rediscovery). An endpoint
   flipping online/offline is visible in the peer's `status --nodes` within
   seconds (event-driven advertisement, not cadence wait). Evidence via the
   `xtask debug-converge` table (decision 19).
7. Re-pair proof: regenerate a rig node's identity, re-join with the same
   label — the subnet ends with ONE trust row + ONE registry identity for
   that machine (old rows auto-evicted at ceremony), no manual prune needed.
   Sub-check (decision 24): the evicted identity's peer-side epoch memory is
   gone with the row — the re-paired node's fresh epochs are not dropped as
   stale.
8. `spt subnet status` with the daemon stopped says so (no "messaging works"
   implication); hint lines appear on bare `spt subnet` only.
9. Skewed-clock pairing (decision 18): a node with its clock ≥1 min off
   (enlyzeam, or a rig node with the clock deliberately skewed) completes
   the pairing ceremony via the NTP TOTP offset; with NTP blocked, behavior
   matches today's (system clock, ±1 window).
10. Pump-death honesty (decision 23): with the pump task killed/stalled,
    `spt daemon status` and `subnet status` render the stall (never
    implied-healthy); a panicking pump restarts with capped backoff
    (observed via the supervised-restart log line) instead of dying silent.

## Non-goals

- spt-claude-code (separate repo; scoping starts after M8 closes).
- PresenceChannel, instantiate-anywhere, remote-exec, sandboxing, sidecar
  adapters, manifest composition (DEFERRED — triggers unmet).
- Tier-2 agent docs (MCP server / `--help --json`).
- Elevated spt-hosted endpoints (consented elevation satellite — own
  deliverable, rides REQ-CONSENT-2 when triggered).
- `spt subnet disband` (decision 16 — deferred, non-breaking to add later).
- Epoch-reset *guard* (decision 24 — documented hazard only; guard waits
  for a field hit of the narrow slice).
- Windows at-startup S4U task variant (decision 17 — opt-in if a headless
  Windows node ever exists).
- Daemon-wide NTP time correction / setting the OS clock (decision 18 —
  TOTP offset only).
