# Debug rollout runbook

<!-- [doc->REQ-UPD-6] -->

Debug rollout is the maintainer-only fast path for testing a local spt-core
build across a trusted lab subnet. It uses the normal signed self-update
substrate: no raw peer file-copy, no production `spt` CLI surface, and no
embedded debug trust anchor.

Use this when debugging needs a local build to reach multiple lab nodes quickly.
Do not use it for public releases; use `docs/RELEASE-RUNBOOK.md` for that.

## Mental model

- A rollout is a signed `SignedUpdateSet`: one signed metadata record, with one
  artifact digest per Rust target triple.
- Each recipient verifies the set under its node-local `release-keys.json`,
  selects only its own platform artifact, stages it, and then follows the normal
  consent/apply policy.
- Debug and stable are separate channels. A debug-pinned node accepts only
  `channel = "debug"` offers; a stable-pinned node accepts only stable offers.
- Debug versions are monotonic within the debug channel. To recover from a bad
  debug build, publish a higher debug version, even if the bytes are a previous
  known-good binary.
- A broker-touching candidate may be staged, but apply refuses it while
  broker-held runtime is live. Shut down or suspend hosted endpoints first; the
  first flow does not auto-cycle them.

## One-time lab setup

Generate a debug key on the coordinator:

```powershell
cargo run -p xtask -- debug-keygen dev-debug-2026
```

Set the printed seed as a long-lived coordinator-local environment variable:

```powershell
$env:SPT_DEBUG_RELEASE_SEED = "<seed_hex>"
```

Pin each lab node to the debug channel with the printed public key:

```powershell
cargo run -p xtask -- debug-pin --key-id dev-debug-2026 --public-key <public_hex>
```

This writes `$SPT_HOME/identity/release-keys.json` with the debug public key and
`"channel": "debug"`. Removing the key or setting the channel back to
`"stable"` removes the node from debug rollout eligibility.

## Stage a rollout

From the fast coordinator, stage an update set into the local release cache:

```powershell
cargo run -p xtask -- debug-rollout --build-current --artifact "x86_64-unknown-linux-gnu=<path-to-linux-spt>"
```

Common flags:

- `--build-current` builds and includes the coordinator's current platform.
- `--artifact <target=path>` adds an already-built artifact for another target,
  such as `x86_64-pc-windows-msvc` or `x86_64-unknown-linux-gnu`.
- `--version <u64>` overrides the local debug sequence. Use a higher value when
  recovering from lost local state.
- `--stage-dir <path>` stages somewhere other than `$SPT_HOME/releases`.
- `--state <path>` stores the local debug sequence somewhere other than
  `target/debug-rollout-state.json`.

The staged update then propagates through the normal pull-based update pump.
There is intentionally no bespoke push path.

## Apply and observe

On a default-gated node, apply is still explicit:

```powershell
spt update apply
```

Nodes configured for full-auto update may apply after staging without a prompt.
For quick debugging, run or poke the daemon/update pump on lab nodes so they
query peers promptly.

**Convergence watcher (M8-D4, decision 19).** Instead of hand-walking nodes,
run the maintainer watcher from the coordinator:

```powershell
cargo run -p xtask -- debug-converge --version <N> [--subnet <name>] [--timeout 120] [--poll 3]
```

It polls every expected node (the coordinator's trust rows; `--nodes <hex,…>`
overrides) over the status-only update query and prints a per-node table —
`Applied` / `StagedAwaitingConsent` / `NotPinned` / `Rejected{reason}` /
`Offline` — exiting `0` only when every node applied the target (`1`
otherwise, `2` on usage). Spec + state model: `docs/DEBUG-CONVERGE-PLAN.md`.
The manual walk below remains the fallback.

Useful outcomes:

- `NoArtifactForPlatform` means the signed set did not include the recipient's
  Rust target triple.
- `WrongChannel` means the node is not pinned to `debug`.
- `Rollback` means the offered debug version is not greater than the node's
  current debug-channel version.
- `RefusedClass(BrokerBreaking)` means the candidate touches broker-held
  resources; quiesce hosted endpoints first.

## Agent checklist

1. Read `CONTEXT.md` "debug rollout" and ADR-0016 before changing the flow.
2. Keep debug tooling in `xtask` or other maintainer tooling, not the public
   `spt` CLI/help surface.
3. Keep verification on the production update substrate: signed metadata,
   per-platform artifact digest, channel pinning, monotonic version, and
   apply-time re-verification.
4. Add or update traceability evidence for `REQ-UPD-6`.
5. Run `cargo fmt`, focused update tests, and `traceable-reqs check`.
