# Subnet full-mesh via symmetric seed-proof membership + roster-only relay

## Status

accepted (2026-06-08) — amends ADR-0005 (pairing/trust) and ADR-0006 (multi-subnet membership); **preserves** KNOWN-HAZARDS 4.10 and 7.5 (does **not** supersede them — see Consequences). Design ratified in the 2026-06-08 grill-with-docs session (`SUBNET-MESH-PLAN.md` was the opening sketch; this ADR is the decided architecture).

## Context

A subnet was, until now, the **pairing graph, not a mesh**: a node saw only the peers it had *directly* paired with. Repro (user, 2026-06-08): A creates subnet → B joins A → A offline → C joins B → B offline → A online → **A and C can never see or reach each other**. Two v1-by-design gaps caused it:

1. **Authorization was pairwise TOFU pinning** (`spt-store::trust`): the inbound gate is `is_trusted(subnet, origin_node)` — A pinned B, C pinned B, A never pinned C, so A rejects C's origin.
2. **Gossip was own-rows-only with no roster and no relay** (`registryhost::advertise_local`, `peerloop`): a node pushed only its own registry rows to its directly-paired peers, and the pairing seed transfer (`pairing::wire::Frame::Seed`) carried only `{seed, epoch}` — no member list. So A never even *learned* C exists.

The plan's first instinct was **per-node-signed transitive row relay** — relay third-party registry rows, signed by their authors. That works but **breaks KNOWN-HAZARDS 4.10's safety assumption** ("v1 has no transitive gossip, so any future update for a node comes from that node itself, alive — no lagging third-party replay to mis-order against") and KH 7.5's per-author honesty, forcing a re-derivation of the eviction + epoch lease against lagging relayed rows. That re-derivation is subtle and was the plan's stated "subtle core."

The grill found a design that **dissolves that subtlety instead of solving it**, anchored on one fact: under ADR-0005 **every member node already holds the subnet seed**.

## Decision

**A subnet member is any node that can prove knowledge of the subnet's current-epoch seed.** Authorization moves from a stored pairwise pin to a **live, per-connection symmetric membership proof ("seed-proof")**; visibility is restored by relaying only a **member roster** (a discovery directory), never registry rows. Concretely:

1. **Membership proof (seed-proof) replaces `is_trusted` at every inbound gate** (registry apply, WAN message receive, sync, notif, connection accept). Derive a per-(subnet, seed-epoch) membership key `MK = HKDF(seed, domain ‖ subnet_id ‖ seed_epoch)`; at each QUIC connection a **mutual challenge–response** exchanges fresh nonces and `MAC(MK, transcript)` where the transcript binds **both handshake-proven node pubkeys, both nonces, subnet_id, seed_epoch, and role**. Channel-binding (both pubkeys in the MAC'd transcript) is load-bearing: a member cannot MITM-relay another member's proof onto a different connection. Verified **once per connection**, cached on the broker `ConnEntry`; connections are kept warm (QUIC keep-alive) so re-proof is rare (restart/partition/rotation only) and never a per-message tax. The node's per-node Ed25519 identity is unchanged (the QUIC handshake still proves it — KH 7.5). **Exact-epoch match** required (the re-seed path below is the sole N-1 exception).

2. **Member roster — relayed; registry rows — not.** The roster is a node-level union-merge grow-set (per member: pubkey, label, machine_id, last-known address, last-seen — **not** the seed), **seeded in full at pairing** (the seed-holder hands the joiner the whole current roster, so a fresh node knows even offline members) and merged on every member connection (each node authors its own entry, ordered by the per-node monotonic **lease epoch**, strictly-greater wins). Registry rows stay **own-authored**; the only change to gossip is that the push target widens from *directly-paired peers* to **all roster members** — a wider *direct* fan-out, never a third-party relay. So every row and message still arrives from its author over a handshake.

3. **Removal needs a tombstone.** A grow-set can't delete by omission (an un-tombstoned revokee re-inserts itself on its next seed-proof'd connect). `spt subnet revoke <node>...` writes a per-pubkey **tombstone** that propagates over member connections, dominates the roster entry, and augments the gate to **seed-proof ∧ ¬tombstoned**. A completed re-pair ceremony for a tombstoned pubkey clears its tombstone.

4. **Revocation = seed rotation, timeboxed.** The tombstone schedules **one** seed rotation (re-mint seed, bump `seed_epoch`, push the new seed **confidentially over member-authenticated TLS connections** — never in roster/registry gossip) at the close of a **coalescing window (default 1 h)**; further revokes in the window join the same rotation → one epoch bump however many nodes. `--force-rotate-seed` rotates immediately (compromised-node path). Elevation-gated, consistent with ADR-0006 membership mutations.

5. **Re-seed grace (auto-heal).** A benign member offline during a rotation returns on `seed_epoch − 1` and would fail exact-epoch proof. A node proving the **immediately-prior** epoch **and still on the roster** is granted a **re-seed-only** restricted connection that hands it the new seed and nothing else. The revoked node is off-roster → denied (not a revocation hole); a node ≥2 rotations stale gets no grace → re-pair. Batch/timeboxed revoke keeps multi-removal to one epoch bump so benign offliners stay inside this one-deep window.

6. **Warn-on-change demoted, hard cutover.** Pairwise authorization is retired; `peers.json` is deleted with **no migration** (the fleet is an expendable test ground until spt-core stabilizes — user decision 2026-06-08; the fleet re-pairs fresh under the new model). Warn-on-change survives only as an **awareness notice** (not a gate) anchored on **machine_id** (stable across reinstall; hostnames collide): "machine M, last seen as K1, now presents K2." Same event drives the REQ-SUBNET-7 re-pair overwrite.

Rejected alternative — **per-node-signed transitive row relay** (the plan's opening sketch): preserves per-author honesty but requires relaying third-party rows, which re-opens the KH 4.10 eviction lease against lagging replays and adds per-row signature machinery. It buys insider-honesty (one compromised member can't forge *another's* rows) that the v1 same-user threat model does not require — every member is the one user's own node, and a compromised member already holds the seed (full compromise, rotation-mitigated). Roster-only relay + direct row fetch achieves the same mesh visibility while **preserving** both hazards verbatim.

## Consequences

- **KNOWN-HAZARDS 4.10 and 7.5 are preserved, not superseded.** No registry row is ever relayed, so 7.5 ("origin is the handshake-proven node, never payload") holds verbatim and 4.10's eviction lease is untouched ("any future update for a node comes from that node itself, alive"). 4.10's wording is clarified: "no transitive **row** gossip" — the roster relays, rows do not. This is the central payoff of choosing roster-only relay over signed row relay.
- **Topology is O(N²) direct connections** (every member meshes every member). Trivial at same-user personal-fleet scale; iroh relay handles NAT. Only bites at large N, which same-user subnets are not.
- **A compromised member can impersonate membership and inject roster entries.** Both are inert beyond the existing model: it already holds the seed (ADR-0005 treats this as full subnet compromise, mitigated by rotation), and a forged roster entry names a pubkey that still cannot seed-proof. Seed-proof adds **no** new compromise surface.
- **Revocation requires the user to re-pair benign members stranded across multiple separate revoke events** (≥2 rotations stale, outside the one-deep re-seed grace). Batch/timeboxed revoke makes the common "remove several at once" case a single epoch bump.
- **New security invariants to test:** seed-proof must be channel-bound (no cross-connection replay) and mutual; the re-seed grace must be roster-gated (no revoked re-admit); the seed must never enter roster/registry gossip. Captured under `REQ-MESH-*`.
- **Amends** ADR-0005 (the trust model becomes membership-proof, not pairwise pin; warn-on-change demoted) and ADR-0006 (the cross-user seam now layers a roster-authority chain *on top of* seed-proof — symmetric membership generalizes to per-(subnet,user) membership later without reworking the data path).
- **Folds in** the deferred "pairing-time hostname capture" and "post-join address seeding" (DEFERRED.md) — both are roster fields delivered at the ceremony.
- **CONTEXT.md** updated inline (subnet member / membership proof, member roster, seed rotation / revoke, re-seed, warn-on-change reframe; trust store marked retired).
