# M4-D3 — subnet registry + resolution policy (JIT plan)

**Status:** not started. D2 complete (REQ-PAIR-1..6 `[impl,unit]`, CI-green at `a82446f`).
D3 builds the per-subnet **eventually-consistent registry** `endpoint_id → [instances]`
and the bare-id **resolution policy**, plus the monotonic-epoch lease that red-team **#8**
demands so partitions/skew never route to the wrong instance.

## Goal

A node holds, per subnet, a registry of `endpoint_id → [instance...]` where an instance is
`{ node, status: Active|Dormant|Offline, epoch }`. Resolving a bare id picks the right live
instance (local → most-recently-active → explicit `id@node`); ambiguity across visible
subnets **refuses and forces qualification** (no guessing, ADR-0006 §2). Cross-node
*replication* of this registry rides the transport (D4) — D3 builds the **local model +
merge + resolution**, hermetically testable now, exactly as D2 built ceremony logic before
the D4 wire wiring.

## Decisions (locked)

- **Where the registry lives:** `crates/spt-net/src/net/registry.rs`. spt-net already owns
  "the eventually-consistent per-subnet registry (ADR-0006)" and the serde wire/registry
  types (its Cargo comment). The pure data model + merge is `net`-feature-gated (default-on),
  so unit tests run; cross-node replication wires in at D4.
- **Epoch source:** `crates/spt-store/src/epoch.rs` — a persisted **per-node monotonic
  counter** (`EpochSource::next() -> u64`, strictly increasing, NEVER wall-clock). Lives in
  spt-store (no net dep) because the SAME source serves D3b registry leases **and** D6 sync
  precedence (M4-PLAN: "one per-node monotonic epoch source serves registry leases *and*
  sync precedence"). Persisted as a tiny `identity/epoch` file, atomic-write.
- **Merge = version vector (#8 / #7 unified):** a registry update carries the author's
  `(node, epoch)`. On receipt for a given `(endpoint, node)`: incoming epoch **strictly
  greater** ⇒ accept (newer state, incl. a Dormant/Offline transition); **≤** ⇒ drop (stale —
  a lagging "Active" can NOT overwrite a newer "Offline"); **different node** ⇒ a distinct
  instance row (kept; ambiguity handled at resolution, not merge). Wall-clock is at most a
  human tiebreaker hint, never the ordering authority.
- **Resolution refuses on ambiguity** (REQ-INST-10): two distinct endpoints sharing a bare
  name across visible subnets ⇒ error that forces `subnet:id` / `id@node`. Reuse the
  charset/delimiter invariant (REQ-HAZARD-ID-CHARSET) — `:`/`@` are reserved.
- **Local-now, wire-later:** no QUIC/replication in D3; the registry is a per-node view fed
  locally (and, at D4, by inbound wire updates through the same `merge` seam). int = D9.

## Pieces (build order)

1. **D3a — registry model** (`REQ-INST-7`). `Status {Active,Dormant,Offline}`, `Instance
   {node: String, status, epoch: u64}`, `SubnetRegistry { by_id: BTreeMap<String,
   Vec<Instance>> }`, serde-roundtrippable. Constructors + `instances(id)` accessor.
2. **D3b — epoch lease + merge** (`REQ-HAZARD-REGISTRY-EPOCH-LEASE`, **new hazard, add to
   toml first**). `spt-store::epoch::EpochSource` (persisted monotonic `next()`); registry
   `merge_instance(endpoint, Instance)` applying the version-vector rule above. The hazard
   test: a stale lower-epoch `Active` must NOT win over a newer higher-epoch `Offline`; an
   equal-epoch replay is a no-op; an advance is accepted.
3. **D3c — resolution policy** (`REQ-INST-10`). `Address::parse("[subnet:]id[@node]")`;
   `resolve(&registry, &Address, local_node) -> Resolution { One(Instance) | Ambiguous(..)
   | None }`. Order: exact `@node` → local node → most-recently-active (highest epoch among
   Active) → refuse on a cross-instance tie.
4. **D3d — multi-subnet membership** (`REQ-INST-9`). SubnetStore already holds N seeds; add
   the **join-time bare-id collision check** (a joining endpoint id already present in the
   target subnet refuses). Same-user only; cross-user seam left shaped.
5. **D3e — visibility** (`REQ-INST-12,13`). Per-`(endpoint,subnet)` excluded = not advertised
   AND not routable; `hidden = S.hide_new_endpoints OR E.default_hide` unless per-`(E,S)`
   override; **visibility gates sync**. Resolution/advertise honor excluded.
6. **D3f — rename ripple** (`REQ-INST-11`). `spt rename <id> <new_id>`: registry rows, every
   perch dir, the `a-<id>` context branch; collision-checked per target subnet; 6.5-reconciled.

## THIS slice — D3a + D3b (registry model + epoch lease + hazard)

The hermetic substrate everything else stands on. Ship green, then D3c.

- `spt-store::epoch` — `EpochSource { load/load_from, next() }`, persisted monotonic; absent
  ⇒ start at 1; corrupt ⇒ degrade safe (re-seed, never panic — match SubnetStore ethos).
- `spt-net::net::registry` — the model + `merge_instance`.
- Activate `REQ-INST-7` + `REQ-HAZARD-REGISTRY-EPOCH-LEASE` `[impl,unit]`. Add the hazard to
  `traceable-reqs.toml` AND a `KNOWN-HAZARDS.md` §4.8 section first (rule 3/4).

### Tests (this slice)
- epoch: `next()` strictly increases; survives reload (persisted); corrupt/absent file →
  safe restart; never returns a wall-clock-derived value (monotone across two quick calls).
- registry: serde roundtrip; `merge` accepts strictly-newer, drops stale (lower/equal),
  keeps distinct-node instances as separate rows.
- **hazard REQ-HAZARD-REGISTRY-EPOCH-LEASE:** lower-epoch `Active` does NOT overwrite a
  higher-epoch `Offline` for the same `(endpoint,node)`; the resolved status stays `Offline`.

## NOT in D3 (this milestone)
- Cross-node registry **replication** over QUIC (D4) — int/two-host at D9.
- Concurrent-write conflict **surfacing UX** (D6, confirm with user at D6 start).
- BranchStore-backed persistence (REQ-STORE-1 deferred) — D3 uses a plain atomic file.

## Conventions (carried)
- NO `cargo fmt`. Tag `[impl->REQ-…]` / `[unit->REQ-…]` on real evidence.
- `traceable-reqs check` must pass before done. Local clippy can't see `#[cfg]` arms — the
  Linux CI clippy is the real gate (D2g lesson). Commit trailer:
  `Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>`.
