# Mesh-D4 — roster propagation (on-connect exchange/merge)

> JIT plan for `SUBNET-MESH-PLAN.md` §Build phases · phase 4. **Spreads the discovery directory.** D3 seeded the roster at the pairing ceremony only; D4 makes it propagate in steady state: every seed-proof'd **member** connection exchanges and merges rosters, so a node learns members it never paired with (the transitive growth `A learns C via B`). Addresses are captured from live connections and reconciled with `peer-addrs.json`. **Enriches REQ-MESH-2 impl; lands REQ-MESH-2 int** (the on-connect propagation is exactly its int evidence). No new requirement. **Still no mesh _reach_** — the gate swap (D5) and the fan-out widen (D6) are untouched; D4 only makes the directory converge so D6 has somewhere to dial.

## The seam (grounded in the code)

A "member connection" is a QUIC conn that passed the **mutual seed-proof** (Mesh-D2). That proof runs on **one dedicated bidi control stream**, opened by the dialer (`open_bi`) / accepted by the acceptor (`accept_bi`) in `crates/spt-daemon/src/seedproofx.rs`, **before** `register_conn` wires the general app-stream acceptor (`nethost.rs:514,609`). A four-leg ping-pong (`Hello`×2, `Proof`×N each way) yields `proven: HashSet<String>` — the subnets **both** sides proved current-epoch seed-knowledge of.

**D4 appends a fifth/sixth leg to that same stream**, after the proofs verify:

```
dialer:   send Proofs → read+verify acceptor Proofs → SEND Roster → READ acceptor Roster → finish
acceptor: read+verify dialer Proofs → send Proofs → READ dialer Roster → SEND acceptor Roster → finish
```

Dialer sends its roster first (buffers on the stream while the acceptor reads), acceptor then sends its own — the **same send-then-read / read-then-send discipline** the proofs already use, so the single bidi stream never deadlocks.

**Why ride the proof stream instead of a second stream or `Frame::Seed`:**
- `Frame::Seed` is the *pairing* ALPN (`SPT_PAIR_ALPN`), a one-shot ceremony — not a steady-state member conn. Wrong channel.
- A *second* control stream would race the app-stream acceptor that `register_conn` wires post-proof. The proof stream is already open, already mutual, already finished cleanly after — appending one leg is the smallest safe change.
- **Security, by construction:** roster bytes ride a leg that only executes *after* both proofs verify, so the REQ-MESH-2 invariant "exchanged only over seed-proof'd member connections" is **structural**, not a convention a future edit can break. And the provider sends only the **proven** subnets' slices (§Part B) — a peer that proved subnet X never receives subnet Y's roster (the same cross-subnet-leak defense D3 baked into `Frame::Seed`'s implicit-subnet adopt).

## Part A — the roster-exchange frame (`spt-net`, shared codec)

Add `SeedProofFrame::Roster { entries: Vec<RosterEntry>, tombstones: Vec<Tombstone> }` to `crates/spt-net/src/net/mesh/seedproof.rs`.

**Subnet is EXPLICIT on this wire** (unlike `Frame::Seed`, where it's implicit because the ceremony fixes one subnet). A member conn carries multiple proven subnets at once, so each `RosterEntry`/`Tombstone` keeps its real `subnet` field — the sink merges them as-is, no stamping.

**DRY the codec:** the per-entry field encoding already exists in `pairing/wire.rs` (D3's `enc`/`dec_roster` for `Frame::Seed`). Extract it into a shared helper — `crates/spt-net/src/net/mesh/rostercodec.rs` (`enc_entries`/`dec_entries`/`enc_tombstones`/`dec_tombstones`) — and have **both** `Frame::Seed` (wire.rs) and `SeedProofFrame::Roster` (seedproof.rs) call it. The only delta vs D3's encoding: the shared form writes the `subnet` field per entry (Seed's variant zero-lengths it / stamps on adopt). Keep the `u32 count`-then-length-prefixed-fields shape; counts are never pre-allocated from the wire (a hostile count hits "truncated" on the first short field — D3's posture). `MAX_FRAME` in seedproofx is already `64 * 1024` (D3 bumped it for exactly this payload).

`SeedProofFrame::Roster` encode/decode mirrors the existing variants; decode returns `None` on malformed (drops the conn — but the proofs already passed, so a malformed roster leg is a protocol bug, not an attacker who matters; fail closed anyway).

## Part B — wiring the exchange into `prove_membership` (`spt-daemon/seedproofx.rs`)

`prove_membership` gains two injected closures (decouples crypto-proof from the spt-store directory — the `MembershipSource` posture):

```rust
pub type RosterProvider = Arc<dyn Fn(&HashSet<String>) -> (Vec<RosterEntry>, Vec<Tombstone>) + Send + Sync>;
pub type RosterSink     = Arc<dyn Fn(Vec<RosterEntry>, Vec<Tombstone>) + Send + Sync>;
```

- **provider(proven)** → flatten `roster_for(s)` over every `s ∈ proven` into one `(entries, tombstones)` list, subnets intact. Sends **only proven subnets** (the leak defense).
- **sink(entries, tombstones)** → merge into the durable store + reconcile peeraddrs (§Part C). Runs on the net runtime; does its own `load`/`save` (rare, per-connect — the store's posture).

Both default to `None`-equivalent no-ops when `membership: None` (mechanics-only tests). Threaded through `NetConfig` alongside `membership` (`nethost.rs:75`), cloned into both the accept loop and `dial`, passed to `prove_membership`. `prove_as_dialer`/`prove_as_acceptor` gain the roster legs at the positions diagrammed above, gated on `!proven.is_empty()` (no shared subnet already dropped the conn before this point).

## Part C — the sink: address capture + peeraddrs reconcile (the seam decision)

**Decision: roster FEEDS peeraddrs (gap-fill only); the live endpoint FEEDS the roster self-entry. `peer-addrs.json` is RETAINED as the dial cache — not absorbed, not retired** (REQ-CONV-1 is load-bearing in `peerloop::dial_seeded`; D6 reach dials *through* it).

Two address flows, kept distinct:

1. **My own entry advertises MY address.** Before the provider builds its slice, refresh this node's self-entry for every held subnet: `upsert_self(subnet, self_pk, label, machine_id, Some(endpoint.addr() as JSON), fresh_lease)`. Source = `NetHost::addr()` (`nethost.rs:568`, the endpoint's id+paths), label/machine_id = the same `os_hostname()`/`machine_id_hash()` D3 used, lease = `EpochSource::load().current()` **peek** (no consume — propagation must not disturb registry ordering, D3's rule). Generalize D3's `refresh_self_roster` helper. Without this, my entry (seeded at pairing with `address: None`) never advertises a dialable address and peers can't transitively reach me — this is the line that makes D6 work.

2. **A learned member's advertised address gap-fills peeraddrs.** In the sink, after merging: for each incoming entry with `address: Some(addr)` and `pubkey ≠ self`, if `PeerAddrStore` has **no** entry for that pubkey, `record(peeraddrs_file, pubkey, addr)`. **Gap-fill only** — a locally *observed* peeraddr (written by `peerloop` after a real dial, `conn_remote_addr`) is strictly fresher/more-reachable than a third-party-advertised one, so never clobber it. This is what lets A, having learned C's entry from B's roster, later dial C directly (the D6 reach precondition).

`conn_remote_addr` (`endpoint.rs:370`) still feeds peeraddrs for the **directly** connected peer exactly as today (`peerloop` write-back, untouched) — D4 adds only the *transitive* gap-fill from merged entries.

## Part D — the daemon production wiring

`production_membership()` has a sibling `production_roster_provider()` / `production_roster_sink()` (in `seedproofx.rs` or a small `rosterx.rs`), wired where `NetConfig` is built for the real daemon (find the `production_membership()` call-site). The sink closure captures the peeraddrs path + roster path and does the load→merge→save + gap-fill. Hermetic/mechanics tests pass `None` (no roster exchange, as today).

## Explicitly NOT D4 (carry-forward)

- **D5** — swap the five `is_trusted` gates to `roster.is_member` / the `ConnEntry.proven_subnets` set; hard cutover (delete `peers.json` + the `TrustStore` auth path). **Pre-D5 open item #3** (confirm self-update is node-local per ADR-0016, not riding member conns) is checked *before* D5, not here.
- **D6** — fan-out widen (push target → all roster members in `peerloop`/`advertise_local`) → **MESH ON**, D0 harness flips green. D4 leaves the push target at directly-paired peers; it only makes the *directory* converge.
- **D7** — revoke CLI, timeboxed rotation, confidential seed push, re-seed grace, cross-node un-tombstone propagation.

## Tests

**unit — `spt-net/src/net/mesh/seedproof.rs`** (`// [unit->REQ-MESH-2]`):
- `SeedProofFrame::Roster` round-trips: multi-subnet entries (one with an `address` JSON, one without) + tombstones; empty entries+tombstones; malformed trailing (bad count / short field / non-UTF-8) ⇒ `None`.

**unit — `spt-net/src/net/mesh/rostercodec.rs`** (`// [unit->REQ-MESH-2]`):
- shared `enc_entries`/`dec_entries` round-trip with the explicit `subnet` field; parity sanity that `Frame::Seed`'s use (subnet stamped on adopt) still round-trips through the shared helper (regression guard on the D3 ceremony codec after the extraction).

**unit — `spt-daemon` (seedproofx / nethost)** (`// [unit->REQ-MESH-2]`):
- **the propagation merge**: two hermetic `NetHost`s (loopback, relay off) both wired with membership + roster on a shared subnet; B's store pre-seeded with an entry for an **offline** C; A dials B; assert **A's roster now contains C** (transitive growth) and **A's peeraddrs got C's advertised addr** (gap-fill) — the store-level proof of "A learns + can reach C via B."
- **gap-fill discipline**: a pre-existing locally-observed peeraddr for a pubkey is **not** clobbered when a roster entry for that pubkey is merged; an absent one **is** filled.
- **self-entry advertises addr**: after a member connect, this node's own roster entry carries `Some(address)` from `endpoint.addr()` (was `None` post-pairing).
- **leak defense**: provider over `proven = {X}` never emits subnet Y's entries even when the store holds Y.

**int — `spt-daemon`** (`// [int->REQ-MESH-2]`, **activates REQ-MESH-2 int**):
- the plan's acceptance for this phase: B knows `{A, C}` (C offline / never directly paired to A); A connects B; after the connect settles, **A's durable roster contains C and A's peeraddrs is seeded for C** — A has both *learned* and is *able to reach* C, with B never relaying a row. (The full 3-node staggered **reach** convergence stays the D0 harness → REQ-MESH-3 int at D6; this int proves only the directory propagation REQ-MESH-2 owns.)

## Done when

- The roster-exchange leg rides the proof control stream; member connects exchange + merge rosters both ways; self-entry advertises `endpoint.addr()`; merged entries gap-fill peeraddrs.
- Shared `rostercodec` extracted; `Frame::Seed` (D3) and `SeedProofFrame::Roster` both use it; the D3 ceremony tests still pass (extraction is behavior-preserving).
- `cargo test -p spt-net mesh::seedproof`, `cargo test -p spt-net mesh::rostercodec`, `cargo test -p spt-net pairing::wire` (D3 regression), `cargo test -p spt-daemon` (propagation unit + int) green; `--no-default-features` + workspace build + clippy clean.
- `traceable-reqs check` clean with **REQ-MESH-2 `required_stages = ["impl","unit","int"]`** (D4 adds `int`). Evidence on the frame, the codec helper, the provider/sink, the self-entry refresh, the peeraddrs gap-fill, and the tests.
- The five `is_trusted` gates and the `peerloop` push target remain **untouched** (D5/D6 own those). No mesh reach yet — only directory convergence.
- Commit: `feat(mesh): on-connect roster propagation + peeraddrs reconcile (REQ-MESH-2) — Mesh-D4`.
