# Mesh-D2 — seed-proof at connect + ConnEntry membership set

> JIT plan for `SUBNET-MESH-PLAN.md` §Build phases · phase 2. **Wires the D1 codec onto live connections.** Run the mutual channel-bound seed-proof on every dial + accept in `nethost` *before* the connection is usable; cache the proven membership on `ConnEntry`; fail → drop. Verify/enable QUIC keep-alive so conns stay warm. **Old `is_trusted` gates keep running in parallel** (belt + suspenders — no regression; the gate swap is D5). **Enriches REQ-MESH-1 (impl)**; activates REQ-MESH-1 **int**.

## The one design decision (and why it's forced, not chosen)

D1 imagined one `subnet_id` per connection. The codebase says otherwise:

- `dial(addr)` (`broker.rs:593/605`) takes only an address — **connections are not subnet-scoped**. iroh dedups to one QUIC conn per peer-pair on the shared `SPT_NET_ALPN`, so a single conn carries traffic for **every subnet the two nodes share**.
- The five inbound gates D5 will swap are all **per-`(subnet, origin)`, evaluated per item** (`registryhost.rs:81/228/291`, `notifsync.rs:60/106`, `sync.rs:102/107`) — each registry row / notif / sync item carries its own `subnet`, and the gate asks `is_trusted(item.subnet, origin)`.

Therefore the cached verdict D5 consumes must answer *"has this peer proven subnet S?"* for **any** S an item names → **a set of proven subnets per `ConnEntry`, not a bool.** Single-subnet-per-conn would require threading a subnet through `dial`/accept (more plumbing) and still couldn't answer the multi-subnet gate. The set is both more correct and less invasive.

**Decision:** at connect, the two nodes prove the **full set of subnets they share**; `ConnEntry` caches `proven_subnets: HashSet<String>` (current-epoch). D5's gate becomes `conn.proven_subnets.contains(item.subnet)`.

## The connect-time exchange (one dedicated control stream, before the conn is usable)

The seed-proof rides the **first bidi stream**, opened by the dialer and consumed *before* `register_conn` wires the general `accept_bi` app pump — so it is never confused with an app stream.

```
dial:    connect → open_bi(ctrl) → run_seedproof(Dialer)   → ok? register_conn(conn, proven) : drop
accept:  connecting.await → conn → accept_bi(ctrl)         → ok? register_conn(conn, proven) : drop
         (pairing-ALPN branch is untouched — pre-trust, separate)
```

`run_seedproof` (new `spt-daemon` module `seedproofx.rs`, using `spt_net::net::mesh::seedproof` primitives — orchestration lives in the daemon; the crypto stays in `spt-net`):

1. **Dialer → `Hello { nonce_d, subnets: [names] }`** — its local subnet names + a fresh 32-byte OS-CSPRNG nonce.
2. **Acceptor → `Hello { nonce_a, subnets: shared }`** — replies with **only the intersection** of the dialer's list and its own (minimizes name disclosure to an unauthenticated dialer; see Security notes). `shared` is sorted (deterministic order both sides agree on).
3. Empty `shared` ⇒ both **drop** (a legit member always shares ≥1 subnet; no shared subnet = nothing to mesh).
4. For each `S` in `shared`: each side derives `MK_S = MembershipKey::derive(seed_S, S, epoch_S)` **from its own local subnet record** (each uses *its own* `epoch_S` — an epoch skew makes the tags diverge → exact-epoch reject, which is correct), builds the shared `SeedProofTranscript { S, epoch_S, dialer_pub, acceptor_pub, nonce_d, nonce_a }`, sends `Proof { tag_S(own_role) }`, and `verify`s the peer's `Proof` for the opposite role.
5. `proven = { S | peer's tag verified }`. **Any agreed-shared subnet that fails to prove ⇒ drop the whole connection** (an impostor naming a subnet it can't prove, or an epoch mismatch — neither is a usable member conn). On full success `proven == shared`; cache it.

Nonces are exchanged **once per connection** and reused across the per-subnet proofs (the transcript already binds `subnet_id` + `seed_epoch`, so per-subnet tags differ; the nonces only need per-connection freshness). Pubkeys come from the QUIC handshake (`conn.remote_id()` + local identity) — the channel binding.

### Wire: extend the D1 frame set (additive)

`SeedProofFrame` (D1: `Challenge{nonce}` / `Proof{tag}`) gains:

```rust
Hello { nonce: [u8; NONCE_LEN], subnets: Vec<String> },   // tag byte 2; len-prefixed count + len-prefixed names
```

`Challenge` is effectively subsumed by `Hello` (nonce + subnet list in one frame) — keep `Challenge` defined (D1 tests pin it) but the D2 driver uses `Hello`/`Proof`. `decode` stays total (unknown tag / bad length / bad UTF-8 ⇒ `None`). Length-prefix the name vector (u32 count, then per name u32 len + bytes) — same framing discipline as `pairing::wire`.

Subnet **names** go on the wire (not seeds, not epochs). Names are human labels, not secrets; the seed never leaves the node (ADR-0017 invariant). See Security notes.

## ConnEntry + register_conn changes (`nethost.rs`)

```rust
struct ConnEntry {
    conn: Connection,
    remote_id_hex: String,
    proven_subnets: HashSet<String>,   // NEW — current-epoch membership proven at connect
}
```

- `register_conn(shared, conn)` → `register_conn(shared, conn, proven_subnets)` — stores the set; everything else (presence append, closed-watcher, app `accept_bi` loop) unchanged. The app `accept_bi` loop now starts on the conn **after** the ctrl stream is consumed, so stream #1 (ctrl) is never registered as an app stream.
- New accessor `conn_proven_subnets(conn_id) -> HashSet<String>` (the seam D5 reads; unused by any gate at D2 — belt + suspenders).
- `dial` and the accept-loop trusted branch each run `run_seedproof` and drop on failure (don't register). Drop = simply return without `register_conn` (dialer) / let the conn fall out of scope (acceptor) → QUIC closes it.

### Where the subnet material comes from

`NetHost` has no `SubnetStore` access today. For D2, `run_seedproof` reads `SubnetStore::load()` at connect time (always-current across a future rotation; connects are rare, the load is cheap). No new long-lived handle, no snapshot-staleness. (D7's confidential re-seed push may want a live handle; out of scope here.) Pass the loaded `Vec<(name, seed_bytes, epoch)>` into the exchange. `nethost` already sits in `spt-daemon` and may depend on `spt-store`.

## Keep-alive (open item #1 — verify, then set)

Connections must stay warm so re-proof is **restart/partition/rotation-only**, never a per-message tax. `endpoint.rs` binds via `Endpoint::builder(presets::N0)` with **no explicit transport config**, so the conn lives under iroh/quinn defaults.

**Execution step (verify-then-set, don't guess):**
1. Determine iroh 0.98.2 `presets::N0` defaults for `max_idle_timeout` and `keep_alive_interval` (quinn's `keep_alive_interval` default is `None` = no keep-alive → conns idle out). Confirm the builder exposes a transport-config seam.
2. Set an explicit transport config with **`keep_alive_interval` < `max_idle_timeout`** (e.g. keep-alive ≈ idle/3) so an otherwise-idle member conn is held open by PING frames. Pick conservative values (idle on the order of tens of seconds; keep-alive a few seconds under it) — "multi-week" means *survives indefinite idleness while both ends are up*, not a single conn object literally pinned for weeks; a real restart/partition drops it and the next connect re-proves.
3. Apply on the production scope; the change is transport-only and orthogonal to `BindScope` (loopback hermetic tests inherit it harmlessly).

Tag the transport-config change `// [impl->REQ-MESH-1]` (keep-warm is part of the REQ-MESH-1 "kept warm via QUIC keep-alive" clause).

## Security notes (the invariants this phase must not break)

- **Channel-bound + mutual** — inherited from the D1 transcript (both handshake pubkeys + both nonces in the MAC; both sides prove). D2 must feed the *real* handshake pubkeys (`conn.remote_id()`, local identity) and *fresh* per-connect nonces — not fixed test arrays.
- **Drop-closed, not gate-open** — a failed/empty proof drops the connection; it does **not** fall through to an unauthenticated usable conn. Until D5 the `is_trusted` gates also still run, so even a (hypothetical) proven non-member conn carries no authority.
- **No-regression** — a legit directly-paired peer shares the subnet seed at the same epoch (epoch only bumps at D7's rotation; pre-D7 it's stable), so it always proves → never dropped. Note the one edge: a pre-existing epoch skew would now drop a conn `is_trusted` would have allowed; impossible before D7, acceptable to surface then.
- **Name disclosure** — the acceptor replies with the **intersection only**, so an unauthenticated dialer learns only which of the names *it already supplied* are shared (a weak name-enumeration oracle, not a seed/membership oracle). Acceptable under the same-user threat model (ADR-0017); the seed-proof, not name secrecy, is the boundary.

## Tests

**unit (`spt-net`, `mesh::seedproof`)** — extend D1's codec tests for the new frame, `// [unit->REQ-MESH-1]`:
- `Hello` round-trips (nonce + multi-name list incl. empty list and unicode name); malformed (bad count, short name, non-UTF-8, unknown tag) ⇒ `None`.

**int (`spt-daemon`, `nethost` tests)** — the connect-level integration, `// [int->REQ-MESH-1]`:
1. **both members → conn + proof** — two in-process `NetHost`s sharing a subnet (same seed+epoch) dial; assert the conn registers on both sides AND `conn_proven_subnets` contains the shared subnet on both.
2. **multi-subnet set** — two hosts sharing {`home`,`work`} but not `solo`; assert `proven_subnets == {home, work}` on both (the set, not a bool; `solo` absent).
3. **non-prover dropped** — a host with the wrong seed (or no shared subnet) dials a member; assert the connection does **not** register (dropped) on the acceptor and the dialer sees a dropped/closed conn. (Models "a non-prover is dropped.")
4. **keep-alive** *(light)* — assert the transport config carries a non-`None` `keep_alive_interval` strictly less than `max_idle_timeout` (config-level assertion; a real multi-week idle isn't testable in CI). If iroh hides these post-build, assert at the config-construction seam instead.

Hermetic throughout (`BindScope::Loopback`, `RelayPolicy::Disabled`, `LocalDiscovery::Off`) — same shape as the existing `two_hosts_dial_over_loopback`.

## Done when

- Seed-proof runs on every trusted-ALPN dial + accept before the conn is usable; `ConnEntry.proven_subnets` populated; failure/empty drops the conn.
- `cargo test -p spt-net mesh::seedproof` + `cargo test -p spt-daemon nethost` green; the four int cases pass.
- `cargo build -p spt-daemon` + `-p spt-net` (and `spt-net --no-default-features`) clean; mesh module stays `net`-gated.
- `keep_alive_interval` < `max_idle_timeout` confirmed set on the production endpoint.
- The existing `two_hosts_dial_over_loopback` still passes (now with a shared subnet in the hermetic config so the proof succeeds) — **no-regression** proven.
- `traceable-reqs check` clean with **REQ-MESH-1 `required_stages = ["impl","unit","int"]`** (D2 activates `int`; enriches `impl`). Evidence tags on the connect wiring, the `ConnEntry` set, the transport config, and the int tests.
- The `is_trusted` gates are **untouched** (D5 owns the swap) — grep confirms the five call sites unchanged.
- Commit: `feat(mesh): seed-proof at connect + ConnEntry membership set (REQ-MESH-1) — Mesh-D2`.

## Carry-forward (not D2)

- **Pre-D5 open item #3** — confirm the self-update path is node-local (ADR-0016), not riding member conns, before the cutover roll. Not D2.
- **Open item #2** (REQ-SUBNET-5 probe loop serial? → D8).
- D3 consumes nothing from D2's wire beyond the established conns; the roster (`Frame::Seed` growth) is independent.
