# Spike #2 — Iroh smoke test

> Date: 2026-05-31. Status: **PASS — incl. native cross-NAT hole-punch between two separate networks.** Networking premise (ADR-0002) strongly validated; the M4 real-proof gate was demonstrated early.
> Throwaway code: `../spt-spikes/spike-02-iroh-smoke` (outside the spt-core repo).
> Drives ROADMAP Stage B Spike #2 + validates ADR-0002 (bake Iroh into the core).

## Question

ADR-0002 makes WAN networking first-class day-one on Iroh. Before committing the whole `spt-net` crate to it: **does Iroh's current API actually give us a clean QUIC connect + bidirectional message exchange, and what footprint does it add?**

## Method

iroh `0.98.2`, tokio. One process, two endpoints (`server`, `client`), each built with the N0 preset + a custom ALPN (`spt-spike/iroh-smoke/0`). Server accepts a connection, accepts a bidi stream, echoes the request. Client connects by `EndpointAddr`, opens a bidi stream, sends `ping`, reads the response. Assert `echo:ping`.

API shape that worked (0.98 — churns every release, pin this):
- `Endpoint::builder(iroh::endpoint::presets::N0).alpns(vec![ALPN.to_vec()]).bind().await`
- own identity: `ep.id()` → `EndpointId`, `ep.addr()` → `EndpointAddr` (direct values, not watchers)
- accept: `incoming.accept()?.await?` → `Connection`; `conn.accept_bi().await?` → `(SendStream, RecvStream)` **directly** (not `Option`)
- connect: `ep.connect(addr, ALPN).await?`; `conn.open_bi().await?`
- streams: `send.write_all(&b).await?; send.finish()?;` · `recv.read_to_end(max).await?`
- `Endpoint::conn_type()` does **not** exist in 0.98 — path classification needs another route.

## Result

```
response       : "echo:ping"
connect time   : 14 ms
total setup    : 99 ms   (two endpoint binds + N0 discovery init + handshake)
binary size    : debug 26.68 MB | release 13.57 MB (optimized, unstripped)
round-trip echo over Iroh QUIC : PASS  →  OVERALL PASS ✅
```

The API is workable and terse; handshake on a single host is sub-15ms. Footprint: **13.57 MB release** vs the research brief's ~8MB estimate — the gap is LTO/`strip`/`opt-level="z"`/`panic="abort"`, none applied here. ~8MB is plausible for the shipped binary with size profile tuning; budget for it.

### Cross-machine run (2026-05-31) — PASS

Ran `server` on a second host (`enlyzeam`, Win10) and `client` on this host (`hfenduleam`, Win11), both on the same Tailscale tailnet, over the serialized-`EndpointAddr` ticket (JSON; iroh has no Display/FromStr for it). Binary pushed via `scp` over OpenSSH+key.

```
target id 02f316cd...  connect time 14 ms   response "echo:ping"   PASS ✅
server-advertised addrs: relay(n0) + 76.80.65.5 (WAN) + 100.125.242.88 (Tailscale) + 192.168.1.12 (LAN) + IPv6
```

Proves a **real two-host** QUIC connect + ALPN + bidi exchange across machines and that the ticket survives serialize→wire→deserialize. (The two hosts each sit behind their own home NAT — the shared `192.168.1.0/24` is coincidental default-router addressing, not one LAN.)

### Path attribution (instrumented via `Connection::paths()`)

Added a selected-path readout (`is_relay`/`is_ip`/`remote_addr`/`rtt`) and a `--relay-only` flag. Three configurations, same server, observed which path iroh actually selected:

| Config | Initial connect | Selected path | RTT | What it shows |
|---|---|---|---|---|
| Both Tailscale up, auto | 13 ms | `direct-ip 100.125.242.88` (Tailscale) | 11.9 ms | iroh rides the Tailscale overlay as a "direct" addr — Tailscale did the traversal |
| Both up, `--relay-only` | 185 ms | upgraded to `direct-ip 100.x` | 12.5 ms | relay leg works **and** iroh auto-upgrades relay→direct (N0 discovery republishes addrs, so client-side filtering can't suppress a reachable direct path) |
| **Client Tailscale DOWN, auto** | **164 ms** | **`direct-ip 76.80.65.5` (public WAN)** | **12.8 ms** | **iroh's OWN hole-punch across two real NATs — no VPN.** relay path stayed available as fallback (94 ms) |

The third row is the headline: with no overlay, iroh started on the n0 relay, then **hole-punched a direct UDP path to the peer's public IP** and promoted it. Native peer-to-peer NAT traversal works, and relay fallback is graceful. Direct beats relay ~7× on latency (12.8 ms vs 94 ms).

## What this does NOT prove (the real risks, deferred to their milestones)

1. ~~Separate-NAT traversal + direct-vs-relay~~ — **DONE** (table above): native hole-punch to a public IP + relay fallback, across two home NATs, no VPN.
2. **mDNS LAN discovery** — not exercised. The N0 preset wires n0-relay/DNS discovery; local mDNS is a separate path to smoke between two nodes on the *same* LAN.
3. **Live QUIC stream survival across a brain restart** (Spike #1 open-gap #1, codex FATAL #2) — needs the ADR-0004 broker owning the Iroh endpoint while the brain cycles. Integration spike, not smoke. **Now the single highest remaining networking risk.**
4. **Sustained throughput / many-stream / long-lived** behavior — only a 4-byte echo was sent.

## Verdict

Iroh `0.98` is a sound foundation for `spt-net`: clean connect/echo, fast handshake, acceptable footprint, and — proven here — **native NAT traversal between two separate networks with graceful relay fallback**, which was the load-bearing unknown behind ADR-0002. ADR-0002 stands, strongly. The remaining networking work is integration, not feasibility: broker-owned Iroh stream survival across a brain restart (M3), and mDNS LAN discovery (M4).
