# Papers Index (13 files = 6 papers + 1 HTML-only + dual formats)

Source: `docs/research/inspiration/papers/`. HTML files for arxiv-* are abstract pages only; PDFs hold full text. `bford-p2pnat.html` is the full HTML reprint of the Ford/Srisuresh/Kegel 2005 paper.

## Quick lookup

| Question | Paper |
|---|---|
| What's the canonical UDP/TCP hole-punch success rate (2005)? | arxiv-cs-0603074 / bford-p2pnat |
| How does QUIC hole punching compare to TCP hole punching? | arxiv-2408-01791 |
| What's the modern (2025) measured DCUtR hole-punch success rate? | arxiv-2510-27500 |
| Is UDP really better than TCP for NAT traversal? | arxiv-2510-27500 (refutes; ~70% both) |
| Why do Kademlia DHTs degrade at scale / with NAT'd peers? | arxiv-2402-09993 |
| How to securely sample peers in adversarial / Sybil P2P networks? | arxiv-2402-16201 (Honeybee) |
| How to reduce propagation delay in random-graph P2P overlays? | arxiv-2406-16661 (Close-Weaver) |
| How does NAT type (full-cone vs symmetric vs port-restricted) affect HP? | arxiv-cs-0603074 / bford-p2pnat |
| Should SPT operate its own relay infrastructure? | arxiv-2510-27500 (centralization risk) |
| Can hole-punched connections survive NAT timeout / network change? | arxiv-2408-01791 (QUIC migration saves 2-3 RTT) |

## Per-paper entries

### arxiv-cs-0603074 — Peer-to-Peer Communication Across Network Address Translators
- **Authors:** Bryan Ford (MIT), Pyda Srisuresh (Caymas Systems), Dan Kegel
- **Venue/Year:** USENIX Annual Technical Conference 2005 (arXiv posted 2006-03-18)
- **Files:** `arxiv-cs-0603074.html` (arxiv abstract page), `arxiv-cs-0603074.pdf` (full paper), `bford-p2pnat.html` (full HTML reprint of the same paper from Ford's site)
- **Abstract (paraphrased):** Documents and measures UDP and TCP hole punching as a NAT-traversal technique, the simplest reliable approach for direct P2P across NAT. Shows that TCP hole punching (often dismissed) is achievable using socket/port-reuse tricks alongside an out-of-band rendezvous server. Empirically tested across a wide range of consumer NAT devices to quantify support.
- **Key results:** 82% of tested NATs support UDP hole punching; 64% support TCP hole punching. Defines and classifies NAT behaviour (cone, restricted-cone, port-restricted, symmetric) and isolates which behaviour breaks HP (symmetric NAT). UDP idle timeouts force keepalives every ~60 s.
- **Relevance to SPT:** Foundational. Justifies SPT's planned hole-punch path (iroh / DCUtR / quinn). The 82%/64% numbers set realistic expectations: ~18% of WAN pairs need TURN-style relay fallback, which is why SPT's design keeps an encrypted-relay backstop (n0.computer relays under iroh, or self-hosted). Also informs whether to expose TCP fallback for restricted environments.
- **Worth reading:** Section 2 (NAT terminology + relay/connection-reversal alternatives), Section 3.4 "Peers Behind Different NATs" (the core HP algorithm), Section 4 (TCP-specific tricks: port reuse, simultaneous-open), Section 6 "Evaluation" (test methodology + per-NAT table) — see bford-p2pnat.html sections 2-6 or PDF pages ~3-14.
- **Tags:** nat-traversal, hole-punching, benchmark
- **Brief footnote refs:** [^35][^36]

### arxiv-2408-01791 — Implementing NAT Hole Punching with QUIC
- **Authors:** Jinyu Liang, Wei Xu, Taotao Wang, Qing Yang, Shengli Zhang
- **Venue/Year:** VTC2024-Fall (IEEE Vehicular Technology Conference), arXiv 2024-08-03
- **Files:** `arxiv-2408-01791.html` (abstract), `arxiv-2408-01791.pdf` (full)
- **Abstract (paraphrased):** Designs and implements a QUIC-based hole-punching scheme to address the latency and security drawbacks of TCP-based NAT traversal. Benchmarks punching time of QUIC vs TCP and shows QUIC is faster, especially on lossy / weak networks. Evaluates two recovery strategies after a punched connection breaks (NAT timeout, network handoff): QUIC connection migration vs re-punching.
- **Key results:** QUIC HP reduces punching time vs TCP, with the gap widening under packet loss. QUIC connection migration restores broken sessions saving 2 RTTs vs QUIC re-punch and 3 RTTs vs TCP re-punch. Removes most of the CPU/key-exchange cost of re-establishment.
- **Relevance to SPT:** Directly supports SPT's choice of QUIC (quinn) over raw TCP/UDP. QUIC's 0-RTT + connection-migration features mean SPT sessions survive laptop sleep/Wi-Fi-to-LTE handoffs without re-doing the HP ceremony — important for the "agent perch goes mobile" case. Also evidence that QUIC HP is production-viable in iroh.
- **Worth reading:** Section III (proposed QUIC HP design + sequence diagram), Section IV (experimental setup with simulated NAT + impaired links), Section V (results tables comparing TCP vs QUIC punching times and recovery RTTs). PDF pages ~2-5.
- **Tags:** nat-traversal, hole-punching, quic, benchmark
- **Brief footnote refs:** [^43]

### arxiv-2510-27500 — Challenging Tribal Knowledge: Large-Scale Measurement Campaign on Decentralized NAT Traversal
- **Authors:** Dennis Trautwein, Cornelius Ihle, Moritz Schubotz, Bela Gipp
- **Venue/Year:** arXiv 2025-10-31 (preprint; cited as "2025" measurement study)
- **Files:** `arxiv-2510-27500.html` (abstract), `arxiv-2510-27500.pdf` (full)
- **Abstract (paraphrased):** First large-scale longitudinal measurement of DCUtR (Direct Connection Upgrade through Relay) on the production libp2p/IPFS network — 4.4M traversal attempts across 85k networks in 167 countries. Establishes a contemporary baseline hole-punch success rate and tests whether UDP truly outperforms TCP for traversal. Argues NAT-traversal infrastructure (relays, signalling) is where decentralized systems quietly re-introduce centralization.
- **Key results:** Modern hole-punch success rate = 70% ± 7.1% (lower than Ford 2005's 82% because today's NATs include more CGNAT/symmetric). TCP and QUIC traversal success rates are statistically indistinguishable (~70%) — refutes long-held assumption of UDP superiority. 97.6% of successful connections succeed on first attempt. Success is independent of relay characteristics → DCUtR works in fully permissionless relay graphs.
- **Relevance to SPT:** Strongest evidence for SPT's architecture choices: (a) plan for ~30% of WAN pairs needing relay fallback, not 18%; (b) don't over-invest in UDP-only paths — TCP hole punching is equally viable, useful for QUIC-blocked networks; (c) relay diversity matters more than relay quality, supporting the "users self-host their own relays" plan from the Brief; (d) sets a realistic SLA target. Also a warning: SPT must not become the next centralization point via default relays.
- **Worth reading:** Section 3 (methodology — how DCUtR is instrumented in libp2p), Section 4 (results: success-rate breakdown by NAT type, transport, country), Section 5 (centralization analysis of relay operators), Section 6 (protocol enhancement roadmap). PDF pages ~3-12.
- **Tags:** nat-traversal, hole-punching, dht-perf, benchmark, privacy
- **Brief footnote refs:** [^82]

### arxiv-2402-09993 — Scalability Limitations of Kademlia DHTs When Enabling Data Availability Sampling in Ethereum
- **Authors:** Mikel Cortes-Goicoechea, Csaba Kiraly, Dmitriy Ryajov, Jose Luis Muñoz-Tapia, Leonardo Bautista-Gomez
- **Venue/Year:** arXiv 2024-02-15 (preprint, BCRG / Codex / UPC)
- **Files:** `arxiv-2402-09993.html` (abstract), `arxiv-2402-09993.pdf` (full)
- **Abstract (paraphrased):** Investigates whether Kademlia DHTs can support Ethereum's Data Availability Sampling (DAS) at scale. Builds a DAS-DHT simulator and cross-validates against measurements on the live IPFS DHT. Identifies which DAS properties existing Kademlia implementations can deliver and which they fundamentally cannot, then discusses alternative overlays.
- **Key results:** Kademlia routing-table churn balloons when many peers are undialable (NAT'd / transient), degrading lookup latency and recall. Provider-record TTLs and republish intervals are incompatible with DAS's sub-slot sampling deadlines. IPFS-style DHT cannot meet the sample-recall budget at Ethereum's target network size; alternative structured overlays or pull-based gossip are needed.
- **Relevance to SPT:** Cautionary. If SPT ever uses a public Kademlia (Mainline / IPFS) for peer rendezvous (per Brief Rank 5), expect significant churn-induced lookup failures from NAT'd peers and TTL-bounded record loss (~30 min republish, as the Brief notes). Tells SPT to keep DHT use to "yellow pages" (announce + lookup) rather than relying on it for low-latency message routing.
- **Worth reading:** Section 2 (Kademlia + DAS background), Section 3 (simulator design), Section 4 (results: lookup recall vs network size, churn impact, NAT impact), Section 5 (limitations & alternatives). PDF pages ~2-8.
- **Tags:** dht, dht-perf, overlay, benchmark
- **Brief footnote refs:** [^79]

### arxiv-2402-16201 — Honeybee: Byzantine Tolerant Decentralized Peer Sampling with Verifiable Random Walks
- **Authors:** Yunqi Zhang, Shaileshh Bojja Venkatakrishnan (Ohio State)
- **Venue/Year:** arXiv 2024-02-25 (preprint)
- **Files:** `arxiv-2402-16201.html` (abstract), `arxiv-2402-16201.pdf` (full)
- **Abstract (paraphrased):** Defines decentralized uniform random peer sampling as the foundational primitive for sharded / DAS / L2 blockchains, and shows that today's mechanisms (Kademlia address lookups, GossipSub neighbour shares) are insecure under Sybil attack. Proposes Honeybee: verifiable random walks plus table consistency checks that resist a majority adversary. Evaluates sampling quality against Kademlia and GossipSub baselines.
- **Key results:** Honeybee maintains near-uniform sampling even when ≥50% of nodes are Byzantine; baselines collapse. Quality improvement over GossipSub / Kademlia ranges from 4% to 63% depending on adversary strength. Walk verification costs are modest enough for both full and light nodes.
- **Relevance to SPT:** Sets the bar for any future "find a random peer" capability in SPT (e.g., relay selection, gossip neighbour selection). If SPT inherits libp2p's GossipSub or Kademlia for discovery, it inherits their Sybil weakness — Honeybee-style verifiable walks could harden the relay-selection step that arxiv-2510-27500 flagged as the centralization risk.
- **Worth reading:** Section 3 (threat model + verifiable random walk construction), Section 4 (Honeybee algorithm + table consistency check), Section 5 (security analysis), Section 6 (experimental evaluation vs Kademlia/GossipSub). PDF pages ~4-14.
- **Tags:** dht, overlay, privacy, agent-protocol, benchmark
- **Brief footnote refs:** [^80]

### arxiv-2406-16661 — Towards Communication-Efficient Peer-to-Peer Networks
- **Authors:** Khalid Hourani, William K. Moses Jr., Gopal Pandurangan
- **Venue/Year:** arXiv 2024-06-24 (preprint)
- **Files:** `arxiv-2406-16661.html` (abstract), `arxiv-2406-16661.pdf` (full)
- **Abstract (paraphrased):** Random-graph P2P overlays give nice theoretical properties (expansion, low diameter, robustness) but ignore the underlying Internet topology, causing inflated propagation delay. Presents Close-Weaver, a decentralized protocol that rewires a random graph embedded in Euclidean space into a topology that also respects the metric. Provides routing and broadcast protocols with near-optimal performance bounds against the underlying space.
- **Key results:** Close-Weaver provably transforms random topology → metric-aware topology in O(polylog n) rounds while preserving expansion. Resulting point-to-point routing and broadcast latency scale with the metric (geographic) diameter rather than the random-graph diameter. Specific bounds in the paper's main theorems.
- **Relevance to SPT:** Most relevant if SPT moves from direct-dial to a gossip mesh (Brief mentions GossipSub via libp2p). Argues that latency-aware neighbour selection matters more than uniform-random neighbours for messaging workloads — directly applicable to which relays SPT pins for "fastest path" delivery. Less critical for the v1 direct-dial design.
- **Worth reading:** Section 1 (problem statement + Close-Weaver overview), Section 3 (Close-Weaver algorithm), Section 4 (routing & broadcast protocols), Section 5 (analysis/theorems). PDF pages ~2-10.
- **Tags:** overlay, mesh, dht-perf
- **Brief footnote refs:** [^81]

### bford-p2pnat (HTML-only sibling of arxiv-cs-0603074)
- **Authors:** Bryan Ford, Pyda Srisuresh, Dan Kegel
- **Venue/Year:** USENIX ATC 2005 — HTML reprint hosted at bford.info
- **Files:** `bford-p2pnat.html` (only). Same paper as arxiv-cs-0603074; see that entry for content. Prefer this HTML for quick section browsing because the full body text is present (the arxiv .html is abstract-only).
- **Section map (from H2/H3 in the HTML):**
  - 2.1 NAT Terminology · 2.2 Relaying · 2.3 Connection Reversal
  - 3.1 Rendezvous Server · 3.2 Establishing Sessions · 3.3 Peers Behind a Common NAT · 3.4 Peers Behind Different NATs · 3.6 UDP Idle Timeouts
  - 4.1 Sockets and TCP Port Reuse · 4.3 Application-Observed Behaviour
  - 5.3 Leaving Payloads Alone
  - 6.1 Test Method (6.1.1 UDP, 6.1.2 TCP) · 6.2 Test Results · 6.3 Testing Limitations · 6.4 Corroboration
- **Tags:** nat-traversal, hole-punching, benchmark
- **Brief footnote refs:** [^35][^36]

## Themes

- **NAT traversal evidence trail (chronological):** arxiv-cs-0603074 / bford-p2pnat (2005, 82% UDP / 64% TCP, lab) → arxiv-2408-01791 (2024, QUIC HP faster + migration restores sessions) → arxiv-2510-27500 (2025, 70% in production, TCP ≈ QUIC, relay centralization risk). Together these set SPT's realistic SLA: plan for ~30% of WAN pairs requiring relay; QUIC is the right default but TCP is a valid fallback for QUIC-blocked networks.
- **DHT / overlay design constraints:** arxiv-2402-09993 (Kademlia churn + NAT'd peers + TTL) shows why a public DHT can only be a yellow-pages layer for SPT. arxiv-2402-16201 (Honeybee) shows the Sybil-resistance bar for any random-peer-sampling step. arxiv-2406-16661 (Close-Weaver) shows latency-aware neighbour selection if SPT ever runs gossipsub.
- **Centralization watchpoints:** arxiv-2510-27500's headline finding is that NAT-traversal infrastructure (relays + signalling) is the silent point of re-centralization in nominally-decentralized P2P. This directly motivates SPT's "users can self-host relays" requirement (Brief Rank 1, Iroh) and the multi-operator architecture goal.
- **What SPT v1 needs vs v2:** Direct-dial + iroh relay fallback (v1) is fully covered by the NAT-traversal trail. The DHT / overlay / sampling papers (2402-09993, 2402-16201, 2406-16661) become load-bearing only if SPT adds gossip-mesh or DHT-based rendezvous (Brief Rank 2 libp2p path or Rank 5 Mainline DHT path).
