# ADR 0004: Room Hot-Reload — fs.watch + Ed25519 Signed Manifests + Atomic Writes

[doc->REQ-SRV-13]

**Date:** 2026-05-07
**Phase:** 04 (during execution of Wave 4 / Plan 04-12b)

## Status

**Accepted** — locked at start of Phase 4 execution (D-09..D-13). Re-evaluation
gates listed in **Forcing Functions for Re-Open** below; absent any of those
triggers, the contract here is binding through Phase 7 PAR-03 (full room set
materialization — reuses this exact contract with no protocol changes).

Supersedes: nothing. Superseded by: nothing.

## Context

Server-side rooms must support edit-and-reload at runtime without restarting
the server. Per
[`.planning/phases/04-server-rebuild-mvp/04-CONTEXT.md`](../../.planning/phases/04-server-rebuild-mvp/04-CONTEXT.md)
§D-09..D-13, the layout subsystem implements:

- Content-addressed JSON layout files at
  `apps/server/rooms/<room_id>/<rev>.json`.
- Ed25519-signed companion manifests `<rev>.sig`, where the signed payload is
  `Buffer.concat([room_id, rev, sha256(layout_bytes)])` so the signature
  commits to all three identity components — a layout swap across rev numbers
  / room_ids cannot reuse a sig from a different file.
- `fs.watch` recursive on `apps/server/rooms/`, debounced 200ms per
  `room_id`, to coalesce the double-fire window of an atomic-rename pair on
  Windows / Linux / macOS.
- Per-room player state preserved across mid-session layout swaps (no client
  drop on hot-reload).

The runtime half landed in
[Plan 04-12](../../.planning/phases/04-server-rebuild-mvp/04-12-SUMMARY.md)
(Wave 3) — `RoomRegistry` + `room-key.ts` + the canonical `mvp-lobby/000`
seed + 10 tests across unit + integration.

The build-time half (this ADR's pair) lands in Plan 04-12b (Wave 4): the
`tools/room-converter` standalone CLI for authoring + signing layouts, the
`lint:room-layout` structural drift guard, and this ADR.

## Decision

The wire frame
`s2c.room_layout = { room_id, layout_rev, layout_bytes (msgpackr-encoded
canonical JSON), manifest_sig (Ed25519 over `room_id || rev || sha256(layout_bytes)`) }`
is the canonical room transport. Server holds the private key under
`./keys/` (gitignored via both `keys/*.ed25519` and `**/keys/*.ed25519` to
catch the `apps/server/keys/` sibling that `pnpm --filter @rebno/server dev`
produces from CWD-relative resolution; mode `0o600` on POSIX, ACL-controlled
on Linux production); the public-key sibling
(`keys/rebno-room-signing.ed25519.pub.pem`) is committed for CI verification.
Clients pin the public key (Phase 6 ships via
`import.meta.env.VITE_ROOM_SIGNING_PUBKEY`).

`tools/room-converter` is the build-time authoring path with three
subcommands:

- `convert` — extracted GM5 room dir → canonical layout JSON → atomic-rename
  paired write of `<rev>.json` + `<rev>.sig`.
- `edit` — round-trip latest rev through `$EDITOR` (Windows fallback:
  `notepad`; POSIX fallback: `vi`) → top-level shape validate → write
  `rev+1` paired files.
- `verify` — walk a room dir → assert every `<rev>.json` has a sibling sig
  AND that sig verifies against the public key.

Runtime hot-reload triggers via `fs.watch` of the rooms directory with 200ms
per-`room_id` debounce. Layout schema is locked by `layoutSchema` in
[`packages/protocol/src/intents.ts`](../../packages/protocol/src/intents.ts)
and runtime-validated on every load (zod `.strict()` — unknown keys are
rejected).

Atomic writes (`<rev>.json.tmp` → rename → `<rev>.json`, paired with
`<rev>.sig.tmp` → `<rev>.sig`) prevent the `RoomRegistry` from observing a
half-written layout. The `fs.watch` debounce window absorbs the
two-rename-event burst.

Structural drift between this ADR's contract and the on-disk layouts is
guarded by `tools/scripts/lint-room-layout.mjs` (`pnpm lint:room-layout`),
which walks `apps/server/rooms/`, validates every `<rev>.json` against the
canonical top-level key set
(`tile_grid`, `collision_polys`, `spawn_points`, `platform_defs`,
`scripted_triggers`, `room_size`, `tile_atlas_ref`, `bg_atlas_ref`), and
verifies every `<rev>.sig` against the committed public key. Phaser 4 has
no bearing on this contract — the wire format and signature scheme are
engine-agnostic; the engine ADR is
[ADR 0001](0001-client-engine.md).

## Consequences

### Positive

- **Player UX:** rooms can be edited live without dropping connections —
  hot-reload broadcasts `s2c.room_layout` to every connected client in the
  affected room.
- **Ops impact:** private key on Fly Volume, never logged. Phase 5
  RESTORE.md documents key recovery from secret-store backup.
- **Security posture:** tampered intermediates (CDN, proxy, malicious
  filesystem write) cannot inject forged rooms — client verifies signature
  against pinned public key; server rejects bad sigs at load time and logs
  `signature_invalid: true` at warn level.
- **Phase 7 PAR-03 reuses this contract verbatim** — no protocol changes
  when the full room set lands, only `tools/room-converter convert` runs
  for each extracted room.

### Negative

1. **Duplicated Ed25519 helpers:**
   [`apps/server/src/room-key.ts`](../../apps/server/src/room-key.ts) and
   [`tools/room-converter/src/sign.ts`](../../tools/room-converter/src/sign.ts)
   are byte-equivalent in API and behavior. Standalone tools (per
   [`.planning/phases/04-server-rebuild-mvp/04-CONTEXT.md`](../../.planning/phases/04-server-rebuild-mvp/04-CONTEXT.md)
   D-18) cannot import workspace packages. Drift risk is mitigated by
   `tools/room-converter verify` running against layouts signed by the
   server's `room-key.ts` — a payload-format drift surfaces as a verify
   failure. Phase 7 may consolidate via a `packages/crypto-utils` package
   if a third consumer surfaces. Documented in Plan 04-12b SUMMARY's
   drift-risk section.
2. **`fs.watch` reliability is host-dependent.** On Windows / NTFS the
   `ReadDirectoryChangesW` backend is reliable; on Linux / ext4
   (production via Fly.io) `inotify` is reliable. macOS dev hosts use
   `FSEvents` and have been observed to fire 1-2 events per atomic rename
   pair — the 200ms debounce covers both. Forwarded to Phase 5 as a
   "watch-on-Fly-volume sanity check" smoke step.
3. **Editor fallback is not interactive on CI.** `tools/room-converter
   edit` requires a TTY. CI / lint paths use `convert` and `verify` only.

### Neutral

- The wire format is engine-agnostic. Client choice (Phaser 3 per
  [ADR 0001](0001-client-engine.md)) is irrelevant to this contract; no
  Phaser-3 vs Phaser 4 caveat applies. Phaser 4 is the engine ADR's
  forcing function, not this one.

## Forcing Functions for Re-Open

Re-open this ADR ONLY if one of the following surfaces:

1. **`fs.watch` proves unreliable on Fly Volume + ext4** during Phase 5
   smoke (e.g. dropped events under burst load, debounce window
   insufficient). Forward-flag.
2. **Ed25519 signature verify exceeds 5ms hot-path budget at scale.** The
   sig verifies once per `tryLoadLatest` (not per packet) so this is
   defensive; if Phase 6 surfaces hot-path costs, re-evaluate.
3. **Phase 7 PAR-03 surfaces a layout shape that doesn't fit the canonical
   schema.** `layoutSchema` would extend rather than replace; this ADR
   re-opens only if the extension breaks the wire-frame contract above.
4. **Better alternative emerges** — e.g. content-hashed S3 + signed URLs
   if Fly Volume hot-reload becomes operationally painful. Currently no
   trigger.

## References

- [`apps/server/src/RoomRegistry.ts`](../../apps/server/src/RoomRegistry.ts) —
  runtime scan + watch + load pipeline.
- [`apps/server/src/room-key.ts`](../../apps/server/src/room-key.ts) — server-
  side Ed25519 sign/verify (paired with
  [`tools/room-converter/src/sign.ts`](../../tools/room-converter/src/sign.ts)).
- [`packages/protocol/src/intents.ts`](../../packages/protocol/src/intents.ts) —
  `layoutSchema` (zod `.strict()` canonical layout shape).
- [`packages/protocol/src/events.ts`](../../packages/protocol/src/events.ts) —
  `s2c.room_layout` + `s2c.room_deleted` wire events.
- [`tools/room-converter/cli.ts`](../../tools/room-converter/cli.ts) — build-
  time authoring CLI.
- [`tools/scripts/lint-room-layout.mjs`](../../tools/scripts/lint-room-layout.mjs) —
  structural drift guard.
- [`.planning/phases/04-server-rebuild-mvp/04-CONTEXT.md`](../../.planning/phases/04-server-rebuild-mvp/04-CONTEXT.md)
  D-09..D-13 — verbatim source for this contract.
- [`.planning/phases/04-server-rebuild-mvp/04-RESEARCH.md`](../../.planning/phases/04-server-rebuild-mvp/04-RESEARCH.md)
  §Pattern 6 — fs.watch + atomic-rename pattern survey.
- [`.planning/phases/04-server-rebuild-mvp/04-12-SUMMARY.md`](../../.planning/phases/04-server-rebuild-mvp/04-12-SUMMARY.md) —
  Wave-3 sibling that landed the runtime half.
- [ADR 0001 — Client Engine](0001-client-engine.md) — Phaser 3 lock; the
  wire-format here is engine-agnostic.
- [ADR 0003 — Canonical Snapshot](0003-canonical-snapshot.md) — Phase 7
  PAR-03 consumer of this contract for the full room set.
