# M4 — Hardening

M3 made the product: desktop monitors floating in space, daily-driver quality confirmed by
extended eyes-in. M4 makes it **trustworthy and sharp** — instant start, single-resample
sharpness, paced prediction, neck-model truth, color correctness, install story. Polish the
loop users actually live in.

## Inputs (all settled, do not re-derive)

| Fact | Source |
|---|---|
| Render path: **CANTED cameras**; warp per-channel direct POLY3, shared center, `0.5/(1+grow)`, no r² clamp, frame-invariant `u_src=(q+G)/2G` | M2 |
| AHRS: Mahony head-frame, stillness-validated bias capture, rate+\|a\|-gated accel correction, yaw-only recenter, `pose(predictSec)`; micro-gap (<5 s, initialized) soft recovery — keep orientation+bias, skip hole | M2 / M3 |
| GPU P-state pin: `ensureMaxPerfProfile()` NvAPI DRS profile, PREFERRED_PSTATE=PREFER_MAX, **keyed to `spatial_light.exe` name** (renamed exes lose it — exe-swap caveat); D3D12 queue HIGH priority | M3 (rubber-band fix, d4df1ad) |
| Triage status line: `fps imu gap frame gpu vsw corr ahrs \|a\| resets soft worn cap` — keep through M4; it found the P-state bug | M3 |
| Capture: DXGI Desktop Duplication per monitor, latest-frame-wins mailbox, CPU-confirmed publish, consumer COMMON↔PSR barriers only, ACCESS_LOST reacquire never wedges | M3 |
| Settings: `%APPDATA%\sauna\config.json`, live-apply watcher diffs vs last FILE state (flag overrides survive), 's' save; brightness `-1` / ipd `0` sentinels = headset owns | M3 |
| Idle policy: worn (prox) hard-vetoes idle timer — user directive | M3 |
| Flash calib: native ReadWatchmanConfig default (overlapped GET_FEATURE + stall retries + read-twice; hidsdi.h before hidpi.h) | M3 step 6 |
| Onboard store target: Beyond MCU 512 B user-signature flash (U/W/V HID, TLV+CRC8 poly 0x07 init 0xFF; unknown tags skipped by firmware). Watchman config write = **STOP**, never | M3 step 7 research |
| Gyro ±2000 dps/int16; universality pending enlyzeam `gyro_cal_vr` run | M2 |

## Work breakdown (gated, in order; steps largely independent — interleave freely except 3 wants 2's pacing hooks)

### 1. Instant-start bias (per-serial persisted gyro bias)
User-confirmed UX pain: users don the headset *before* launching; current still-window
bias capture forces a take-it-off-or-hold-still start. Fix: persist bias per serial at
`%APPDATA%\sauna\calib\<LHR-serial>.json`; on start, load it and **track from the first
sample** (gravity correction levels pitch/roll instantly; persisted bias covers yaw drift).
Opportunistic background still-window recapture keeps refreshing bias in RAM + file — the
store is self-healing, staleness self-corrects within the first quiet moment. 'b' key
manual recapture and the no-file first-run path keep the existing capture flow.
Touch: `src/track/ahrs.cpp` + small store (~1 h).
**Gate:** launch with headset already on head and moving → tracking correct immediately,
no still-window wait; bias file updates after a natural still moment; first run (no file)
behaves exactly as today.

### 2. Single-resample warp-direct compositing
Today: desktop → quad render → warp pass = two resamples; each one softens. Collapse to
one: per panel pixel, warp gives the ray, ray→quad intersection gives the desktop UV,
sample the desktop texture **once** (with its mips). Next big sharpness win after CANTED.
Open question to answer empirically at the gate: does `--ss` supersampling still buy
anything once double-resample is gone? If no — drop it and reclaim the GPU headroom.
Touch: `src/render/warp_pass.cpp` / `scene_renderer.cpp` restructure; keep the two-pass
path behind a flag until the gate passes (A/B and fallback).
**Gate:** A/B same scene — text at 100% zoom measurably sharper (user eyes-in verdict);
no new edge artifacts at quad borders; `--ss` re-evaluated with a keep/drop decision;
frame budget still comfortable at 75 Hz (gpu= watermark unchanged).
**✅ PASSED 2026-06-10** — user: "looks great… can resolve much finer patterns with
excellent clarity"; gpu= 0–1 ms both paths, clk pinned 2520. Verdict on `--ss`: DROPPED
to default 1.0 (two-pass fallback only keeps the knob). Warp-direct now the default;
`--warp2pass` / 'w' key keep the old path for A/B.

### 3. Paced prediction + pose-age-aware sampling
Today: blit-latest-frame, fixed ~20 ms gyro prediction. M4: measure actual sample→photon
age per frame and predict by *that*; pace against the panel's 75/90 Hz cadence rather than
free-running capture Hz (capture mailbox stays, render side picks pose at the right
moment). The M3 mailbox was structured for exactly this swap.
Fold in the corr= follow-up: corr occasionally hits 0.13–0.18 rad/s during motion (Mahony
tug visible, not rubber-band) — tighten rate fade / narrow the \|a\| gate while in this
code, watch corr= watermark for regression.
**Gate:** measured pose age replaces the magic 20 ms; head-turn latency feel ≤ today
(user eyes-in); corr= watermark during vigorous motion lower than today's 0.13–0.18 with
no new sluggishness; no pacing-induced judder at 75 Hz.
**✅ PASSED 2026-06-10** — age= 10.9 ms latch-on vs 20.6 latch-off at 75 Hz (latency
improved, not just matched); corr= hard-capped at 0.10 rad/s (clamp + 0.85–1.15 g gate +
0.5 rad/s fade knee); no judder. Field hardening en route: (1) free-run detection moved
from work-time to vsync-wait classification (latch sleep masked the old signal —
spurious FREE-RUN flapping read as hitching); (2) latch margin rides a fast-attack/
slow-release submit envelope and the photon estimate targets the first *reachable* flip
(k≤4) — a GPU contention episode had read as every-other-frame rubber-banding from
alternating prediction error; (3) capture worker rate-capped to the panel period — the
copy+GenerateMips pipeline ran per captured frame and amplified 130–170 Hz content
bursts (animated wallpaper) into render contention; (4) `[gpu hogs]` PDH forensic line
names GPU contenders live when gpu= balloons >15 ms. Defaults: predict_auto ON,
late-latch ON ('p'/'l' keys A/B, --no-latelatch); age= and [LL-OFF] in status line.

### 4. User neck calibration (guided lever-arm solve)
User-proposed, design agreed. Tray-driven guided capture: torso still, yaw shakes ~10 s,
then pitch nods ~10 s. LSQ solve `a = g + ω̇×r + ω×(ω×r)` for lever arm `r` (accel bias
as nuisance parameter). Yaw motion observes forward+lateral components, pitch observes
up+forward; lateral≈0 is the sanity gate; residual gate catches torso sway (reject + ask
to redo, don't silently accept garbage). Write `neck_forward_mm`/`neck_up_mm` to settings
(live-apply already wired).
Caveat to resolve first: solve yields pivot→IMU; rendering wants pivot→eye. Check the
config `imu` block for the IMU→eye offset — if absent, document the cm-assumption used.
Touch: `src/track/ahrs.cpp` (capture mode), small solver, tray menu item (~half day).
**Gate:** calibration run completes with pass/fail verdict from the gates; solved values
plausible (forward ~7–12 cm, up ~10–18 cm ballpark); deliberately swaying torso → run
rejected; saved values survive restart and visibly improve translation feel on nods
(user eyes-in).
**✅ PASSED 2026-06-11** — field solve fwd 95 / up 19 / lateral 3 mm, residual
0.19 m/s², yaw 1.8 / pitch 1.4 rad/s RMS over 21 k rows; user: "head rotation feels
more visually natural"; values persisted and survived restart. Anatomy lesson encoded:
**up is posture, not anatomy** — small nods pivot at the skull base (≈ eye height), so
up near zero is a legitimate solve; the 10–18 cm ballpark assumed a lower-neck pivot
and the plausibility warning now reflects that (fwd 20–200 / up −30–250 mm). Config imu
position is zero on this unit → pivot→IMU used as pivot→eye, assumption printed.
Implementation: AHRS sample tap (raw rate, pre-Mahony, append-only under the filter
lock — tracking never blocks), 6-unknown LSQ (lever arm + accel-bias nuisance),
gap-safe central-difference ω̇, gates excitation ≥0.8 rad/s RMS per phase / lateral
≤35 mm / residual ≤0.45 m/s². Tray item + 'n' key.

### 5. HDR / color correctness
M3 assumed SDR sRGB and renders HDR/scRGB desktops wrong. Detect per-monitor advanced
color state (DXGI output desc / colorspace), tone-map scRGB→panel correctly, and handle
the SDR-content-on-HDR-monitor white-level knob. Detect-and-warn was M3; correct is M4.
**Gate:** HDR-enabled monitor captured → colors/brightness in-headset match the physical
monitor to eyes-in satisfaction; SDR monitors byte-identical to today.
**Code complete 2026-06-11, gate HARDWARE-BLOCKED** — no HDR-capable monitor on the
field rig; moved to carried gates. Implementation: non-BGRA8 duplicated desktop →
desktop-format intermediate + in-capture convert pass into the BGRA8 ring (FP16 scRGB:
÷ SDR white from DISPLAYCONFIG_SDR_WHITE_LEVEL, fallback 200 nits, clamp — highlights
above SDR white clip in v1 — then sRGB-encode; 10-bit SDR passthrough). Downstream
never sees the desktop format; SDR path byte-identical (direct copy unchanged); HDR
toggle rides the mode-change rebuild + generation bump; SDR white re-queried per
reacquire. Console: "capture: output N HDR (scRGB FP16) — tone-mapping…" when active.

### 5.5 Hybrid-graphics capture (inserted 2026-06-11, user-prioritized)
Team-share field test: enlyzeam rig cannot duplicate its monitor — output owned by a
different adapter than the render GPU (working hypothesis; DuplicateOutput1 didn't fix
it). Plan + gates: **docs/hybrid-capture.md** (A topology probe → B capture device on
owning adapter → C CPU bridge via D3D12 copy queue → D field validation). Probe FIRST —
confirm the hypothesis before building the bridge.

### 6. Shell polish: settings GUI + tray growth
Settings GUI window proper (replaces notepad-via-tray; the M3 gate said deferred, M4
delivers): plain desktop window, edits the same config.json the watcher already
live-applies — GUI is just a friendlier writer, zero new plumbing. Tray gains "Start Neck
Calibration" (step 4 entry point) and any step 1–5 toggles worth surfacing.
**Gate:** every setting reachable in GUI → live-applies → restart sticks; notepad path
still works (config.json remains source of truth).

### 7. Install story
One artifact a non-dev can run: installer or self-contained zip+exe. First-run experience:
no calib JSON, no config — native flash read (M3 step 6) + defaults get to working
pictures with zero setup. Start-with-Windows option. Uninstall leaves no orphan DRS
profile surprises (document or clean up the NvAPI profile).
Decision in-step: installer tech (MSIX vs Inno vs zip) — pick the cheapest that satisfies
the gate, this is not a packaging hobby.
**Gate:** clean Windows machine (or clean user account), one artifact, zero dev tools →
working floating desktop; uninstall leaves system clean.

## Carried gates from M3 (not new work — field validation)

- **M3 step 2 multi-screen:** code done; gate needs enlyzeam 2+ physical monitors.
- **M3 step 7 onboard store (write path):** strictly gated on both-unit MCU signature
  16-block backups + `gyro_cal_vr` universality verdict (universal constant ⇒ onboard IMU
  cal store may be moot — decide then). Erase-then-write power-loss window ⇒ backup-first,
  write all 16 blocks before SAVE.
- **P-state fix field validation:** DRS pin deployed, user "seems good so far" — confirm
  no rubber-band recurrence over extended sessions; watch gpu=/vsw= watermarks.
- **M4 step 5 HDR gate:** code complete, needs an HDR-capable monitor (none on the field
  rig 2026-06-11). Test: toggle Windows HDR while running → console prints the
  tone-mapping line → eyes-in color/brightness match; SDR monitors unchanged.
- **Periodic system stalls (~30 s, kernel-level):** NOT GPU contention — [gpu hogs] 0%
  during episodes, [stall] ticks 1.4–2.3 s, recurs with Wallpaper Engine off, HAGS on.
  dwm/wallpaper were victims, not culprits. [cpu] forensic line (dpc=/int=/priv= + top
  CPU) prints on stall; high dpc/int → LatencyMon names the driver; all-low → SMI/
  firmware (BIOS angle). Sauna rides the episodes acceptably (pacing degrades clean).

## Parallel track: enlyzeam field session (user-gated, unchanged list)

Step 2 multi-monitor gate; M2/M3 eyes-in; `calib_dump` LHR-599F3B91 backup; MCU signature
16-block backups both units (unblocks M3 step 7); `gyro_cal_vr` universality run; Index
`28DE:2300` collision verify.

## Carry-over engineering

- Triage status line stays through M4 — cheap, and it caught the P-state bug only because
  it was always on.
- Resilience invariants extend to every new path: pacing must not wedge on ACCESS_LOST;
  neck-cal capture mode must not block tracking; GUI crash must not take down the
  compositor.
- DRS profile is keyed to exe name: any installer rename / exe-swap workflow must account
  for it (step 7 owns the cleanup story).

## Out of M4 scope

6DOF. Streaming bridge (trait seam only). Linux/PipeWire. IddCx virtual displays.
Beyond-2e eye-tracked dynamic distortion. Per-window capture (WGC rung stays dormant).
