# Feature wishlist from the HMD photon-latency bench (2026-06-12)

Source: an agent-driven hardware bench in the `sauna` project — VDO.Ninja camera
feed in a Chrome window, repository captures of that window, pixel-level photon
detection on Beyond HMD optics, wallclock correlation against an on-screen timer
(full writeup: `sauna/docs/photon-latency-bench.md`). Everything below was a real
friction point in that session, ranked by time it would have saved.

## 1. Region metrics query (biggest win)

`region_stats(session, rect, metric?)` → per-frame mean luminance (and/or RGB
means, histogram) over a sub-rectangle of every frame in a repository session.

The bench's core detector was "mean luminance of a 52×44 px region per frame,
find the step" (28 dark → 152 lit — unambiguous). Today that required PowerShell
`System.Drawing.GetPixel` sweeps: ~2 minutes per 192-frame session, three
sessions. Server-side this is milliseconds, and it turns the MCP into a generic
"did anything change in THIS spot, when" instrument — useful far beyond HMDs
(LEDs, dialogs, progress bars).

## 2. Single-frame fetch with crop/zoom

`get_frame(session, index, crop?, scale?)` → full-res frame (or cropped/zoomed
region) as an image resource.

Grid compilation shrinks frames too much for fine inspection (a 1500-px-tall
frame becomes a ~380-px tile; the optic glow and timer digits were unreadable).
The workaround was reading JPEGs straight off disk + System.Drawing crops.
Native crop+scale fetch removes the whole detour.

## 3. Event markers in the session timeline

`mark_session(session, label)` → append `{label, timestamp}` into the running
session's manifest; `list_repository_frames` shows markers inline between frames.

Benchmark scripts stamp milestones (INIT_START, FIRST_DRAW…) on their own clock;
correlating means hand-deriving wallclock per frame from `startedAt` +
`elapsed_ms`. Markers put external events directly on the frame timeline.

## 4. Stream bring-up (autonomy unlock)

`open_stream(url, {width?, height?, position?})` → launch a managed borderless
window (VDO.Ninja view link or any URL) and return a capture-ready window
handle; `close_stream` to tear down.

Today the human must pre-arrange the VDO.Ninja window. With this, an agent
starts from nothing: open stream → capture → measure. (A kiosk-mode Edge/Chrome
spawn is 90% of it; owning the window also fixes handle churn when the browser
restarts.)

## 5. Desktop-crop-at-window-rect capture mode

Window-targeted capture excludes overlay windows that visually sit on top (the
session's green-square ShareX pin marking the watch region was invisible in
every window capture — caused real confusion). A mode that captures the desktop
cropped to the target window's rect would include overlays, at the cost of
occlusion artifacts. Worth offering as `target: "window_desktop_crop"`.

## 6. Sub-100 ms intervals

125 ms was the effective floor (`interval_ms` min is 100). Latency work wants
50 ms to halve transition-timing uncertainty. Even a capped burst mode
(e.g. ≤10 s @ 50 ms) would do.

## 7. Absolute per-frame timestamps

`list_repository_frames` returns relative `elapsed_ms` only; `startedAt` lives
in `get_capture_status`. Adding an ISO `wallclock` per frame removes the math
(and the off-by-one-tool-call risk).

## 8. Region OCR (nice-to-have)

`ocr_region(session, index, rect)` for reading on-screen digits (the bench
timer) automatically. Vision-model reads worked fine, so this is the lowest
priority — but it would make fully scripted (no-LLM-in-the-loop) benchmarks
possible.
