# Hybrid-graphics capture (cross-adapter desktop duplication)

**Trigger (2026-06-11, team-share field test):** enlyzeam rig (LHR-599F3B91) gets
`DuplicateOutput failed (0x887a0004)` (DXGI_ERROR_UNSUPPORTED) in a permanent retry
loop — `cap=0Hz [CAP LOST]`, test-card fallback. `DuplicateOutput1` with full format
negotiation (50c46ba) did NOT fix it, and per-app Windows graphics preferences (force
NVIDIA, no Auto-HDR, no windowed-gaming optimizations) changed nothing. Working
hypothesis: **the monitor's output is owned by a different adapter than the NVIDIA GPU
driving the Beyond** (hybrid graphics / iGPU-wired monitor / second card). Desktop
duplication must run on the adapter that owns the output; our capture device is created
on the render adapter only.

**Why it matters:** v1 ships to team machines we don't control. Laptops are almost
always hybrid; desktops with iGPU-wired monitors are common. "Plug the monitor into the
NVIDIA card" is a workaround, not an answer.

## Constraints (settled, do not re-derive)

- Beyond DirectMode requires the NVIDIA adapter — the D3D12 render device CANNOT move.
- Plain DXGI shared textures (`CreateSharedHandle`) do not cross adapters. Crossing
  requires either D3D12 cross-adapter heaps (row-major, no mips, both ends D3D12) or a
  CPU round-trip.
- Consumer contract must hold: publish only CPU-confirmed complete frames; consumer
  does COMMON↔PSR barriers only; latest-frame-wins mailbox; ACCESS_LOST never wedges.
- Capture rate cap (one frame per panel period) stays — it bounds the bridge bandwidth.

## Steps (gated, in order)

### A. Adapter topology probe (diagnosis before construction)
At capture init, enumerate ALL DXGI adapters and their outputs; print one line per
output: adapter description + LUID, output name, and whether it matches the render
adapter. When the wanted output is found on a foreign adapter, say so explicitly
(today's UNSUPPORTED hint is a guess; this makes it a statement).
Touch: `monitor_layout.cpp` or capture init in `spatial_light.cpp` (~1 h).
**Gate:** run on enlyzeam rig → topology printed; hypothesis confirmed or refuted
(if refuted: STOP, re-diagnose — do not build the bridge on a guess).

### B. Capture device on the owning adapter
`DuplicationSource` finds the adapter that owns its output (enumerate all, match
output device name) and creates its D3D11 device there. Same-adapter case is
unchanged (zero-copy shared ring as today). The whole worker pipeline (copy, cursor
composite, HDR convert, GenerateMips) already runs on the capture device — it moves
with it for free.
Touch: `duplication_source.cpp` start()/acquireDuplication (~half day with C).

### C. CPU bridge for the foreign-adapter case
Chosen approach: **staging readback → upload through a D3D12 COPY queue** (the
simple, robust option; cross-adapter heaps are a later optimization if profiling
demands it).
- Capture device: after mips, CopyResource ring texture → D3D11 staging texture
  (CPU-readable, full mip chain).
- Worker maps each subresource and memcpys into a persistently-mapped D3D12 upload
  buffer (GetCopyableFootprints layout).
- Dedicated D3D12 COPY queue + per-slot command allocator (created at start, owned by
  the worker): CopyBufferRegion/CopyTextureRegion into the D3D12 ring texture
  (DEFAULT heap, created COMMON), fence-signal, CPU-wait, THEN publish. Consumer sees
  the exact same contract as the shared-handle path.
- Bandwidth ballpark: 1080p BGRA ×1.33 (mips) ×75 Hz ≈ 0.8 GB/s worst case; the rate
  cap plus content-change gating keeps the typical far lower. 4K ≈ 3 GB/s worst case —
  acceptable over PCIe but watch the gate.
Touch: `duplication_source.{h,cpp}` (~1 day).
**Gate (B+C together, on enlyzeam rig):** floating desktop visible from an
iGPU-owned monitor; cap= tracks content; no [CAP LOST]; gpu=/frame= watermarks clean
at 75 Hz on the render side; same-adapter rigs byte-identical (zero-copy path
untouched, A/B by log line).

### D. Field validation + perf characterization
Extended eyes-in on enlyzeam: latency feel of captured content (bridge adds ≤1 frame
of CONTENT latency — pose latency unaffected), CPU% of the capture thread, episode
behavior under the known kernel stalls.
**Gate:** daily-driver verdict on the hybrid machine; capture thread CPU < ~10% of
one core at 1080p; content latency not user-noticeable vs the zero-copy rig.

## Out of scope
D3D12 cross-adapter heaps (optimization, revisit only if C's CPU cost fails gate D).
Capturing outputs of a THIRD adapter class (USB display adapters etc.) — same bridge
should work, not validated. Per-window capture (WGC rung stays dormant).
