# Architecture: Native DWM Window Capture Integration

**Domain:** Native C++ addon for flicker-free window capture via Windows.Graphics.Capture API
**Researched:** 2026-04-12
**Overall confidence:** MEDIUM-HIGH

## Recommended Architecture

### Capture API Choice: Windows.Graphics.Capture (not DWM Thumbnail, not DWM Shared Surface)

Three DWM-based capture approaches were evaluated:

| Approach | Pixel Access | Occlusion-Free | Flicker-Free | Complexity | Verdict |
|----------|-------------|----------------|--------------|------------|---------|
| DwmRegisterThumbnail | NO (renders to window, no pixel readback) | Yes | Yes | Low | **Rejected** -- no pixel data extraction |
| DwmGetDxSharedSurface | Yes (undocumented) | Yes | Yes | Medium | **Rejected** -- undocumented API, breaks between Windows versions |
| Windows.Graphics.Capture | Yes (D3D11 texture -> MapSubresource) | Yes | Yes | High | **Selected** -- official, documented, supported on Win10 1903+ / Win11 |

**Decision:** Use `Windows.Graphics.Capture` via `IGraphicsCaptureItemInterop::CreateForWindow(hwnd)`. This is the official Microsoft API for programmatic window capture. It captures from the DWM composition surface, meaning overlapping windows are ignored and no WM_PRINT is sent.

**Yellow border caveat:** Windows.Graphics.Capture shows a yellow border around captured windows. On Windows 11, this can be disabled via `GraphicsCaptureSession.IsBorderRequired = false` after calling `GraphicsCaptureAccess.RequestAccessAsync(GraphicsCaptureAccessKind.Borderless)`. This requires the `graphicsCaptureWithoutBorder` capability. For an MCP server (non-packaged Win32 app), this capability declaration may not be straightforward. **Phase 1 should accept the yellow border; Phase 2 can investigate borderless capture.**

### Integration Strategy: Modify WindowTarget, Not New Target Class

**Decision:** The native addon replaces the internal capture mechanism inside `WindowTarget.capture()` and `WindowRegionTarget.capture()`, rather than creating a new `DwmTarget` class.

**Rationale:**
1. The `CaptureTarget` interface contract is `capture(): Promise<Buffer>` returning a PNG buffer. DWM capture produces the same output -- a PNG buffer of a window's content.
2. The existing `WindowTarget` already does "capture a window by handle/title" -- DWM capture is a *better implementation* of the same operation, not a different operation.
3. A separate `DwmTarget` would require changes to `server.ts` target factory, `CaptureConfig.target` enum, and MCP tool schemas. This is unnecessary churn.
4. Auto-detection (DWM available? use it; otherwise fall back to monitor-crop) belongs inside `WindowTarget`, keeping the rest of the system unaware.

### Component Boundaries

```
Existing (unchanged)           New/Modified
================================================================================================
CaptureTarget interface   -->  (unchanged)
DesktopTarget             -->  (unchanged)
RegionTarget              -->  (unchanged)
SessionManager            -->  (unchanged)
Scheduler                 -->  (unchanged)
Grid Compiler             -->  (unchanged)
Server (MCP tools)        -->  (unchanged)

WindowTarget.capture()    -->  Modified: try DWM first, fall back to monitor-crop
WindowRegionTarget        -->  Modified: try DWM first, fall back to monitor-crop
window-utils.ts           -->  Modified: add DWM capture function alongside captureWindowViaMonitor

NEW: native/                   C++ NAPI addon source
NEW: native/src/capture.cpp    Windows.Graphics.Capture implementation
NEW: native/src/capture.h      Header
NEW: native/CMakeLists.txt     cmake-js build config
NEW: src/capture/targets/dwm-capture.ts   TypeScript wrapper around native addon
```

## Data Flow: DWM Surface to CaptureFrame

### Detailed Pipeline

```
1. JS: WindowTarget.capture() called by Scheduler
       |
2. JS: dwm-capture.ts: captureWindowDwm(hwnd: number): Promise<Buffer>
       |  (calls native addon async function)
       |
3. C++: Napi::AsyncWorker subclass executes on libuv thread pool:
       |
       a. Initialize D3D11 device + context (cached per-addon-lifetime)
       |
       b. IGraphicsCaptureItemInterop::CreateForWindow(hwnd)
       |     -> GraphicsCaptureItem
       |
       c. Direct3D11CaptureFramePool::CreateFreeThreaded(
       |       device, DXGI_FORMAT_B8G8R8A8_UNORM, 1, size)
       |     -> FramePool (free-threaded: no DispatcherQueue needed)
       |
       d. FramePool.CreateCaptureSession(item)
       |     -> GraphicsCaptureSession
       |
       e. session.StartCapture()
       |
       f. Wait for FrameArrived event (Win32 event + WaitForSingleObject)
       |
       g. frame = framePool.TryGetNextFrame()
       |     -> Direct3D11CaptureFrame (contains IDXGISurface)
       |
       h. Copy frame texture to CPU-readable staging texture:
       |     - CreateTexture2D(D3D11_USAGE_STAGING, D3D11_CPU_ACCESS_READ)
       |     - CopyResource(staging, frameSurface)
       |     - Map(staging, D3D11_MAP_READ) -> D3D11_MAPPED_SUBRESOURCE
       |     -> Raw BGRA pixel buffer with row pitch
       |
       i. session.Close(), framePool.Close()
       |     (release DWM resources immediately)
       |
       j. Allocate output buffer, convert BGRA -> PNG (or return raw BGRA)
       |
4. C++: AsyncWorker.OnOK():
       |     Return Napi::Buffer<uint8_t>::New(env, data, length, destructor)
       |
5. JS: Promise resolves with Buffer (PNG data)
       |
6. JS: WindowTarget returns buffer to Scheduler
       |
7. JS: Scheduler stores as CaptureFrame { buffer, elapsedMs, index }
```

### Key Design Decisions in Data Flow

**PNG encoding location:** Encode to PNG in C++ using a lightweight encoder (stb_image_write or libpng). This avoids passing large raw BGRA buffers (4 * width * height bytes) across the NAPI boundary. A 1920x1080 window = 8.3MB raw vs ~200KB PNG. The existing code path already expects PNG buffers from `toPngSync()`.

**Async execution:** Use `Napi::AsyncWorker` so the D3D11 operations (including WaitForSingleObject for frame arrival) run on libuv's thread pool, not blocking the Node.js event loop. This is critical because the scheduler calls `await target.capture()`.

**D3D11 device lifetime:** Create the D3D11 device once when the addon loads and reuse it across captures. Device creation is expensive (~50ms). Store as module-level state in the addon.

**CreateFreeThreaded over Create:** The `Direct3D11CaptureFramePool::CreateFreeThreaded()` method is essential. The regular `Create()` requires a `DispatcherQueue` on the calling thread, which libuv worker threads don't have. `CreateFreeThreaded` fires `FrameArrived` on an internal thread, which we synchronize with a Win32 event.

## DWM Session Lifecycle Management

### Per-Capture Lifecycle (Recommended for v1.1)

Each call to `captureWindowDwm(hwnd)` creates and destroys a full capture session:

```cpp
// Pseudocode for single-frame capture
Napi::Value CaptureWindow(const Napi::CallbackInfo& info) {
    HWND hwnd = (HWND)(intptr_t)info[0].As<Napi::Number>().Int64Value();
    // Returns Promise via AsyncWorker
    auto* worker = new CaptureWorker(env, hwnd, g_d3dDevice);
    worker->Queue();
    return worker->Promise();
}

// Inside CaptureWorker::Execute() (runs on libuv thread):
void Execute() override {
    // 1. Create capture item from HWND
    auto item = CreateCaptureItem(hwnd_);
    
    // 2. Create frame pool (free-threaded)
    auto framePool = CreateFramePool(device_, item.Size());
    
    // 3. Subscribe to FrameArrived with Win32 event
    HANDLE frameEvent = CreateEvent(nullptr, FALSE, FALSE, nullptr);
    framePool.FrameArrived([&](auto&, auto&) { SetEvent(frameEvent); });
    
    // 4. Start capture session
    auto session = framePool.CreateCaptureSession(item);
    session.StartCapture();
    
    // 5. Wait for first frame (with timeout)
    DWORD result = WaitForSingleObject(frameEvent, 2000); // 2s timeout
    if (result == WAIT_TIMEOUT) {
        SetError("DWM capture timeout");
        goto cleanup;
    }
    
    // 6. Get frame and extract pixels
    auto frame = framePool.TryGetNextFrame();
    ExtractPixelsToPng(frame, &outputBuffer_, &outputSize_);
    
cleanup:
    // 7. Always release DWM resources
    session.Close();
    framePool.Close();
    CloseHandle(frameEvent);
}
```

**Why per-capture, not persistent sessions:**
- The scheduler calls `target.capture()` at intervals (100ms-10s+). Between calls, holding a DWM capture session open is wasteful and may interfere with other capture tools.
- DWM capture item is tied to an HWND. If the window is closed/recreated between captures, a persistent session would break anyway.
- The overhead of session setup (~5-15ms) is acceptable given typical capture intervals of 500ms+.
- Simpler error handling: each capture is independent, no stale session state.

### Future Optimization: Persistent Session (v1.2+)

For high-frequency captures (100ms interval), a persistent session could be beneficial:

```typescript
// Future API shape (NOT for v1.1)
interface DwmCaptureSession {
    start(hwnd: number): void;
    grabFrame(): Promise<Buffer>;  // Gets latest frame from pool
    stop(): void;
}
```

This would avoid repeated session setup but adds complexity around session invalidation (window close, minimize, resize) and resource cleanup.

## Native Addon Structure

### Directory Layout

```
native/
  CMakeLists.txt           # cmake-js build configuration
  src/
    addon.cpp              # NAPI module init, exports captureWindow()
    capture.cpp            # Windows.Graphics.Capture implementation
    capture.h              # Header with CaptureWorker class
    d3d_device.cpp         # D3D11 device creation and caching
    d3d_device.h
    png_encoder.cpp        # BGRA -> PNG encoding (stb_image_write)
    png_encoder.h
  vendor/
    stb_image_write.h      # Single-header PNG encoder (public domain)
```

### Build System: cmake-js

**Decision:** Use cmake-js over node-gyp because:
1. C++/WinRT requires `/std:c++17` and specific Windows SDK headers. CMake handles this more naturally than GYP.
2. Linking `d3d11.lib`, `dxgi.lib`, `windowsapp.lib` is straightforward in CMake.
3. The cppwinrt NuGet package integrates cleanly with CMake via `find_package`.

### CMakeLists.txt Skeleton

```cmake
cmake_minimum_required(VERSION 3.15)
project(dwm-capture)

# C++/WinRT requires C++17
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED ON)

# Node-API / cmake-js integration
include_directories(${CMAKE_JS_INC})

add_library(${PROJECT_NAME} SHARED
    src/addon.cpp
    src/capture.cpp
    src/d3d_device.cpp
    src/png_encoder.cpp
)

target_include_directories(${PROJECT_NAME} PRIVATE
    ${CMAKE_JS_INC}
    ${CMAKE_CURRENT_SOURCE_DIR}/src
    ${CMAKE_CURRENT_SOURCE_DIR}/vendor
)

target_link_libraries(${PROJECT_NAME}
    ${CMAKE_JS_LIB}
    d3d11.lib
    dxgi.lib
    windowsapp.lib  # WinRT runtime
    dwmapi.lib
)

# node-addon-api headers
execute_process(
    COMMAND node -p "require('node-addon-api').include"
    WORKING_DIRECTORY ${CMAKE_SOURCE_DIR}
    OUTPUT_VARIABLE NODE_ADDON_API_DIR
    OUTPUT_STRIP_TRAILING_WHITESPACE
)
target_include_directories(${PROJECT_NAME} PRIVATE ${NODE_ADDON_API_DIR})

# NAPI version
target_compile_definitions(${PROJECT_NAME} PRIVATE NAPI_VERSION=8)

set_target_properties(${PROJECT_NAME} PROPERTIES
    PREFIX ""
    SUFFIX ".node"
)
```

### NAPI Export Surface

```typescript
// src/capture/targets/dwm-capture.ts
// TypeScript wrapper around native addon

let nativeAddon: NativeAddon | null = null;
let loadAttempted = false;

interface NativeAddon {
    captureWindow(hwnd: number): Promise<Buffer>;
    isAvailable(): boolean;
}

function loadAddon(): NativeAddon | null {
    if (loadAttempted) return nativeAddon;
    loadAttempted = true;
    try {
        nativeAddon = require('../../native/build/Release/dwm-capture.node');
        return nativeAddon;
    } catch {
        return null;
    }
}

/** Check if DWM capture is available on this platform. */
export function isDwmCaptureAvailable(): boolean {
    const addon = loadAddon();
    return addon?.isAvailable() ?? false;
}

/**
 * Capture a window via DWM (Windows.Graphics.Capture).
 * Returns PNG buffer, or null if DWM capture is unavailable/fails.
 */
export async function captureWindowDwm(hwnd: number): Promise<Buffer | null> {
    const addon = loadAddon();
    if (!addon?.isAvailable()) return null;
    try {
        return await addon.captureWindow(hwnd);
    } catch {
        return null; // Caller falls back to monitor-crop
    }
}
```

## Modified Files

### window-utils.ts -- Add DWM-first capture function

```typescript
// New export alongside existing captureWindowViaMonitor (which remains for fallback)
import { captureWindowDwm, isDwmCaptureAvailable } from "./dwm-capture.js";

/**
 * Capture a window using the best available method.
 * Tries DWM capture first (flicker-free, occlusion-free), then falls back
 * to monitor capture + crop.
 */
export async function captureWindowBest(win: Window): Promise<Buffer> {
    // Try DWM capture first
    if (isDwmCaptureAvailable()) {
        const hwnd = win.id(); // node-screenshots Window.id() returns HWND as number
        const buffer = await captureWindowDwm(hwnd);
        if (buffer) return buffer; // Already PNG
    }
    
    // Fall back to monitor capture + crop (existing behavior)
    const image = captureWindowViaMonitor(win);
    return image.toPngSync();
}
```

### WindowTarget.capture() -- Use async best-effort

```typescript
// Change from:
//   const image = captureWindowViaMonitor(win);
//   return image.toPngSync();
// To:
async capture(): Promise<Buffer> {
    const win = findWindow(this.handle, this.titleMatch);
    if (!win) { /* existing error */ }
    if (win.isMinimized()) { /* existing error */ }
    
    return captureWindowBest(win);
}
```

**Note:** The current `WindowTarget.capture()` is technically async (returns `Promise<Buffer>`) but the implementation is fully synchronous. The DWM path makes it truly async. This is safe because the scheduler already `await`s the result.

### WindowRegionTarget.capture() -- Similar modification

The DWM capture returns the full window image as PNG. `WindowRegionTarget` would:
1. Get full window PNG via `captureWindowBest(win)` 
2. Use sharp to crop the sub-region from the PNG

This differs from the current path (monitor crop -> window Image -> region crop via `Image.cropSync()`). The sharp crop adds negligible overhead and correctly handles the DWM case.

## Prebuilt Binary Strategy

**Use prebuildify** to bundle prebuilt `.node` binaries with the npm package:

```json
{
  "scripts": {
    "build:native": "cmake-js compile -d native",
    "prebuild": "prebuildify --napi --strip",
    "install": "node-gyp-build"
  },
  "dependencies": {
    "node-gyp-build": "^4.8.0"
  },
  "devDependencies": {
    "prebuildify": "^6.0.0",
    "cmake-js": "^7.3.0",
    "node-addon-api": "^8.0.0"
  }
}
```

**prebuildify** bundles the `.node` file inside `prebuilds/win32-x64/` in the published npm package. At install time, `node-gyp-build` checks for a matching prebuild before attempting compilation. This matches the pattern used by `node-screenshots` and `sharp`.

Only Windows x64 prebuilds are needed (per project constraints).

## Suggested Build Order for v1.1 Milestone

```
Phase 1: Native addon scaffold + D3D11 device initialization
  - native/ directory structure, CMakeLists.txt
  - addon.cpp with isAvailable() export
  - d3d_device.cpp: D3D11CreateDevice, cached singleton
  - Verify: addon loads, isAvailable() returns true on Win11
  
Phase 2: Single-frame capture (core C++ implementation)
  - capture.cpp: CaptureWorker with Windows.Graphics.Capture flow
  - png_encoder.cpp: BGRA -> PNG via stb_image_write
  - Verify: captureWindow(hwnd) returns valid PNG buffer

Phase 3: TypeScript integration layer
  - dwm-capture.ts: loader + wrapper
  - window-utils.ts: captureWindowBest() function
  - WindowTarget.capture(): use captureWindowBest
  - WindowRegionTarget.capture(): use captureWindowBest + sharp crop
  - Verify: existing tests pass, window capture uses DWM when available

Phase 4: Fallback + error handling
  - Graceful fallback when DWM unavailable (non-Windows, old Windows)
  - Timeout handling (2s timeout on WaitForSingleObject)
  - Window-not-found / window-closed during capture errors
  - Verify: monitor-crop fallback works when addon missing

Phase 5: Prebuilt binaries + distribution
  - prebuildify configuration
  - CI build for Windows x64
  - cplugs marketplace update
```

**Phase ordering rationale:**
- Phase 1 validates that the C++ build toolchain works (cmake-js, C++/WinRT headers, linking). This is the highest-risk step.
- Phase 2 is the core value delivery -- if capture works, the project succeeds.
- Phase 3 wires it into the existing architecture with minimal changes to existing code.
- Phase 4 hardens for production (the monitor-crop fallback already works, so this is additive safety).
- Phase 5 is distribution concern, not functional.

## Anti-Patterns to Avoid

### Anti-Pattern 1: Persistent Capture Sessions Across Scheduler Ticks
**What:** Keeping a GraphicsCaptureSession alive between `target.capture()` calls.
**Why bad:** Session invalidation on window close/resize creates complex state management. The session holds GPU resources between captures. Error recovery requires rebuilding the session anyway.
**Instead:** Create and destroy sessions per-capture. The ~10ms overhead is negligible at typical intervals.

### Anti-Pattern 2: Raw BGRA Buffer Across NAPI Boundary
**What:** Returning raw pixel data from C++ and encoding PNG in JS.
**Why bad:** 8MB+ buffer transfer for 1080p. V8 garbage collector pressure. Unnecessary copy.
**Instead:** Encode PNG in C++ (stb_image_write is a single header, ~50KB). Return the much smaller PNG buffer.

### Anti-Pattern 3: Blocking the Event Loop with Synchronous D3D11
**What:** Using synchronous `Napi::Function` for capture.
**Why bad:** D3D11 CopyResource + Map + WaitForSingleObject block for 5-30ms.
**Instead:** Always use `Napi::AsyncWorker` or `Napi::Promise` deferred pattern.

### Anti-Pattern 4: Creating D3D11 Device Per Capture
**What:** Calling `D3D11CreateDevice()` inside every `CaptureWorker::Execute()`.
**Why bad:** Device creation is expensive (~50ms) and unnecessary repeated work.
**Instead:** Create once at addon initialization, store as module-level `ComPtr<ID3D11Device>`.

### Anti-Pattern 5: New DwmTarget Class
**What:** Creating a parallel `DwmTarget implements CaptureTarget` alongside `WindowTarget`.
**Why bad:** Forces changes to server.ts factory, CaptureConfig enum, MCP tool schemas, and user-facing API. DWM capture is not a different *target*, it is a different *capture mechanism* for the same target (a window).
**Instead:** Modify `WindowTarget` internals to prefer DWM, fall back transparently.

## Sources

- [Windows.Graphics.Capture namespace](https://learn.microsoft.com/en-us/uwp/api/windows.graphics.capture?view=winrt-26100) -- Official API reference (HIGH confidence)
- [IGraphicsCaptureItemInterop::CreateForWindow](https://learn.microsoft.com/en-us/windows/win32/api/windows.graphics.capture.interop/nf-windows-graphics-capture-interop-igraphicscaptureiteminterop-createforwindow) -- Programmatic window capture without picker (HIGH confidence)
- [Win32CaptureSample (robmikh)](https://github.com/robmikh/Win32CaptureSample) -- Reference C++ implementation with CaptureSnapshot (HIGH confidence)
- [GraphicsCaptureSession.IsBorderRequired](https://learn.microsoft.com/en-us/uwp/api/windows.graphics.capture.graphicscapturesession.isborderrequired?view=winrt-26100) -- Yellow border control, Win11 only (HIGH confidence)
- [node-addon-api Buffer docs](https://github.com/nodejs/node-addon-api/blob/main/doc/buffer.md) -- NAPI buffer factory methods with finalizers (HIGH confidence)
- [Node-API official docs](https://nodejs.org/api/n-api.html) -- AsyncWorker, Buffer patterns (HIGH confidence)
- [cmake-js](https://github.com/cmake-js/cmake-js) -- CMake-based native addon build tool (HIGH confidence)
- [prebuildify / node-gyp-build](https://nodejs.github.io/node-addon-examples/build-tools/prebuild/) -- Prebuilt binary bundling (HIGH confidence)
- [DWM Thumbnail Overview](https://learn.microsoft.com/en-us/windows/win32/dwm/thumbnail-ovw) -- Evaluated and rejected, no pixel readback (HIGH confidence)
- [DWMCapture OBS plugin](https://github.com/notr1ch/DWMCapture) -- DWM shared surface approach, undocumented APIs (MEDIUM confidence)
- [Screen capture overview](https://learn.microsoft.com/en-us/windows/uwp/audio-video-camera/screen-capture) -- Microsoft's recommended capture approach (HIGH confidence)
- [Asynchronous C++ Addon with N-API](https://codemerx.com/blog/asynchronous-c-addon-for-node-js-with-n-api-and-node-addon-api/) -- AsyncWorker pattern reference (MEDIUM confidence)
