# Project Research Summary

**Project:** Screen Timelapse MCP Server -- v1.1 Native DWM Window Capture
**Domain:** Native C++ addon for flicker-free, occlusion-immune window capture on Windows
**Researched:** 2026-04-12
**Confidence:** MEDIUM

## Executive Summary

The v1.1 milestone adds a native C++ addon to replace the current monitor-crop capture approach (BitBlt full desktop + crop to window bounds) with DWM-based per-window capture. This eliminates two core problems: flicker from PrintWindow and overlapping-window contamination from monitor-crop. The existing project stack (Node.js 20, TypeScript, MCP SDK, sharp, node-screenshots) is validated and unchanged. New additions are a C++ NAPI addon built with cmake-js, distributed via prebuild + prebuild-install, with node-addon-api for ABI-stable bindings.

**There is a critical unresolved disagreement between researchers on which Windows capture API to use.** STACK.md and ARCHITECTURE.md recommend Windows.Graphics.Capture (WGC) -- the official Microsoft API. FEATURES.md recommends DwmGetDxSharedSurface -- an undocumented API proven in OBS. The core tension is the yellow border: WGC shows a visible yellow border around captured windows (a Windows security feature). Disabling it requires either a user consent prompt or a packaged app manifest, both problematic for a headless MCP server. DwmGetDxSharedSurface has no border but is undocumented and could break in any Windows update. This decision must be made before writing any C++ code. See the "API Choice Decision" section below.

The highest-risk aspect of this milestone is the native C++ addon itself. A crash in native code takes down the entire MCP server with no recovery. Defensive coding (RAII wrappers, COM/HRESULT checking, HWND validation) is essential. The build toolchain (cmake-js + C++/WinRT + prebuild) has multiple integration points that must be validated early. The fallback to monitor-crop (already working in v1.0) is the safety net -- the server must always function even if the native addon fails to load.

## API Choice Decision -- MUST RESOLVE BEFORE IMPLEMENTATION

This is the most consequential technical decision for v1.1. The researchers disagree:

### Option A: Windows.Graphics.Capture (WGC)
**Advocated by:** STACK.md, ARCHITECTURE.md

| Pro | Con |
|-----|-----|
| Official Microsoft API, documented, supported | Yellow border shown around captured windows |
| Captures all window types (GDI, DirectX, UWP) | Borderless requires user consent prompt OR packaged app manifest |
| Stable across Windows updates | More complex C++ (WinRT, COM, DispatcherQueue or CreateFreeThreaded) |
| Excellent samples and documentation | Consent prompt is unacceptable for automated headless agent use |
| Will not break in future Windows versions | IsBorderRequired=false needs graphicsCaptureWithoutBorder capability |

**Yellow border workarounds:**
1. Accept the border for v1.1, investigate borderless in v1.2 (ARCHITECTURE.md suggestion)
2. On Windows 11 24H2+, reports suggest the border may no longer appear by default (unverified)
3. The yellow border is NOT in the captured pixels -- it is a visual overlay. Captures are clean.

**Clarification from STACK.md:** "The yellow border can be disabled on Windows 11 via IsBorderRequired = false." However, FEATURES.md counters that this requires graphicsCaptureWithoutBorder capability declaration, which needs either a packaged app or a user consent prompt.

### Option B: DwmGetDxSharedSurface
**Advocated by:** FEATURES.md

| Pro | Con |
|-----|-----|
| No yellow border at all | Undocumented API -- could break in any Windows update |
| No user consent needed | Only works for DWM-composited windows (scope disputed -- see below) |
| Simpler implementation (fewer COM layers) | No Microsoft support or documentation |
| Proven in OBS DWMCapture plugin | Does not work in VMs (returns Invalid function) |
| Fast (~5ms per capture) | Parameters changed between Win7 and Win8 (documented breakage) |
| Frame change detection via updateId | Sparse reference material (one OBS plugin, scattered blog posts) |

**Critical factual disagreement:** STACK.md states DwmGetDxSharedSurface "only works for DirectX windows and returns blank for GDI apps." FEATURES.md states it "works for all DWM-composited windows (GDI, DirectX, WPF, UWP, etc.)." This factual conflict must be resolved empirically by testing the API against GDI windows (Notepad, mspaint) before committing to this path.

### Recommendation

**Test DwmGetDxSharedSurface against GDI windows first.** If it works for GDI apps (contradicting STACK.md), it is the pragmatically better choice for v1.1 because the yellow border is genuinely problematic for a visual debugging tool -- an agent trying to observe UI changes cannot have the UI modified by the observation. If it fails on GDI apps, WGC is the only viable path, and the yellow border must be accepted (it does not appear in captured pixels, only on the live screen).

Either way, the fallback to monitor-crop ensures the server always works.

## Key Findings

### Recommended Stack

The existing stack is unchanged. New additions for native addon development:

**Core new technologies:**
- **cmake-js ^8.0.0**: Build system -- C++/WinRT requires /std:c++17 and MSVC flags that node-gyp handles poorly
- **node-addon-api ^8.7.0**: C++ NAPI wrapper -- ABI-stable across Node.js versions, consistent with node-screenshots using napi-rs
- **prebuild ^13.0.0 + prebuild-install ^7.1.0**: Binary distribution -- prebuildify does not support cmake-js, so prebuild is the only option
- **stb_image_write**: PNG encoding in C++ -- avoids passing 8MB raw BGRA buffers across NAPI boundary
- **Windows SDK 10.0.19041+**: Required headers for D3D11, DXGI, and either WGC or DWM APIs
- **Microsoft.Windows.CppWinRT ^2.0**: C++/WinRT projection headers (only needed if WGC path chosen)

**Build prerequisites for developers:** Visual Studio Build Tools 2022, CMake 3.20+, Windows SDK. End users need none of these if prebuilt binaries are available.

### Expected Features

**Must have (table stakes):**
- Flicker-free window capture (primary motivation for v1.1)
- Occlusion-immune capture (window content only, ignoring overlapping windows)
- Drop-in CaptureTarget integration (must work with existing session-manager/scheduler/grid-compiler)
- Automatic fallback to monitor-crop when DWM capture unavailable
- Prebuilt binary for Windows x64 (users must not need build tools)

**Should have (differentiators):**
- Frame change detection via updateId (DwmGetDxSharedSurface only -- skip unchanged frames)
- DPI-aware capture using actual texture dimensions rather than coordinate math
- Cursor compositing option for mouse-interaction debugging

**Defer (v2+):**
- ARM64 prebuilt binary (low demand currently)
- Persistent capture sessions for high-frequency intervals (optimization)
- Cross-platform native addon (macOS/Linux have different capture APIs entirely)

### Architecture Approach

The native addon integrates into the existing architecture by modifying WindowTarget.capture() internals rather than creating a new target class. This avoids changes to server.ts, CaptureConfig, MCP tool schemas, or user-facing API. The addon exports a single function `captureWindow(hwnd): Promise<Buffer>` that returns PNG data. A TypeScript wrapper (dwm-capture.ts) handles addon loading and provides `captureWindowBest()` which tries DWM first and falls back to the existing monitor-crop.

**Major components:**
1. **Native C++ addon** (native/ directory) -- D3D11 device management, capture API calls, BGRA-to-PNG encoding, NAPI async worker
2. **TypeScript wrapper** (dwm-capture.ts) -- Addon loading with graceful failure, availability checking, Promise-based interface
3. **Modified WindowTarget** -- Uses captureWindowBest() instead of direct monitor-crop, transparent to the rest of the system
4. **Prebuilt binary pipeline** -- cmake-js build, prebuild for CI, prebuild-install for user machines

**Key design decisions:**
- PNG encoding happens in C++ (stb_image_write) to avoid 8MB raw buffer transfers
- D3D11 device created once at addon load, reused across captures
- Per-capture session lifecycle (create/destroy each call) -- simpler than persistent sessions, ~10ms overhead acceptable at typical intervals
- CreateFreeThreaded for WGC frame pool (no DispatcherQueue needed on libuv threads)

### Critical Pitfalls

1. **Native addon crash kills MCP server** -- No memory isolation between addon and Node.js. Use RAII for all resources, validate all HWNDs with IsWindow(), check all HRESULTs. Consider child-process isolation if stability proves insufficient.
2. **COM/WinRT not initialized on worker threads** -- Call CoInitializeEx/RoInitialize at start of every AsyncWorker::Execute(). This is the most common cause of "DWM not available" misdiagnosis.
3. **GDI/DirectX resource leaks** -- At 2 captures/second, one leaked object per cycle exhausts GDI handles in minutes, corrupting the entire Windows desktop. RAII wrappers mandatory for every resource. Monitor handle count during development.
4. **BGRA vs RGBA pixel format** -- Windows produces BGRA, sharp expects RGBA. Swap in C++ before encoding, or use DXGI format conversion during GPU-side copy.
5. **stdout pollution from C++ code** -- printf/cout in native addon corrupts MCP JSON-RPC stream. Ban stdout in all C++ code, use stderr-only logging macro.

## Implications for Roadmap

### Phase 1: Build Toolchain and Addon Scaffold
**Rationale:** Highest risk is the C++ build system. cmake-js + C++/WinRT + NAPI integration has multiple failure points. Validate this first before writing capture logic.
**Delivers:** Native addon that loads in Node.js, exports isAvailable(), creates D3D11 device
**Addresses:** Build system validation, prebuilt binary pipeline setup
**Avoids:** Pitfall #6 (build failures), Pitfall #9 (ABI breaks), Pitfall #12 (stdout pollution -- establish stderr macro)
**Includes:** API choice validation -- test DwmGetDxSharedSurface against GDI windows to resolve the researcher disagreement

### Phase 2: Single-Frame Capture Implementation
**Rationale:** Core value delivery. If capture works correctly, the milestone succeeds.
**Delivers:** captureWindow(hwnd) returning valid PNG buffer with correct colors and dimensions
**Addresses:** Flicker-free capture, occlusion immunity, PNG encoding
**Avoids:** Pitfall #2 (COM init), Pitfall #3 (resource leaks -- RAII from line one), Pitfall #4 (BGRA/RGBA -- color test), Pitfall #7 (thread safety)
**Uses:** cmake-js, node-addon-api, stb_image_write, chosen capture API

### Phase 3: TypeScript Integration and Fallback
**Rationale:** Wire the addon into the existing architecture with minimal disruption. Fallback must be developed alongside DWM capture, not after.
**Delivers:** WindowTarget and WindowRegionTarget using DWM capture with automatic fallback
**Addresses:** Drop-in CaptureTarget integration, automatic fallback, DPI handling
**Avoids:** Pitfall #5 (DWM unavailable -- fallback path), Pitfall #8 (DPI mismatch -- use actual texture dimensions)
**Implements:** dwm-capture.ts wrapper, captureWindowBest(), modified WindowTarget.capture()

### Phase 4: Hardening and Error Handling
**Rationale:** Production robustness. The fallback already works from v1.0, so this is additive safety.
**Delivers:** Timeout handling, window-closed-during-capture recovery, minimized/cloaked window detection, backend logging
**Addresses:** Pitfall #1 (crash resilience), Pitfall #11 (minimized/cloaked windows)

### Phase 5: Distribution
**Rationale:** Distribution is a packaging concern, not functional. Do after capture works.
**Delivers:** Prebuilt x64 binary on GitHub Releases, prebuild-install in package.json, cplugs marketplace update
**Addresses:** Prebuilt binary for Windows x64 (table stakes feature)

### Phase Ordering Rationale

- Phase 1 first because the build toolchain is the highest-risk unknown. If cmake-js + C++/WinRT + NAPI does not work, the entire approach must change.
- Phase 1 also resolves the API choice by empirical testing, unblocking Phase 2.
- Phase 2 before Phase 3 because integration is pointless without working capture.
- Phase 3 includes fallback because PITFALLS.md warns against developing capture and fallback in separate phases.
- Phase 4 after Phase 3 because hardening requires integration to be in place.
- Phase 5 last because distribution is orthogonal to functionality.

### Research Flags

Phases likely needing deeper research during planning:
- **Phase 1:** API choice validation requires hands-on testing, not more desk research. The DwmGetDxSharedSurface vs WGC disagreement can only be resolved empirically.
- **Phase 2:** WGC implementation is complex (WinRT, COM, FramePool, CreateFreeThreaded). The Win32CaptureSample reference repo is the key resource.

Phases with standard patterns (skip research-phase):
- **Phase 3:** Standard NAPI wrapper + TypeScript integration. Well-documented patterns.
- **Phase 4:** Error handling is standard defensive coding. No novel patterns.
- **Phase 5:** prebuild + prebuild-install + GitHub Actions CI is well-documented.

## Confidence Assessment

| Area | Confidence | Notes |
|------|------------|-------|
| Stack | HIGH | Official tools (cmake-js, node-addon-api, prebuild) with extensive docs. Version requirements verified. |
| Features | MEDIUM | Table stakes are clear. The API comparison is thorough but the DwmGetDxSharedSurface GDI compatibility claim is disputed between researchers. |
| Architecture | MEDIUM-HIGH | Integration strategy (modify WindowTarget, not new class) is sound. Data flow is well-specified. prebuildify vs prebuild disagreement resolved in favor of prebuild. |
| Pitfalls | HIGH | Comprehensive coverage of native addon risks. Phase-specific warnings are actionable. v1.0 pitfalls already addressed in existing codebase. |

**Overall confidence:** MEDIUM -- The technical approach is sound and well-researched, but the fundamental API choice is unresolved and can only be settled by testing.

### Gaps to Address

- **DwmGetDxSharedSurface GDI compatibility:** Must test empirically against Notepad, mspaint, and other GDI apps. This determines the API choice.
- **Yellow border on Windows 11 24H2+:** Reports suggest it may no longer appear. Needs verification on a current Win11 build.
- **prebuildify vs prebuild:** ARCHITECTURE.md recommends prebuildify; STACK.md correctly identifies that prebuildify does not support cmake-js. Use prebuild + prebuild-install (STACK.md recommendation).
- **DwmGetDxSharedSurface on Windows 11 24H2:** No confirmed reports of breakage, but also low confidence it still works.
- **Child-process isolation:** PITFALLS.md suggests it as a fallback if in-process addon proves unstable. Not designed yet. Defer unless crashes occur during development.

## Sources

### Primary (HIGH confidence)
- [Windows.Graphics.Capture API](https://learn.microsoft.com/en-us/uwp/api/windows.graphics.capture) -- Official WinRT capture API reference
- [IGraphicsCaptureItemInterop::CreateForWindow](https://learn.microsoft.com/en-us/windows/win32/api/windows.graphics.capture.interop/) -- Programmatic capture without picker
- [Win32CaptureSample](https://github.com/robmikh/Win32CaptureSample) -- Microsoft reference C++ implementation
- [node-addon-api](https://www.npmjs.com/package/node-addon-api) -- v8.7.0, ABI-stable NAPI wrapper
- [cmake-js](https://github.com/cmake-js/cmake-js) -- CMake-based native addon build tool
- [Node-API docs](https://nodejs.org/api/n-api.html) -- ABI stability, AsyncWorker patterns

### Secondary (MEDIUM confidence)
- [OBS DWMCapture plugin](https://github.com/notr1ch/DWMCapture) -- DwmGetDxSharedSurface reference implementation
- [DwmGetDxSharedSurface undocumented reference](https://undoc.airesoft.co.uk/user32.dll/DwmGetDxSharedSurface.php) -- Function signature and risks
- [prebuild + cmake-js support](https://github.com/cmake-js/cmake-js/issues/206) -- Confirms prebuild backend works

### Tertiary (LOW confidence)
- Reports of yellow border removal on Windows 11 24H2+ -- Unverified community reports
- DwmGetDxSharedSurface compatibility with Windows 11 24H2 -- No confirmed testing

---
*Research completed: 2026-04-12*
*Ready for roadmap: yes (after API choice is resolved in Phase 1)*
