# Roadmap: Screen Timelapse MCP Server

## Milestones

- ✅ **v1.0 MVP** - Phases 1-4 (shipped 2026-04-12)
- ✅ **v1.1 Native DWM Window Capture** - Phases 5-7 (shipped 2026-04-13)
- ✅ **v1.2 Capture Profiles & Frame Repository** - Phases 8-11 (shipped 2026-04-13)

## Phases

<details>
<summary>✅ v1.0 MVP (Phases 1-4) - SHIPPED 2026-04-12</summary>

- [x] **Phase 1: Core Capture Pipeline** - MCP server with desktop capture, timed scheduling, and grid image compilation
- [x] **Phase 2: Window and Region Targeting** - Capture specific windows, screen regions, and handle target lifecycle
- [x] **Phase 3: Diagnostic Features** - Delta highlighting, idle frame compression, and animated GIF export
- [x] **Phase 4: Distribution** - Package and publish as cplugs marketplace plugin

### Phase 1: Core Capture Pipeline
**Goal**: Agents can trigger a timed desktop capture session via MCP tools and retrieve a properly-sized grid image with timestamps via MCP resources
**Depends on**: Nothing (first phase)
**Requirements**: MCP-01, MCP-02, MCP-03, MCP-04, MCP-05, CAPT-01, TIME-01, TIME-02, TIME-03, TIME-04, GRID-01, GRID-02, GRID-03, GRID-04, GRID-05
**Plans:** 3 plans

Plans:
- [x] 01-01-PLAN.md — Project scaffold, stdout guard, logger, types, MCP server skeleton
- [x] 01-02-PLAN.md — Capture engine: session manager, scheduler, desktop target, tool wiring
- [x] 01-03-PLAN.md — Grid compiler, timestamp overlays, resource handler, end-to-end verification

### Phase 2: Window and Region Targeting
**Goal**: Agents can target specific application windows or screen regions instead of the full desktop
**Depends on**: Phase 1
**Requirements**: CAPT-02, CAPT-03, CAPT-04, CAPT-05, TIME-05
**Plans:** 2 plans

Plans:
- [x] 02-01-PLAN.md — Types, window-utils, and target classes (WindowTarget, RegionTarget, WindowRegionTarget)
- [x] 02-02-PLAN.md — Server wiring (list_windows tool, start_capture target factory) and scheduler skip logic

### Phase 3: Diagnostic Features
**Goal**: Agents get richer visual diagnostics that highlight what changed between frames and eliminate redundant information
**Depends on**: Phase 2
**Requirements**: DIAG-01, DIAG-02, DIAG-03
**Plans:** 2 plans

Plans:
- [x] 03-01-PLAN.md — Processing modules: pixel comparison, delta highlighter, idle compressor, GIF exporter
- [x] 03-02-PLAN.md — Server wiring: type extensions, grid pipeline integration, GIF resource endpoint

### Phase 4: Distribution
**Goal**: The tool is installable from the cplugs marketplace with zero manual configuration
**Depends on**: Phase 3
**Requirements**: DIST-01, DIST-02, DIST-03
**Plans:** 2 plans

Plans:
- [x] 04-01-PLAN.md — Plugin manifest files, tsup ESM config, build verification
- [x] 04-02-PLAN.md — Publish to cplugs marketplace, verify installation and MCP registration

</details>

### ✅ v1.1 Native DWM Window Capture (Shipped 2026-04-13)

**Milestone Goal:** Replace monitor-crop window capture with native DWM-based per-window capture for flicker-free, occlusion-immune screenshots

- [x] **Phase 5: Native Addon and DWM Capture** - C++ NAPI addon with build toolchain, API validation, and single-frame window capture
- [x] **Phase 6: Integration and Hardening** - Wire DWM capture into existing targets with automatic fallback and production robustness
- [x] **Phase 7: Native Distribution** - Prebuilt binaries and updated cplugs marketplace plugin

## Phase Details

### Phase 5: Native Addon and DWM Capture
**Goal**: A loadable native addon can capture any window by HWND and return a correct PNG buffer, using the empirically validated best DWM API
**Depends on**: Phase 4 (v1.0 complete)
**Requirements**: DWM-01, DWM-02, DWM-03, DWM-04, DWM-05
**Success Criteria** (what must be TRUE):
  1. Running `node -e "require('./native/build/...')"` loads the addon without errors and reports DWM capture availability
  2. Calling captureWindow(hwnd) on a GDI app (Notepad) returns a PNG buffer that decodes to a valid image matching the window content
  3. Calling captureWindow(hwnd) on a window partially covered by another window produces an image showing only the target window content (no overlapping window bleed)
  4. Capturing a window does not cause visible flicker or WM_PRINT-style rendering disruption in the target application
  5. The API choice (DwmGetDxSharedSurface vs WGC) is resolved via empirical testing and documented
**Plans:** 3 plans

Plans:
- [x] 05-01-PLAN.md — Build scaffold: cmake-js, D3D11 device singleton, PNG encoder, loadable .node binary
- [x] 05-02-PLAN.md — WGC capture: CaptureWorker with Windows.Graphics.Capture pipeline
- [x] 05-03-PLAN.md — TypeScript wrapper and empirical WGC validation (GDI, occlusion, flicker tests)

### Phase 6: Integration and Hardening
**Goal**: WindowTarget and WindowRegionTarget transparently use DWM capture when available, with graceful fallback and crash resilience
**Depends on**: Phase 5
**Requirements**: DWM-06, DWM-07, DWM-08, DWM-09, DWM-10, DWM-11, DWM-12
**Success Criteria** (what must be TRUE):
  1. An agent using start_capture with a window target gets DWM-quality captures without changing any tool parameters (transparent upgrade)
  2. If the native addon is missing or fails to load, window capture falls back to monitor-crop and the agent sees no error
  3. Capturing a minimized, cloaked, or destroyed window returns a structured error (not a crash) and the MCP server continues running
  4. Running 100+ consecutive captures shows no GDI handle leak (handle count stable within +/- 5)
**Plans**: 2 plans

Plans:
- [x] 06-01-PLAN.md — DWM-first capture integration (captureWindowBest helper, target rewiring)
- [x] 06-02-PLAN.md — GDI handle leak test (native addon test helper, 120-capture validation)

### Phase 7: Native Distribution
**Goal**: Users install the plugin with prebuilt native binaries -- no build tools required
**Depends on**: Phase 6
**Requirements**: DWM-13, DWM-14
**Success Criteria** (what must be TRUE):
  1. `npm install` on a clean Windows x64 machine (no Visual Studio, no CMake) downloads and installs the prebuilt .node binary without errors
  2. The updated cplugs marketplace plugin includes the native addon and captures windows using DWM after installation
**Plans**: 2 plans

Plans:
- [x] 07-01-PLAN.md — Prebuild infrastructure: node-gyp-build loader, prebuilds/ directory, package.json updates
- [x] 07-02-PLAN.md — Rebuild and republish cplugs marketplace plugin with native addon

## Progress

**Execution Order:**
Phases execute in numeric order: 5 -> 6 -> 7

| Phase | Milestone | Plans Complete | Status | Completed |
|-------|-----------|----------------|--------|-----------|
| 1. Core Capture Pipeline | v1.0 | 3/3 | Complete | 2026-04-12 |
| 2. Window and Region Targeting | v1.0 | 2/2 | Complete | 2026-04-12 |
| 3. Diagnostic Features | v1.0 | 2/2 | Complete | 2026-04-12 |
| 4. Distribution | v1.0 | 2/2 | Complete | 2026-04-12 |
| 5. Native Addon and DWM Capture | v1.1 | 3/3 | Complete | 2026-04-13 |
| 6. Integration and Hardening | v1.1 | 2/2 | Complete | 2026-04-13 |
| 7. Native Distribution | v1.1 | 2/2 | Complete | 2026-04-13 |

### ✅ v1.2 Capture Profiles & Frame Repository (Shipped 2026-04-13)

**Milestone Goal:** Enable agents to define reusable capture presets and run long capture sessions with on-demand grid compilation

- [x] **Phase 8: Screenshot Profiles** - Named capture area presets reusable by name
- [x] **Phase 9: Timing Profiles** - Named timing parameter presets reusable by name
- [x] **Phase 10: Profile-Aware Capture Tool** - start_capture accepts profile names instead of raw params (delivered in Phases 8-9)
- [x] **Phase 11: Frame Repository and Subset Grid Compilation** - Long captures with on-demand subset grid compilation (completed 2026-04-13)

## Phase Details (v1.2)

### Phase 8: Screenshot Profiles
**Goal**: Per-project named screenshot profiles storing target parameters (x/y/width/height, desktop-relative vs window-relative, window name). Agents define a capture area once and reference it by name in future start_capture calls. New MCP tools: save_screenshot_profile, list_screenshot_profiles, delete_screenshot_profile. Profiles persisted to a project-local JSON file. start_capture extended with screenshot_profile param for profile-based captures with merge semantics.
**Depends on**: Phase 7 (v1.1 complete)
**Requirements**: PROF-01, PROF-02, PROF-03, PROF-04, PROF-05, PROF-06
**Plans:** 2 plans

Plans:
- [x] 08-01-PLAN.md — Profile types, ProfileManager with file persistence, three CRUD MCP tools
- [x] 08-02-PLAN.md — Profile resolver with merge semantics, wire screenshot_profile into start_capture

### Phase 9: Timing Profiles
**Goal**: Per-project named timing profiles with built-in presets, diagnostic flags, and parameter summaries. Three new MCP tools (save/list/delete) plus timing_profile param on start_capture with merge semantics.
**Depends on**: Phase 8 (uses ProfileManager, shared JSON file, slugify, resolver pattern)
**Requirements**: D-01, D-02, D-03, D-04, D-05, D-06, D-07, D-08, D-09, D-10, D-11, D-12, D-13, D-14, D-15, D-16, D-17
**Plans:** 2 plans

Plans:
- [x] 09-01-PLAN.md — TimingProfile type, ProfileManager CRUD extension, built-in presets, parameter summary generator
- [x] 09-02-PLAN.md — Three timing profile MCP tools, resolveTimingProfile, wire timing_profile into start_capture

### Phase 10: Profile-Aware Capture Tool
**Goal**: Update start_capture to accept optional screenshot_profile and timing_profile name parameters. When provided, the tool resolves the named profiles and merges their parameters into the capture config. Allows agents to say "capture using my 'sidebar' screenshot profile with 'fast-scan' timing" instead of specifying all parameters every time.
**Depends on**: Phase 8, Phase 9
**Requirements**: TBD
**Plans**: 0 plans

Plans:
- [ ] TBD

### Phase 11: Frame Repository and Subset Grid Compilation
**Goal**: Decouple capture from grid compilation. Long-running capture sessions persist frames to disk in a frame repository. Agents query, filter, and select subsets of stored frames to compile into grid images on demand. New MCP tools: start_repository_capture, list_repository_sessions, list_repository_frames, compile_subset_grid, delete_repository_session. Repository is session-scoped with 24h retention-based cleanup.
**Depends on**: Phase 9 (v1.2 profiles complete)
**Requirements**: D-01, D-02, D-03, D-04, D-05, D-06, D-07, D-08, D-09, D-10, D-11, D-12, D-13, D-14, D-15, D-16, D-17, D-18, D-19, D-20, D-21, D-22, D-23, D-24, D-25
**Plans:** 3/3 plans complete

Plans:
- [x] 11-01-PLAN.md — Repository types, Zod schemas, RepositoryManager with disk persistence, frame-query module
- [x] 11-02-PLAN.md — Repository scheduler, start_repository_capture, list/delete session tools, get_capture_status extension
- [x] 11-03-PLAN.md — list_repository_frames and compile_subset_grid tools with change metrics and label overlay

## Progress (v1.2)

**Execution Order:**
Phases 8 and 9 can run in parallel -> Phase 10 -> Phase 11 (independent)

| Phase | Milestone | Plans Complete | Status | Completed |
|-------|-----------|----------------|--------|-----------|
| 8. Screenshot Profiles | v1.2 | 2/2 | Complete | 2026-04-13 |
| 9. Timing Profiles | v1.2 | 2/2 | Complete | 2026-04-13 |
| 10. Profile-Aware Capture Tool | v1.2 | 0/0 | Complete (delivered in 8-9) | 2026-04-13 |
| 11. Frame Repository & Subset Grid | v1.2 | 3/3 | Complete    | 2026-04-13 |

## Backlog

### Phase 999.1: Floating desktop window for agent-captured images (BACKLOG)

**Goal:** [Captured for future planning]
**Requirements:** TBD
**Plans:** 0 plans

Optional feature: surface images the agent captures/compiles as a small floating window on the desktop. Fades in/out seamlessly. Agent invokes when user says phrases like "show me as you go" or "surface along the way". Defaults to top-right corner; agent can reposition.

Plans:
- [ ] TBD (promote with /gsd-review-backlog when ready)

### Phase 999.2: VDO.Ninja video stream as capture source (BACKLOG)

**Goal:** [Captured for future planning]
**Requirements:** TBD
**Plans:** 0 plans

New optional capture **source**: VDO.Ninja video streams, usable with all screen-timelapse features in PLACE of the screen-capture source. Agent asks user for their VDO.Ninja code (e.g. `JYMW97gq`), tools pull frames from `https://vdo.ninja/?view=JYMW97gq`.

Plans:
- [ ] TBD (promote with /gsd-review-backlog when ready)

### Phase 999.3: Return image filepath directly in tool output (BACKLOG)

**Goal:** [Captured for future planning]
**Requirements:** TBD
**Plans:** 0 plans

In output data (text/json) of relevant tools, directly return filepath of resource (image) agent would inspect, rather than requiring `readMcpResource` call. Saves time and tokens.

Plans:
- [ ] TBD (promote with /gsd-review-backlog when ready)

