## Substack Creator Newsletter Engine

A tool for brand-driven Substack content creation. Designed for a company like P&G — but not specific to any company, and not dependent on any nonpublic information other than what the user shares at runtime.

---

### Technology

Pure React frontend (no separate backend). Private repo on GitHub.

Deployment readiness for **GitHub -> Railway** is a required deliverable, but it is validated by repository/code review only (configuration, commands, and docs are present and coherent).
Live deployment is explicitly prohibited during build, test, and acceptance verification for this project.

- **Gemini 3.1 Flash** for small fast things.
- **Gemini 3.1 Pro with extended thinking** for anything important.
- **Gemini 2.5 Flash Lite** for tests (replacing both of the above).
- Both accessed client-side using the user's API key, which is entered during configuration and persisted in IndexedDB.

**IndexedDB** (via the `idb` library) is used for all local persistence: configuration, drafts, cached demo sessions, and post history.
All persisted data is retained indefinitely unless the user deletes it.

No RAG — input prompts should just be capped at generous, large values.

All LLM replies use **structured output mode** (JSON replies enforced to follow a schema). If they fail, they retry — feeding back the incorrect answer with the reason for failure, if appropriate — and back off intelligently. This is especially important for the fast test mode using Gemini 2.5 Flash Lite, where nondeterminism would otherwise cause crashes.

---

### Testing

**Comprehensive integration tests** that include running everything end-to-end, with comprehensive "canned" data that is generated by LLMs and hardcoded.

**Smoke tests** that run only after integration tests pass. These use live LLM calls but substitute **Gemini 2.5 Flash Lite** for all models, for speed and cost. They do not depend on strict output adherence (the structured retry logic handles nondeterminism).

**Manual test option** to use the real LLM configuration (3.1 Flash + 3.1 Pro) for everything.

This test matrix is critical because the agentic development system needs a tight iterative loop for QA.

#### Browser Verification Contract

- Integration browser tests must run with canned data and no external network dependency.
- Smoke browser tests must run only after integration tests pass and use live Gemini 2.5 Flash Lite in place of production models.
- Manual browser tests may use the full production model mix (Gemini 3.1 Flash + Gemini 3.1 Pro).
- Browser verification is evidence-based, not exit-code-only. Every browser scenario must emit:
  - a manifest entry in `.ai/test-evidence/latest/manifest.json`
  - required UI artifacts (screenshots and relevant trace/log evidence)
  - a console/unhandled-error summary for the scenario
- On failure, browser tests must still write best-effort evidence artifacts plus a failing manifest entry.

#### Testability Requirements

- Critical UI actions/states must expose stable `data-testid` selectors to keep tests resilient to cosmetic copy/styling changes.
- Browser test runs must be deterministic where practical: fixed viewport profile, fixed timezone/locale, and seeded/canned fixtures for integration mode.
- Test harness must capture browser console errors, uncaught exceptions, and failed network requests as artifacts.
- Tests must be able to reset to first-run state by clearing IndexedDB between scenarios.

#### v01 E2E Critical Paths

1. First-run setup routing and completion flow (Settings -> Dashboard).
2. Full New Post flow (Topic -> Research -> Outline -> Write/Edit/Guardrails -> Complete) with citation lineage persistence.
3. Demo replay flow (bundled P&G session), including cache-miss error behavior.
4. Reset Everything flow, with complete IndexedDB wipe verification.

---

### Demo Mode and Production Mode

**In production mode**, every session (configuration + post creation) is recorded: all inputs, all LLM responses, all intermediate state. Sessions are stored in IndexedDB.

**In demo mode**, you choose from all recorded sessions. When you do, the full experience replays using the production code path — text fields prefill instantly with a subtle fade-in, attachments appear a moment after the page loads, and the button that was clicked next becomes highlighted. The user clicks through each step, seeing exactly what a real session looks like.
If a demo replay has a cache miss, demo mode shows an error and does not fall back to live API calls.

The codebase ships with **one session of a P&G post**, so demo mode works out of the box. This includes the configuration first-run using the cached configuration. The initial bundled session is best-effort quality and can later be swapped with a cached run. The easiest way to build a great demo library is to record several live sessions and keep the best ones.

---

### Setup Flow

Setup is centered on a **Settings** page. It runs on first run and can also be opened later from a dashboard option.
On app start, first run goes to Settings, while non-first-run goes directly to the dashboard.

Each setup option is parallel, not a forced series. Users can complete items in any order.
Each setup option shows an icon when it is empty so users can see what still needs to be completed.
v01 supports one company workspace only.

Each input step uses a **rich input control** — a reusable component that accepts any combination of typed/pasted text, uploaded documents, and links. This same component is used everywhere the user provides information to the system.

1. **API key setup** — collect the user's API key and store it in IndexedDB.
2. **Identify your company** — via the rich input control, then **Confirm company** via **Gemini 3.1 Pro**, with a back button to return.
3. **Define voice** — via the rich input control, then **Confirm voice** via **Gemini 3.1 Pro**, with a back button to return.
4. **Define guardrails** — via the rich input control, then **Confirm guardrails** via **Gemini 3.1 Pro**, with a back button to return.
5. **Proceed to dashboard** once required setup items are complete.

The Settings page also contains a less visually obvious **"Reset everything"** option. It requires confirmation and deletes all local data.

---

### Dashboard

- Prominent **"New Post"** button.
- Prominent **"Trending Topics"** button.
- Dashboard option to open the setup Settings page.
- Past posts and drafts, with support to view prior posts and resume drafts.

---

### Trending Topics

As soon as you open it, the system autonomously launches a set of **Gemini 3.1 Flash calls with search grounding enabled**. Each call is a targeted research query (e.g., "Research recent trends in [industry], summarize, include quotes with sources"). As results return, the system builds a **visualization of topics and trends** deterministically and displays them. In demo mode, the API results are cached, so after a quick animation of a few seconds everything loads.

Then **Gemini 3.1 Pro** synthesizes the sources and presents **3 prompts** for what to write about. Selecting one jumps to "New Post" prefilled with that topic.

---

### New Post

#### 1. Topic

Starts with the **rich input control** — the same reusable component. Then a button for "Research," the next step.

#### 2. Research

Uses **Gemini 3.1 Flash with search grounding** to find and retrieve sources. As each source comes back, the user sees a headline and snippet (rapid, and cached in demo mode). Each source carries structured metadata — URL, title, author, publication date — which persists through the pipeline for attribution.

The user can **highlight** interesting sources or **delete** irrelevant/unwanted ones. Then next.

#### 3. Outline

Draft an outline of the post using **Gemini 3.1 Pro**. This is **one-shot** from all materials gathered so far. The user can accept or go back.

#### 4. Write

A **3-cycle process** using **Gemini 3.1 Pro** that the user sees visually:
In v01, these three cycles run strictly automatically with no manual pause/edit/rerun controls between steps.

1. **Write** — attempts to one-shot the article without the guardrails, incorporating inline citations that reference the research sources.
2. **Edit** — asks for revisions to better align with the style guide, and rewrites as needed to improve the draft holistically.
3. **Guardrails** — the only pass that receives the guardrails document (and nothing else besides the post). Assigned to fix any guardrails issues.

Each cycle is visible to the user as it progresses.
Citation behavior is best effort for credibility: source-derived claims should be cited, while unsourced generic or opinion statements are allowed.

#### 5. Complete

Shows the final formatted post with proper footnotes — each citation rendered with its source title and link. Saves the result as Markdown and displays it in a formatted Markdown viewer. Then return to the dashboard.

### Visual Design

- Use a similar visual look and feel (font family, colors, visual styling) as Substack
- Input surface: textarea with an attach and link toolbar at the bottom.
- Multi-step flows show horizontal dots as step indicators: accent = current, green = done, outlined = future.
- Cards as containers: Content blocks, source results, confirmations, previews — all render in the same card primitive. Clean spacing, no shadows.
- Horizontal bars as progress. No spinners, no skeletons. Accent color fill animating left-to-right.
- Final output looks like Substack. The post preview is serif, with numbered footnotes and linked sources. If it doesn't look like a real Substack post, it's wrong.

### UI Diagram (Graphviz)

See [substack-spec-v01-ui.gv](./substack-spec-v01-ui.gv).