# apps/kb — Knowledge Base (RAG)

A retrieval-augmented-generation service that ingests Bigscreen's internal documents, GitHub commits, and Discord conversations, embeds the chunks, stores them in pgvector, and answers questions using Anthropic Claude. It doubles as the brains behind the internal Discord bot.

**Location:** [apps/kb/](../../apps/kb).
**Package:** `bigscreen_kb` (yarn workspace).
**Depends on:** [`@bigscreen/lib`](../libraries/lib.md), [`@bigscreen/auth`](../libraries/auth.md), `@anthropic-ai/sdk`, `openai`, `pgvector`, `discord.js`, `@octokit/rest`.

## Folder Layout

```
apps/kb/src/
├── api/
│   ├── KBApi.ts                 # main export — consumed by apps/admin_api
│   └── handlers/
│       ├── query.ts             # semantic search + Claude response
│       ├── ingest.ts            # document / GitHub / Discord ingestion
│       └── admin.ts             # stats, docs, feedback, logs
├── db/client.ts                 # pgvector-aware Postgres client
├── embedding/                   # OpenAI embedding service + batching
├── generation/response.ts       # Claude API wrapper
├── ingestion/                   # chunkers, document processing
├── retrieval/                   # vector similarity search
├── sync/                        # periodic GitHub / Discord sync workers
├── bot/                         # Discord bot
└── types/                       # schemas
```

## Ingestion & Query Pipeline

```mermaid
flowchart LR
    subgraph Sources
        MD[Markdown]
        PDF[PDFs]
        DOC[Word docs]
        GH[GitHub commits<br/>+ wiki]
        DC[Discord]
    end

    subgraph Ingest["Ingestion"]
        CHUNK["Chunker<br/>(recursive)"]
        EMBED["OpenAI<br/>embeddings"]
    end

    PGVEC[("pgvector<br/>on Postgres")]

    subgraph Query["Query path"]
        Q[User query]
        QEMBED[embed query]
        SEARCH[vector search<br/>+ BM25 fallback]
        CLAUDE[Claude<br/>Anthropic API]
        A[Answer]
    end

    MD --> CHUNK
    PDF --> CHUNK
    DOC --> CHUNK
    GH --> CHUNK
    DC --> CHUNK
    CHUNK --> EMBED
    EMBED --> PGVEC

    Q --> QEMBED
    QEMBED --> SEARCH
    PGVEC --> SEARCH
    SEARCH --> CLAUDE
    CLAUDE --> A
```

## Integration with `apps/admin_api`

KB endpoints are wired into `apps/admin_api` by importing `KBApi` directly ([admin_api.ts:21](../../apps/admin_api/admin_api.ts)). This means KB is not a separate HTTP server at the edge — it rides inside `admin_api` and inherits the same auth model. Admin routes for KB are gated behind the `KBAdmin` access policy.

## Discord Bot

The bot uses `discord.js` to listen in configured channels and answer questions via the same query pipeline. It lives in [apps/kb/src/bot/](../../apps/kb/src/bot). Credentials come from env vars.

## Sync Workers

Periodic jobs re-ingest sources:

- **GitHub** — pulls latest commits and wiki pages (`@octokit/rest`)
- **Discord** — snapshots public conversations

Sync cadence lives in [apps/kb/src/sync/](../../apps/kb/src/sync).

## Tests

Tests sit in [tests/kb/](../../tests/kb) and cover ingestion basics, search + query, admin endpoints, and GitHub-wiki diagnostics. Run with `ts-mocha` as usual (see [testing.md](../testing.md)).

## Further reading

- External API setup (Anthropic, OpenAI) → [external-services.md](../external-services.md)
- Access-policy model → [libraries/auth.md](../libraries/auth.md)
