# Building a RAG Knowledge Base API in TypeScript

A comprehensive guide for building a Retrieval-Augmented Generation (RAG) system that indexes company documents and Discord messages, queryable via natural language through a TypeScript API.

---

## Table of Contents

1. [Architecture Overview](#architecture-overview)
2. [Technology Stack](#technology-stack)
3. [Project Structure](#project-structure)
4. [Database Schema](#database-schema)
5. [Document Ingestion Pipeline](#document-ingestion-pipeline)
6. [Discord Ingestion](#discord-ingestion)
7. [Embedding Service](#embedding-service)
8. [Vector Search & Retrieval](#vector-search--retrieval)
9. [Query Enhancement](#query-enhancement)
10. [Hybrid Search](#hybrid-search)
11. [Re-ranking](#re-ranking)
12. [API Design](#api-design)
13. [Incremental Sync & Updates](#incremental-sync--updates)
14. [Access Control](#access-control)
15. [Testing & Evaluation](#testing--evaluation)
16. [Deployment Considerations](#deployment-considerations)

---

## Architecture Overview

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                           INGESTION PIPELINE                                 │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐    ┌───────────┐ │
│  │   Documents  │    │   Discord    │    │   Google     │    │   Other   │ │
│  │  (PDF, DOCX, │    │   Messages   │    │    Drive     │    │  Sources  │ │
│  │   MD, etc.)  │    │              │    │              │    │           │ │
│  └──────┬───────┘    └──────┬───────┘    └──────┬───────┘    └─────┬─────┘ │
│         │                   │                   │                  │       │
│         └───────────────────┴───────────────────┴──────────────────┘       │
│                                     │                                       │
│                                     ▼                                       │
│                          ┌──────────────────┐                               │
│                          │  Document Parser │                               │
│                          │  & Preprocessor  │                               │
│                          └────────┬─────────┘                               │
│                                   │                                         │
│                                   ▼                                         │
│                          ┌──────────────────┐                               │
│                          │     Chunker      │                               │
│                          │ (Recursive/      │                               │
│                          │  Semantic)       │                               │
│                          └────────┬─────────┘                               │
│                                   │                                         │
│                                   ▼                                         │
│                          ┌──────────────────┐                               │
│                          │    Embedding     │                               │
│                          │     Service      │                               │
│                          └────────┬─────────┘                               │
│                                   │                                         │
│                                   ▼                                         │
│                          ┌──────────────────┐                               │
│                          │   PostgreSQL +   │                               │
│                          │    pgvector      │                               │
│                          └──────────────────┘                               │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────────────┐
│                              QUERY PIPELINE                                  │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌──────────────┐                                                           │
│  │  User Query  │                                                           │
│  │  (Natural    │                                                           │
│  │   Language)  │                                                           │
│  └──────┬───────┘                                                           │
│         │                                                                   │
│         ▼                                                                   │
│  ┌──────────────┐    ┌──────────────┐                                       │
│  │    Query     │───▶│   Embed      │                                       │
│  │ Enhancement  │    │   Query      │                                       │
│  │ (Optional)   │    │              │                                       │
│  └──────────────┘    └──────┬───────┘                                       │
│                             │                                               │
│                             ▼                                               │
│                      ┌──────────────┐                                       │
│                      │   Hybrid     │                                       │
│                      │   Search     │                                       │
│                      │ (Vector +    │                                       │
│                      │  Keyword)    │                                       │
│                      └──────┬───────┘                                       │
│                             │                                               │
│                             ▼                                               │
│                      ┌──────────────┐                                       │
│                      │  Re-ranker   │                                       │
│                      │  (Optional)  │                                       │
│                      └──────┬───────┘                                       │
│                             │                                               │
│                             ▼                                               │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐                   │
│  │   Response   │◀───│     LLM      │◀───│  Retrieved   │                   │
│  │              │    │  Generation  │    │   Chunks     │                   │
│  └──────────────┘    └──────────────┘    └──────────────┘                   │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘
```

### How RAG Works

1. **Ingestion Phase**: Documents are parsed, split into chunks, converted to vector embeddings, and stored in a vector database alongside metadata.

2. **Query Phase**: When a user asks a question, the query is embedded using the same model. The system retrieves the most semantically similar chunks, optionally re-ranks them, then passes them as context to an LLM to generate a grounded answer.

---

## Technology Stack

### Core Dependencies

```json
{
  "dependencies": {
    // Web Framework
    "hono": "^4.0.0",
    
    // Database
    "pg": "^8.11.0",
    "pgvector": "^0.1.8",
    
    // AI/ML
    "openai": "^4.28.0",
    "@anthropic-ai/sdk": "^0.20.0",
    
    // Document Processing
    "langchain": "^0.1.0",
    "@langchain/community": "^0.0.20",
    "pdf-parse": "^1.1.1",
    "mammoth": "^1.6.0",
    "marked": "^12.0.0",
    
    // Utilities
    "zod": "^3.22.0",
    "p-queue": "^8.0.0",
    "tiktoken": "^1.0.0"
  },
  "devDependencies": {
    "typescript": "^5.3.0",
    "@types/node": "^20.0.0",
    "@types/pg": "^8.10.0",
    "vitest": "^1.0.0"
  }
}
```

### Why These Choices

| Component | Choice | Rationale |
|-----------|--------|-----------|
| **Web Framework** | Hono | Lightweight, fast, excellent TypeScript support, works everywhere (Node, Bun, Workers) |
| **Vector DB** | PostgreSQL + pgvector | Leverages existing Postgres infrastructure, good enough for millions of vectors, supports filtering |
| **Embeddings** | OpenAI text-embedding-3-small | Best price/performance ratio, 1536 dimensions, good multilingual support |
| **LLM** | Claude or GPT-4o | Both excellent for RAG; Claude tends to follow instructions more precisely |
| **Chunking** | LangChain splitters | Battle-tested, handles overlap well, multiple strategies available |

---

## Project Structure

```
src/
├── index.ts                 # Application entry point
├── config.ts                # Environment configuration
│
├── api/
│   ├── routes.ts            # API route definitions
│   ├── middleware/
│   │   ├── auth.ts          # Authentication middleware
│   │   └── rateLimit.ts     # Rate limiting
│   └── handlers/
│       ├── query.ts         # Query endpoint handler
│       ├── ingest.ts        # Document ingestion handler
│       └── admin.ts         # Admin operations
│
├── ingestion/
│   ├── pipeline.ts          # Main ingestion orchestrator
│   ├── parsers/
│   │   ├── index.ts         # Parser factory
│   │   ├── pdf.ts           # PDF parser
│   │   ├── docx.ts          # Word document parser
│   │   ├── markdown.ts      # Markdown parser
│   │   └── plaintext.ts     # Plain text parser
│   ├── chunkers/
│   │   ├── recursive.ts     # Recursive character splitter
│   │   └── semantic.ts      # Semantic chunker (optional)
│   └── sources/
│       ├── discord.ts       # Discord message ingestion
│       ├── filesystem.ts    # Local file ingestion
│       └── googleDrive.ts   # Google Drive integration (optional)
│
├── embedding/
│   ├── service.ts           # Embedding service abstraction
│   ├── openai.ts            # OpenAI embeddings implementation
│   └── batch.ts             # Batch embedding with rate limiting
│
├── retrieval/
│   ├── search.ts            # Main search orchestrator
│   ├── vector.ts            # Vector similarity search
│   ├── keyword.ts           # Keyword/BM25 search
│   ├── hybrid.ts            # Hybrid search combination
│   └── rerank.ts            # Re-ranking service
│
├── generation/
│   ├── llm.ts               # LLM service abstraction
│   ├── prompts.ts           # Prompt templates
│   └── response.ts          # Response generation
│
├── db/
│   ├── client.ts            # Database client setup
│   ├── migrations/          # SQL migration files
│   │   └── 001_initial.sql
│   └── repositories/
│       ├── documents.ts     # Document CRUD operations
│       └── chunks.ts        # Chunk operations
│
├── utils/
│   ├── tokenizer.ts         # Token counting utilities
│   ├── logger.ts            # Logging setup
│   └── errors.ts            # Custom error classes
│
└── types/
    └── index.ts             # TypeScript type definitions
```

---

## Database Schema

### Initial Migration (001_initial.sql)

```sql
-- Enable required extensions
CREATE EXTENSION IF NOT EXISTS vector;
CREATE EXTENSION IF NOT EXISTS pg_trgm;  -- For keyword search

-- Documents table: stores source document metadata
CREATE TABLE documents (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    
    -- Source information
    source_type TEXT NOT NULL,  -- 'discord', 'file', 'google_drive', etc.
    source_id TEXT NOT NULL,    -- Unique identifier within the source
    source_url TEXT,            -- URL to access the original
    
    -- Content metadata
    title TEXT,
    content_type TEXT,          -- MIME type
    content_hash TEXT NOT NULL, -- SHA-256 of content for deduplication
    
    -- Organizational metadata
    metadata JSONB DEFAULT '{}',
    
    -- Access control
    visibility TEXT DEFAULT 'internal',  -- 'public', 'internal', 'restricted'
    allowed_groups TEXT[],               -- Groups that can access this document
    
    -- Timestamps
    source_created_at TIMESTAMPTZ,
    source_updated_at TIMESTAMPTZ,
    indexed_at TIMESTAMPTZ DEFAULT NOW(),
    
    -- Constraints
    UNIQUE(source_type, source_id)
);

-- Chunks table: stores document chunks with embeddings
CREATE TABLE chunks (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    document_id UUID NOT NULL REFERENCES documents(id) ON DELETE CASCADE,
    
    -- Content
    content TEXT NOT NULL,
    content_tokens INTEGER,
    
    -- Position information
    chunk_index INTEGER NOT NULL,       -- Order within document
    start_char INTEGER,                 -- Character offset in original
    end_char INTEGER,
    
    -- Embedding
    embedding vector(1536),             -- OpenAI text-embedding-3-small dimension
    
    -- Metadata inherited from document (denormalized for query efficiency)
    source_type TEXT NOT NULL,
    metadata JSONB DEFAULT '{}',
    
    -- Timestamps
    created_at TIMESTAMPTZ DEFAULT NOW(),
    
    -- Constraints
    UNIQUE(document_id, chunk_index)
);

-- Indexes for vector similarity search
-- IVFFlat: faster queries, requires training data, good for 100k-10M vectors
CREATE INDEX chunks_embedding_ivfflat_idx ON chunks 
    USING ivfflat (embedding vector_cosine_ops)
    WITH (lists = 100);  -- Adjust based on expected data size

-- Alternative: HNSW index (better recall, more memory)
-- CREATE INDEX chunks_embedding_hnsw_idx ON chunks 
--     USING hnsw (embedding vector_cosine_ops)
--     WITH (m = 16, ef_construction = 64);

-- Indexes for filtering and keyword search
CREATE INDEX chunks_source_type_idx ON chunks(source_type);
CREATE INDEX chunks_document_id_idx ON chunks(document_id);
CREATE INDEX chunks_metadata_gin_idx ON chunks USING gin(metadata);
CREATE INDEX chunks_content_trgm_idx ON chunks USING gin(content gin_trgm_ops);

CREATE INDEX documents_source_type_idx ON documents(source_type);
CREATE INDEX documents_content_hash_idx ON documents(content_hash);
CREATE INDEX documents_metadata_gin_idx ON documents USING gin(metadata);

-- Sync state table: tracks ingestion progress for each source
CREATE TABLE sync_state (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    source_type TEXT NOT NULL,
    source_identifier TEXT NOT NULL,  -- e.g., channel ID for Discord
    last_sync_at TIMESTAMPTZ,
    last_sync_cursor TEXT,            -- Pagination cursor/timestamp
    sync_metadata JSONB DEFAULT '{}',
    
    UNIQUE(source_type, source_identifier)
);

-- Query logs for analytics and evaluation
CREATE TABLE query_logs (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    query_text TEXT NOT NULL,
    query_embedding vector(1536),
    
    -- Results
    retrieved_chunk_ids UUID[],
    generated_response TEXT,
    
    -- Performance metrics
    retrieval_time_ms INTEGER,
    generation_time_ms INTEGER,
    total_time_ms INTEGER,
    
    -- User feedback (optional)
    feedback_score INTEGER,  -- e.g., 1-5 rating
    feedback_text TEXT,
    
    -- Metadata
    user_id TEXT,
    session_id TEXT,
    created_at TIMESTAMPTZ DEFAULT NOW()
);
```

### Database Client Setup

```typescript
// src/db/client.ts
import { Pool } from 'pg';
import pgvector from 'pgvector/pg';

let pool: Pool | null = null;

export async function getPool(): Promise<Pool> {
    if (!pool) {
        pool = new Pool({
            connectionString: process.env.DATABASE_URL,
            max: 20,
            idleTimeoutMillis: 30000,
            connectionTimeoutMillis: 2000,
        });

        // Register pgvector types
        await pgvector.registerTypes(pool);
    }
    return pool;
}

export async function query<T = any>(
    text: string, 
    params?: any[]
): Promise<T[]> {
    const pool = await getPool();
    const result = await pool.query(text, params);
    return result.rows;
}

export async function transaction<T>(
    fn: (client: any) => Promise<T>
): Promise<T> {
    const pool = await getPool();
    const client = await pool.connect();
    
    try {
        await client.query('BEGIN');
        const result = await fn(client);
        await client.query('COMMIT');
        return result;
    } catch (error) {
        await client.query('ROLLBACK');
        throw error;
    } finally {
        client.release();
    }
}
```

---

## Document Ingestion Pipeline

### Types

```typescript
// src/types/index.ts

export interface Document {
    id?: string;
    sourceType: 'discord' | 'file' | 'google_drive' | 'notion';
    sourceId: string;
    sourceUrl?: string;
    title?: string;
    contentType?: string;
    contentHash: string;
    content: string;
    metadata: Record<string, any>;
    visibility: 'public' | 'internal' | 'restricted';
    allowedGroups?: string[];
    sourceCreatedAt?: Date;
    sourceUpdatedAt?: Date;
}

export interface Chunk {
    id?: string;
    documentId: string;
    content: string;
    contentTokens: number;
    chunkIndex: number;
    startChar?: number;
    endChar?: number;
    embedding?: number[];
    sourceType: string;
    metadata: Record<string, any>;
}

export interface ChunkingOptions {
    chunkSize: number;      // Target size in tokens
    chunkOverlap: number;   // Overlap between chunks in tokens
    minChunkSize: number;   // Minimum chunk size (don't create tiny chunks)
}

export interface ParsedDocument {
    content: string;
    metadata: Record<string, any>;
}
```

### Main Pipeline

```typescript
// src/ingestion/pipeline.ts
import { createHash } from 'crypto';
import { Document, Chunk, ChunkingOptions } from '../types';
import { parseDocument } from './parsers';
import { chunkText } from './chunkers/recursive';
import { embedBatch } from '../embedding/batch';
import { query, transaction } from '../db/client';

const DEFAULT_CHUNKING_OPTIONS: ChunkingOptions = {
    chunkSize: 512,        // ~512 tokens per chunk
    chunkOverlap: 50,      // ~50 token overlap
    minChunkSize: 100,     // Minimum 100 tokens
};

export async function ingestDocument(
    input: {
        sourceType: Document['sourceType'];
        sourceId: string;
        sourceUrl?: string;
        title?: string;
        content: string;
        contentType?: string;
        metadata?: Record<string, any>;
        visibility?: Document['visibility'];
        allowedGroups?: string[];
        sourceCreatedAt?: Date;
        sourceUpdatedAt?: Date;
    },
    options: Partial<ChunkingOptions> = {}
): Promise<{ documentId: string; chunkCount: number }> {
    const chunkingOptions = { ...DEFAULT_CHUNKING_OPTIONS, ...options };
    
    // 1. Parse document based on content type
    const parsed = await parseDocument(input.content, input.contentType);
    
    // 2. Compute content hash for deduplication
    const contentHash = createHash('sha256')
        .update(parsed.content)
        .digest('hex');
    
    // 3. Check if document already exists with same hash
    const existing = await query<{ id: string }>(
        `SELECT id FROM documents 
         WHERE source_type = $1 AND source_id = $2`,
        [input.sourceType, input.sourceId]
    );
    
    if (existing.length > 0) {
        const existingDoc = await query<{ content_hash: string }>(
            `SELECT content_hash FROM documents WHERE id = $1`,
            [existing[0].id]
        );
        
        if (existingDoc[0]?.content_hash === contentHash) {
            // Document unchanged, skip re-indexing
            return { documentId: existing[0].id, chunkCount: 0 };
        }
        
        // Document changed, delete old chunks and re-index
        await query(`DELETE FROM chunks WHERE document_id = $1`, [existing[0].id]);
    }
    
    // 4. Chunk the document
    const textChunks = await chunkText(parsed.content, chunkingOptions);
    
    if (textChunks.length === 0) {
        throw new Error('Document produced no chunks');
    }
    
    // 5. Generate embeddings for all chunks
    const embeddings = await embedBatch(textChunks.map(c => c.content));
    
    // 6. Store document and chunks in a transaction
    const result = await transaction(async (client) => {
        // Upsert document
        const docResult = await client.query(
            `INSERT INTO documents (
                source_type, source_id, source_url, title, content_type,
                content_hash, metadata, visibility, allowed_groups,
                source_created_at, source_updated_at, indexed_at
            ) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, NOW())
            ON CONFLICT (source_type, source_id) DO UPDATE SET
                source_url = EXCLUDED.source_url,
                title = EXCLUDED.title,
                content_hash = EXCLUDED.content_hash,
                metadata = EXCLUDED.metadata,
                visibility = EXCLUDED.visibility,
                allowed_groups = EXCLUDED.allowed_groups,
                source_updated_at = EXCLUDED.source_updated_at,
                indexed_at = NOW()
            RETURNING id`,
            [
                input.sourceType,
                input.sourceId,
                input.sourceUrl,
                input.title || parsed.metadata.title,
                input.contentType,
                contentHash,
                JSON.stringify({ ...parsed.metadata, ...input.metadata }),
                input.visibility || 'internal',
                input.allowedGroups,
                input.sourceCreatedAt,
                input.sourceUpdatedAt,
            ]
        );
        
        const documentId = docResult.rows[0].id;
        
        // Insert chunks
        for (let i = 0; i < textChunks.length; i++) {
            const chunk = textChunks[i];
            await client.query(
                `INSERT INTO chunks (
                    document_id, content, content_tokens, chunk_index,
                    start_char, end_char, embedding, source_type, metadata
                ) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9)`,
                [
                    documentId,
                    chunk.content,
                    chunk.tokens,
                    i,
                    chunk.startChar,
                    chunk.endChar,
                    JSON.stringify(embeddings[i]),
                    input.sourceType,
                    JSON.stringify(input.metadata || {}),
                ]
            );
        }
        
        return { documentId, chunkCount: textChunks.length };
    });
    
    return result;
}

export async function ingestDocuments(
    documents: Parameters<typeof ingestDocument>[0][],
    options?: Partial<ChunkingOptions>
): Promise<{ success: number; failed: number; errors: Error[] }> {
    const errors: Error[] = [];
    let success = 0;
    let failed = 0;
    
    // Process in batches to avoid overwhelming the embedding API
    const batchSize = 10;
    for (let i = 0; i < documents.length; i += batchSize) {
        const batch = documents.slice(i, i + batchSize);
        
        await Promise.all(
            batch.map(async (doc) => {
                try {
                    await ingestDocument(doc, options);
                    success++;
                } catch (error) {
                    failed++;
                    errors.push(error as Error);
                    console.error(`Failed to ingest ${doc.sourceId}:`, error);
                }
            })
        );
    }
    
    return { success, failed, errors };
}
```

### Document Parsers

```typescript
// src/ingestion/parsers/index.ts
import { ParsedDocument } from '../../types';
import { parsePdf } from './pdf';
import { parseDocx } from './docx';
import { parseMarkdown } from './markdown';

export async function parseDocument(
    content: string | Buffer,
    contentType?: string
): Promise<ParsedDocument> {
    // Detect content type from content or use provided type
    const type = contentType || detectContentType(content);
    
    switch (type) {
        case 'application/pdf':
            return parsePdf(content as Buffer);
        case 'application/vnd.openxmlformats-officedocument.wordprocessingml.document':
            return parseDocx(content as Buffer);
        case 'text/markdown':
            return parseMarkdown(content as string);
        case 'text/plain':
        default:
            return {
                content: content.toString(),
                metadata: {},
            };
    }
}

function detectContentType(content: string | Buffer): string {
    if (Buffer.isBuffer(content)) {
        // Check magic bytes
        if (content[0] === 0x25 && content[1] === 0x50) return 'application/pdf';
        if (content[0] === 0x50 && content[1] === 0x4B) {
            // Could be docx, xlsx, etc. (ZIP-based)
            return 'application/vnd.openxmlformats-officedocument.wordprocessingml.document';
        }
    }
    
    const str = content.toString().slice(0, 1000);
    if (str.includes('# ') || str.includes('## ')) return 'text/markdown';
    
    return 'text/plain';
}
```

```typescript
// src/ingestion/parsers/pdf.ts
import pdfParse from 'pdf-parse';
import { ParsedDocument } from '../../types';

export async function parsePdf(buffer: Buffer): Promise<ParsedDocument> {
    const data = await pdfParse(buffer);
    
    return {
        content: data.text,
        metadata: {
            pageCount: data.numpages,
            title: data.info?.Title,
            author: data.info?.Author,
            createdAt: data.info?.CreationDate,
        },
    };
}
```

```typescript
// src/ingestion/parsers/docx.ts
import mammoth from 'mammoth';
import { ParsedDocument } from '../../types';

export async function parseDocx(buffer: Buffer): Promise<ParsedDocument> {
    const result = await mammoth.extractRawText({ buffer });
    
    return {
        content: result.value,
        metadata: {
            warnings: result.messages,
        },
    };
}
```

```typescript
// src/ingestion/parsers/markdown.ts
import { marked } from 'marked';
import { ParsedDocument } from '../../types';

export async function parseMarkdown(content: string): Promise<ParsedDocument> {
    // Extract title from first heading
    const titleMatch = content.match(/^#\s+(.+)$/m);
    
    // Convert to plain text by stripping markdown
    const plainText = content
        .replace(/^#+\s+/gm, '')           // Remove heading markers
        .replace(/\*\*(.+?)\*\*/g, '$1')   // Remove bold
        .replace(/\*(.+?)\*/g, '$1')       // Remove italic
        .replace(/\[(.+?)\]\(.+?\)/g, '$1') // Convert links to text
        .replace(/`{1,3}[^`]*`{1,3}/g, '') // Remove code blocks
        .replace(/^\s*[-*+]\s+/gm, '• ')   // Normalize list markers
        .trim();
    
    return {
        content: plainText,
        metadata: {
            title: titleMatch?.[1],
            originalFormat: 'markdown',
        },
    };
}
```

### Chunking Strategies

```typescript
// src/ingestion/chunkers/recursive.ts
import { encoding_for_model } from 'tiktoken';
import { ChunkingOptions } from '../../types';

interface TextChunk {
    content: string;
    tokens: number;
    startChar: number;
    endChar: number;
}

// Initialize tokenizer once
const encoder = encoding_for_model('gpt-4');

export function countTokens(text: string): number {
    return encoder.encode(text).length;
}

export async function chunkText(
    text: string,
    options: ChunkingOptions
): Promise<TextChunk[]> {
    const { chunkSize, chunkOverlap, minChunkSize } = options;
    
    // Separators in order of preference (try to split at natural boundaries)
    const separators = [
        '\n\n\n',   // Multiple blank lines (section breaks)
        '\n\n',     // Paragraph breaks
        '\n',       // Line breaks
        '. ',       // Sentence endings
        '? ',
        '! ',
        '; ',
        ', ',
        ' ',        // Words
        '',         // Characters (last resort)
    ];
    
    const chunks: TextChunk[] = [];
    
    function splitRecursively(
        text: string,
        startOffset: number,
        separatorIndex: number
    ): void {
        if (!text.trim()) return;
        
        const tokens = countTokens(text);
        
        // If text fits in chunk, add it
        if (tokens <= chunkSize) {
            if (tokens >= minChunkSize) {
                chunks.push({
                    content: text.trim(),
                    tokens,
                    startChar: startOffset,
                    endChar: startOffset + text.length,
                });
            }
            return;
        }
        
        // Find a separator that works
        let separator = separators[separatorIndex];
        while (separatorIndex < separators.length && !text.includes(separator)) {
            separatorIndex++;
            separator = separators[separatorIndex];
        }
        
        if (separatorIndex >= separators.length) {
            // No separator found, force split by characters
            const midpoint = Math.floor(text.length / 2);
            splitRecursively(text.slice(0, midpoint), startOffset, 0);
            splitRecursively(text.slice(midpoint), startOffset + midpoint, 0);
            return;
        }
        
        // Split by separator and try to combine into chunk-sized pieces
        const parts = text.split(separator);
        let currentChunk = '';
        let currentStart = startOffset;
        
        for (let i = 0; i < parts.length; i++) {
            const part = parts[i];
            const potentialChunk = currentChunk 
                ? currentChunk + separator + part 
                : part;
            
            if (countTokens(potentialChunk) <= chunkSize) {
                currentChunk = potentialChunk;
            } else {
                // Current chunk is full
                if (currentChunk) {
                    splitRecursively(currentChunk, currentStart, separatorIndex + 1);
                }
                currentChunk = part;
                currentStart = startOffset + text.indexOf(part);
            }
        }
        
        // Don't forget the last chunk
        if (currentChunk) {
            splitRecursively(currentChunk, currentStart, separatorIndex + 1);
        }
    }
    
    splitRecursively(text, 0, 0);
    
    // Add overlap between chunks
    if (chunkOverlap > 0 && chunks.length > 1) {
        return addOverlap(chunks, text, chunkOverlap);
    }
    
    return chunks;
}

function addOverlap(
    chunks: TextChunk[],
    originalText: string,
    overlapTokens: number
): TextChunk[] {
    const overlappedChunks: TextChunk[] = [];
    
    for (let i = 0; i < chunks.length; i++) {
        const chunk = chunks[i];
        let content = chunk.content;
        let startChar = chunk.startChar;
        
        // Add overlap from previous chunk
        if (i > 0) {
            const prevChunk = chunks[i - 1];
            const prevText = prevChunk.content;
            
            // Find overlap amount (approximate by characters)
            const overlapChars = Math.floor(
                (overlapTokens / prevChunk.tokens) * prevText.length
            );
            const overlapText = prevText.slice(-overlapChars);
            
            content = overlapText + ' ' + content;
            startChar = prevChunk.endChar - overlapChars;
        }
        
        overlappedChunks.push({
            content: content.trim(),
            tokens: countTokens(content),
            startChar,
            endChar: chunk.endChar,
        });
    }
    
    return overlappedChunks;
}
```

### Semantic Chunking (Optional, Higher Quality)

```typescript
// src/ingestion/chunkers/semantic.ts
import { embedBatch } from '../../embedding/batch';
import { ChunkingOptions } from '../../types';
import { countTokens } from './recursive';

interface SemanticChunk {
    content: string;
    tokens: number;
    startChar: number;
    endChar: number;
}

/**
 * Semantic chunking splits documents at natural topic boundaries
 * by detecting shifts in embedding similarity between sentences.
 * 
 * More expensive (requires embeddings during chunking) but produces
 * more coherent chunks.
 */
export async function semanticChunk(
    text: string,
    options: ChunkingOptions & { similarityThreshold?: number }
): Promise<SemanticChunk[]> {
    const { chunkSize, similarityThreshold = 0.8 } = options;
    
    // 1. Split into sentences
    const sentences = text.match(/[^.!?]+[.!?]+/g) || [text];
    
    if (sentences.length <= 1) {
        return [{
            content: text,
            tokens: countTokens(text),
            startChar: 0,
            endChar: text.length,
        }];
    }
    
    // 2. Embed all sentences
    const embeddings = await embedBatch(sentences);
    
    // 3. Find breakpoints where similarity drops
    const breakpoints: number[] = [0];
    
    for (let i = 1; i < sentences.length; i++) {
        const similarity = cosineSimilarity(embeddings[i - 1], embeddings[i]);
        
        if (similarity < similarityThreshold) {
            breakpoints.push(i);
        }
    }
    breakpoints.push(sentences.length);
    
    // 4. Create chunks from breakpoints
    const chunks: SemanticChunk[] = [];
    let charOffset = 0;
    
    for (let i = 0; i < breakpoints.length - 1; i++) {
        const start = breakpoints[i];
        const end = breakpoints[i + 1];
        const content = sentences.slice(start, end).join(' ').trim();
        const tokens = countTokens(content);
        
        // If chunk is too large, fall back to recursive splitting
        if (tokens > chunkSize * 1.5) {
            // Import and use recursive chunker here
            // For simplicity, just add as-is (you'd want to sub-chunk)
        }
        
        const startChar = charOffset;
        const endChar = charOffset + content.length;
        
        chunks.push({ content, tokens, startChar, endChar });
        charOffset = endChar;
    }
    
    return chunks;
}

function cosineSimilarity(a: number[], b: number[]): number {
    let dotProduct = 0;
    let normA = 0;
    let normB = 0;
    
    for (let i = 0; i < a.length; i++) {
        dotProduct += a[i] * b[i];
        normA += a[i] * a[i];
        normB += b[i] * b[i];
    }
    
    return dotProduct / (Math.sqrt(normA) * Math.sqrt(normB));
}
```

---

## Discord Ingestion

### Setup

1. Create a Discord application at https://discord.com/developers/applications
2. Create a Bot for your application
3. Enable the following Privileged Gateway Intents:
   - `MESSAGE_CONTENT` - Read message content
   - `GUILD_MESSAGES` - Receive message events
   - `GUILD_MEMBERS` - Get member info for message attribution (optional)
4. Generate an invite URL with these permissions:
   - `Read Message History` - Read historical messages
   - `View Channels` - See available channels
5. Invite the bot to your server and get the Bot Token

### Discord Ingestion Service

```typescript
// src/ingestion/sources/discord.ts
import { Client, GatewayIntentBits, TextChannel, Message, Snowflake } from 'discord.js';
import { query } from '../../db/client';
import { ingestDocument } from '../pipeline';

interface DiscordConfig {
    token: string;
    guildId: string;                  // Server ID to sync
    includedChannels?: string[];      // If empty, sync all text channels
    excludedChannels?: string[];
    includeThreads?: boolean;
    includeReactions?: boolean;
}

interface DiscordMessage {
    id: string;
    author: { id: string; username: string; globalName?: string };
    content: string;
    createdTimestamp: number;
    hasThread?: boolean;
    thread?: { id: string };
    reactions?: Array<{ emoji: { name: string }; count: number }>;
}

export class DiscordIngestionService {
    private client: Client;
    private config: DiscordConfig;
    private userCache: Map<string, string> = new Map();
    private ready: Promise<void>;

    constructor(config: DiscordConfig) {
        this.client = new Client({
            intents: [
                GatewayIntentBits.Guilds,
                GatewayIntentBits.GuildMessages,
                GatewayIntentBits.MessageContent,
                GatewayIntentBits.GuildMembers,
            ],
        });
        this.config = {
            includeThreads: true,
            includeReactions: false,
            ...config,
        };

        // Initialize connection
        this.ready = new Promise((resolve, reject) => {
            this.client.once('ready', () => resolve());
            this.client.once('error', reject);
            this.client.login(config.token);
        });
    }

    /**
     * Sync all configured channels
     */
    async syncAll(): Promise<{ channelsSynced: number; messagesSynced: number }> {
        await this.ready;
        const channels = await this.getChannelsToSync();
        let totalMessages = 0;

        for (const channel of channels) {
            const count = await this.syncChannel(channel.id, channel.name);
            totalMessages += count;
        }

        return {
            channelsSynced: channels.length,
            messagesSynced: totalMessages
        };
    }

    /**
     * Sync a single channel incrementally
     */
    async syncChannel(channelId: string, channelName?: string): Promise<number> {
        await this.ready;

        const channel = await this.client.channels.fetch(channelId);
        if (!channel || !(channel instanceof TextChannel)) {
            console.warn(`Channel ${channelId} is not a text channel`);
            return 0;
        }

        // Get last sync state
        const syncState = await query<{ last_sync_cursor: string }>(
            `SELECT last_sync_cursor FROM sync_state
             WHERE source_type = 'discord' AND source_identifier = $1`,
            [channelId]
        );

        const afterMessageId = syncState[0]?.last_sync_cursor;
        let messageCount = 0;
        let lastMessageId: string | undefined;
        let latestMessageId: string | undefined;

        // Fetch messages in batches (Discord API limit is 100 per request)
        do {
            const options: { limit: number; after?: Snowflake } = { limit: 100 };
            if (afterMessageId && !lastMessageId) {
                options.after = afterMessageId;
            } else if (lastMessageId) {
                options.after = lastMessageId;
            }

            const messages = await channel.messages.fetch(options);

            if (messages.size === 0) break;

            // Track the newest message ID for sync state
            if (!latestMessageId) {
                latestMessageId = messages.first()?.id;
            }

            // Process messages
            for (const [, message] of messages) {
                if (message.author.bot) continue;  // Skip bot messages
                if (message.system) continue;       // Skip system messages

                await this.processMessage(message, channelId, channelName || channel.name);
                messageCount++;
                lastMessageId = message.id;
            }

            // If we got less than 100, we've reached the end
            if (messages.size < 100) break;

        } while (true);

        // Update sync state
        if (latestMessageId) {
            await query(
                `INSERT INTO sync_state (source_type, source_identifier, last_sync_at, last_sync_cursor)
                 VALUES ('discord', $1, NOW(), $2)
                 ON CONFLICT (source_type, source_identifier) DO UPDATE SET
                     last_sync_at = NOW(),
                     last_sync_cursor = $2`,
                [channelId, latestMessageId]
            );
        }

        return messageCount;
    }

    /**
     * Process a single message (with thread if applicable)
     */
    private async processMessage(
        message: Message,
        channelId: string,
        channelName: string
    ): Promise<void> {
        let content = message.content || '';
        const metadata: Record<string, any> = {
            channel: channelName,
            channelId,
            user: this.getUserDisplayName(message.author),
            userId: message.author.id,
            timestamp: message.id,
            hasThread: message.hasThread,
        };

        // Fetch and include thread replies
        if (this.config.includeThreads && message.hasThread && message.thread) {
            const threadContent = await this.fetchThread(message.thread.id);
            content = `${content}\n\nThread replies:\n${threadContent}`;
        }

        // Include reactions as context
        if (this.config.includeReactions && message.reactions.cache.size > 0) {
            const reactions = message.reactions.cache
                .map(r => `${r.emoji.name} (${r.count})`)
                .join(', ');
            metadata.reactions = reactions;
        }

        // Handle attachments
        if (message.attachments.size > 0) {
            const attachmentList = message.attachments
                .map(a => `[Attachment: ${a.name}]`)
                .join(' ');
            content = `${content}\n${attachmentList}`;
        }

        // Skip very short messages
        if (content.length < 20) return;

        const guildId = this.config.guildId;

        await ingestDocument({
            sourceType: 'discord',
            sourceId: `${channelId}:${message.id}`,
            sourceUrl: `https://discord.com/channels/${guildId}/${channelId}/${message.id}`,
            title: `#${channelName} - ${new Date(message.createdTimestamp).toLocaleDateString()}`,
            content,
            metadata,
            visibility: 'internal',
            sourceCreatedAt: new Date(message.createdTimestamp),
        });
    }

    /**
     * Fetch all messages in a thread
     */
    private async fetchThread(threadId: string): Promise<string> {
        const thread = await this.client.channels.fetch(threadId);
        if (!thread || !thread.isThread()) return '';

        const messages = await thread.messages.fetch({ limit: 100 });

        const replies = messages
            .filter(m => !m.author.bot)
            .map(m => `${this.getUserDisplayName(m.author)}: ${m.content}`)
            .reverse()  // Oldest first
            .join('\n');

        return replies;
    }

    /**
     * Get channels to sync based on config
     */
    private async getChannelsToSync(): Promise<Array<{ id: string; name: string }>> {
        await this.ready;

        const guild = await this.client.guilds.fetch(this.config.guildId);
        const channels = await guild.channels.fetch();

        let textChannels = channels
            .filter(c => c instanceof TextChannel)
            .map(c => ({ id: c!.id, name: (c as TextChannel).name }));

        // Filter by included channels
        if (this.config.includedChannels?.length) {
            textChannels = textChannels.filter(c =>
                this.config.includedChannels!.includes(c.id) ||
                this.config.includedChannels!.includes(c.name)
            );
        }

        // Filter out excluded channels
        if (this.config.excludedChannels?.length) {
            textChannels = textChannels.filter(c =>
                !this.config.excludedChannels!.includes(c.id) &&
                !this.config.excludedChannels!.includes(c.name)
            );
        }

        return textChannels;
    }

    /**
     * Get display name for a user
     */
    private getUserDisplayName(author: { id: string; username: string; globalName?: string | null }): string {
        return author.globalName || author.username;
    }

    /**
     * Cleanup - disconnect from Discord
     */
    async destroy(): Promise<void> {
        await this.client.destroy();
    }
}

// Event handler for real-time updates
export function createDiscordEventHandler(ingestionService: DiscordIngestionService) {
    return async (message: Message) => {
        if (message.author.bot) return;
        if (message.system) return;

        // Process the message immediately for real-time ingestion
        await ingestionService.syncChannel(message.channelId, message.channel.isTextBased() ? (message.channel as TextChannel).name : undefined);
    };
}
```

### Discord Bot (Query Interface)

The Discord bot allows users to query the knowledge base by @mentioning the bot.

```typescript
// src/bot/discord_bot.ts
import { Client, GatewayIntentBits, Message } from "discord.js";
import { hybridSearch } from "../retrieval/hybrid";
import { generateResponse } from "../generation/response";

export class DiscordKBBot {
    private client: Client;
    private config: { token: string; guildId?: string };
    private ready: Promise<void>;

    constructor(config: { token: string; guildId?: string }) {
        this.client = new Client({
            intents: [
                GatewayIntentBits.Guilds,
                GatewayIntentBits.GuildMessages,
                GatewayIntentBits.MessageContent,
                GatewayIntentBits.DirectMessages,
            ],
        });
        this.config = config;

        // Set up message handler
        this.client.on("messageCreate", (message) => this.handleMessage(message));

        this.ready = new Promise((resolve, reject) => {
            this.client.once("ready", () => resolve());
            this.client.once("error", reject);
        });
    }

    async start(): Promise<void> {
        await this.client.login(this.config.token);
        await this.ready;
    }

    async stop(): Promise<void> {
        await this.client.destroy();
    }

    private async handleMessage(message: Message): Promise<void> {
        // Ignore bots
        if (message.author.bot) return;

        // Check if bot was mentioned
        if (!this.client.user || !message.mentions.has(this.client.user)) return;

        // Extract question (remove mentions)
        const question = message.content.replace(/<@!?\d+>/g, "").trim();
        if (!question) {
            await message.reply("Please ask a question!");
            return;
        }

        try {
            await message.channel.sendTyping();

            // Search and generate response
            const results = await hybridSearch(question, { limit: 5 });
            if (results.length === 0) {
                await message.reply("I couldn't find relevant information.");
                return;
            }

            const response = await generateResponse(question, results);
            let reply = response.answer;

            // Add sources
            const sources = results
                .filter(r => r.sourceUrl)
                .slice(0, 3)
                .map(r => r.title || r.sourceUrl);
            if (sources.length > 0) {
                reply += "\n\n**Sources:**\n" + sources.map(s => `• ${s}`).join("\n");
            }

            // Truncate if needed (Discord 2000 char limit)
            if (reply.length > 2000) {
                reply = reply.substring(0, 1950) + "...\n\n*(truncated)*";
            }

            await message.reply(reply);
        } catch (error) {
            await message.reply("Sorry, an error occurred. Please try again.");
        }
    }
}

// Start the bot
export async function startDiscordBot(): Promise<DiscordKBBot> {
    const token = process.env.DISCORD_BOT_TOKEN;
    const guildId = process.env.DISCORD_GUILD_ID;

    if (!token) throw new Error("DISCORD_BOT_TOKEN required");

    const bot = new DiscordKBBot({ token, guildId });
    await bot.start();
    return bot;
}
```

**Usage:**
```bash
# Run the bot
yarn bot

# Or with ts-node directly
npx ts-node src/bot/discord_bot.ts
```

Users can then query the knowledge base by mentioning the bot:
```
@KBBot What is the deployment process?
@KBBot How do I request PTO?
```

---

## Embedding Service

### Main Service

```typescript
// src/embedding/service.ts
export interface EmbeddingService {
    embed(texts: string[]): Promise<number[][]>;
    dimension: number;
}

// Factory function to create embedding service
export function createEmbeddingService(
    provider: 'openai' | 'cohere' = 'openai'
): EmbeddingService {
    switch (provider) {
        case 'openai':
            return new OpenAIEmbeddingService();
        default:
            throw new Error(`Unknown embedding provider: ${provider}`);
    }
}
```

### OpenAI Implementation

```typescript
// src/embedding/openai.ts
import OpenAI from 'openai';
import { EmbeddingService } from './service';

export class OpenAIEmbeddingService implements EmbeddingService {
    private client: OpenAI;
    readonly dimension = 1536;  // text-embedding-3-small dimension
    
    constructor() {
        this.client = new OpenAI();
    }
    
    async embed(texts: string[]): Promise<number[][]> {
        // OpenAI has a limit of 8191 tokens per text and 2048 texts per batch
        const response = await this.client.embeddings.create({
            model: 'text-embedding-3-small',
            input: texts,
        });
        
        // Sort by index to maintain order (API doesn't guarantee order)
        const sorted = response.data.sort((a, b) => a.index - b.index);
        return sorted.map(d => d.embedding);
    }
}
```

### Batch Embedding with Rate Limiting

```typescript
// src/embedding/batch.ts
import PQueue from 'p-queue';
import { createEmbeddingService, EmbeddingService } from './service';

let embeddingService: EmbeddingService | null = null;
const queue = new PQueue({ 
    concurrency: 5,           // Max concurrent requests
    intervalCap: 50,          // Max requests per interval
    interval: 60 * 1000,      // 1 minute interval (adjust based on tier)
});

function getService(): EmbeddingService {
    if (!embeddingService) {
        embeddingService = createEmbeddingService('openai');
    }
    return embeddingService;
}

/**
 * Embed texts with automatic batching and rate limiting
 */
export async function embedBatch(
    texts: string[],
    batchSize: number = 100
): Promise<number[][]> {
    const service = getService();
    const allEmbeddings: number[][] = [];
    
    // Process in batches
    for (let i = 0; i < texts.length; i += batchSize) {
        const batch = texts.slice(i, i + batchSize);
        
        const embeddings = await queue.add(async () => {
            return service.embed(batch);
        });
        
        if (embeddings) {
            allEmbeddings.push(...embeddings);
        }
    }
    
    return allEmbeddings;
}

/**
 * Embed a single text (convenience function)
 */
export async function embedSingle(text: string): Promise<number[]> {
    const [embedding] = await embedBatch([text], 1);
    return embedding;
}
```

---

## Vector Search & Retrieval

### Search Types

```typescript
// src/retrieval/types.ts
export interface SearchOptions {
    limit?: number;
    minSimilarity?: number;
    sourceTypes?: string[];
    metadata?: Record<string, any>;
    dateRange?: {
        start?: Date;
        end?: Date;
    };
    userGroups?: string[];  // For access control
}

export interface SearchResult {
    chunkId: string;
    documentId: string;
    content: string;
    similarity: number;
    sourceType: string;
    sourceUrl?: string;
    title?: string;
    metadata: Record<string, any>;
}
```

### Vector Search

```typescript
// src/retrieval/vector.ts
import { query } from '../db/client';
import { embedSingle } from '../embedding/batch';
import { SearchOptions, SearchResult } from './types';

export async function vectorSearch(
    queryText: string,
    options: SearchOptions = {}
): Promise<SearchResult[]> {
    const {
        limit = 10,
        minSimilarity = 0.5,
        sourceTypes,
        metadata,
        dateRange,
        userGroups,
    } = options;
    
    // 1. Embed the query
    const queryEmbedding = await embedSingle(queryText);
    
    // 2. Build the SQL query with filters
    const conditions: string[] = [];
    const params: any[] = [JSON.stringify(queryEmbedding), limit];
    let paramIndex = 3;
    
    if (minSimilarity > 0) {
        conditions.push(`1 - (c.embedding <=> $1::vector) >= ${minSimilarity}`);
    }
    
    if (sourceTypes?.length) {
        conditions.push(`c.source_type = ANY($${paramIndex})`);
        params.push(sourceTypes);
        paramIndex++;
    }
    
    if (metadata) {
        conditions.push(`c.metadata @> $${paramIndex}`);
        params.push(JSON.stringify(metadata));
        paramIndex++;
    }
    
    if (dateRange?.start) {
        conditions.push(`d.source_created_at >= $${paramIndex}`);
        params.push(dateRange.start);
        paramIndex++;
    }
    
    if (dateRange?.end) {
        conditions.push(`d.source_created_at <= $${paramIndex}`);
        params.push(dateRange.end);
        paramIndex++;
    }
    
    // Access control
    if (userGroups?.length) {
        conditions.push(`(
            d.visibility = 'public' OR
            (d.visibility = 'internal') OR
            (d.visibility = 'restricted' AND d.allowed_groups && $${paramIndex})
        )`);
        params.push(userGroups);
        paramIndex++;
    } else {
        conditions.push(`d.visibility = 'public'`);
    }
    
    const whereClause = conditions.length 
        ? `WHERE ${conditions.join(' AND ')}` 
        : '';
    
    // 3. Execute search
    const results = await query<SearchResult>(
        `SELECT 
            c.id as "chunkId",
            c.document_id as "documentId",
            c.content,
            1 - (c.embedding <=> $1::vector) as similarity,
            c.source_type as "sourceType",
            d.source_url as "sourceUrl",
            d.title,
            c.metadata
        FROM chunks c
        JOIN documents d ON c.document_id = d.id
        ${whereClause}
        ORDER BY c.embedding <=> $1::vector
        LIMIT $2`,
        params
    );
    
    return results;
}
```

### Keyword Search (BM25-like)

```typescript
// src/retrieval/keyword.ts
import { query } from '../db/client';
import { SearchOptions, SearchResult } from './types';

/**
 * Keyword search using PostgreSQL full-text search with trigram similarity
 */
export async function keywordSearch(
    queryText: string,
    options: SearchOptions = {}
): Promise<SearchResult[]> {
    const { limit = 10, sourceTypes, metadata } = options;
    
    const conditions: string[] = [];
    const params: any[] = [queryText, limit];
    let paramIndex = 3;
    
    if (sourceTypes?.length) {
        conditions.push(`c.source_type = ANY($${paramIndex})`);
        params.push(sourceTypes);
        paramIndex++;
    }
    
    if (metadata) {
        conditions.push(`c.metadata @> $${paramIndex}`);
        params.push(JSON.stringify(metadata));
        paramIndex++;
    }
    
    const whereClause = conditions.length 
        ? `AND ${conditions.join(' AND ')}` 
        : '';
    
    // Use trigram similarity for fuzzy matching
    const results = await query<SearchResult>(
        `SELECT 
            c.id as "chunkId",
            c.document_id as "documentId",
            c.content,
            similarity(c.content, $1) as similarity,
            c.source_type as "sourceType",
            d.source_url as "sourceUrl",
            d.title,
            c.metadata
        FROM chunks c
        JOIN documents d ON c.document_id = d.id
        WHERE c.content % $1 ${whereClause}
        ORDER BY similarity(c.content, $1) DESC
        LIMIT $2`,
        params
    );
    
    return results;
}
```

### Hybrid Search

```typescript
// src/retrieval/hybrid.ts
import { vectorSearch } from './vector';
import { keywordSearch } from './keyword';
import { SearchOptions, SearchResult } from './types';

interface HybridSearchOptions extends SearchOptions {
    vectorWeight?: number;   // Weight for vector results (0-1)
    keywordWeight?: number;  // Weight for keyword results (0-1)
}

/**
 * Combine vector and keyword search using Reciprocal Rank Fusion (RRF)
 */
export async function hybridSearch(
    queryText: string,
    options: HybridSearchOptions = {}
): Promise<SearchResult[]> {
    const {
        limit = 10,
        vectorWeight = 0.7,
        keywordWeight = 0.3,
        ...searchOptions
    } = options;
    
    // Run both searches in parallel
    const [vectorResults, keywordResults] = await Promise.all([
        vectorSearch(queryText, { ...searchOptions, limit: limit * 2 }),
        keywordSearch(queryText, { ...searchOptions, limit: limit * 2 }),
    ]);
    
    // Compute RRF scores
    const k = 60;  // RRF constant (standard value)
    const scores = new Map<string, { result: SearchResult; score: number }>();
    
    // Add vector search scores
    vectorResults.forEach((result, rank) => {
        const rrfScore = vectorWeight / (k + rank + 1);
        scores.set(result.chunkId, { result, score: rrfScore });
    });
    
    // Add keyword search scores
    keywordResults.forEach((result, rank) => {
        const rrfScore = keywordWeight / (k + rank + 1);
        const existing = scores.get(result.chunkId);
        
        if (existing) {
            existing.score += rrfScore;
        } else {
            scores.set(result.chunkId, { result, score: rrfScore });
        }
    });
    
    // Sort by combined score and return top results
    return Array.from(scores.values())
        .sort((a, b) => b.score - a.score)
        .slice(0, limit)
        .map(({ result, score }) => ({
            ...result,
            similarity: score,  // Replace with combined score
        }));
}
```

---

## Query Enhancement

### Query Expansion & Rewriting

```typescript
// src/retrieval/queryEnhancer.ts
import Anthropic from '@anthropic-ai/sdk';

const anthropic = new Anthropic();

interface EnhancedQuery {
    original: string;
    rewritten: string;
    subQueries?: string[];
    filters?: {
        sourceTypes?: string[];
        dateRange?: { start?: string; end?: string };
    };
}

/**
 * Use an LLM to enhance the query for better retrieval
 */
export async function enhanceQuery(query: string): Promise<EnhancedQuery> {
    const response = await anthropic.messages.create({
        model: 'claude-sonnet-4-20250514',
        max_tokens: 500,
        messages: [{
            role: 'user',
            content: `You are a query enhancement system for a company knowledge base search.

Given the user's question, output a JSON object with:
1. "rewritten": A clearer, more searchable version of the query
2. "subQueries": 2-3 alternative phrasings or related queries (optional)
3. "filters": Suggested filters like sourceTypes (["discord", "file"]) or dateRange (optional)

User query: "${query}"

Output only valid JSON, no explanation.`
        }],
    });
    
    try {
        const content = response.content[0];
        if (content.type !== 'text') {
            throw new Error('Unexpected response type');
        }
        
        const enhanced = JSON.parse(content.text);
        return {
            original: query,
            rewritten: enhanced.rewritten || query,
            subQueries: enhanced.subQueries,
            filters: enhanced.filters,
        };
    } catch {
        // Fall back to original query if parsing fails
        return {
            original: query,
            rewritten: query,
        };
    }
}

/**
 * Hypothetical Document Embedding (HyDE)
 * Generate a hypothetical answer and use it for retrieval
 */
export async function generateHypotheticalAnswer(query: string): Promise<string> {
    const response = await anthropic.messages.create({
        model: 'claude-sonnet-4-20250514',
        max_tokens: 300,
        messages: [{
            role: 'user',
            content: `Write a short, factual paragraph that would answer this question. Write as if you're confident in the answer, even if you're not sure. Be specific and use technical terms if relevant.

Question: ${query}

Answer:`
        }],
    });
    
    const content = response.content[0];
    if (content.type !== 'text') {
        return query;
    }
    return content.text;
}
```

---

## Re-ranking

### Cohere Re-ranker

```typescript
// src/retrieval/rerank.ts
import { SearchResult } from './types';

interface RerankOptions {
    topN?: number;
    model?: string;
}

/**
 * Re-rank results using Cohere's re-ranking model
 * Requires COHERE_API_KEY environment variable
 */
export async function rerankWithCohere(
    query: string,
    results: SearchResult[],
    options: RerankOptions = {}
): Promise<SearchResult[]> {
    const { topN = 5, model = 'rerank-english-v3.0' } = options;
    
    if (results.length === 0) return [];
    
    const response = await fetch('https://api.cohere.ai/v1/rerank', {
        method: 'POST',
        headers: {
            'Authorization': `Bearer ${process.env.COHERE_API_KEY}`,
            'Content-Type': 'application/json',
        },
        body: JSON.stringify({
            model,
            query,
            documents: results.map(r => r.content),
            top_n: topN,
            return_documents: false,
        }),
    });
    
    if (!response.ok) {
        console.error('Cohere rerank failed:', await response.text());
        return results.slice(0, topN);  // Fall back to original order
    }
    
    const data = await response.json();
    
    // Reorder results based on rerank scores
    return data.results.map((r: { index: number; relevance_score: number }) => ({
        ...results[r.index],
        similarity: r.relevance_score,
    }));
}

/**
 * Simple cross-encoder re-ranking using Claude
 * More expensive but doesn't require additional API
 */
export async function rerankWithLLM(
    query: string,
    results: SearchResult[],
    topN: number = 5
): Promise<SearchResult[]> {
    if (results.length <= topN) return results;
    
    const Anthropic = (await import('@anthropic-ai/sdk')).default;
    const anthropic = new Anthropic();
    
    const documentsText = results
        .map((r, i) => `[${i}] ${r.content.slice(0, 500)}`)
        .join('\n\n');
    
    const response = await anthropic.messages.create({
        model: 'claude-sonnet-4-20250514',
        max_tokens: 100,
        messages: [{
            role: 'user',
            content: `Given the query and documents below, return the indices of the ${topN} most relevant documents in order of relevance.

Query: ${query}

Documents:
${documentsText}

Output only a JSON array of indices, e.g., [2, 0, 4, 1, 3]`
        }],
    });
    
    try {
        const content = response.content[0];
        if (content.type !== 'text') {
            throw new Error('Unexpected response type');
        }
        
        const indices: number[] = JSON.parse(content.text);
        return indices
            .filter(i => i >= 0 && i < results.length)
            .slice(0, topN)
            .map(i => results[i]);
    } catch {
        return results.slice(0, topN);
    }
}
```

---

## API Design

### Main Router

```typescript
// src/api/routes.ts
import { Hono } from 'hono';
import { cors } from 'hono/cors';
import { zValidator } from '@hono/zod-validator';
import { z } from 'zod';
import { queryHandler } from './handlers/query';
import { ingestHandler, ingestDiscordHandler } from './handlers/ingest';
import { authMiddleware } from './middleware/auth';

const app = new Hono();

// Middleware
app.use('*', cors());
app.use('/api/*', authMiddleware);

// Health check
app.get('/health', (c) => c.json({ status: 'ok' }));

// Query endpoint
const querySchema = z.object({
    query: z.string().min(1).max(1000),
    options: z.object({
        limit: z.number().min(1).max(20).optional(),
        sourceTypes: z.array(z.string()).optional(),
        useReranking: z.boolean().optional(),
        includeContext: z.boolean().optional(),
    }).optional(),
});

app.post('/api/query', zValidator('json', querySchema), queryHandler);

// Ingest endpoints
const ingestDocumentSchema = z.object({
    sourceType: z.enum(['file', 'google_drive', 'notion']),
    sourceId: z.string(),
    sourceUrl: z.string().optional(),
    title: z.string().optional(),
    content: z.string(),
    contentType: z.string().optional(),
    metadata: z.record(z.any()).optional(),
});

app.post('/api/ingest/document', zValidator('json', ingestDocumentSchema), ingestHandler);

const ingestDiscordSchema = z.object({
    channels: z.array(z.string()).optional(),  // If empty, sync all
});

app.post('/api/ingest/discord', zValidator('json', ingestDiscordSchema), ingestDiscordHandler);

// Admin endpoints
app.get('/api/admin/stats', async (c) => {
    const { query } = await import('../db/client');
    
    const [docCount] = await query<{ count: string }>(
        'SELECT COUNT(*) as count FROM documents'
    );
    const [chunkCount] = await query<{ count: string }>(
        'SELECT COUNT(*) as count FROM chunks'
    );
    const [sourceBreakdown] = await query<{ breakdown: Record<string, number> }>(
        `SELECT json_object_agg(source_type, count) as breakdown
         FROM (SELECT source_type, COUNT(*) as count FROM documents GROUP BY source_type) t`
    );
    
    return c.json({
        documents: parseInt(docCount.count),
        chunks: parseInt(chunkCount.count),
        bySource: sourceBreakdown.breakdown,
    });
});

export default app;
```

### Query Handler

```typescript
// src/api/handlers/query.ts
import { Context } from 'hono';
import { hybridSearch } from '../../retrieval/hybrid';
import { rerankWithCohere } from '../../retrieval/rerank';
import { generateResponse } from '../../generation/response';
import { query } from '../../db/client';

export async function queryHandler(c: Context) {
    const startTime = Date.now();
    const body = await c.req.json();
    const { query: queryText, options = {} } = body;
    
    // Get user context from auth middleware
    const user = c.get('user');
    
    try {
        // 1. Retrieve relevant chunks
        let results = await hybridSearch(queryText, {
            limit: options.limit || 10,
            sourceTypes: options.sourceTypes,
            userGroups: user?.groups,
        });
        
        const retrievalTime = Date.now() - startTime;
        
        // 2. Re-rank if requested and we have enough results
        if (options.useReranking !== false && results.length > 3) {
            results = await rerankWithCohere(queryText, results, { topN: 5 });
        }
        
        // 3. Generate response
        const generationStart = Date.now();
        const response = await generateResponse(queryText, results, {
            includeContext: options.includeContext,
        });
        const generationTime = Date.now() - generationStart;
        
        // 4. Log query for analytics
        await logQuery({
            queryText,
            chunkIds: results.map(r => r.chunkId),
            response: response.answer,
            retrievalTime,
            generationTime,
            totalTime: Date.now() - startTime,
            userId: user?.id,
        });
        
        return c.json({
            answer: response.answer,
            sources: results.map(r => ({
                title: r.title,
                sourceType: r.sourceType,
                sourceUrl: r.sourceUrl,
                relevance: r.similarity,
                ...(options.includeContext && { excerpt: r.content.slice(0, 200) }),
            })),
            metadata: {
                retrievalTimeMs: retrievalTime,
                generationTimeMs: generationTime,
                totalTimeMs: Date.now() - startTime,
            },
        });
    } catch (error) {
        console.error('Query failed:', error);
        return c.json({ error: 'Query processing failed' }, 500);
    }
}

async function logQuery(data: {
    queryText: string;
    chunkIds: string[];
    response: string;
    retrievalTime: number;
    generationTime: number;
    totalTime: number;
    userId?: string;
}) {
    await query(
        `INSERT INTO query_logs (
            query_text, retrieved_chunk_ids, generated_response,
            retrieval_time_ms, generation_time_ms, total_time_ms, user_id
        ) VALUES ($1, $2, $3, $4, $5, $6, $7)`,
        [
            data.queryText,
            data.chunkIds,
            data.response,
            data.retrievalTime,
            data.generationTime,
            data.totalTime,
            data.userId,
        ]
    );
}
```

### Response Generation

```typescript
// src/generation/response.ts
import Anthropic from '@anthropic-ai/sdk';
import { SearchResult } from '../retrieval/types';

const anthropic = new Anthropic();

interface GenerationOptions {
    includeContext?: boolean;
    maxTokens?: number;
}

interface GeneratedResponse {
    answer: string;
    tokensUsed: number;
}

export async function generateResponse(
    query: string,
    context: SearchResult[],
    options: GenerationOptions = {}
): Promise<GeneratedResponse> {
    const { maxTokens = 1000 } = options;
    
    // Build context string
    const contextText = context
        .map((r, i) => {
            const source = r.title || r.sourceType;
            return `[Source ${i + 1}: ${source}]\n${r.content}`;
        })
        .join('\n\n---\n\n');
    
    const systemPrompt = `You are a helpful assistant that answers questions based on the provided context from internal company documents and Discord messages.

Guidelines:
- Answer based ONLY on the provided context
- If the context doesn't contain relevant information, say so clearly
- Cite your sources by referencing [Source N] when using information from that source
- Be concise but complete
- If information seems outdated or contradictory, mention this
- For Discord messages, be aware they may be informal or incomplete`;

    const response = await anthropic.messages.create({
        model: 'claude-sonnet-4-20250514',
        max_tokens: maxTokens,
        system: systemPrompt,
        messages: [{
            role: 'user',
            content: `Context:
${contextText}

Question: ${query}

Please answer based on the context above.`
        }],
    });
    
    const content = response.content[0];
    if (content.type !== 'text') {
        throw new Error('Unexpected response type');
    }
    
    return {
        answer: content.text,
        tokensUsed: response.usage.output_tokens,
    };
}
```

---

## Incremental Sync & Updates

### Background Sync Service

```typescript
// src/sync/scheduler.ts
import { DiscordIngestionService } from '../ingestion/sources/discord';
import { query } from '../db/client';

interface SyncJob {
    sourceType: string;
    sourceIdentifier: string;
    intervalMs: number;
    lastRun?: Date;
}

export class SyncScheduler {
    private jobs: Map<string, NodeJS.Timeout> = new Map();
    private discordService: DiscordIngestionService;

    constructor(discordToken: string, guildId: string) {
        this.discordService = new DiscordIngestionService({ token: discordToken, guildId });
    }

    /**
     * Start scheduled sync jobs
     */
    async start() {
        // Sync Discord every 15 minutes
        this.scheduleJob({
            sourceType: 'discord',
            sourceIdentifier: 'all',
            intervalMs: 15 * 60 * 1000,
        }, async () => {
            console.log('Running Discord sync...');
            const result = await this.discordService.syncAll();
            console.log(`Discord sync complete: ${result.messagesSynced} messages`);
        });
        
        // Add other sync jobs here (Google Drive, etc.)
    }
    
    /**
     * Stop all sync jobs
     */
    stop() {
        for (const [key, timeout] of this.jobs) {
            clearInterval(timeout);
            this.jobs.delete(key);
        }
    }
    
    private scheduleJob(job: SyncJob, handler: () => Promise<void>) {
        const key = `${job.sourceType}:${job.sourceIdentifier}`;
        
        // Run immediately, then schedule
        handler().catch(console.error);
        
        const timeout = setInterval(() => {
            handler().catch(console.error);
        }, job.intervalMs);
        
        this.jobs.set(key, timeout);
    }
}

/**
 * Handle document updates (for webhooks or file watchers)
 */
export async function handleDocumentUpdate(
    sourceType: string,
    sourceId: string,
    newContent: string
) {
    const { ingestDocument } = await import('../ingestion/pipeline');
    
    // Re-ingest will handle deduplication and updating
    await ingestDocument({
        sourceType: sourceType as any,
        sourceId,
        content: newContent,
    });
}

/**
 * Handle document deletion
 */
export async function handleDocumentDeletion(
    sourceType: string,
    sourceId: string
) {
    await query(
        `DELETE FROM documents WHERE source_type = $1 AND source_id = $2`,
        [sourceType, sourceId]
    );
    // Chunks are deleted via CASCADE
}
```

---

## Access Control

### Authentication Middleware

```typescript
// src/api/middleware/auth.ts
import { Context, Next } from 'hono';
import { verify } from 'jsonwebtoken';

interface User {
    id: string;
    email: string;
    groups: string[];
}

declare module 'hono' {
    interface ContextVariableMap {
        user: User | null;
    }
}

export async function authMiddleware(c: Context, next: Next) {
    const authHeader = c.req.header('Authorization');
    
    if (!authHeader?.startsWith('Bearer ')) {
        // Allow unauthenticated access with limited permissions
        c.set('user', null);
        return next();
    }
    
    try {
        const token = authHeader.slice(7);
        const decoded = verify(token, process.env.JWT_SECRET!) as User;
        c.set('user', decoded);
    } catch {
        c.set('user', null);
    }
    
    return next();
}
```

### Document-Level Access Control

```typescript
// src/retrieval/accessControl.ts
import { query } from '../db/client';

interface AccessCheckResult {
    allowed: boolean;
    reason?: string;
}

/**
 * Check if a user can access a document
 */
export async function canAccessDocument(
    documentId: string,
    userGroups: string[] | null
): Promise<AccessCheckResult> {
    const [doc] = await query<{ 
        visibility: string; 
        allowed_groups: string[] 
    }>(
        'SELECT visibility, allowed_groups FROM documents WHERE id = $1',
        [documentId]
    );
    
    if (!doc) {
        return { allowed: false, reason: 'Document not found' };
    }
    
    // Public documents are accessible to all
    if (doc.visibility === 'public') {
        return { allowed: true };
    }
    
    // Internal documents require authentication
    if (doc.visibility === 'internal') {
        if (!userGroups) {
            return { allowed: false, reason: 'Authentication required' };
        }
        return { allowed: true };
    }
    
    // Restricted documents require specific group membership
    if (doc.visibility === 'restricted') {
        if (!userGroups) {
            return { allowed: false, reason: 'Authentication required' };
        }
        
        const hasAccess = doc.allowed_groups?.some(g => userGroups.includes(g));
        if (!hasAccess) {
            return { allowed: false, reason: 'Insufficient permissions' };
        }
        return { allowed: true };
    }
    
    return { allowed: false, reason: 'Unknown visibility level' };
}

/**
 * Build SQL condition for access control filtering
 */
export function buildAccessControlCondition(
    userGroups: string[] | null,
    tableAlias: string = 'd'
): { condition: string; params: any[] } {
    if (!userGroups) {
        return {
            condition: `${tableAlias}.visibility = 'public'`,
            params: [],
        };
    }
    
    return {
        condition: `(
            ${tableAlias}.visibility = 'public' OR
            ${tableAlias}.visibility = 'internal' OR
            (${tableAlias}.visibility = 'restricted' AND ${tableAlias}.allowed_groups && $1)
        )`,
        params: [userGroups],
    };
}
```

---

## Testing & Evaluation

### Unit Tests

```typescript
// src/__tests__/chunking.test.ts
import { describe, it, expect } from 'vitest';
import { chunkText, countTokens } from '../ingestion/chunkers/recursive';

describe('Chunking', () => {
    it('should split long text into chunks', async () => {
        const longText = 'This is a test. '.repeat(100);
        const chunks = await chunkText(longText, {
            chunkSize: 50,
            chunkOverlap: 10,
            minChunkSize: 20,
        });
        
        expect(chunks.length).toBeGreaterThan(1);
        chunks.forEach(chunk => {
            expect(countTokens(chunk.content)).toBeLessThanOrEqual(60); // Allow some margin
        });
    });
    
    it('should preserve small documents', async () => {
        const shortText = 'This is a short document.';
        const chunks = await chunkText(shortText, {
            chunkSize: 100,
            chunkOverlap: 10,
            minChunkSize: 5,
        });
        
        expect(chunks.length).toBe(1);
        expect(chunks[0].content).toBe(shortText);
    });
});
```

### Retrieval Quality Evaluation

```typescript
// src/__tests__/retrieval.test.ts
import { describe, it, expect } from 'vitest';

interface TestCase {
    query: string;
    expectedDocIds: string[];  // Documents that should be in top results
    minRecall: number;         // Minimum fraction of expected docs to find
}

const testCases: TestCase[] = [
    {
        query: 'How do I request PTO?',
        expectedDocIds: ['hr-policy-pto', 'employee-handbook-ch5'],
        minRecall: 0.5,
    },
    {
        query: 'What is the deployment process?',
        expectedDocIds: ['devops-runbook', 'ci-cd-guide'],
        minRecall: 0.5,
    },
];

describe('Retrieval Quality', () => {
    it.each(testCases)('should retrieve relevant docs for: $query', async ({ query, expectedDocIds, minRecall }) => {
        const { hybridSearch } = await import('../retrieval/hybrid');
        
        const results = await hybridSearch(query, { limit: 10 });
        const foundDocIds = results.map(r => r.documentId);
        
        const found = expectedDocIds.filter(id => foundDocIds.includes(id));
        const recall = found.length / expectedDocIds.length;
        
        expect(recall).toBeGreaterThanOrEqual(minRecall);
    });
});
```

### End-to-End Tests

```typescript
// src/__tests__/e2e/query.test.ts
import { describe, it, expect, beforeAll, afterAll } from 'vitest';
import app from '../../api/routes';

describe('Query API', () => {
    it('should answer a simple question', async () => {
        const response = await app.request('/api/query', {
            method: 'POST',
            headers: { 'Content-Type': 'application/json' },
            body: JSON.stringify({
                query: 'What are the office hours?',
            }),
        });
        
        expect(response.status).toBe(200);
        
        const data = await response.json();
        expect(data.answer).toBeDefined();
        expect(data.sources).toBeInstanceOf(Array);
        expect(data.metadata.totalTimeMs).toBeDefined();
    });
    
    it('should handle queries with no relevant context', async () => {
        const response = await app.request('/api/query', {
            method: 'POST',
            headers: { 'Content-Type': 'application/json' },
            body: JSON.stringify({
                query: 'What is the meaning of life according to internal docs?',
            }),
        });
        
        expect(response.status).toBe(200);
        
        const data = await response.json();
        // Should gracefully handle no relevant results
        expect(data.answer).toContain('context');  // Should mention lack of context
    });
});
```

---

## Deployment Considerations

### Environment Variables

```bash
# .env.example

# Database
DATABASE_URL=postgresql://user:pass@localhost:5432/knowledge_base

# OpenAI (for embeddings)
OPENAI_API_KEY=sk-...

# Anthropic (for generation)
ANTHROPIC_API_KEY=sk-ant-...

# Cohere (for re-ranking, optional)
COHERE_API_KEY=...

# Discord
DISCORD_BOT_TOKEN=...
DISCORD_GUILD_ID=...

# Auth
JWT_SECRET=your-secret-key

# Server
PORT=3000
NODE_ENV=production
```

### Docker Setup

```dockerfile
# Dockerfile
FROM node:20-alpine AS builder

WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

FROM node:20-alpine AS runner

WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY package*.json ./

ENV NODE_ENV=production
EXPOSE 3000

CMD ["node", "dist/index.js"]
```

```yaml
# docker-compose.yml
version: '3.8'

services:
  api:
    build: .
    ports:
      - "3000:3000"
    environment:
      - DATABASE_URL=postgresql://postgres:postgres@db:5432/knowledge_base
    depends_on:
      - db
    
  db:
    image: pgvector/pgvector:pg16
    environment:
      POSTGRES_DB: knowledge_base
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres
    volumes:
      - pgdata:/var/lib/postgresql/data
    ports:
      - "5432:5432"

volumes:
  pgdata:
```

### Performance Tuning

```sql
-- Tune pgvector for your dataset size

-- For ~100k vectors: lists = 100
-- For ~1M vectors: lists = 1000  
-- For ~10M vectors: lists = 5000

-- Rebuild index with optimal settings
DROP INDEX IF EXISTS chunks_embedding_ivfflat_idx;
CREATE INDEX chunks_embedding_ivfflat_idx ON chunks 
    USING ivfflat (embedding vector_cosine_ops)
    WITH (lists = 1000);

-- Set probes at query time (higher = better recall, slower)
SET ivfflat.probes = 10;

-- Or use HNSW for better recall (more memory)
CREATE INDEX chunks_embedding_hnsw_idx ON chunks 
    USING hnsw (embedding vector_cosine_ops)
    WITH (m = 16, ef_construction = 64);
SET hnsw.ef_search = 100;
```

### Monitoring

Key metrics to track:
- Query latency (p50, p95, p99)
- Retrieval recall (manual sampling)
- Embedding API costs
- LLM API costs
- Index size and query performance over time

---

## Quick Start Checklist

1. [ ] Set up PostgreSQL with pgvector extension
2. [ ] Run database migrations
3. [ ] Configure environment variables
4. [ ] Ingest initial documents
5. [ ] Set up Discord integration
6. [ ] Configure authentication
7. [ ] Run tests
8. [ ] Deploy API
9. [ ] Set up monitoring
10. [ ] Schedule incremental sync jobs

---

## Further Reading

- [pgvector documentation](https://github.com/pgvector/pgvector)
- [OpenAI Embeddings Guide](https://platform.openai.com/docs/guides/embeddings)
- [Anthropic Claude API](https://docs.anthropic.com/claude/reference)
- [LangChain.js Documentation](https://js.langchain.com/)
- [Cohere Rerank](https://docs.cohere.com/docs/reranking)