RAG architecture — self-hosted semantic search

Status: Live since Phase 71 (2026-06-05). Replaces the Google NotebookLM bridge that shipped in Phase 36.

TL;DR: every Arc project's wiki + issues + skills are embedded into a per-project vector index that lives inside the existing SQLite SSOT. Queries return semantically relevant passages across project content + a shared global skill pool. No external retrieval service, no per-user Google session, no source caps.


1. Why we moved off NotebookLM

The Phase 36 bridge wrapped Google NotebookLM as a vector store using reverse-engineered cookie auth. It produced two problems that the underlying product can't fix:

  1. Source cap. NotebookLM Plus tops out at 100 sources per notebook. By mid-2026 several Arc projects had blown past the cap; eviction logic turned into a permanent band-aid (#324).
  2. Personal-account coupling. Every new project created a notebook inside the CEO's personal NotebookLM. Scaling to N customers meant N notebooks in one person's sidebar — an unmanaged side-effect the bridge couldn't decouple.

Phase 71 swaps the retrieval layer for a self-hosted pipeline the user's Google account never touches.


2. Component map

                Write path                          Read path
                ──────────                          ─────────
 wiki.ts handleWikiSave ────┐                 chat.ts executeAskNotebooklm ──┐
 cli-routes.ts handleCreate/                                                 │
   UpdateIssue ─────────────┤              shared/routes/rag-search.ts ──────┤
 skills.ts handleSave/      │                 GET /api/crm/projects/:name/   │
   Delete + global CRUD ────┤                 rag/search                     │
                            │                                                │
                            ▼                                                ▼
              shared/rag-hooks.ts                              shared/rag.ts search()
                syncIssue / syncWiki /                            │  embeds query →
                syncSkill (fire-and-forget)                       │  KNN over vec0 →
                            │                                     │  project + doc_type filter
                            ▼                                     │  → top-K passages
              shared/rag.ts upsert()                              │
                paragraph-aware chunker (~1800 chars)              │
                  → atomic transaction:                            │
                       embeddings ⨯ embeddings_vec                 │
                            │                                     │
                            ▼                                     │
              shared/embeddings.ts ◄────────────────────────────  │
                Cohere embed-multilingual-v3.0                    │
                (1024-dim float32, batch ≤96 inputs/call)         │
                            │                                     │
                            ▼                                     │
              SQLite (data/citadel.db)  ◄──────────────────────── ┘
                 ┌──────────────────────┐   ┌─────────────────────┐
                 │ embeddings           │   │ embeddings_vec      │
                 │ (project, doc_type,  │   │ (vec0 virtual,      │
                 │  doc_id, chunk_ix,   │   │  embedding FLOAT    │
                 │  text, created_at)   │ ──┤  [1024])            │
                 │ id ←→ rowid          │   │                     │
                 └──────────────────────┘   └─────────────────────┘

2.1 Modules at a glance

Module Role
shared/embeddings.ts Cohere client — embedBatch(texts, inputType) for indexing + embedQuery(text) for queries. Retry with 3× exponential backoff on transient errors, fail-fast on 401/403.
shared/rag.ts Storage layer — upsert / search / removeDoc / removeProject / stats. Owns the chunker and the atomic embeddingsembeddings_vec transaction.
shared/rag-hooks.ts Fire-and-forget wrappers (syncIssue, syncWiki, syncSkill + remove* mirrors). Errors print but never roll back the disk/SQL write that triggered them.
shared/routes/rag-search.ts Public HTTP entry point. GET /api/crm/projects/:name/rag/search?q=...
scripts/phase-71-backfill-rag.ts One-shot idempotent seeder for existing content. --dry-run / --force / --project / --throttle flags.

3. Embedding choice — Cohere embed-multilingual-v3.0

Property Value
Dimensionality 1024-dim float32
Multilingual Native — Ukrainian + English + 100+ languages share a sub-space
Cost ~$0.10 / 1M tokens (Production tier). Estimated ~$4/month at 50-user scale per the cost model.
Sub-space split input_type="search_document" for indexing vs input_type="search_query" for runtime queries. Mixing the two tanks recall.

Live cross-lingual smoke against the prod corpus:

Query Lang Top hit Distance
Safari cookie auth bug EN issue/174 (auth/password reset) 1.00
множинні тенанти UK issue/49 (Phase 53.11.2 Multi-Worker TG Topics Mode) 1.07

The UK query beat the corresponding EN-EN distance — multilingual sub-space alignment is real, not a marketing claim.


4. Storage — sqlite-vec inside the SSOT

Migration 049_embeddings creates two tables:

CREATE TABLE embeddings (
  id          INTEGER PRIMARY KEY AUTOINCREMENT,
  project     TEXT NOT NULL,
  doc_type    TEXT NOT NULL,         -- 'wiki' | 'issue' | 'skill' | 'transcript'
  doc_id      TEXT NOT NULL,         -- filename, issue id, skill name
  chunk_ix    INTEGER NOT NULL DEFAULT 0,
  text        TEXT NOT NULL,
  created_at  TEXT NOT NULL DEFAULT (datetime('now'))
);

CREATE VIRTUAL TABLE embeddings_vec USING vec0(
  embedding FLOAT[1024]
);
-- embeddings.id ←→ embeddings_vec.rowid

The split keeps metadata in regular SQLite (joinable with wiki/issues/skills tables, easy LIST/DELETE) while the vec0 virtual table holds the dense vectors for ANN search.

Sizing. 1024-dim float32 = 4096 bytes / row. 100K rows ≈ 400MB. Fine at the current scale; revisit at 1M+ rows.

Loading. sqlite-vec is loaded as a runtime extension in shared/db.ts initDb() before migrations run. Migration 049 sanity-checks vec_version() and aborts cleanly if the extension didn't load.


5. Write path — re-embed hooks (Phase 71.5)

Every write surface that mutates indexable content fires a fire-and-forget hook so the embedding stays fresh without a separate sync job.

Trigger Hook Notes
PUT /api/crm/projects/:name/wiki/save syncWiki(project, path, content) After Bun.write succeeds
POST /api/mcp/issues/:project (create) syncIssue(project, id, title, body) After SQLite write
PUT /api/mcp/issues/:project/:id (update) syncIssue(project, id, title, body) Same
POST /api/mcp/issues/:project/:id/log (activity) none Activity log doesn't change indexed title/body — re-embedding would burn quota on no-op
PUT /api/crm/projects/:name/skills/save syncSkill(project, name, content) Per-project skill
DELETE /api/crm/projects/:name/skills/delete removeSkill(project, name) Mirror
POST /api/crm/skills (global create) syncSkill("_global_", name, content) Global pool
PUT /api/crm/skills/:id (global update) syncSkill("_global_", name, content) Only if content field actually changed
DELETE /api/crm/skills/:id (global delete) removeSkill("_global_", name) Mirror
DELETE /api/auth/account (GDPR Art. 17) removeProject(name) per owned project Embeddings don't survive account deletion
Transcript pipeline (Phase 73.6): status → summarized → embedding → done upsert(project, "transcript", id, ragText) via transcript-worker.ts ragText = transcript text + [Xs] frame descriptions + summary tldr/key_points/decisions/action_items. Skipped when embed_to_rag=0. Non-fatal: RAG failure still finalises job to done.

Rule: Cohere errors print one line and die. They never roll back the disk/SQL write that triggered them. Persistent failures converge back through the nightly backfill, not through retries in the hot path.


6. Read path — search() + per-surface façades

import { search } from "shared/rag";

const hits = await search("arc-v2", "How does the multi-tenancy gate work?", {
  k: 6,
  doc_types: ["wiki", "issue", "skill", "transcript"], // optional narrow
});
// hits: Array<{ doc_type, doc_id, chunk_ix, text, distance }> sorted by distance

Under the hood:

  1. Embed the query with input_type="search_query" (different sub-space than search_document).
  2. KNN over embeddings_vec with k * 4 overfetch.
  3. Filter by project + optional doc_type at the SQL JOIN.
  4. Return top-K passages sorted by ascending L2 distance.

Public façades


7. Backfill — scripts/phase-71-backfill-rag.ts

bun scripts/phase-71-backfill-rag.ts                       # all projects + global
bun scripts/phase-71-backfill-rag.ts --dry-run             # inventory only
bun scripts/phase-71-backfill-rag.ts --project arc-v2      # one project + global
bun scripts/phase-71-backfill-rag.ts --skip-global         # projects only
bun scripts/phase-71-backfill-rag.ts --doc-types wiki,issue
bun scripts/phase-71-backfill-rag.ts --force               # re-embed even if row exists
bun scripts/phase-71-backfill-rag.ts --limit 100           # cap docs per project
bun scripts/phase-71-backfill-rag.ts --throttle 700        # ms between Cohere calls (default 700; set 0 on Production keys)

Idempotency. Each candidate is checked for (project, doc_type, doc_id) presence in embeddings and skipped if found, unless --force is set. Re-runs are zero-cost in Cohere quota.

Production run (2026-06-05). 814 candidates across 20 projects + _global_ (37 wiki + 482 issue + 295 skill). First-pass burnt the Trial 1000-call monthly cap on global skills. After upgrading to a Production key, the remaining 178 docs finished in 57s with zero errors. Total chunks written: ~3,150.


8. Security boundaries


9. Migration from NotebookLM (one-time, 2026-06-05)

Step Done
Cohere API key + Platform Settings tab + live probe #358
sqlite-vec extension + migration 049 #359
shared/embeddings.ts client #360
shared/rag.ts façade #361
Re-embed hooks on wiki / issue / skill writes #362
Backfill script + prod run #363
Swap call sites (chat.ts, skills.ts, new /rag/search, arc kb search) #364
Decommission services/notebooklm-bridge/ + drop projects.notebook_id (migration 050) #365
Docs (this page + 7 locale stubs) #366
Soak validation #367

10. Operating notes


11. References