RAG architecture — self-hosted semantic search

Status: Live since Phase 71 (2026-06-05). Replaces the Google NotebookLM bridge that shipped in Phase 36.

TL;DR: every Arc project's wiki + issues + skills are embedded into a per-project vector index that lives inside the existing SQLite SSOT. Queries return semantically relevant passages across project content + a shared global skill pool. No external retrieval service, no per-user Google session, no source caps.

1. Why we moved off NotebookLM

The Phase 36 bridge wrapped Google NotebookLM as a vector store using reverse-engineered cookie auth. It produced two problems that the underlying product can't fix:

Source cap. NotebookLM Plus tops out at 100 sources per notebook. By mid-2026 several Arc projects had blown past the cap; eviction logic turned into a permanent band-aid (#324).
Personal-account coupling. Every new project created a notebook inside the CEO's personal NotebookLM. Scaling to N customers meant N notebooks in one person's sidebar — an unmanaged side-effect the bridge couldn't decouple.

Phase 71 swaps the retrieval layer for a self-hosted pipeline the user's Google account never touches.

2. Component map

                Write path                          Read path
                ──────────                          ─────────
 wiki.ts handleWikiSave ────┐                 chat.ts executeAskNotebooklm ──┐
 cli-routes.ts handleCreate/                                                 │
   UpdateIssue ─────────────┤              shared/routes/rag-search.ts ──────┤
 skills.ts handleSave/      │                 GET /api/crm/projects/:name/   │
   Delete + global CRUD ────┤                 rag/search                     │
                            │                                                │
                            ▼                                                ▼
              shared/rag-hooks.ts                              shared/rag.ts search()
                syncIssue / syncWiki /                            │  embeds query →
                syncSkill (fire-and-forget)                       │  KNN over vec0 →
                            │                                     │  project + doc_type filter
                            ▼                                     │  → top-K passages
              shared/rag.ts upsert()                              │
                paragraph-aware chunker (~1800 chars)              │
                  → atomic transaction:                            │
                       embeddings ⨯ embeddings_vec                 │
                            │                                     │
                            ▼                                     │
              shared/embeddings.ts ◄────────────────────────────  │
                Cohere embed-multilingual-v3.0                    │
                (1024-dim float32, batch ≤96 inputs/call)         │
                            │                                     │
                            ▼                                     │
              SQLite (data/citadel.db)  ◄──────────────────────── ┘
                 ┌──────────────────────┐   ┌─────────────────────┐
                 │ embeddings           │   │ embeddings_vec      │
                 │ (project, doc_type,  │   │ (vec0 virtual,      │
                 │  doc_id, chunk_ix,   │   │  embedding FLOAT    │
                 │  text, created_at)   │ ──┤  [1024])            │
                 │ id ←→ rowid          │   │                     │
                 └──────────────────────┘   └─────────────────────┘

2.1 Modules at a glance

Module	Role
`shared/embeddings.ts`	Cohere client — `embedBatch(texts, inputType)` for indexing + `embedQuery(text)` for queries. Retry with 3× exponential backoff on transient errors, fail-fast on 401/403.
`shared/rag.ts`	Storage layer — `upsert` / `search` / `removeDoc` / `removeProject` / `stats`. Owns the chunker and the atomic `embeddings`↔`embeddings_vec` transaction.
`shared/rag-hooks.ts`	Fire-and-forget wrappers (`syncIssue`, `syncWiki`, `syncSkill` + `remove*` mirrors). Errors print but never roll back the disk/SQL write that triggered them.
`shared/routes/rag-search.ts`	Public HTTP entry point. `GET /api/crm/projects/:name/rag/search?q=...`
`scripts/phase-71-backfill-rag.ts`	One-shot idempotent seeder for existing content. `--dry-run` / `--force` / `--project` / `--throttle` flags.

3. Embedding choice — Cohere `embed-multilingual-v3.0`

Property	Value
Dimensionality	1024-dim float32
Multilingual	Native — Ukrainian + English + 100+ languages share a sub-space
Cost	~$0.10 / 1M tokens (Production tier). Estimated ~$4/month at 50-user scale per the cost model.
Sub-space split	`input_type="search_document"` for indexing vs `input_type="search_query"` for runtime queries. Mixing the two tanks recall.

Live cross-lingual smoke against the prod corpus:

Query	Lang	Top hit	Distance
Safari cookie auth bug	EN	`issue/174` (auth/password reset)	1.00
множинні тенанти	UK	`issue/49` (Phase 53.11.2 Multi-Worker TG Topics Mode)	1.07

The UK query beat the corresponding EN-EN distance — multilingual sub-space alignment is real, not a marketing claim.

4. Storage — `sqlite-vec` inside the SSOT

Migration 049_embeddings creates two tables:

CREATE TABLE embeddings (
  id          INTEGER PRIMARY KEY AUTOINCREMENT,
  project     TEXT NOT NULL,
  doc_type    TEXT NOT NULL,         -- 'wiki' | 'issue' | 'skill' | 'transcript'
  doc_id      TEXT NOT NULL,         -- filename, issue id, skill name
  chunk_ix    INTEGER NOT NULL DEFAULT 0,
  text        TEXT NOT NULL,
  created_at  TEXT NOT NULL DEFAULT (datetime('now'))
);

CREATE VIRTUAL TABLE embeddings_vec USING vec0(
  embedding FLOAT[1024]
);
-- embeddings.id ←→ embeddings_vec.rowid

The split keeps metadata in regular SQLite (joinable with wiki/issues/skills tables, easy LIST/DELETE) while the vec0 virtual table holds the dense vectors for ANN search.

Sizing. 1024-dim float32 = 4096 bytes / row. 100K rows ≈ 400MB. Fine at the current scale; revisit at 1M+ rows.

Loading. sqlite-vec is loaded as a runtime extension in shared/db.ts initDb() before migrations run. Migration 049 sanity-checks vec_version() and aborts cleanly if the extension didn't load.

5. Write path — re-embed hooks (Phase 71.5)

Every write surface that mutates indexable content fires a fire-and-forget hook so the embedding stays fresh without a separate sync job.

Trigger	Hook	Notes
`PUT /api/crm/projects/:name/wiki/save`	`syncWiki(project, path, content)`	After `Bun.write` succeeds
`POST /api/mcp/issues/:project` (create)	`syncIssue(project, id, title, body)`	After SQLite write
`PUT /api/mcp/issues/:project/:id` (update)	`syncIssue(project, id, title, body)`	Same
`POST /api/mcp/issues/:project/:id/log` (activity)	none	Activity log doesn't change indexed title/body — re-embedding would burn quota on no-op
`PUT /api/crm/projects/:name/skills/save`	`syncSkill(project, name, content)`	Per-project skill
`DELETE /api/crm/projects/:name/skills/delete`	`removeSkill(project, name)`	Mirror
`POST /api/crm/skills` (global create)	`syncSkill("_global_", name, content)`	Global pool
`PUT /api/crm/skills/:id` (global update)	`syncSkill("_global_", name, content)`	Only if `content` field actually changed
`DELETE /api/crm/skills/:id` (global delete)	`removeSkill("_global_", name)`	Mirror
`DELETE /api/auth/account` (GDPR Art. 17)	`removeProject(name)` per owned project	Embeddings don't survive account deletion
Transcript pipeline (Phase 73.6): `status → summarized → embedding → done`	`upsert(project, "transcript", id, ragText)` via `transcript-worker.ts`	`ragText` = transcript text + `[Xs] frame descriptions` + summary tldr/key_points/decisions/action_items. Skipped when `embed_to_rag=0`. Non-fatal: RAG failure still finalises job to `done`.

Rule: Cohere errors print one line and die. They never roll back the disk/SQL write that triggered them. Persistent failures converge back through the nightly backfill, not through retries in the hot path.

6. Read path — `search()` + per-surface façades

import { search } from "shared/rag";

const hits = await search("arc-v2", "How does the multi-tenancy gate work?", {
  k: 6,
  doc_types: ["wiki", "issue", "skill", "transcript"], // optional narrow
});
// hits: Array<{ doc_type, doc_id, chunk_ix, text, distance }> sorted by distance

Under the hood:

Embed the query with input_type="search_query" (different sub-space than search_document).
KNN over embeddings_vec with k * 4 overfetch.
Filter by project + optional doc_type at the SQL JOIN.
Return top-K passages sorted by ascending L2 distance.

Public façades

GET /api/crm/projects/:name/rag/search?q=...&k=...&include_global=true&doc_types=... — HTTP entry point. Powers arc kb search + the in-chat ask_notebooklm tool.
chat.ts executeAskNotebooklm — runs the project + global search in parallel, merges by distance, falls back to keyword search when top hit d > 1.6 or zero results.
skills.ts handleGenerateSkill — uses RAG as house-style retrieval, then calls Claude Sonnet for generation.
help.ts ragSemantic — Arc Help chat surfaces project + global hits as model context.

7. Backfill — `scripts/phase-71-backfill-rag.ts`

bun scripts/phase-71-backfill-rag.ts                       # all projects + global
bun scripts/phase-71-backfill-rag.ts --dry-run             # inventory only
bun scripts/phase-71-backfill-rag.ts --project arc-v2      # one project + global
bun scripts/phase-71-backfill-rag.ts --skip-global         # projects only
bun scripts/phase-71-backfill-rag.ts --doc-types wiki,issue
bun scripts/phase-71-backfill-rag.ts --force               # re-embed even if row exists
bun scripts/phase-71-backfill-rag.ts --limit 100           # cap docs per project
bun scripts/phase-71-backfill-rag.ts --throttle 700        # ms between Cohere calls (default 700; set 0 on Production keys)

Idempotency. Each candidate is checked for (project, doc_type, doc_id) presence in embeddings and skipped if found, unless --force is set. Re-runs are zero-cost in Cohere quota.

Production run (2026-06-05). 814 candidates across 20 projects + _global_ (37 wiki + 482 issue + 295 skill). First-pass burnt the Trial 1000-call monthly cap on global skills. After upgrading to a Production key, the remaining 178 docs finished in 57s with zero errors. Total chunks written: ~3,150.

8. Security boundaries

Cohere API key lives in the vault (COHERE_API_KEY), rotates via Platform Settings → RAG / Semantic search. Live Test button performs a 1-token embed call to confirm reachability.
No prompt leakage to Cohere. Embeddings are bag-of-words derivatives; reversing them back to the source text is impractical. Cohere's own policy says they don't train on API traffic.
Multi-tenancy. Project names are namespace keys; canAccessProject guards /api/crm/projects/:name/* before the search handler runs. Global skills live in a sentinel _global_ namespace that no user project name can collide with (the validator forbids names starting with _).
GDPR Art. 17. DELETE /api/auth/account cascades rag.removeProject for every owned project so embeddings don't outlive deletion.

9. Migration from NotebookLM (one-time, 2026-06-05)

Step	Done
Cohere API key + Platform Settings tab + live probe	#358
`sqlite-vec` extension + migration 049	#359
`shared/embeddings.ts` client	#360
`shared/rag.ts` façade	#361
Re-embed hooks on wiki / issue / skill writes	#362
Backfill script + prod run	#363
Swap call sites (`chat.ts`, `skills.ts`, new `/rag/search`, `arc kb search`)	#364
Decommission `services/notebooklm-bridge/` + drop `projects.notebook_id` (migration 050)	#365
Docs (this page + 7 locale stubs)	#366
Soak validation	#367

10. Operating notes

Tool name kept as ask_notebooklm. The Anthropic tool definition ships downstream to Claude in the Cloud PM conversation. Renaming would break in-flight tool_use loops. Implementation is now pure RAG; the public contract stayed stable.
Audio overview is gone. NotebookLM auto-generated audio summaries had no equivalent in the self-hosted pipeline. POST /api/crm/projects/:name/memory/fetch-artifact now returns 410 Gone. If you need this back, that's a separate phase (likely a Whisper TTS pass over the project's wiki).
Re-index button still works. POST /api/crm/projects/:name/memory/refresh now re-embeds MANIFEST.md + ROADMAP.md + key files into the local embeddings store rather than uploading to NotebookLM. User-facing behavior: same button, same outcome ("project knowledge is now fresh"), different storage.
No notebook URLs. GET /api/crm/projects/:name/notebooks returns { notebooks: [], retired: "phase-71.8" } to keep old frontend builds from 404-ing. The Neural Memory tab in the CRM will be retired in a follow-up cosmetic pass.

11. References

shared/embeddings.ts, shared/rag.ts, shared/rag-hooks.ts, shared/routes/rag-search.ts
shared/migrations/049_embeddings.ts, shared/migrations/050_drop_notebook_id.ts
scripts/phase-71-backfill-rag.ts
Issues #321 (parent) · #358–#367 (sub-phases) · #322 / #324 (closed alongside)
Decision log: docs/architecture/PHASE_71_RAG_MIGRATION.md