RAG architecture — self-hosted semantic search
Status: Live since Phase 71 (2026-06-05). Replaces the Google NotebookLM bridge that shipped in Phase 36.
TL;DR: every Arc project's wiki + issues + skills are embedded into a per-project vector index that lives inside the existing SQLite SSOT. Queries return semantically relevant passages across project content + a shared global skill pool. No external retrieval service, no per-user Google session, no source caps.
1. Why we moved off NotebookLM
The Phase 36 bridge wrapped Google NotebookLM as a vector store using reverse-engineered cookie auth. It produced two problems that the underlying product can't fix:
- Source cap. NotebookLM Plus tops out at 100 sources per notebook. By mid-2026 several Arc projects had blown past the cap; eviction logic turned into a permanent band-aid (
#324). - Personal-account coupling. Every new project created a notebook inside the CEO's personal NotebookLM. Scaling to N customers meant N notebooks in one person's sidebar — an unmanaged side-effect the bridge couldn't decouple.
Phase 71 swaps the retrieval layer for a self-hosted pipeline the user's Google account never touches.
2. Component map
Write path Read path
────────── ─────────
wiki.ts handleWikiSave ────┐ chat.ts executeAskNotebooklm ──┐
cli-routes.ts handleCreate/ │
UpdateIssue ─────────────┤ shared/routes/rag-search.ts ──────┤
skills.ts handleSave/ │ GET /api/crm/projects/:name/ │
Delete + global CRUD ────┤ rag/search │
│ │
▼ ▼
shared/rag-hooks.ts shared/rag.ts search()
syncIssue / syncWiki / │ embeds query →
syncSkill (fire-and-forget) │ KNN over vec0 →
│ │ project + doc_type filter
▼ │ → top-K passages
shared/rag.ts upsert() │
paragraph-aware chunker (~1800 chars) │
→ atomic transaction: │
embeddings ⨯ embeddings_vec │
│ │
▼ │
shared/embeddings.ts ◄──────────────────────────── │
Cohere embed-multilingual-v3.0 │
(1024-dim float32, batch ≤96 inputs/call) │
│ │
▼ │
SQLite (data/citadel.db) ◄──────────────────────── ┘
┌──────────────────────┐ ┌─────────────────────┐
│ embeddings │ │ embeddings_vec │
│ (project, doc_type, │ │ (vec0 virtual, │
│ doc_id, chunk_ix, │ │ embedding FLOAT │
│ text, created_at) │ ──┤ [1024]) │
│ id ←→ rowid │ │ │
└──────────────────────┘ └─────────────────────┘
2.1 Modules at a glance
| Module | Role |
|---|---|
shared/embeddings.ts |
Cohere client — embedBatch(texts, inputType) for indexing + embedQuery(text) for queries. Retry with 3× exponential backoff on transient errors, fail-fast on 401/403. |
shared/rag.ts |
Storage layer — upsert / search / removeDoc / removeProject / stats. Owns the chunker and the atomic embeddings↔embeddings_vec transaction. |
shared/rag-hooks.ts |
Fire-and-forget wrappers (syncIssue, syncWiki, syncSkill + remove* mirrors). Errors print but never roll back the disk/SQL write that triggered them. |
shared/routes/rag-search.ts |
Public HTTP entry point. GET /api/crm/projects/:name/rag/search?q=... |
scripts/phase-71-backfill-rag.ts |
One-shot idempotent seeder for existing content. --dry-run / --force / --project / --throttle flags. |
3. Embedding choice — Cohere embed-multilingual-v3.0
| Property | Value |
|---|---|
| Dimensionality | 1024-dim float32 |
| Multilingual | Native — Ukrainian + English + 100+ languages share a sub-space |
| Cost | ~$0.10 / 1M tokens (Production tier). Estimated ~$4/month at 50-user scale per the cost model. |
| Sub-space split | input_type="search_document" for indexing vs input_type="search_query" for runtime queries. Mixing the two tanks recall. |
Live cross-lingual smoke against the prod corpus:
| Query | Lang | Top hit | Distance |
|---|---|---|---|
| Safari cookie auth bug | EN | issue/174 (auth/password reset) |
1.00 |
| множинні тенанти | UK | issue/49 (Phase 53.11.2 Multi-Worker TG Topics Mode) |
1.07 |
The UK query beat the corresponding EN-EN distance — multilingual sub-space alignment is real, not a marketing claim.
4. Storage — sqlite-vec inside the SSOT
Migration 049_embeddings creates two tables:
CREATE TABLE embeddings (
id INTEGER PRIMARY KEY AUTOINCREMENT,
project TEXT NOT NULL,
doc_type TEXT NOT NULL, -- 'wiki' | 'issue' | 'skill' | 'transcript'
doc_id TEXT NOT NULL, -- filename, issue id, skill name
chunk_ix INTEGER NOT NULL DEFAULT 0,
text TEXT NOT NULL,
created_at TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE VIRTUAL TABLE embeddings_vec USING vec0(
embedding FLOAT[1024]
);
-- embeddings.id ←→ embeddings_vec.rowid
The split keeps metadata in regular SQLite (joinable with wiki/issues/skills tables, easy LIST/DELETE) while the vec0 virtual table holds the dense vectors for ANN search.
Sizing. 1024-dim float32 = 4096 bytes / row. 100K rows ≈ 400MB. Fine at the current scale; revisit at 1M+ rows.
Loading. sqlite-vec is loaded as a runtime extension in shared/db.ts initDb() before migrations run. Migration 049 sanity-checks vec_version() and aborts cleanly if the extension didn't load.
5. Write path — re-embed hooks (Phase 71.5)
Every write surface that mutates indexable content fires a fire-and-forget hook so the embedding stays fresh without a separate sync job.
| Trigger | Hook | Notes |
|---|---|---|
PUT /api/crm/projects/:name/wiki/save |
syncWiki(project, path, content) |
After Bun.write succeeds |
POST /api/mcp/issues/:project (create) |
syncIssue(project, id, title, body) |
After SQLite write |
PUT /api/mcp/issues/:project/:id (update) |
syncIssue(project, id, title, body) |
Same |
POST /api/mcp/issues/:project/:id/log (activity) |
none | Activity log doesn't change indexed title/body — re-embedding would burn quota on no-op |
PUT /api/crm/projects/:name/skills/save |
syncSkill(project, name, content) |
Per-project skill |
DELETE /api/crm/projects/:name/skills/delete |
removeSkill(project, name) |
Mirror |
POST /api/crm/skills (global create) |
syncSkill("_global_", name, content) |
Global pool |
PUT /api/crm/skills/:id (global update) |
syncSkill("_global_", name, content) |
Only if content field actually changed |
DELETE /api/crm/skills/:id (global delete) |
removeSkill("_global_", name) |
Mirror |
DELETE /api/auth/account (GDPR Art. 17) |
removeProject(name) per owned project |
Embeddings don't survive account deletion |
Transcript pipeline (Phase 73.6): status → summarized → embedding → done |
upsert(project, "transcript", id, ragText) via transcript-worker.ts |
ragText = transcript text + [Xs] frame descriptions + summary tldr/key_points/decisions/action_items. Skipped when embed_to_rag=0. Non-fatal: RAG failure still finalises job to done. |
Rule: Cohere errors print one line and die. They never roll back the disk/SQL write that triggered them. Persistent failures converge back through the nightly backfill, not through retries in the hot path.
6. Read path — search() + per-surface façades
import { search } from "shared/rag";
const hits = await search("arc-v2", "How does the multi-tenancy gate work?", {
k: 6,
doc_types: ["wiki", "issue", "skill", "transcript"], // optional narrow
});
// hits: Array<{ doc_type, doc_id, chunk_ix, text, distance }> sorted by distance
Under the hood:
- Embed the query with
input_type="search_query"(different sub-space thansearch_document). - KNN over
embeddings_vecwithk * 4overfetch. - Filter by project + optional
doc_typeat the SQL JOIN. - Return top-K passages sorted by ascending L2 distance.
Public façades
GET /api/crm/projects/:name/rag/search?q=...&k=...&include_global=true&doc_types=...— HTTP entry point. Powersarc kb search+ the in-chatask_notebooklmtool.chat.ts executeAskNotebooklm— runs the project + global search in parallel, merges by distance, falls back to keyword search when top hitd > 1.6or zero results.skills.ts handleGenerateSkill— uses RAG as house-style retrieval, then calls Claude Sonnet for generation.help.ts ragSemantic— Arc Help chat surfaces project + global hits as model context.
7. Backfill — scripts/phase-71-backfill-rag.ts
bun scripts/phase-71-backfill-rag.ts # all projects + global
bun scripts/phase-71-backfill-rag.ts --dry-run # inventory only
bun scripts/phase-71-backfill-rag.ts --project arc-v2 # one project + global
bun scripts/phase-71-backfill-rag.ts --skip-global # projects only
bun scripts/phase-71-backfill-rag.ts --doc-types wiki,issue
bun scripts/phase-71-backfill-rag.ts --force # re-embed even if row exists
bun scripts/phase-71-backfill-rag.ts --limit 100 # cap docs per project
bun scripts/phase-71-backfill-rag.ts --throttle 700 # ms between Cohere calls (default 700; set 0 on Production keys)
Idempotency. Each candidate is checked for (project, doc_type, doc_id) presence in embeddings and skipped if found, unless --force is set. Re-runs are zero-cost in Cohere quota.
Production run (2026-06-05). 814 candidates across 20 projects + _global_ (37 wiki + 482 issue + 295 skill). First-pass burnt the Trial 1000-call monthly cap on global skills. After upgrading to a Production key, the remaining 178 docs finished in 57s with zero errors. Total chunks written: ~3,150.
8. Security boundaries
- Cohere API key lives in the vault (
COHERE_API_KEY), rotates via Platform Settings → RAG / Semantic search. LiveTestbutton performs a 1-token embed call to confirm reachability. - No prompt leakage to Cohere. Embeddings are bag-of-words derivatives; reversing them back to the source text is impractical. Cohere's own policy says they don't train on API traffic.
- Multi-tenancy. Project names are namespace keys;
canAccessProjectguards/api/crm/projects/:name/*before the search handler runs. Global skills live in a sentinel_global_namespace that no user project name can collide with (the validator forbids names starting with_). - GDPR Art. 17.
DELETE /api/auth/accountcascadesrag.removeProjectfor every owned project so embeddings don't outlive deletion.
9. Migration from NotebookLM (one-time, 2026-06-05)
| Step | Done |
|---|---|
| Cohere API key + Platform Settings tab + live probe | #358 |
sqlite-vec extension + migration 049 |
#359 |
shared/embeddings.ts client |
#360 |
shared/rag.ts façade |
#361 |
| Re-embed hooks on wiki / issue / skill writes | #362 |
| Backfill script + prod run | #363 |
Swap call sites (chat.ts, skills.ts, new /rag/search, arc kb search) |
#364 |
Decommission services/notebooklm-bridge/ + drop projects.notebook_id (migration 050) |
#365 |
| Docs (this page + 7 locale stubs) | #366 |
| Soak validation | #367 |
10. Operating notes
- Tool name kept as
ask_notebooklm. The Anthropic tool definition ships downstream to Claude in the Cloud PM conversation. Renaming would break in-flighttool_useloops. Implementation is now pure RAG; the public contract stayed stable. - Audio overview is gone. NotebookLM auto-generated audio summaries had no equivalent in the self-hosted pipeline.
POST /api/crm/projects/:name/memory/fetch-artifactnow returns410 Gone. If you need this back, that's a separate phase (likely a Whisper TTS pass over the project's wiki). - Re-index button still works.
POST /api/crm/projects/:name/memory/refreshnow re-embedsMANIFEST.md+ROADMAP.md+ key files into the localembeddingsstore rather than uploading to NotebookLM. User-facing behavior: same button, same outcome ("project knowledge is now fresh"), different storage. - No notebook URLs.
GET /api/crm/projects/:name/notebooksreturns{ notebooks: [], retired: "phase-71.8" }to keep old frontend builds from 404-ing. The Neural Memory tab in the CRM will be retired in a follow-up cosmetic pass.
11. References
shared/embeddings.ts,shared/rag.ts,shared/rag-hooks.ts,shared/routes/rag-search.tsshared/migrations/049_embeddings.ts,shared/migrations/050_drop_notebook_id.tsscripts/phase-71-backfill-rag.ts- Issues #321 (parent) · #358–#367 (sub-phases) · #322 / #324 (closed alongside)
- Decision log:
docs/architecture/PHASE_71_RAG_MIGRATION.md