Meeting Transcription + Analysis

Status: Live since Phase 73 (2026-06-06). Upload audio/video recordings to the chat composer — Arc OS transcribes, analyzes visuals, summarizes, and embeds into project search.

Overview

The meeting transcription pipeline turns a raw recording into searchable, structured knowledge without leaving Arc OS:

Upload (Transcripts page or composer paperclip)
  → Groq whisper-large-v3         (~100× realtime, $0.111/audio-hour)
  → Claude vision on key frames   (video only — slide content, screen shares)
  → Claude Sonnet summary         (tldr, action items, decisions, topics)
  → Cohere RAG embed              (transcript searchable via arc kb search)
  → done                          (source file deleted from disk)

Cost estimate (BYOK Anthropic key): ~$0.20–0.45 per 30-minute meeting (vision + summary) + ~$0.056 Groq transcription. Feature requires a paid plan (Starter / Starter Cloud / Beta).

Uploading a Recording

Open any project workspace.
Click the paperclip icon in the chat composer.
Select an audio or video file. Supported formats:
- Video: mp4, mov, webm, mkv, m4v
- Audio: mp3, wav, m4a, aac, ogg, opus, flac
- Max size: 1 GB
A progress chip appears in the composer showing the current step and percentage.
When the chip reaches 100% (Summary ready) the Send button ungates — you can send a message referencing the transcript immediately.
After a brief additional step (RAG indexing), the status becomes done and the source file is deleted.

While processing: You can type other messages in the composer. Only media attachments that are still processing block the Send button.

What Gets Injected into Chat

When you send a message containing a completed transcript attachment, the transcript text is automatically appended to your message:

Your message text here

--- TRANSCRIPT: meeting-2026-06-06.mp4 (video) ---
[whisper transcript text...]
--- END TRANSCRIPT ---

The AI worker sees the full transcript inline, like any other document. You can ask questions about it, request summaries, or have the worker extract action items.

Pipeline Steps in Detail

Step 1 — Audio extraction (ffmpeg)

For video files, Arc OS strips the audio track to a 16 kHz mono WAV. For audio files, this is a format conversion only.

Step 2 — Transcription (Groq whisper-large-v3)

The WAV is sent to the Groq API (whisper-large-v3 model). Speed: ~100× realtime (11 s clip → under 1 s). The platform GROQ_API_KEY is configured in Platform Settings → Transcription by an admin. Multilingual — Ukrainian, English, and all other major languages are supported out of the box.

Step 3 — Scene-change frame extraction (video only)

ffmpeg samples frames at scene-change points (threshold 0.4, sensitivity from 0 = every frame to 1 = never). Timestamps are recorded to timestamps.json so frame descriptions map back to exact video positions.

Maximum frames per video: 50 (~$0.15 worst case in Claude vision costs).

Step 4 — Claude vision analysis (video only)

Each sampled frame is sent to Claude Sonnet vision with a prompt focused on:

Slides and presentation content (titles, key points, charts)
Screen shares (apps, code, dashboards, visible URLs)
Diagrams (boxes, arrows, labels)
Visible text (UI labels, headings, important numbers)

Webcam-only frames or frames with no informational content are marked "No informational content" and contribute nothing to the summary.

Frame failures are non-fatal — the description becomes [vision failed: <reason>] and the pipeline continues.

Step 5 — Claude Sonnet summary

The transcript (and vision frame descriptions if available) are sent to Claude Sonnet. The summary is stored as structured JSON:

{
  "tldr": "Team reviewed Q3 revenue forecast and decided to ship the new feature by end of month.",
  "key_points": [
    "Q3 revenue is 15% above projection",
    "Two blockers remain on the backend API"
  ],
  "action_items": [
    { "task": "Fix auth endpoint before Thursday", "owner": "Serhii" },
    { "task": "Update stakeholder deck", "owner": null }
  ],
  "decisions": [
    "Ship feature to production on June 30"
  ],
  "topics": ["Q3 revenue", "product launch", "API blockers"],
  "model": "claude-sonnet-4-5",
  "generated_at": "2026-06-06T10:42:00Z"
}

Summary failures roll the transcript back to its previous state so the raw text is preserved. You can retry after fixing your Anthropic key.

Step 6 — RAG embedding (Phase 73.6)

The full text (transcript + frame descriptions + summary) is chunked and embedded via Cohere embed-multilingual-v3.0 into the project's vector index. After this step:

The recording is searchable via arc kb search and the in-chat knowledge tool
The source file is deleted from disk (CEO decision D4 — transcript table is the source of truth)
Status becomes done

To skip RAG indexing: pass embed_to_rag=false in the upload form (API only; the UI always embeds by default).

BYOK API Key Setup

Claude vision and summary use your Anthropic API key:

Go to Profile → API Keys → Anthropic Key.
Paste your key. It is encrypted with AES-256-GCM before storage.
If no personal key is configured, the platform's shared key is used (subject to platform rate limits).

Groq transcription uses the platform GROQ_API_KEY — no personal key needed for transcription.

Searching Transcripts

Once a transcript reaches done, it is findable via semantic search:

arc kb search "action items from last standup"
arc kb search "decision about Q3 launch"

From the chat composer, the AI worker can answer questions grounded in your transcripts automatically via the ask_notebooklm tool (which queries the same RAG index).

You can narrow to transcripts only via the doc_types param:

GET /api/crm/projects/:name/rag/search?q=...&doc_types=transcript

Costs and Limits

Resource	Limit	Notes
Upload size	1 GB	Per-file
Vision frames	50 max	Hard cap; excess frames dropped
Claude vision cost	~$0.003/frame	At typical 1024×768 JPEG resolution
Claude summary cost	~$0.005–0.02	Depends on transcript length (≤60 000 chars fed to summary)
Groq transcription	$0.111/audio-hour	Platform key; ~$0.056 per 30 min meeting
Monthly cap (Starter)	60 min	Resets on 1st of each month
Monthly cap (Starter Cloud)	300 min	Resets on 1st of each month
Monthly cap (Beta)	1 200 min	Resets on 1st of each month
Concurrent jobs	1	Jobs queue; one transcription at a time per server

Troubleshooting

Symptom	Likely cause	Fix
Chip stuck at "Transcribing" for >2 min	Groq API timeout or long recording	Check GROQ_API_KEY in Platform Settings; jobs queue
Vision step shows `[vision failed: ...]`	Anthropic key missing or rate-limited	Set BYOK key in Profile → API Keys
Summary failed, status rolled back	Claude API error	Fix key; pipeline will accept new upload
RAG embed failed (non-fatal)	Cohere key issue	Transcript still usable; `arc kb search` won't find it until re-embed
Send button still blocked after "Summary ready"	Browser cache stale	Refresh the page
"No informational content" on all frames	Webcam-only recording (no screen share)	Expected; vision skips pure webcam frames

References

master-bot/transcript-worker.ts — pipeline orchestrator (Phase 73.2 + 73.4 + 73.5 + 73.6)
master-bot/transcript-vision.ts — Claude vision per frame (Phase 73.4, #380)
master-bot/transcript-summary.ts — Claude Sonnet summary (Phase 73.5, #381)
shared/migrations/052_transcripts.ts — transcripts table
shared/migrations/053_transcript_jobs.ts — job progress tracker
frontend/src/crm/pages/workspace/Composer.jsx — upload chip + SSE progress
API reference: Phase 73 endpoints
Architecture: RAG — transcript doc_type