Meeting Transcription + Analysis
Status: Live since Phase 73 (2026-06-06). Upload audio/video recordings to the chat composer — Arc OS transcribes, analyzes visuals, summarizes, and embeds into project search.
Overview
The meeting transcription pipeline turns a raw recording into searchable, structured knowledge without leaving Arc OS:
Upload (Transcripts page or composer paperclip)
→ Groq whisper-large-v3 (~100× realtime, $0.111/audio-hour)
→ Claude vision on key frames (video only — slide content, screen shares)
→ Claude Sonnet summary (tldr, action items, decisions, topics)
→ Cohere RAG embed (transcript searchable via arc kb search)
→ done (source file deleted from disk)
Cost estimate (BYOK Anthropic key): ~$0.20–0.45 per 30-minute meeting (vision + summary) + ~$0.056 Groq transcription. Feature requires a paid plan (Starter / Starter Cloud / Beta).
Uploading a Recording
- Open any project workspace.
- Click the paperclip icon in the chat composer.
- Select an audio or video file. Supported formats:
- Video: mp4, mov, webm, mkv, m4v
- Audio: mp3, wav, m4a, aac, ogg, opus, flac
- Max size: 1 GB
- A progress chip appears in the composer showing the current step and percentage.
- When the chip reaches 100% (Summary ready) the Send button ungates — you can send a message referencing the transcript immediately.
- After a brief additional step (RAG indexing), the status becomes done and the source file is deleted.
While processing: You can type other messages in the composer. Only media attachments that are still processing block the Send button.
What Gets Injected into Chat
When you send a message containing a completed transcript attachment, the transcript text is automatically appended to your message:
Your message text here
--- TRANSCRIPT: meeting-2026-06-06.mp4 (video) ---
[whisper transcript text...]
--- END TRANSCRIPT ---
The AI worker sees the full transcript inline, like any other document. You can ask questions about it, request summaries, or have the worker extract action items.
Pipeline Steps in Detail
Step 1 — Audio extraction (ffmpeg)
For video files, Arc OS strips the audio track to a 16 kHz mono WAV. For audio files, this is a format conversion only.
Step 2 — Transcription (Groq whisper-large-v3)
The WAV is sent to the Groq API (whisper-large-v3 model). Speed: ~100× realtime (11 s clip → under 1 s). The platform GROQ_API_KEY is configured in Platform Settings → Transcription by an admin. Multilingual — Ukrainian, English, and all other major languages are supported out of the box.
Step 3 — Scene-change frame extraction (video only)
ffmpeg samples frames at scene-change points (threshold 0.4, sensitivity from 0 = every frame to 1 = never). Timestamps are recorded to timestamps.json so frame descriptions map back to exact video positions.
Maximum frames per video: 50 (~$0.15 worst case in Claude vision costs).
Step 4 — Claude vision analysis (video only)
Each sampled frame is sent to Claude Sonnet vision with a prompt focused on:
- Slides and presentation content (titles, key points, charts)
- Screen shares (apps, code, dashboards, visible URLs)
- Diagrams (boxes, arrows, labels)
- Visible text (UI labels, headings, important numbers)
Webcam-only frames or frames with no informational content are marked "No informational content" and contribute nothing to the summary.
Frame failures are non-fatal — the description becomes [vision failed: <reason>] and the pipeline continues.
Step 5 — Claude Sonnet summary
The transcript (and vision frame descriptions if available) are sent to Claude Sonnet. The summary is stored as structured JSON:
{
"tldr": "Team reviewed Q3 revenue forecast and decided to ship the new feature by end of month.",
"key_points": [
"Q3 revenue is 15% above projection",
"Two blockers remain on the backend API"
],
"action_items": [
{ "task": "Fix auth endpoint before Thursday", "owner": "Serhii" },
{ "task": "Update stakeholder deck", "owner": null }
],
"decisions": [
"Ship feature to production on June 30"
],
"topics": ["Q3 revenue", "product launch", "API blockers"],
"model": "claude-sonnet-4-5",
"generated_at": "2026-06-06T10:42:00Z"
}
Summary failures roll the transcript back to its previous state so the raw text is preserved. You can retry after fixing your Anthropic key.
Step 6 — RAG embedding (Phase 73.6)
The full text (transcript + frame descriptions + summary) is chunked and embedded via Cohere embed-multilingual-v3.0 into the project's vector index. After this step:
- The recording is searchable via
arc kb searchand the in-chat knowledge tool - The source file is deleted from disk (CEO decision D4 — transcript table is the source of truth)
- Status becomes
done
To skip RAG indexing: pass embed_to_rag=false in the upload form (API only; the UI always embeds by default).
BYOK API Key Setup
Claude vision and summary use your Anthropic API key:
- Go to Profile → API Keys → Anthropic Key.
- Paste your key. It is encrypted with AES-256-GCM before storage.
- If no personal key is configured, the platform's shared key is used (subject to platform rate limits).
Groq transcription uses the platform GROQ_API_KEY — no personal key needed for transcription.
Searching Transcripts
Once a transcript reaches done, it is findable via semantic search:
arc kb search "action items from last standup"
arc kb search "decision about Q3 launch"
From the chat composer, the AI worker can answer questions grounded in your transcripts automatically via the ask_notebooklm tool (which queries the same RAG index).
You can narrow to transcripts only via the doc_types param:
GET /api/crm/projects/:name/rag/search?q=...&doc_types=transcript
Costs and Limits
| Resource | Limit | Notes |
|---|---|---|
| Upload size | 1 GB | Per-file |
| Vision frames | 50 max | Hard cap; excess frames dropped |
| Claude vision cost | ~$0.003/frame | At typical 1024×768 JPEG resolution |
| Claude summary cost | ~$0.005–0.02 | Depends on transcript length (≤60 000 chars fed to summary) |
| Groq transcription | $0.111/audio-hour | Platform key; ~$0.056 per 30 min meeting |
| Monthly cap (Starter) | 60 min | Resets on 1st of each month |
| Monthly cap (Starter Cloud) | 300 min | Resets on 1st of each month |
| Monthly cap (Beta) | 1 200 min | Resets on 1st of each month |
| Concurrent jobs | 1 | Jobs queue; one transcription at a time per server |
Troubleshooting
| Symptom | Likely cause | Fix |
|---|---|---|
| Chip stuck at "Transcribing" for >2 min | Groq API timeout or long recording | Check GROQ_API_KEY in Platform Settings; jobs queue |
Vision step shows [vision failed: ...] |
Anthropic key missing or rate-limited | Set BYOK key in Profile → API Keys |
| Summary failed, status rolled back | Claude API error | Fix key; pipeline will accept new upload |
| RAG embed failed (non-fatal) | Cohere key issue | Transcript still usable; arc kb search won't find it until re-embed |
| Send button still blocked after "Summary ready" | Browser cache stale | Refresh the page |
| "No informational content" on all frames | Webcam-only recording (no screen share) | Expected; vision skips pure webcam frames |
References
master-bot/transcript-worker.ts— pipeline orchestrator (Phase 73.2 + 73.4 + 73.5 + 73.6)master-bot/transcript-vision.ts— Claude vision per frame (Phase 73.4, #380)master-bot/transcript-summary.ts— Claude Sonnet summary (Phase 73.5, #381)shared/migrations/052_transcripts.ts— transcripts tableshared/migrations/053_transcript_jobs.ts— job progress trackerfrontend/src/crm/pages/workspace/Composer.jsx— upload chip + SSE progress- API reference: Phase 73 endpoints
- Architecture: RAG — transcript doc_type