Voice Input

Status: Live since Phase 62 (2026-06-05). Self-hosted whisper.cpp on Contabo — no OpenAI dependency, no audio leaves your server.

Arc OS supports two layers of voice input in the chat composer:

Layer	Requires	Works in
L1 — Web Speech API	Chrome or Edge (Chromium)	All Arc plans
L2 — Self-hosted whisper	Any browser (WebM/WAV MediaRecorder)	Arc Standard Cloud, On-premise

Both layers feed into the same composer text box and are transparent to the AI worker.

Using Voice in Chat

Open any project workspace and focus the chat composer.
Click the microphone button (bottom-left of the composer) or press Ctrl+Shift+V (works on any keyboard layout — the shortcut matches the physical V key, not the layout character).
Speak. A waveform animation + orange border indicate recording.
Click the mic button again (or press the shortcut) to stop.
The transcribed text appears in the composer — edit if needed, then Send.

Continuous mode: L1 keeps listening until you stop; L2 records a single clip and sends it for transcription.

L1 — Web Speech API (Chrome/Edge)

Uses the browser's built-in speech recognition API. Works instantly with no server round-trip.

Locale: Automatically set from your Arc language preference. Supported BCP-47 codes: uk-UA, en-US, de-DE, es-ES, fr-FR, pl-PL, ru-RU, pt-BR.

Permissions: Browser prompts for microphone access on first use. If you accidentally blocked it, go to Site settings → Microphone → Allow for arc-os.co. The Permissions-Policy header on arc-os.co uses microphone=(self) — same-origin access is allowed while cross-origin embeds are locked out.

Limitations:

Requires Chrome or Edge (Chromium-based). Firefox and Safari fall through to L2 automatically.
Transcription quality depends on Google's servers (called by the browser, not by Arc OS).

L2 — Self-hosted whisper.cpp

Arc OS runs a persistent arc-whisper.service on Contabo with the ggml-base model (142 MB) preloaded. When the browser doesn't support the Web Speech API, the composer falls back to this layer automatically:

Browser records a WebM clip via MediaRecorder.
On stop, the clip is POST-ed to /api/crm/voice/transcribe (multipart, max 25 MB).
The server forwards it to the local whisper-server at 127.0.0.1:19214 — audio bytes never leave Contabo.
Transcribed text returns and is inserted into the composer.

Latency: ~3.4 s for an 11-second clip (model warm, 3.1× realtime on the current 6-vCPU EPYC box).

Concurrency: The whisper-server has a 2-slot semaphore. If both slots are busy, the API returns 429 ("server busy") and the UI shows a toast — try again in a few seconds.

Daily Quota

Each user has a soft 60 min/day limit across L2 transcriptions. The server estimates clip duration from upload size (~32 kbps voice codec assumption, ±30% accuracy). When the limit is reached:

The API returns 429 with { "error": "Daily voice quota reached (60 min/day)", "used": <seconds>, "cap": 3600 }
The mic button is disabled for the rest of the day
Quota resets at midnight UTC

L1 (Web Speech API) is not subject to this limit — it uses the browser's own service.

Troubleshooting

Symptom	Likely cause	Fix
Mic button grayed out	Daily quota reached	Wait until midnight UTC
"Microphone access blocked"	Browser blocked mic	Site settings → Microphone → Allow for arc-os.co
L1 not working in Firefox	Firefox doesn't support Web Speech API	Falls back to L2 automatically
429 "server busy"	Two other users transcribing	Retry after ~5 s
Transcript quality poor	Background noise / L2 base model	Speak clearly; Pro transcription (larger model) is a future phase
Ctrl+Shift+V does nothing	Focus not in composer	Click inside the composer textarea first

References

shared/routes/voice.ts — endpoint handler (Phase 62.4, #373)
shared/migrations/051_voice_usage.ts — voice_usage_log quota table
frontend/src/crm/pages/workspace/Composer.jsx — MicButton component
arc-whisper.service on Contabo — whisper.cpp daemon config
API endpoint: POST /api/crm/voice/transcribe