Voice Input
Status: Live since Phase 62 (2026-06-05). Self-hosted whisper.cpp on Contabo — no OpenAI dependency, no audio leaves your server.
Arc OS supports two layers of voice input in the chat composer:
| Layer | Requires | Works in |
|---|---|---|
| L1 — Web Speech API | Chrome or Edge (Chromium) | All Arc plans |
| L2 — Self-hosted whisper | Any browser (WebM/WAV MediaRecorder) | Arc Standard Cloud, On-premise |
Both layers feed into the same composer text box and are transparent to the AI worker.
Using Voice in Chat
- Open any project workspace and focus the chat composer.
- Click the microphone button (bottom-left of the composer) or press Ctrl+Shift+V (works on any keyboard layout — the shortcut matches the physical V key, not the layout character).
- Speak. A waveform animation + orange border indicate recording.
- Click the mic button again (or press the shortcut) to stop.
- The transcribed text appears in the composer — edit if needed, then Send.
Continuous mode: L1 keeps listening until you stop; L2 records a single clip and sends it for transcription.
L1 — Web Speech API (Chrome/Edge)
Uses the browser's built-in speech recognition API. Works instantly with no server round-trip.
Locale: Automatically set from your Arc language preference. Supported BCP-47 codes: uk-UA, en-US, de-DE, es-ES, fr-FR, pl-PL, ru-RU, pt-BR.
Permissions: Browser prompts for microphone access on first use. If you accidentally blocked it, go to Site settings → Microphone → Allow for arc-os.co. The Permissions-Policy header on arc-os.co uses microphone=(self) — same-origin access is allowed while cross-origin embeds are locked out.
Limitations:
- Requires Chrome or Edge (Chromium-based). Firefox and Safari fall through to L2 automatically.
- Transcription quality depends on Google's servers (called by the browser, not by Arc OS).
L2 — Self-hosted whisper.cpp
Arc OS runs a persistent arc-whisper.service on Contabo with the ggml-base model (142 MB) preloaded. When the browser doesn't support the Web Speech API, the composer falls back to this layer automatically:
- Browser records a WebM clip via
MediaRecorder. - On stop, the clip is
POST-ed to/api/crm/voice/transcribe(multipart, max 25 MB). - The server forwards it to the local whisper-server at
127.0.0.1:19214— audio bytes never leave Contabo. - Transcribed text returns and is inserted into the composer.
Latency: ~3.4 s for an 11-second clip (model warm, 3.1× realtime on the current 6-vCPU EPYC box).
Concurrency: The whisper-server has a 2-slot semaphore. If both slots are busy, the API returns 429 ("server busy") and the UI shows a toast — try again in a few seconds.
Daily Quota
Each user has a soft 60 min/day limit across L2 transcriptions. The server estimates clip duration from upload size (~32 kbps voice codec assumption, ±30% accuracy). When the limit is reached:
- The API returns
429with{ "error": "Daily voice quota reached (60 min/day)", "used": <seconds>, "cap": 3600 } - The mic button is disabled for the rest of the day
- Quota resets at midnight UTC
L1 (Web Speech API) is not subject to this limit — it uses the browser's own service.
Troubleshooting
| Symptom | Likely cause | Fix |
|---|---|---|
| Mic button grayed out | Daily quota reached | Wait until midnight UTC |
| "Microphone access blocked" | Browser blocked mic | Site settings → Microphone → Allow for arc-os.co |
| L1 not working in Firefox | Firefox doesn't support Web Speech API | Falls back to L2 automatically |
| 429 "server busy" | Two other users transcribing | Retry after ~5 s |
| Transcript quality poor | Background noise / L2 base model | Speak clearly; Pro transcription (larger model) is a future phase |
| Ctrl+Shift+V does nothing | Focus not in composer | Click inside the composer textarea first |
References
shared/routes/voice.ts— endpoint handler (Phase 62.4, #373)shared/migrations/051_voice_usage.ts—voice_usage_logquota tablefrontend/src/crm/pages/workspace/Composer.jsx— MicButton componentarc-whisper.serviceon Contabo — whisper.cpp daemon config- API endpoint:
POST /api/crm/voice/transcribe