Voice Input

Status: Live since Phase 62 (2026-06-05). Self-hosted whisper.cpp on Contabo — no OpenAI dependency, no audio leaves your server.

Arc OS supports two layers of voice input in the chat composer:

Layer Requires Works in
L1 — Web Speech API Chrome or Edge (Chromium) All Arc plans
L2 — Self-hosted whisper Any browser (WebM/WAV MediaRecorder) Arc Standard Cloud, On-premise

Both layers feed into the same composer text box and are transparent to the AI worker.


Using Voice in Chat

  1. Open any project workspace and focus the chat composer.
  2. Click the microphone button (bottom-left of the composer) or press Ctrl+Shift+V (works on any keyboard layout — the shortcut matches the physical V key, not the layout character).
  3. Speak. A waveform animation + orange border indicate recording.
  4. Click the mic button again (or press the shortcut) to stop.
  5. The transcribed text appears in the composer — edit if needed, then Send.

Continuous mode: L1 keeps listening until you stop; L2 records a single clip and sends it for transcription.


L1 — Web Speech API (Chrome/Edge)

Uses the browser's built-in speech recognition API. Works instantly with no server round-trip.

Locale: Automatically set from your Arc language preference. Supported BCP-47 codes: uk-UA, en-US, de-DE, es-ES, fr-FR, pl-PL, ru-RU, pt-BR.

Permissions: Browser prompts for microphone access on first use. If you accidentally blocked it, go to Site settings → Microphone → Allow for arc-os.co. The Permissions-Policy header on arc-os.co uses microphone=(self) — same-origin access is allowed while cross-origin embeds are locked out.

Limitations:


L2 — Self-hosted whisper.cpp

Arc OS runs a persistent arc-whisper.service on Contabo with the ggml-base model (142 MB) preloaded. When the browser doesn't support the Web Speech API, the composer falls back to this layer automatically:

  1. Browser records a WebM clip via MediaRecorder.
  2. On stop, the clip is POST-ed to /api/crm/voice/transcribe (multipart, max 25 MB).
  3. The server forwards it to the local whisper-server at 127.0.0.1:19214 — audio bytes never leave Contabo.
  4. Transcribed text returns and is inserted into the composer.

Latency: ~3.4 s for an 11-second clip (model warm, 3.1× realtime on the current 6-vCPU EPYC box).

Concurrency: The whisper-server has a 2-slot semaphore. If both slots are busy, the API returns 429 ("server busy") and the UI shows a toast — try again in a few seconds.


Daily Quota

Each user has a soft 60 min/day limit across L2 transcriptions. The server estimates clip duration from upload size (~32 kbps voice codec assumption, ±30% accuracy). When the limit is reached:

L1 (Web Speech API) is not subject to this limit — it uses the browser's own service.


Troubleshooting

Symptom Likely cause Fix
Mic button grayed out Daily quota reached Wait until midnight UTC
"Microphone access blocked" Browser blocked mic Site settings → Microphone → Allow for arc-os.co
L1 not working in Firefox Firefox doesn't support Web Speech API Falls back to L2 automatically
429 "server busy" Two other users transcribing Retry after ~5 s
Transcript quality poor Background noise / L2 base model Speak clearly; Pro transcription (larger model) is a future phase
Ctrl+Shift+V does nothing Focus not in composer Click inside the composer textarea first

References