Skip to content

Voice

Everything that controls how ORBIS hears you and how it speaks back. These live in Settings → Voice (mic, speech-to-text, text-to-speech), and the same options can be set in orbis.yaml for headless setups.

For the why behind the pipeline, see The voice loop.

Microphone

The Voice → Microphone panel manages audio input.

ControlWhat it does
PermissionmacOS mic access. If it's not granted, ORBIS shows a button to request it (or to open System Settings if it was previously denied).
Input deviceChoose which microphone to use. Shown only when the audio mode uses a selectable device; in voice-processing mode ORBIS follows the macOS system input instead.
Level meterA live meter — speak and confirm the bars move. Your single best "is the mic working?" check.

M-series internal mic

The built-in mic on M-series Macs is quiet without hardware AGC. ORBIS applies input gain so normal speech registers; if you use an external mic and it's clipping, prefer a quieter input or external device.

Speech-to-text (STT)

How spoken audio becomes text. Set the backend, plus per-backend options. Config lives under stt: in orbis.yaml; the STT_BACKEND env var overrides stt.backend.

BackendWhat it isWhen to use
local (Whisper)In-process Whisper on Apple-Silicon (MPS). Segmented per turn — it transcribes after you stop talking, not while. Default.The private, offline default.
parakeetNVIDIA Parakeet-TDT via Apple MLX (opt-in [parakeet] extra). Faster and far fewer silence-hallucinations than Whisper. ~600 MB model on first use; restart to apply.Best local quality/speed if you can install the extra.
openaiAn OpenAI-compatible Whisper endpoint.You want a hosted transcriber.
protoLabsfaster-whisper on the protoLabs gateway (same key as the LLM).protoLabs-hosted setups.

Keys (stt:):

KeyBackendMeaning
backendalllocal · parakeet · openai
whisper_modellocalWhisper model id (takes effect on restart).
modelopenaiRemote model id (e.g. whisper-1).
url / api_keyopenaiEndpoint + key (take effect on the next session).

Text-to-speech (TTS)

How ORBIS's replies become the orb's voice. Set the backend and a voice. Config lives under voice: in orbis.yaml; TTS_BACKEND and KOKORO_VOICE override voice.tts_backend and voice.voice.

BackendWhat it isVoice selection
kokoroThe default. Runs on CPU, fully local.A fixed catalogue (e.g. af_heart) — pick from the dropdown.
openaiOpenAI-compatible /v1/audio/speech.Type any voice id the endpoint supports.
protoLabs (Fish)Fish S2-Pro via the protoLabs gateway (same key as the LLM).protolabs/fish.
fishOpt-in local sidecar — cloneable voices, needs a GPU.Custom.

Keys (voice:):

KeyMeaning
tts_backendkokoro · openai · fish · gateway.
voiceVoice id (Kokoro voice, OpenAI voice, ElevenLabs voice_id, …).
tts_url / tts_model / tts_api_keyFor OpenAI-compatible endpoints.

Voice cache

Kokoro voices are cached after first use; the TTS panel shows which voices are already cached so you know which will start instantly.

See also