Voice

Everything that controls how ORBIS hears you and how it speaks back. These live in Settings → Voice (mic, speech-to-text, text-to-speech), and the same options can be set in orbis.yaml for headless setups.

For the why behind the pipeline, see The voice loop.

Microphone

The Voice → Microphone panel manages audio input.

Control	What it does
Permission	macOS mic access. If it's not granted, ORBIS shows a button to request it (or to open System Settings if it was previously denied).
Input device	Choose which microphone to use. Shown only when the audio mode uses a selectable device; in voice-processing mode ORBIS follows the macOS system input instead.
Level meter	A live meter — speak and confirm the bars move. Your single best "is the mic working?" check.

M-series internal mic

The built-in mic on M-series Macs is quiet without hardware AGC. ORBIS applies input gain so normal speech registers; if you use an external mic and it's clipping, prefer a quieter input or external device.

Speech-to-text (STT)

How spoken audio becomes text. Set the backend, plus per-backend options. Config lives under stt: in orbis.yaml; the STT_BACKEND env var overrides stt.backend.

Backend	What it is	When to use
`local` (Whisper)	In-process Whisper on Apple-Silicon (MPS). Segmented per turn — it transcribes after you stop talking, not while. Default.	The private, offline default.
`parakeet`	NVIDIA Parakeet-TDT via Apple MLX (opt-in `[parakeet]` extra). Faster and far fewer silence-hallucinations than Whisper. ~600 MB model on first use; restart to apply.	Best local quality/speed if you can install the extra.
`openai`	An OpenAI-compatible Whisper endpoint.	You want a hosted transcriber.
protoLabs	faster-whisper on the protoLabs gateway (same key as the LLM).	protoLabs-hosted setups.

Keys (stt:):

Key	Backend	Meaning
`backend`	all	`local` · `parakeet` · `openai`
`whisper_model`	`local`	Whisper model id (takes effect on restart).
`model`	`openai`	Remote model id (e.g. `whisper-1`).
`url` / `api_key`	`openai`	Endpoint + key (take effect on the next session).

Text-to-speech (TTS)

How ORBIS's replies become the orb's voice. Set the backend and a voice. Config lives under voice: in orbis.yaml; TTS_BACKEND and KOKORO_VOICE override voice.tts_backend and voice.voice.

Backend	What it is	Voice selection
`kokoro`	The default. Runs on CPU, fully local.	A fixed catalogue (e.g. `af_heart`) — pick from the dropdown.
`openai`	OpenAI-compatible `/v1/audio/speech`.	Type any voice id the endpoint supports.
protoLabs (Fish)	Fish S2-Pro via the protoLabs gateway (same key as the LLM).	`protolabs/fish`.
`fish`	Opt-in local sidecar — cloneable voices, needs a GPU.	Custom.

Keys (voice:):

Key	Meaning
`tts_backend`	`kokoro` · `openai` · `fish` · gateway.
`voice`	Voice id (Kokoro voice, OpenAI voice, ElevenLabs `voice_id`, …).
`tts_url` / `tts_model` / `tts_api_key`	For OpenAI-compatible endpoints.

Voice cache

Kokoro voices are cached after first use; the TTS panel shows which voices are already cached so you know which will start instantly.

Voice ​

Microphone ​

Speech-to-text (STT) ​

Text-to-speech (TTS) ​

See also ​

Voice

Microphone

Speech-to-text (STT)

Text-to-speech (TTS)

See also