-
Notifications
You must be signed in to change notification settings - Fork 0
Voice
Voice lets blumi — a local-first, bring-your-own-key (BYOK) AI coding agent — speak its replies (text-to-speech, TTS) and accept spoken input (speech-to-text, STT). It works in the web UI and
the blugo phone app; configuration lives in the voice section of
settings.json and is editable from the in-app Control Center → Voice.
TL;DR — Voice key facts
- Two directions. Voice covers both TTS (hear blumi's replies aloud) and STT (talk to blumi by voice).
- TTS providers. Text-to-speech runs through ElevenLabs (recommended) or OpenAI (or an OpenAI-compatible endpoint).
- STT provider. Speech-to-text uses an OpenAI-compatible Whisper endpoint.
- Bring your own key (BYOK). Voice needs your own provider API key(s); it is the one feature beyond the LLM that may require a key.
- Keys stay on your machine. TTS is synthesized on the gateway (which holds the key) and streamed to the phone, and keys are write-only over the API — saved but never returned.
- Optional. Voice is entirely optional — everything else in blumi works without it.
To hear blumi speak, enable text-to-speech (TTS) and pick a provider. Two providers are supported:
- Control Center → Voice → enable, pick provider elevenlabs.
- Paste your ElevenLabs API key.
- Tap "Authenticate & load voices" — this validates the key and fills a dropdown of your account's voices. Pick one.
- Save. Tap the 🔊 on any assistant message to hear it.
Pick provider openai, paste a TTS API key, and set a voice (e.g. alloy). Save.
Equivalent settings.json:
"voice": { "enabled": true, "tts_provider": "elevenlabs", "tts_api_key": "...", "tts_voice": "<voice_id>", "tts_model": "eleven_multilingual_v2" }
To talk to blumi, use speech-to-text (STT), which transcribes your microphone via an OpenAI-compatible Whisper endpoint. In Control Center → Voice, set the Mic key (and the app fills in the Whisper endpoint/model). Then tap the 🎤 in the composer, speak, and the transcript is dropped into the message box.
"voice": { "voice_api_key": "sk-...", "stt_base_url": "https://api.openai.com/v1", "stt_model": "whisper-1" }
- Keys are write-only over the API: the app shows
saved ✓but never returns the stored key. To change a voice later, re-enter the key to re-authenticate and reload the dropdown. - TTS is synthesized on the gateway (which holds the key) and streamed to the phone, so the key stays on your machine.
- Voice is optional — everything else works without it.
blumi uses ElevenLabs or OpenAI (or an OpenAI-compatible endpoint) for text-to-speech (TTS), and an OpenAI-compatible Whisper endpoint for speech-to-text (STT). ElevenLabs is the recommended TTS provider because the app can validate your key and load a dropdown of your account's voices.
Yes. Voice is bring-your-own-key (BYOK): you supply your own provider API key for TTS, STT, or both. It is the one feature beyond the LLM that may require a key — blumi's memory and code search do not.
Keys stay on your machine. TTS is synthesized on the gateway (which holds the key) and streamed to the phone, so the key never leaves your machine. Keys are also write-only over the API — the app shows saved ✓ but never returns the stored key, so to change a voice later you re-enter the key to re-authenticate and reload the voice dropdown.
Both. Voice works in the web UI and in the blugo phone app. You configure it from the in-app Control Center → Voice, or directly in the voice section of settings.json.
No. Voice is entirely optional — everything else in blumi works without it. Enable it only when you want to hear replies (TTS) or speak your input (STT).