If you find our tools helpful, please consider supporting us!
Local Ubuntu-first text-to-speech daemon with a stable REST API, model lifecycle management, and systemd --user operation.
- Default engine is
mockfor local development and API validation. - Real
sherpa-onnxGo runtime is integrated; useconfig/config.sherpa.yamlto run with native inference. - KO/ZH/JA/EN runtime uses installed models from local manifest plus remote list from sherpa release.
Runtime dependencies for ./bin/tts on Ubuntu:
libc6libstdc++6libgcc-s1ca-certificates(for downloading remote models)
Optional runtime dependencies:
alsa-utils(for local speaker playback viaaplay)
Build-time dependencies (if building from source):
golang(Go toolchain)gcc(required for cgo link step with sherpa runtime)task(Taskfile runner, only if usingbuild.sh/ Taskfile workflows)
Ubuntu install example:
sudo apt update sudo apt install -y libc6 libstdc++6 libgcc-s1 ca-certificates alsa-utils golang gcc
-
Sherpa ONNX TTS docs:
-
Sherpa ONNX TTS pretrained models:
-
Sherpa ONNX full TTS catalog:
-
Sherpa ONNX Go bindings:
Pinned model artifact URLs used in this repo:
- Korean (Mimic3 VITS):
- Chinese (Piper):
- Japanese (Kokoro int8 multi-lang):
- English (Kitten):
linux-tts-onnx currently builds with dynamic linking (not fully static).
Sherpa runtime shared libraries are provided by the Go module at:
$(go env GOPATH)/pkg/mod/github.com/k2-fsa/sherpa-onnx-go-linux@<version>/lib/x86_64-unknown-linux-gnu/
Expected files:
libsherpa-onnx-c-api.solibonnxruntime.so
Quick checks:
- list libs:
ls "$(go env GOPATH)"/pkg/mod/github.com/k2-fsa/sherpa-onnx-go-linux@*/lib/x86_64-unknown-linux-gnu - inspect linkage:
ldd ./bin/tts
Downloaded models are stored under:
~/.local/share/tts-onnx/models
Current language model paths:
- English:
~/.local/share/tts-onnx/models/en/<version>/<model-id>/
- Korean:
~/.local/share/tts-onnx/models/ko/<version>/
- Chinese:
~/.local/share/tts-onnx/models/zh/<version>/
- Japanese:
~/.local/share/tts-onnx/models/ja/<version>/<model-id>/
Voice and model asset files are inside each model directory, e.g.:
voices.bintokens.txtmodel.onnxormodel.fp16.onnxespeak-ng-data/
Installed model state is tracked in:
~/.local/share/tts-onnx/models/manifest.json
Quick checks:
- Installed models from service:
curl -fsS http://127.0.0.1:18741/v1/models
- Build and run:
go build -o ./bin/tts ./cmd/tts./bin/tts --service --config ./config/config.example.yaml
- Check health:
curl http://127.0.0.1:18741/v1/health
- Speak test:
curl -X POST http://127.0.0.1:18741/v1/speak -H 'content-type: application/json' -d '{"text":"hello world","lang":"en","format":"wav"}' --output out.wav
- Speaker playback (service):
- In
config/config.sherpa.yaml,play_on_speak: trueplays audio on the host speaker immediately when/v1/speakis called.
- In
Service mode (same tts binary):
--config(default:./config/config.sherpa.yaml): config file path.--service: run HTTP daemon mode
Direct CLI (cmd/tts/main.go):
- All flags use double-dash form (example:
--voice-list). - Running
./bin/ttswith no arguments prints help automatically. --lang(optional for synthesis): explicit language bucket for model selection; if omitted,ttsinfers from--voiceselector or falls back to first installed model--voice(optional): installed model selector (id/version) or numeric speaker id (sid)--format(default:wav):wav|pcm_s16le--out(optional): output path; file is written only when this is set--config(default:./config/config.sherpa.yaml)--rate(default:1.0)--sample-rate(default:0, optional override)--request-id(optional): correlation/cancel id--no-play(default:false): disable immediate speaker playback--voice-list: list installed models and voices (all languages by default, or one language with--lang); when voice names are unavailable, it shows numeric sid range--remote-models: list online TTS model packages from sherpa-onnx release--install-remote-id: download+extract model from remote list (language inferred from remote model ID)--menu: interactive language/model/voice selector--auto-install(default:true): used with--menuwhen selected model is not installed- positional
text...: synthesis input text
Examples:
./bin/tts "Sentence test without explicit language"./bin/tts --voice kitten-nano-en-v0_1-fp16 "Sentence test"./bin/tts --voice-list --lang en./bin/tts --remote-models --lang en./bin/tts --install-remote-id kitten-nano-en-v0_1-fp16./bin/tts --menu
If you want to run tts without ./bin/ prefix:
sudo ln -sf "$(pwd)/bin/tts" /usr/local/bin/tts- or add
./binto yourPATH
Model selection behavior:
- If
--voicematches an installed modelidorversion, that model is used for synthesis and language is inferred from that model. - Otherwise,
--voiceis treated as numeric speaker id (sid) for the selected model. - If no model is specified,
ttsuses the first installed model for selected language; when--langis omitted, it uses the first installed model across all languages.
Base URL: http://127.0.0.1:18741/v1
Auth behavior (internal/httpapi/server.go):
- If
bearer_tokenis empty, no auth is required. - If
bearer_tokenis set,/v1/models,/v1/models/install,/v1/models/{lang}/{version},/v1/speak,/v1/stop, and/v1/metricsrequireAuthorization: Bearer <token>. /v1/health,/v1/capabilities, and/remain accessible without auth.
Endpoints:
GET /v1/healthGET /v1/capabilitiesGET /v1/modelsPOST /v1/models/installDELETE /v1/models/{lang}/{version}?force=true|falsePOST /v1/speakPOST /v1/stopGET /v1/metricsGET /(root sanity endpoint returning{status,time})
Common request fields:
/v1/models/install:lang(required),url(required),model_id,checksum,version/v1/speak:text,lang(optional),voice,rate,format,sample_rate,request_id/v1/stop:request_id
/v1/speak response:
200 OKwith binary audio bodyContent-Type:audio/wavorapplication/octet-streamX-Sample-Rate: output sample rate
bash ./test.sh- Current behavior: downloads KO/ZH/JA/EN test models and plays samples through speaker (
aplay) without writing WAV files.
task dev:runtask test:unittask service:install-user-unittask service:enabletask release:buildtask release:package VERSION=v0.1.0
cmd/tts: single entrypoint for CLI + service modecmd/runtime-check: native runtime visibility check helperinternal/httpapi: REST handlers and error modelinternal/modelmgr: model install/delete + manifestinternal/synth: synthesis queue/engine abstraction and mock audio generationdeploy/systemd:systemd --userunitcmd/tts: direct CLI for non-service synthesis
- Full API reference:
API_FULL.md