-
Notifications
You must be signed in to change notification settings - Fork 217
Feature Request: Add SenseVoice/FunASR as STT option #40
Open
Description
Hi! Verbi's modular architecture for experimenting with different STT/LLM/TTS components is excellent.
I'd like to suggest adding SenseVoice as a new STT option. It fits Verbi's modular philosophy well:
Why SenseVoice:
- 5x faster than Whisper large-v3 (234M non-autoregressive model)
- Emotion detection built-in — useful for adjusting assistant behavior
- Audio event detection — laugh, applause, music, cough, etc.
- 50+ languages with strong Chinese/English/Japanese/Korean support
- Simple
pip install funasr— no extra dependencies
Integration example:
from funasr import AutoModel model = AutoModel(model="iic/SenseVoiceSmall") result = model.generate(input="audio.wav") text = result[0]["text"]
Or via OpenAI-compatible API:
funasr-server --device cuda
# POST http://localhost:8000/v1/audio/transcriptionsWould be a great addition to the existing Deepgram/AssemblyAI/Groq STT options.
- FunASR: https://github.com/modelscope/FunASR (16K+ stars)
- SenseVoice: https://github.com/FunAudioLLM/SenseVoice (8K+ stars)
Metadata
Metadata
Assignees
Labels
No labels