Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Feature Request: Add SenseVoice/FunASR as STT option #40

Open

Description

Hi! Verbi's modular architecture for experimenting with different STT/LLM/TTS components is excellent.

I'd like to suggest adding SenseVoice as a new STT option. It fits Verbi's modular philosophy well:

Why SenseVoice:

  • 5x faster than Whisper large-v3 (234M non-autoregressive model)
  • Emotion detection built-in — useful for adjusting assistant behavior
  • Audio event detection — laugh, applause, music, cough, etc.
  • 50+ languages with strong Chinese/English/Japanese/Korean support
  • Simple pip install funasr — no extra dependencies

Integration example:

from funasr import AutoModel
model = AutoModel(model="iic/SenseVoiceSmall")
result = model.generate(input="audio.wav")
text = result[0]["text"]

Or via OpenAI-compatible API:

funasr-server --device cuda
# POST http://localhost:8000/v1/audio/transcriptions

Would be a great addition to the existing Deepgram/AssemblyAI/Groq STT options.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

      Relationships

      None yet

      Development

      No branches or pull requests

      Issue actions

        AltStyle によって変換されたページ (->オリジナル) /