Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

ahmetoner/whisper-asr-webservice

Repository files navigation

Release Docker Pulls Build Licence

πŸŽ‰ Join our Discord Community! Connect with other users, get help, and stay updated on the latest features: https://discord.gg/4Q5YVrePzZ

Whisper ASR Box

Whisper ASR Box is a general-purpose speech recognition toolkit. Whisper Models are trained on a large dataset of diverse audio and is also a multitask model that can perform multilingual speech recognition as well as speech translation and language identification.

Features

Current release (v1.9.1) supports following whisper models:

Quick Usage

CPU

docker run -d -p 9000:9000 \
 -e ASR_MODEL=base \
 -e ASR_ENGINE=openai_whisper \
 onerahmet/openai-whisper-asr-webservice:latest

GPU

docker run -d --gpus all -p 9000:9000 \
 -e ASR_MODEL=base \
 -e ASR_ENGINE=openai_whisper \
 onerahmet/openai-whisper-asr-webservice:latest-gpu

Cache

To reduce container startup time by avoiding repeated downloads, you can persist the cache directory:

docker run -d -p 9000:9000 \
 -v $PWD/cache:/root/.cache/ \
 onerahmet/openai-whisper-asr-webservice:latest

Key Features

  • Multiple ASR engines support (OpenAI Whisper, Faster Whisper, WhisperX)
  • Multiple output formats (text, JSON, VTT, SRT, TSV)
  • Word-level timestamps support
  • Voice activity detection (VAD) filtering
  • Speaker diarization (with WhisperX)
  • FFmpeg integration for broad audio/video format support
  • GPU acceleration support
  • Configurable model loading/unloading
  • REST API with Swagger documentation

Environment Variables

Key configuration options:

  • ASR_ENGINE: Engine selection (openai_whisper, faster_whisper, whisperx)
  • ASR_MODEL: Model selection (tiny, base, small, medium, large-v3, etc.)
  • ASR_MODEL_PATH: Custom path to store/load models
  • ASR_DEVICE: Device selection (cuda, cpu)
  • MODEL_IDLE_TIMEOUT: Timeout for model unloading

Documentation

For complete documentation, visit: https://ahmetoner.github.io/whisper-asr-webservice

Development

# Install poetry v2.X
pip3 install poetry
# Install dependencies for cpu
poetry install --extras cpu
# Install dependencies for cuda
poetry install --extras cuda
# Run service
poetry run whisper-asr-webservice --host 0.0.0.0 --port 9000

After starting the service, visit http://localhost:9000 or http://0.0.0.0:9000 in your browser to access the Swagger UI documentation and try out the API endpoints.

Credits

  • This software uses libraries from the FFmpeg project under the LGPLv2.1

AltStyle γ«γ‚ˆγ£γ¦ε€‰ζ›γ•γ‚ŒγŸγƒšγƒΌγ‚Έ (->γ‚ͺγƒͺγ‚ΈγƒŠγƒ«) /