-
Notifications
You must be signed in to change notification settings - Fork 4.6k
-
The announcement of Parakeet (ultra-fast?) was good news until I didcovered I don't have it and don't know how to get it.
Beta Was this translation helpful? Give feedback.
All reactions
Replies: 3 comments 1 reply
-
You need to do two things:
1. Use the latest version of whisper.cpp
Make sure you pull the latest code:
git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp
git pull origin master
make
2. Enable Parakeet using the --decoding-parakeet
flag
Once built, just run the CLI with:
./main -m models/ggml-base.en.q5_1.bin --decoding-parakeet -f samples/jfk.wav
This activates the Parakeet decoder instead of the default decoder.
Want to Compare Speeds?
Try running with and without the flag and measure the time:
./main --decoding-parakeet -f your_audio.wav # With Parakeet ./main -f your_audio.wav # Without Parakeet
Notes:
- Works best with quantized models (Q5_1, Q6_K).
- May still be under tuning — watch for updates or issues.
- If building from scratch:
make clean && make -j
to ensure it's fully updated.
Hope this helps.
Beta Was this translation helpful? Give feedback.
All reactions
-
Not sure that helps. I am using the GUI app downloaded pre-built, rather than command line. As a retired software engineer, I could probably do this, though I'm a bit rusty after eleven years of retirement. However, I would rather not have to download and build a separate command-line version.
Beta Was this translation helpful? Give feedback.
All reactions
-
If you're using the GUI version, Parakeet isn't available yet — it's currently only supported in the CLI (main
) binary via the --decoding-parakeet flag
. You'd need to build from source or use a precompiled CLI binary to try it. Hopefully GUI support will be added soon!
Beta Was this translation helpful? Give feedback.
All reactions
-
@officiallyutso these instructions didn't work for using Parakeet via the CLI. Everything i have tried fails.
Firstly, models/ggml-base.en.q5_1.bin does not exist. How do I get it?
Also, calling 'main' returns an error:
./build/bin/main -m models/ggml-base.en.q5_1.bin --decoding-parakeet -f samples/jfk.wav
WARNING: The binary 'main' is deprecated.
Please use 'whisper-cli' instead.
See https://github.com/ggerganov/whisper.cpp/tree/master/examples/deprecation-warning/README.md for more information.
So, I switched to whisper-cli, but that fails also:
./build/bin/whisper-cli -m models/ggml-base.en.bin --decoding-parakeet -f samples/jfk.wav
error: unknown argument: --decoding-parakeet
usage: ./build/bin/whisper-cli [options] file0 file1 ...
supported audio formats: flac, mp3, ogg, wav
options:
-h, --help [default] show this help message and exit
-t N, --threads N [4 ] number of threads to use during computation
-p N, --processors N [1 ] number of processors to use during computation
-ot N, --offset-t N [0 ] time offset in milliseconds
-on N, --offset-n N [0 ] segment index offset
-d N, --duration N [0 ] duration of audio to process in milliseconds
-mc N, --max-context N [-1 ] maximum number of text context tokens to store
-ml N, --max-len N [0 ] maximum segment length in characters
-sow, --split-on-word [false ] split on word rather than on token
-bo N, --best-of N [5 ] number of best candidates to keep
-bs N, --beam-size N [5 ] beam size for beam search
-ac N, --audio-ctx N [0 ] audio context size (0 - all)
-wt N, --word-thold N [0.01 ] word timestamp probability threshold
-et N, --entropy-thold N [2.40 ] entropy threshold for decoder fail
-lpt N, --logprob-thold N [-1.00 ] log probability threshold for decoder fail
-nth N, --no-speech-thold N [0.60 ] no speech threshold
-tp, --temperature N [0.00 ] The sampling temperature, between 0 and 1
-tpi, --temperature-inc N [0.20 ] The increment of temperature, between 0 and 1
-debug, --debug-mode [false ] enable debug mode (eg. dump log_mel)
-tr, --translate [false ] translate from source language to english
-di, --diarize [false ] stereo audio diarization
-tdrz, --tinydiarize [false ] enable tinydiarize (requires a tdrz model)
-nf, --no-fallback [false ] do not use temperature fallback while decoding
-otxt, --output-txt [false ] output result in a text file
-ovtt, --output-vtt [false ] output result in a vtt file
-osrt, --output-srt [false ] output result in a srt file
-olrc, --output-lrc [false ] output result in a lrc file
-owts, --output-words [false ] output script for generating karaoke video
-fp, --font-path [/System/Library/Fonts/Supplemental/Courier New Bold.ttf] path to a monospace font for karaoke video
-ocsv, --output-csv [false ] output result in a CSV file
-oj, --output-json [false ] output result in a JSON file
-ojf, --output-json-full [false ] include more information in the JSON file
-of FNAME, --output-file FNAME [ ] output file path (without file extension)
-np, --no-prints [false ] do not print anything other than the results
-ps, --print-special [false ] print special tokens
-pc, --print-colors [false ] print colors
--print-confidence [false ] print confidence
-pp, --print-progress [false ] print progress
-nt, --no-timestamps [false ] do not print timestamps
-l LANG, --language LANG [en ] spoken language ('auto' for auto-detect)
-dl, --detect-language [false ] exit after automatically detecting language
--prompt PROMPT [ ] initial prompt (max n_text_ctx/2 tokens)
-m FNAME, --model FNAME [models/ggml-base.en.bin] model path
-f FNAME, --file FNAME [ ] input audio file path
-oved D, --ov-e-device DNAME [CPU ] the OpenVINO device used for encode inference
-dtw MODEL --dtw MODEL [ ] compute token-level timestamps
-ls, --log-score [false ] log best decoder scores of tokens
-ng, --no-gpu [false ] disable GPU
-fa, --flash-attn [false ] flash attention
-sns, --suppress-nst [false ] suppress non-speech tokens
--suppress-regex REGEX [ ] regular expression matching tokens to suppress
--grammar GRAMMAR [ ] GBNF grammar to guide decoding
--grammar-rule RULE [ ] top-level GBNF grammar rule name
--grammar-penalty N [100.0 ] scales down logits of nongrammar tokens
Voice Activity Detection (VAD) options:
--vad [false ] enable Voice Activity Detection (VAD)
-vm FNAME, --vad-model FNAME [ ] VAD model path
-vt N, --vad-threshold N [0.50 ] VAD threshold for speech recognition
-vspd N, --vad-min-speech-duration-ms N [250 ] VAD min speech duration (0.0-1.0)
-vsd N, --vad-min-silence-duration-ms N [100 ] VAD min silence duration (to split segments)
-vmsd N, --vad-max-speech-duration-s N [FLT_MAX] VAD max speech duration (auto-split longer)
-vp N, --vad-speech-pad-ms N [30 ] VAD speech padding (extend segments)
-vo N, --vad-samples-overlap N [0.10 ] VAD samples overlap (seconds between segments)
Beta Was this translation helpful? Give feedback.