Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

How do I get "Parakeet"? #3312

Unanswered
WGroleau asked this question in Q&A
Jul 8, 2025 · 3 comments · 1 reply
Discussion options

The announcement of Parakeet (ultra-fast?) was good news until I didcovered I don't have it and don't know how to get it.

You must be logged in to vote

Replies: 3 comments 1 reply

Comment options

You need to do two things:

1. Use the latest version of whisper.cpp

Make sure you pull the latest code:

git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp
git pull origin master
make

2. Enable Parakeet using the --decoding-parakeet flag

Once built, just run the CLI with:

./main -m models/ggml-base.en.q5_1.bin --decoding-parakeet -f samples/jfk.wav

This activates the Parakeet decoder instead of the default decoder.


Want to Compare Speeds?

Try running with and without the flag and measure the time:

./main --decoding-parakeet -f your_audio.wav # With Parakeet
./main -f your_audio.wav # Without Parakeet

Notes:

  • Works best with quantized models (Q5_1, Q6_K).
  • May still be under tuning — watch for updates or issues.
  • If building from scratch: make clean && make -j to ensure it's fully updated.

Hope this helps.

You must be logged in to vote
0 replies
Comment options

Not sure that helps. I am using the GUI app downloaded pre-built, rather than command line. As a retired software engineer, I could probably do this, though I'm a bit rusty after eleven years of retirement. However, I would rather not have to download and build a separate command-line version.

You must be logged in to vote
1 reply
Comment options

If you're using the GUI version, Parakeet isn't available yet — it's currently only supported in the CLI (main) binary via the --decoding-parakeet flag. You'd need to build from source or use a precompiled CLI binary to try it. Hopefully GUI support will be added soon!

Comment options

@officiallyutso these instructions didn't work for using Parakeet via the CLI. Everything i have tried fails.
Firstly, models/ggml-base.en.q5_1.bin does not exist. How do I get it?

Also, calling 'main' returns an error:
./build/bin/main -m models/ggml-base.en.q5_1.bin --decoding-parakeet -f samples/jfk.wav

WARNING: The binary 'main' is deprecated.
 Please use 'whisper-cli' instead.
 See https://github.com/ggerganov/whisper.cpp/tree/master/examples/deprecation-warning/README.md for more information.

So, I switched to whisper-cli, but that fails also:

 ./build/bin/whisper-cli -m models/ggml-base.en.bin --decoding-parakeet -f samples/jfk.wav
error: unknown argument: --decoding-parakeet
usage: ./build/bin/whisper-cli [options] file0 file1 ...
supported audio formats: flac, mp3, ogg, wav
options:
 -h, --help [default] show this help message and exit
 -t N, --threads N [4 ] number of threads to use during computation
 -p N, --processors N [1 ] number of processors to use during computation
 -ot N, --offset-t N [0 ] time offset in milliseconds
 -on N, --offset-n N [0 ] segment index offset
 -d N, --duration N [0 ] duration of audio to process in milliseconds
 -mc N, --max-context N [-1 ] maximum number of text context tokens to store
 -ml N, --max-len N [0 ] maximum segment length in characters
 -sow, --split-on-word [false ] split on word rather than on token
 -bo N, --best-of N [5 ] number of best candidates to keep
 -bs N, --beam-size N [5 ] beam size for beam search
 -ac N, --audio-ctx N [0 ] audio context size (0 - all)
 -wt N, --word-thold N [0.01 ] word timestamp probability threshold
 -et N, --entropy-thold N [2.40 ] entropy threshold for decoder fail
 -lpt N, --logprob-thold N [-1.00 ] log probability threshold for decoder fail
 -nth N, --no-speech-thold N [0.60 ] no speech threshold
 -tp, --temperature N [0.00 ] The sampling temperature, between 0 and 1
 -tpi, --temperature-inc N [0.20 ] The increment of temperature, between 0 and 1
 -debug, --debug-mode [false ] enable debug mode (eg. dump log_mel)
 -tr, --translate [false ] translate from source language to english
 -di, --diarize [false ] stereo audio diarization
 -tdrz, --tinydiarize [false ] enable tinydiarize (requires a tdrz model)
 -nf, --no-fallback [false ] do not use temperature fallback while decoding
 -otxt, --output-txt [false ] output result in a text file
 -ovtt, --output-vtt [false ] output result in a vtt file
 -osrt, --output-srt [false ] output result in a srt file
 -olrc, --output-lrc [false ] output result in a lrc file
 -owts, --output-words [false ] output script for generating karaoke video
 -fp, --font-path [/System/Library/Fonts/Supplemental/Courier New Bold.ttf] path to a monospace font for karaoke video
 -ocsv, --output-csv [false ] output result in a CSV file
 -oj, --output-json [false ] output result in a JSON file
 -ojf, --output-json-full [false ] include more information in the JSON file
 -of FNAME, --output-file FNAME [ ] output file path (without file extension)
 -np, --no-prints [false ] do not print anything other than the results
 -ps, --print-special [false ] print special tokens
 -pc, --print-colors [false ] print colors
 --print-confidence [false ] print confidence
 -pp, --print-progress [false ] print progress
 -nt, --no-timestamps [false ] do not print timestamps
 -l LANG, --language LANG [en ] spoken language ('auto' for auto-detect)
 -dl, --detect-language [false ] exit after automatically detecting language
 --prompt PROMPT [ ] initial prompt (max n_text_ctx/2 tokens)
 -m FNAME, --model FNAME [models/ggml-base.en.bin] model path
 -f FNAME, --file FNAME [ ] input audio file path
 -oved D, --ov-e-device DNAME [CPU ] the OpenVINO device used for encode inference
 -dtw MODEL --dtw MODEL [ ] compute token-level timestamps
 -ls, --log-score [false ] log best decoder scores of tokens
 -ng, --no-gpu [false ] disable GPU
 -fa, --flash-attn [false ] flash attention
 -sns, --suppress-nst [false ] suppress non-speech tokens
 --suppress-regex REGEX [ ] regular expression matching tokens to suppress
 --grammar GRAMMAR [ ] GBNF grammar to guide decoding
 --grammar-rule RULE [ ] top-level GBNF grammar rule name
 --grammar-penalty N [100.0 ] scales down logits of nongrammar tokens
Voice Activity Detection (VAD) options:
 --vad [false ] enable Voice Activity Detection (VAD)
 -vm FNAME, --vad-model FNAME [ ] VAD model path
 -vt N, --vad-threshold N [0.50 ] VAD threshold for speech recognition
 -vspd N, --vad-min-speech-duration-ms N [250 ] VAD min speech duration (0.0-1.0)
 -vsd N, --vad-min-silence-duration-ms N [100 ] VAD min silence duration (to split segments)
 -vmsd N, --vad-max-speech-duration-s N [FLT_MAX] VAD max speech duration (auto-split longer)
 -vp N, --vad-speech-pad-ms N [30 ] VAD speech padding (extend segments)
 -vo N, --vad-samples-overlap N [0.10 ] VAD samples overlap (seconds between segments)
 
You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet

AltStyle によって変換されたページ (->オリジナル) /