How do I get "Parakeet"? · ggml-org/whisper.cpp · Discussion #3312

WGroleau
Jul 8, 2025

The announcement of Parakeet (ultra-fast?) was good news until I didcovered I don't have it and don't know how to get it.

Replies: 3 comments 1 reply

officiallyutso
Jul 11, 2025

You need to do two things:

1. Use the latest version of `whisper.cpp`

Make sure you pull the latest code:

git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp
git pull origin master
make

2. Enable Parakeet using the `--decoding-parakeet` flag

Once built, just run the CLI with:

./main -m models/ggml-base.en.q5_1.bin --decoding-parakeet -f samples/jfk.wav

This activates the Parakeet decoder instead of the default decoder.

Want to Compare Speeds?

Try running with and without the flag and measure the time:

./main --decoding-parakeet -f your_audio.wav # With Parakeet
./main -f your_audio.wav # Without Parakeet

Notes:

Works best with quantized models (Q5_1, Q6_K).
May still be under tuning — watch for updates or issues.
If building from scratch: make clean && make -j to ensure it's fully updated.

Hope this helps.

0 replies

Not sure that helps. I am using the GUI app downloaded pre-built, rather than command line. As a retired software engineer, I could probably do this, though I'm a bit rusty after eleven years of retirement. However, I would rather not have to download and build a separate command-line version.

1 reply

@officiallyutso

officiallyutso Jul 11, 2025

If you're using the GUI version, Parakeet isn't available yet — it's currently only supported in the CLI (main) binary via the --decoding-parakeet flag. You'd need to build from source or use a precompiled CLI binary to try it. Hopefully GUI support will be added soon!

jasonkaplan791
Jul 28, 2025

@officiallyutso these instructions didn't work for using Parakeet via the CLI. Everything i have tried fails.
Firstly, models/ggml-base.en.q5_1.bin does not exist. How do I get it?

Also, calling 'main' returns an error:
./build/bin/main -m models/ggml-base.en.q5_1.bin --decoding-parakeet -f samples/jfk.wav

WARNING: The binary 'main' is deprecated.
 Please use 'whisper-cli' instead.
 See https://github.com/ggerganov/whisper.cpp/tree/master/examples/deprecation-warning/README.md for more information.

So, I switched to whisper-cli, but that fails also:

 ./build/bin/whisper-cli -m models/ggml-base.en.bin --decoding-parakeet -f samples/jfk.wav
error: unknown argument: --decoding-parakeet
usage: ./build/bin/whisper-cli [options] file0 file1 ...
supported audio formats: flac, mp3, ogg, wav
options:
 -h, --help [default] show this help message and exit
 -t N, --threads N [4 ] number of threads to use during computation
 -p N, --processors N [1 ] number of processors to use during computation
 -ot N, --offset-t N [0 ] time offset in milliseconds
 -on N, --offset-n N [0 ] segment index offset
 -d N, --duration N [0 ] duration of audio to process in milliseconds
 -mc N, --max-context N [-1 ] maximum number of text context tokens to store
 -ml N, --max-len N [0 ] maximum segment length in characters
 -sow, --split-on-word [false ] split on word rather than on token
 -bo N, --best-of N [5 ] number of best candidates to keep
 -bs N, --beam-size N [5 ] beam size for beam search
 -ac N, --audio-ctx N [0 ] audio context size (0 - all)
 -wt N, --word-thold N [0.01 ] word timestamp probability threshold
 -et N, --entropy-thold N [2.40 ] entropy threshold for decoder fail
 -lpt N, --logprob-thold N [-1.00 ] log probability threshold for decoder fail
 -nth N, --no-speech-thold N [0.60 ] no speech threshold
 -tp, --temperature N [0.00 ] The sampling temperature, between 0 and 1
 -tpi, --temperature-inc N [0.20 ] The increment of temperature, between 0 and 1
 -debug, --debug-mode [false ] enable debug mode (eg. dump log_mel)
 -tr, --translate [false ] translate from source language to english
 -di, --diarize [false ] stereo audio diarization
 -tdrz, --tinydiarize [false ] enable tinydiarize (requires a tdrz model)
 -nf, --no-fallback [false ] do not use temperature fallback while decoding
 -otxt, --output-txt [false ] output result in a text file
 -ovtt, --output-vtt [false ] output result in a vtt file
 -osrt, --output-srt [false ] output result in a srt file
 -olrc, --output-lrc [false ] output result in a lrc file
 -owts, --output-words [false ] output script for generating karaoke video
 -fp, --font-path [/System/Library/Fonts/Supplemental/Courier New Bold.ttf] path to a monospace font for karaoke video
 -ocsv, --output-csv [false ] output result in a CSV file
 -oj, --output-json [false ] output result in a JSON file
 -ojf, --output-json-full [false ] include more information in the JSON file
 -of FNAME, --output-file FNAME [ ] output file path (without file extension)
 -np, --no-prints [false ] do not print anything other than the results
 -ps, --print-special [false ] print special tokens
 -pc, --print-colors [false ] print colors
 --print-confidence [false ] print confidence
 -pp, --print-progress [false ] print progress
 -nt, --no-timestamps [false ] do not print timestamps
 -l LANG, --language LANG [en ] spoken language ('auto' for auto-detect)
 -dl, --detect-language [false ] exit after automatically detecting language
 --prompt PROMPT [ ] initial prompt (max n_text_ctx/2 tokens)
 -m FNAME, --model FNAME [models/ggml-base.en.bin] model path
 -f FNAME, --file FNAME [ ] input audio file path
 -oved D, --ov-e-device DNAME [CPU ] the OpenVINO device used for encode inference
 -dtw MODEL --dtw MODEL [ ] compute token-level timestamps
 -ls, --log-score [false ] log best decoder scores of tokens
 -ng, --no-gpu [false ] disable GPU
 -fa, --flash-attn [false ] flash attention
 -sns, --suppress-nst [false ] suppress non-speech tokens
 --suppress-regex REGEX [ ] regular expression matching tokens to suppress
 --grammar GRAMMAR [ ] GBNF grammar to guide decoding
 --grammar-rule RULE [ ] top-level GBNF grammar rule name
 --grammar-penalty N [100.0 ] scales down logits of nongrammar tokens
Voice Activity Detection (VAD) options:
 --vad [false ] enable Voice Activity Detection (VAD)
 -vm FNAME, --vad-model FNAME [ ] VAD model path
 -vt N, --vad-threshold N [0.50 ] VAD threshold for speech recognition
 -vspd N, --vad-min-speech-duration-ms N [250 ] VAD min speech duration (0.0-1.0)
 -vsd N, --vad-min-silence-duration-ms N [100 ] VAD min silence duration (to split segments)
 -vmsd N, --vad-max-speech-duration-s N [FLT_MAX] VAD max speech duration (auto-split longer)
 -vp N, --vad-speech-pad-ms N [30 ] VAD speech padding (extend segments)
 -vo N, --vad-samples-overlap N [0.10 ] VAD samples overlap (seconds between segments)

0 replies

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How do I get "Parakeet"? #3312

Uh oh!

{{title}}

Uh oh!

WGroleau
Jul 8, 2025

Replies: 3 comments 1 reply

Uh oh!

{{title}}

Uh oh!

officiallyutso
Jul 11, 2025

1. Use the latest version of `whisper.cpp`

2. Enable Parakeet using the `--decoding-parakeet` flag

Want to Compare Speeds?

Notes:

Uh oh!

{{title}}

Uh oh!

WGroleau
Jul 11, 2025
Author

Uh oh!

{{title}}

Uh oh!

officiallyutso Jul 11, 2025

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

jasonkaplan791
Jul 28, 2025

Select a reply

Uh oh!

How do I get "Parakeet"? #3312

Uh oh!

WGroleau Jul 8, 2025

Replies: 3 comments · 1 reply

Uh oh!

officiallyutso Jul 11, 2025

1. Use the latest version of whisper.cpp

2. Enable Parakeet using the --decoding-parakeet flag

Want to Compare Speeds?

Notes:

Uh oh!

WGroleau Jul 11, 2025 Author

Uh oh!

officiallyutso Jul 11, 2025

Uh oh!

Uh oh!

jasonkaplan791 Jul 28, 2025

WGroleau
Jul 8, 2025

Replies: 3 comments 1 reply

officiallyutso
Jul 11, 2025

1. Use the latest version of `whisper.cpp`

2. Enable Parakeet using the `--decoding-parakeet` flag

WGroleau
Jul 11, 2025
Author

jasonkaplan791
Jul 28, 2025