transcribe()v4.0.131
Transcribes a media file by utilizing Whisper.cpp.
You should first install Whisper.cpp, for example through installWhisperCpp().
This function only works with Whisper.cpp 1.5.5 or later, unless tokenLevelTimestamps is set to false.
transcribe.mjstsximportpath from'path';import {> import transcribe">transcribe } from'@remotion/install-whisper-cpp';const {transcription } =await({ inputPath, whisperPath, whisperCppVersion, model, modelFolder, translateToEnglish, tokenLevelTimestamps, printOutput, tokensPerItem, language, splitOnWord, signal, onProgress, flashAttention, additionalArgs, }: { inputPath: string; whisperPath: string; whisperCppVersion: string; model: WhisperModel; tokenLevelTimestamps: true; modelFolder?: string; translateToEnglish?: boolean; printOutput?: boolean; tokensPerItem?: undefined; language?: Language | null; splitOnWord?: boolean; signal?: AbortSignal; onProgress?: TranscribeOnProgress; flashAttention?: boolean; additionalArgs?: AdditionalArgs; }): Promise > import transcribe">transcribe({inputPath : '/path/to/audio.wav',whisperPath :path .join (process .cwd (), 'whisper.cpp'),whisperCppVersion : '1.5.5',model : 'medium.en',tokenLevelTimestamps : true,});for (consttoken oftranscription ) {console .log (token .timestamps .from ,token .timestamps .to ,token .text );}Options
inputPathThe path to the file you want extract text from.
The file has to be a 16-bit, 16KHz, WAVE file. See Resample audio to 16kHz for more information.
whisperPathThe path to your
whisper.cppfolder.
If you haven't installed Whisper.cpp, you can do so for example throughinstallWhisperCpp()and use the samefolder.
tokenLevelTimestampsv4.0.131 Passes the
--dtwflag to Whisper.cpp to generate more accurate timestamps, which are being returned under thet_dtwfield.
Recommended to get actually accurate timings, but only available from Whisper.cpp versions later than 1.0.55.
Set tofalseif you use an older version of Whisper.cpp.
model?default:
base.enSpecify a specific Whisper model for the transcription.
Possible values:
tiny,tiny.en,base,base.en,small,small.en,medium,medium.en,large-v1,large-v2,large-v3,large-v3-turbo.Make sure the model you want to use exists in your
whisper.cpp/modelsfolder. You can ensure a specific model is available locally by utilizing the downloadWhisperModel() API.Note:
large-v3-turbois only working properly from Whisper.cpp versions built from November 2024 or later and Remotion v4.0.229 or greater.
modelFolder?default: whisperPath/models
If you saved Whisper models to a specific folder, pass its path here.
Uses the
whisper.cpp/modelsfolder at the location defined throughwhisperPathas default.
translateToEnglish?default: false
Set this boolean flag to
trueif you want to get a translated transcription of the provided file in English. Make sure to not use a *.en model, as they will not be able to translate a foreign language to english.noteWe recommend using at least the
mediummodel to get satisfactory results when translating.
printOutput?v4.0.132 Whether to print the output of the transcription process to the console. Defaults to
true.
tokensPerItem?v4.0.141 default:
1The maximum amount of tokens included in each transcription item.
Set this flag to
null, to usewhisper.cpp's default token grouping (useful for generating a movie-style transcription).info
tokensPerItemcan only be set whentokenLevelTimestampsis set tofalse.
splitOnWord?v4.0.208 Adds the
--split-on-wordflag to Whisper.cpp for cleaner word-for-word output.
language?v4.0.142 default: null
Passes the
-lflag to Whisper.cpp to specific spoken language of the audio file.Possible values:
Afrikaans,Albanian,Amharic,Arabic,Armenian,Assamese,Azerbaijani,Bashkir,Basque,Belarusian,Bengali,Bosnian,Breton,Bulgarian,Burmese,Castilian,Catalan,Chinese,Croatian,Czech,Danish,Dutch,English,Estonian,Faroese,Finnish,Flemish,French,Galician,Georgian,German,Greek,Gujarati,Haitian,Haitian Creole,Hausa,Hawaiian,Hebrew,Hindi,Hungarian,Icelandic,Indonesian,Italian,Japanese,Javanese,Kannada,Kazakh,Khmer,Korean,Lao,Latin,Latvian,Letzeburgesch,Lingala,Lithuanian,Luxembourgish,Macedonian,Malagasy,Malay,Malayalam,Maltese,Maori,Marathi,Moldavian,Moldovan,Mongolian,Myanmar,Nepali,Norwegian,Nynorsk,Occitan,Panjabi,Pashto,Persian,Polish,Portuguese,Punjabi,Pushto,Romanian,Russian,Sanskrit,Serbian,Shona,Sindhi,Sinhala,Sinhalese,Slovak,Slovenian,Somali,Spanish,Sundanese,Swahili,Swedish,Tagalog,Tajik,Tamil,Tatar,Telugu,Thai,Tibetan,Turkish,Turkmen,Ukrainian,Urdu,Uzbek,Valencian,Vietnamese,Welsh,Yiddish,Yoruba,Zulu.af,am,ar,as,az,ba,be,bg,bn,bo,br,bs,ca,cs,cy,da,de,el,en,es,et,eu,fa,fi,fo,fr,gl,gu,ha,haw,he,hi,hr,ht,hu,hy,id,is,it,ja,jw,ka,kk,km,kn,ko,la,lb,ln,lo,lt,lv,mg,mi,mk,ml,mn,mr,ms,mt,my,ne,nl,nn,no,oc,pa,pl,ps,pt,ro,ru,sa,sd,si,sk,sl,sn,so,sq,sr,su,sv,sw,ta,te,tg,th,tk,tl,tr,tt,uk,ur,uz,vi,yi,yo,zhorauto.
signal?v4.0.156 A signal from an
AbortControllerto cancel the transcription process.
onProgress?v4.0.156 Listen for progress updates from the transcription process.
The progress is a number between0and1.tsximporttype {TranscribeOnProgress } from'@remotion/install-whisper-cpp';constonProgress :TranscribeOnProgress = (progress ) => {console .log (`Transcription progress: ${progress *100}%`);};
flashAttention?v4.0.324 Boolean value, enable flash attention.
additionalArgs?v4.0.324 Additional args to be passed to whisper, in an array. The array can contain strings or string pairs, like
jstranscribe({...,additionalArgs: ['-tdrz', ['--max-len', '1']]})Return value
TranscriptionJsonAn object containing all the metadata and transcriptions resulting from the transcription process.
tstypeTimestamps = {from :string;to :string;};typeOffsets = {from :number;to :number;};typeWordLevelToken = {t_dtw :number;text :string;timestamps :Timestamps ;offsets :Offsets ;id :number;p :number;};typeTranscriptionItem = {timestamps :Timestamps ;offsets :Offsets ;text :string;};typeTranscriptionItemWithTimestamp =TranscriptionItem & {tokens :WordLevelToken [];};typeModel = {type :string;multilingual :boolean;vocab :number;audio : {ctx :number;state :number;head :number;layer :number;};text : {ctx :number;state :number;head :number;layer :number;};mels :number;ftype :number;};typeParams = {model :string;language :string;translate :boolean;};typeResult = {language :string;};exporttypeTranscriptionJson <WithTokenLevelTimestamp extendsboolean> = {systeminfo :string;model :Model ;params :Params ;result :Result ;transcription :trueextendsWithTokenLevelTimestamp ?TranscriptionItemWithTimestamp [] :TranscriptionItem [];};Prefer relying on the
t_dtwvalue for accurate timestamps overoffsets.
UseconvertToCaptions()to use our opinionated suggestion for postprocessing the captions.See also