@remotion/install-whisper-cpp

Available from v4.0.115

With Whisper.cpp, you can transcribe audio locally on your machine.
This package provides easy to use cross-platform functions to install Whisper.cpp and a model.

npm
bun
pnpm
yarn

npm i --save-exact @remotion/install-whisper-cpp@4.0.398

pnpm i @remotion/install-whisper-cpp@4.0.398

bun i @remotion/install-whisper-cpp@4.0.398

yarn --exact add @remotion/install-whisper-cpp@4.0.398

This assumes you are currently using v4.0.398 of Remotion.
Also update remotion and all `@remotion/*` packages to the same version.
Remove all ^ character in front of the version numbers of it as it can lead to a version conflict.

Example usage

Install Whisper 1.5.5 (the latest version at the time of writing that we find works well and supports token-level timestamps) and the medium.en model to the whisper.cpp folder.

install-whisper.cpp
tsx
import path from'path';
import {downloadWhisperModel, installWhisperCpp, >
import transcribe">transcribe, convertToCaptions} from'@remotion/install-whisper-cpp';
constto= path.join(process.cwd(), 'whisper.cpp');
awaitinstallWhisperCpp({
 to,
 version: '1.5.5',
});
awaitdownloadWhisperModel({
 model: 'medium.en',
 folder: to,
});
// Convert the audio to a 16KHz wav file first if needed:
// import {execSync} from 'child_process';
// execSync('ffmpeg -i /path/to/audio.mp4 -ar 16000 /path/to/audio.wav -y');const {transcription} =await({ inputPath, whisperPath, whisperCppVersion, model, modelFolder, translateToEnglish, tokenLevelTimestamps, printOutput, tokensPerItem, language, splitOnWord, signal, onProgress, flashAttention, additionalArgs, }: {
 inputPath: string;
 whisperPath: string;
 whisperCppVersion: string;
 model: WhisperModel;
 tokenLevelTimestamps: true;
 modelFolder?: string;
 translateToEnglish?: boolean;
 printOutput?: boolean;
 tokensPerItem?: undefined;
 language?: Language | null;
 splitOnWord?: boolean;
 signal?: AbortSignal;
 onProgress?: TranscribeOnProgress;
 flashAttention?: boolean;
 additionalArgs?: AdditionalArgs;
}): Promise>
import transcribe">transcribe({
 model: 'medium.en',
 whisperPath: to,
 whisperCppVersion: '1.5.5',
 inputPath: '/path/to/audio.wav',
 tokenLevelTimestamps: true,
});
for (consttokenof transcription) {
 console.log(token.timestamps.from, token.timestamps.to, token.text);
}
// Optional: Apply our recommended postprocessing
const {captions} =convertToCaptions({
 transcription,
 combineTokensWithinMilliseconds: 200,
});
for (constlineof captions) {
 console.log(line.text, line.startInSeconds);
}