Skip to main content

@remotion/install-whisper-cpp

Available from v4.0.115

With Whisper.cpp, you can transcribe audio locally on your machine.
This package provides easy to use cross-platform functions to install Whisper.cpp and a model.

  • npm
  • bun
  • pnpm
  • yarn
npm i --save-exact @remotion/install-whisper-cpp@4.0.398
This assumes you are currently using v4.0.398 of Remotion.
Also update remotion and all `@remotion/*` packages to the same version.
Remove all ^ character in front of the version numbers of it as it can lead to a version conflict.

Example usage

Install Whisper 1.5.5 (the latest version at the time of writing that we find works well and supports token-level timestamps) and the medium.en model to the whisper.cpp folder.

install-whisper.cpp
tsx
import path from'path';
import {downloadWhisperModel, installWhisperCpp, > import transcribe">transcribe, convertToCaptions} from'@remotion/install-whisper-cpp';
constto= path.join(process.cwd(), 'whisper.cpp');
awaitinstallWhisperCpp({
to,
version: '1.5.5',
});
awaitdownloadWhisperModel({
model: 'medium.en',
folder: to,
});
// Convert the audio to a 16KHz wav file first if needed:
// import {execSync} from 'child_process';
// execSync('ffmpeg -i /path/to/audio.mp4 -ar 16000 /path/to/audio.wav -y');
const {transcription} =await({ inputPath, whisperPath, whisperCppVersion, model, modelFolder, translateToEnglish, tokenLevelTimestamps, printOutput, tokensPerItem, language, splitOnWord, signal, onProgress, flashAttention, additionalArgs, }: { inputPath: string; whisperPath: string; whisperCppVersion: string; model: WhisperModel; tokenLevelTimestamps: true; modelFolder?: string; translateToEnglish?: boolean; printOutput?: boolean; tokensPerItem?: undefined; language?: Language | null; splitOnWord?: boolean; signal?: AbortSignal; onProgress?: TranscribeOnProgress; flashAttention?: boolean; additionalArgs?: AdditionalArgs; }): Promise
> import transcribe">transcribe({
model: 'medium.en',
whisperPath: to,
whisperCppVersion: '1.5.5',
inputPath: '/path/to/audio.wav',
tokenLevelTimestamps: true,
});
for (consttokenof transcription) {
console.log(token.timestamps.from, token.timestamps.to, token.text);
}
// Optional: Apply our recommended postprocessing
const {captions} =convertToCaptions({
transcription,
combineTokensWithinMilliseconds: 200,
});
for (constlineof captions) {
console.log(line.text, line.startInSeconds);
}

Functions

License

MIT

See also

AltStyle によって変換されたページ (->オリジナル) /