A Go CLI that transcribes audio and video files via the OpenAI speech-to-text API and writes results as txt, md, json, srt, or vtt.
base="https://github.com/mssoftjp/ai-transcriber-cli/releases/latest/download" curl -L -O "$base/checksums.txt" archive="$(awk '/darwin_arm64\.tar\.gz$/ {print 2ドル; exit}' checksums.txt)" curl -L -O "$base/$archive" shasum -a 256 -c checksums.txt --ignore-missing tar -xzf "$archive" mkdir -p "$HOME/.local/bin" install -m 0755 "${archive%.tar.gz}" "$HOME/.local/bin/transcriber" export PATH="$HOME/.local/bin:$PATH" transcriber version
If you prefer a manual install, download the latest archive from GitHub Releases, extract it, and place transcriber on your PATH.
ffmpeg and ffprobe are needed for video input, long-file chunking, trimming, and format normalization. Small audio files in a provider-compatible format can be transcribed without them.
When ffmpeg is used, the CLI now produces provider-ready intermediate audio as compact .m4a files rather than large uncompressed WAV files. If you keep the workdir, the files left behind are the same files that were uploaded.
brew install ffmpeg # macOS (https://formulae.brew.sh/formula/ffmpeg) sudo apt install ffmpeg # Debian / Ubuntu
Windows: download from ffmpeg.org/download.html and add the bin directory to your PATH.
export OPENAI_API_KEY="sk-..."
You can also store the key in the OS keychain:
printf '%s' "$OPENAI_API_KEY" | transcriber config key set --method keychain --stdin transcriber config key status
The CLI resolves the API key from the configured [api].key_env, then OPENAI_API_KEY, then the OS keychain, and finally an optional local key file at the same config location (key.txt). It never writes keys to config files, log files, or transcript output. Audio data is sent to the OpenAI API for transcription and is subject to OpenAI's data usage policies. No audio or transcript data is sent anywhere else.
transcriber doctor
This checks API key visibility, ffmpeg / ffprobe availability, temp directory access, provider connectivity, and config validity.
git clone https://github.com/mssoftjp/ai-transcriber-cli.git
cd ai-transcriber-cli
make install
transcriber version# Simplest form — writes Markdown output next to the input file transcriber transcribe input.m4a # Print plain text to stdout transcriber transcribe input.m4a --format txt --stdout --events none # Write JSON to a specific directory transcriber transcribe input.m4a --format json --out-dir ./out
By default, the output file is written next to the input as <name>.transcript.md. A manifest sidecar (<name>.transcript.manifest.json) is also created. Use --out or --out-dir to change the destination, and --overwrite to allow replacing existing output.
Supported input formats include .mp3, .m4a, .wav, .flac, .ogg, .mp4, .mov, .mkv, and others. Run transcriber transcribe --help for the full list. Long files are automatically split into chunks and reassembled.
If a long client-chunked job fails partway through, re-run the same command with --resume to reuse completed chunks from the manifest sidecar and chunk cache next to the output artifacts.
For long client-chunked jobs with gpt-4o-transcribe or gpt-4o-mini-transcribe, you can add --parallel to send chunks concurrently. This speeds up long runs, but it disables prompt carryover for those chunks.
| Model | Strengths | Good for |
|---|---|---|
gpt-4o-transcribe |
High accuracy, preserves code-switched audio | General transcription (default) |
gpt-4o-mini-transcribe |
Lighter, lower cost | Cost-sensitive workloads |
whisper-1 |
Timestamp-capable output | Subtitle generation (srt/vtt) |
gpt-4o-transcribe-diarize |
Speaker-labeled output | Meetings, multi-speaker recordings |
# Generate subtitles transcriber transcribe input.m4a --model whisper-1 --format srt # Speaker diarization transcriber transcribe call.m4a --diarize --format json
The default is auto (automatic detection).
# Auto-detect — best for mixed-language audio transcriber transcribe input.m4a --language auto # Force a single language — suppresses other-language content transcriber transcribe input.m4a --language ja
For audio that mixes multiple languages, auto tends to preserve the original speech more faithfully. Forcing a single language improves readability but may drop content in other languages.
transcriber transcribe input.m4a --start 30 --end 90 transcriber transcribe input.m4a --start 00:01:30 --end 00:03:00
A YAML dictionary file can automatically fix common recognition errors.
transcriber transcribe input.m4a --dictionary ./dict.yaml --dictionary-enabled
Runs a transcript-level correction pass after transcription. It does not summarize or translate.
transcriber transcribe input.m4a --postprocess
For gpt-4o-transcribe and gpt-4o-mini-transcribe, --parallel sends client chunks concurrently.
transcriber transcribe meeting.m4a --model gpt-4o-mini-transcribe --chunking-mode client --parallel
Notes:
--parallelis useful only when the execution plan uses client-side chunking- when the input fits in a single request or uses server-side chunking,
--parallelhas no effect - parallel chunk sending disables prompt carryover, so the default sequential mode remains the safer quality-first option
- if you resume a partial client-chunked run, use the same
--parallelsetting as the original run
Inspect the execution plan without calling the API.
# probe: returns input metadata and the planned strategy as JSON transcriber probe input.m4a # dry-run: same entry point as transcribe, but stops after planning transcriber transcribe input.m4a --dry-run --events none
transcriber transcribe input.m4a --events jsonl > events.jsonlEmits machine-readable JSONL progress events to stdout. Designed for GUI wrappers and automation pipelines.
transcriber tui
The TUI is a helper for one-off local jobs. The batch CLI remains the primary interface for scripts, redirected output, and full option coverage.
Use arrow keys or j/k to move, Enter to select or edit, s to start from the job screen, and Esc to go back or quit.
A TOML config file lets you persist frequently used options as defaults.
- macOS / Linux:
~/.config/transcriber/config.toml - Windows:
%AppData%/transcriber/config.toml
# Generate a sample config transcriber config init # Validate the current config transcriber config validate
Precedence: CLI flags > environment variables > config file > built-in defaults
API keys should be kept in environment variables. The CLI does not store, log, or embed API keys in any output. The --postprocess option sends transcript text (not audio) to the OpenAI API for correction; this is the only case where transcript content leaves the local machine after the initial transcription call.
See docs/config.md for the full reference.
| Command | Description |
|---|---|
transcriber transcribe <input> |
Run transcription |
transcriber probe <input> |
Inspect input and return the execution plan |
transcriber doctor |
Check environment (API key, dependencies) |
transcriber tui |
Open the interactive terminal UI for a single job |
transcriber version |
Print version metadata |
transcriber config init |
Print a sample config |
transcriber config validate |
Validate config and dictionary |
Run transcriber <command> --help for the full flag reference of each command.
- docs/contracts.md — Public contracts for GUI and automation integrations
- docs/config.md — Full config reference for operators and advanced users
- docs/limitations.md — Known constraints and tradeoffs for maintainers and adopters
make build # build make test # test make ci # lint + test + vet make hooks # enable Git hooks (once)
Tests that call the real API are not run by go test ./....
OPENAI_API_KEY=... go test ./internal/provider/openai ./internal/postprocess -run Integration -count=1Build a local binary:
make build
Build a versioned release archive plus checksums.txt:
make package VERSION=v0.4.0
Build a cross-target release archive:
make release-archive VERSION=v0.4.0 GOOS=darwin GOARCH=arm64
Packaging notes:
- macOS and Linux archives are produced as
.tar.gz - Windows archives are produced as
.zipand containtranscriber_..._windows_amd64.exe
This project is licensed under the MIT License. See LICENSE.