Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

BaruchEric/whisperlog

Repository files navigation

whisperlog

License: MIT Python 3.11+

Local-first transcription pipeline that copies recordings off a Sony ICD-UX570 (or any audio source), transcribes them with Whisper into a searchable SQLite/FTS5 archive, and adds opt-in, per-call Claude/Ollama enrichment plus an MCP server so Claude can search your archive directly.

TL;DR

  • What: Plug in a UX570, run whisperlog watch, and recordings flow into a local archive — copied off the device, transcribed, optionally summarized, and indexed for full-text search. Audio never leaves your machine.
  • How: ingest copies/dedupes files → transcribe runs faster-whisper → text is indexed in SQLite FTS5 → optional enrich/agent passes summarize via local Ollama or Claude → an MCP server exposes the archive to Claude Desktop/Code.
  • Stack: Python 3.11+, faster-whisper, typer CLI, pydantic-settings, SQLite FTS5; optional anthropic + keyring, pyyaml, and mcp extras. Managed with uv.
  • Run: uv pip install -e . then whisperlog watch (local mode, no network calls). Add Claude with uv pip install -e '.[cloud]' and whisperlog config set-key anthropic.

What this is

Plug in a UX570, run whisperlog watch, and your recordings flow into a local archive: copied off the device, transcribed with Whisper, optionally summarized with a local LLM, and indexed for full-text search. Claude features (summaries, action items, multi-step agent workflows, MCP) are opt-in and per-call — never silently triggered.

The pipeline is generic — the UX570 is just the default. Pass --source <dir> to ingest or watch to point it at any USB drive, network share, or local folder. If the Sony REC_FILE/FOLDER01..05 layout is detected the device convention is used; otherwise the directory is walked recursively for any common audio format (.mp3 .wav .m4a .mp4 .flac .ogg .aac .opus .webm).

Privacy model

There are two modes. The default never makes a network call.

Mode Audio Transcript Summarization
Local (default) stays on disk stays on disk local Ollama
Cloud (opt-in) stays on disk sent to Claude API or Claude CLI Anthropic
  • Audio never leaves your machine, in either mode. Even agents see only the transcript text.
  • API keys are stored in the OS keychain (Keychain on macOS, Secret Service on Linux) via keyring, never in .env, never logged.
  • Audit log at ~/.whisperlog/audit.log records every cloud call: timestamp, backend, model, transcript SHA-256, first 80 characters, token counts, cost. The transcript content itself is not written.
  • Fully offline mode is real. With DEFAULT_ENRICH_BACKEND=ollama and --backend ollama, the anthropic SDK is lazily imported and never loaded.

Tech stack

  • Language: Python 3.11+ (requires-python = ">=3.11"), managed with uv.
  • Core deps: faster-whisper (transcription), typer + rich (CLI), pydantic / pydantic-settings (config), httpx, platformdirs.
  • Storage: SQLite with FTS5 full-text search at ~/.whisperlog/index.db; archive files under ~/Documents/whisperlog-archive/.
  • Optional extras: cloud (anthropic, keyring), agent (adds pyyaml), mcp (mcp), all, and dev (pytest, ruff, mypy).
  • Entry points: whisperlog (CLI) and whisperlog-mcp (stdio MCP server).

Prerequisites

  • Python 3.11+ and uv (or pip)
  • ffmpeg — for audio decoding (brew install ffmpeg / apt install ffmpeg)

Optional, depending on which features you want:

  • Ollama — local LLM for summarization. brew install ollama, then ollama serve and ollama pull qwen2.5:7b.
  • claude CLI — Claude Code CLI, for the claude-cli enrichment backend and in-repo agent proposals.
  • Anthropic API key — for the claude-api backend and most agent workflows.

Quick start (local mode)

git clone https://github.com/BaruchEric/whisperlog.git
cd whisperlog
uv pip install -e .
# Copy the env template and tweak as needed.
cp .env.example .env
# Plug in the UX570, then:
whisperlog ingest # copy new recordings to the archive
whisperlog transcribe 'archive/**/audio.mp3'
# Or, in one shot:
whisperlog watch # daemon: ingest+transcribe whenever the device is plugged in
whisperlog watch --enrich # also runs the default Ollama summarizer
# Any other audio source — a folder, USB drive, network share, etc.:
whisperlog ingest --source ~/Dropbox/voice-memos
whisperlog watch --source ~/Dropbox/voice-memos --once

Search later:

whisperlog search "sarah AND project"

The archive lives at ~/Documents/whisperlog-archive/<YYYY>/<YYYY-MM-DD>/<HH-MM>_<sha8>/ and contains:

  • audio.mp3 — the original file (read-only)
  • transcript.txt — plain text
  • transcript.srt — subtitles
  • transcript.md — metadata + transcript + appended enrichments

Commands

whisperlog --help for the full list; key subcommands:

Command What it does
ingest [--source DIR] [--once] Copy + dedupe recordings off the device (or any source) into the archive.
transcribe '<glob>' Run faster-whisper over matching audio, writing transcript.{txt,srt,md}.
watch [--source DIR] [--enrich] [--once] Daemon that ingests + transcribes (and optionally enriches) on plug-in.
search '<FTS5 query>' Full-text search across all transcripts.
enrich <path> [--backend ...] [--task ...] [--model ...] [--yes] Run one enrichment pass and append to the transcript .md.
redact <path> [--ollama] [--out ...] Strip PII via regex (+ optional local Ollama pass) before sending anything out.
stats [--days N] Show cloud spend over a window.
config set-key anthropic / config show Store the Anthropic key in the OS keychain / print effective config.
agent meeting-debrief <path> Notes + action items + follow-up email + .ics, from one transcript.
agent code-review <path> [--repo ...] Extract decisions/questions, optionally propose code changes via Claude CLI.
agent custom <path> --workflow file.yaml Run a YAML-defined sequence of prompts.
version Print the version.

Adding Claude

Claude features are an extra, paid layer. Install the extras and store your key:

uv pip install -e '.[cloud]'
whisperlog config set-key anthropic # prompts; key goes to OS keychain

Then pick a backend per call:

whisperlog enrich archive/2026/2026-04-27/14-30_abcd1234/ \
 --backend claude-api \
 --task meeting_notes

Cost expectations

Defaults use claude-sonnet-4-6 (cheaper/faster). Agent workflows default to claude-opus-4-7 (higher quality, ×ばつ the cost). The daily cap (MAX_DAILY_CLAUDE_USD, default 5ドル.00) is enforced before any call is sent — hitting it raises an error, with no silent failover. whisperlog stats --days 30 shows your spend.

Redact before sending

Regex catches email/phone/SSN/credit-card patterns; a local Ollama pass also strips names and addresses (better recall, still imperfect):

whisperlog redact archive/.../transcript.txt --ollama --out cleaned.txt
whisperlog enrich cleaned.txt --backend claude-api --task summarize

Regex alone is not real DLP. Run with --ollama for anything sensitive.

Configuration

All settings load from .env (see .env.example) via pydantic-settings; CLI flags override per call. Notable keys:

Key Default Purpose
WHISPERLOG_ARCHIVE_DIR ~/Documents/whisperlog-archive Where the archive lives.
WHISPERLOG_STATE_DIR ~/.whisperlog DB, audit log, enrich log.
WHISPER_MODEL small.en Whisper model size (see guide below).
WHISPER_DEVICE / WHISPER_COMPUTE_TYPE auto / int8 float16 only on Nvidia GPUs.
DEFAULT_ENRICH_BACKEND ollama ollama | claude-api | claude-cli.
OLLAMA_HOST / OLLAMA_MODEL http://localhost:11434 / qwen2.5:7b Local LLM.
CLAUDE_MODEL / CLAUDE_AGENT_MODEL claude-sonnet-4-6 / claude-opus-4-7 Enrich vs. agent default models.
MAX_DAILY_CLAUDE_USD / COST_CONFIRM_USD 5.00 / 0.10 Daily cap and per-call confirmation threshold.

The Anthropic key is never read from .env — it lives in the OS keychain.

Whisper model size guide

Model Disk RAM Speed (M-series) Speed (CPU) Quality
tiny.en 75 MB ~1 GB ×ばつ real-time ×ばつ rough
base.en 142 MB ~1 GB ×ばつ ×ばつ usable
small.en 466 MB ~2 GB ×ばつ ×ばつ good default
medium.en 1.5 GB ~5 GB ×ばつ &l×ばつ great on M-series
large-v3 2.9 GB ~10 GB ×ばつ painful best

Custom prompts & agent workflows

Prompts. Drop a file in prompts/<task>.md using {{transcript}} as the placeholder. The CLI auto-discovers it: whisperlog enrich /path/to/transcript.txt --task my-custom-task. Bundled prompts include summarize, meeting_notes, action_items, followup_email, calendar_ics, coding_session, and identify_speakers.

Agent workflows. A workflow is a YAML file with a steps list; each step has a name, an output filename, and either a template (referencing prompts/<template>.md) or an inline prompt:

steps:
 - name: tldr
 template: summarize
 output: tldr.md
 - name: facts-only
 output: facts.md
 prompt: |
 Extract only verifiable factual claims from the transcript below.
 Output a numbered list. Skip opinions and intentions.

 ---

 {{transcript}}

Run it: whisperlog agent custom archive/.../transcript.txt --workflow my_workflow.yaml.

MCP server

Expose your transcript archive to Claude Desktop or Claude Code:

uv pip install -e '.[mcp]'

Add this to your MCP config:

{
 "mcpServers": {
 "whisperlog": { "command": "whisperlog-mcp" }
 }
}

Tools the server provides:

  • search_transcripts(query, limit) — FTS5 search
  • get_transcript(recording_id) — full text
  • list_recent(limit) — recent recordings
  • list_enrichments(recording_id) — past summaries/agent outputs

Now you can ask Claude "What did I talk about with Sarah last week?" and it queries this archive directly.

Architecture & data model

src/whisperlog/ holds the package:

  • ingest.py — copy/dedupe recordings into the archive (safe_copy, source detection).
  • transcribe.py — faster-whisper runner producing .txt/.srt/.md.
  • archive.py / db.py — SQLite FTS5 index, recording records, enrichment records, search.
  • enrich/ — pluggable backends (base.py, ollama.py, claude_api.py, claude_cli.py) + prompt loader and audit logging.
  • agent.py — multi-step workflows (meeting_debrief, code_review, custom_workflow).
  • mcp_server.py — stdio MCP server (whisperlog-mcp).
  • config.py, secrets.py, ledger.py, redact.py, watch.py, utils.py, cli.py — settings, keychain, spend cap, PII redaction, the watch daemon, helpers, and the Typer entry point.

State lives in three places: the archive (~/Documents/whisperlog-archive/), ~/.whisperlog/ (DB, audit log, enrich log), and OS keychain entries under whisperlog.

Troubleshooting

  • Device not detected. On macOS, look for /Volumes/IC RECORDER. diskutil list shows whether the OS sees the recorder.
  • Ollama call fails. Confirm ollama serve is running and ollama list shows the OLLAMA_MODEL. There's no silent fallback — intentional.
  • "No Anthropic API key in OS keychain." Run whisperlog config set-key anthropic; re-run to rotate.
  • Cost-cap error mid-day. Raise MAX_DAILY_CLAUDE_USD or wait — the cap is per local calendar day.

Development

uv pip install -e '.[dev,all]'
ruff check .
pytest

Tests live in tests/ (redaction, ingest dedup, transcribe outputs, enrich backends, spend cap, archive search, SRT utils) and use a tmp-dir-isolated SQLite DB — they never touch your real archive or keychain.

Status

  • The core pipeline (ingest → transcribe → search), local Ollama enrichment, Claude API/CLI enrichment, agent workflows, redaction, spend cap, and the MCP server are all implemented and covered by the test suite.
  • src/ux570_transcribe/ is a stale artifact from the project rename (commit d5071ea, ux570-transcribe → whisperlog). It contains only __pycache__, .DS_Store, and an empty enrich/ subfolder — no live code. The wheel build target is src/whisperlog, so it does not ship. Safe to delete.

License

MIT © Eric Baruch

About

Local-first audio transcription pipeline (Whisper + SQLite/FTS5, optional Claude/Ollama enrichment, MCP). Auto-detects Sony ICD-UX570; works with any audio source.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

Contributors

AltStyle によって変換されたページ (->オリジナル) /