-
Notifications
You must be signed in to change notification settings - Fork 1
Releases: Arthur-Ficial/ohr
Releases · Arthur-Ficial/ohr
v0.1.6 — fix --listen SIGTRAP
Assets 3
v0.1.5 — QA Report & Documentation
Comprehensive QA Report
What was tested
- Formats: m4a, wav, mp3, aiff, flac — all 5 work
- Durations: 5s, 30s, 1min, 3min, 10min — all work, no upper limit found
- Performance: ~50x real-time on Apple M2 (10 min audio → 4.7 seconds)
- Edge cases: empty files, corrupted data, silence, wrong format — all handled cleanly
- Languages: 30 reported, English verified, others need real speech testing
- Server: All endpoints, all response formats, security, CORS — all passing
How it was tested
All tests used synthetic speech from macOS say command. This is an important caveat documented honestly in the README and docs/testing.md. Real human speech with accents, noise, and natural pauses has NOT been tested and will likely produce lower accuracy.
Known limitations documented
- ~90-95% accuracy on clear synthetic speech
- Number/ordinal confusion ("five second" → "52nd")
- No speaker diarization
- apfel context window limits piping of long transcripts
See docs/testing.md for the full report.
Assets 2
v0.1.4 — Demo Scripts
10+ Demo Scripts
Real-world shell scripts showcasing ohr's capabilities.
ohr only
- subtitle — generate SRT/VTT subtitles with
--save - batch-transcribe — transcribe all audio files in a directory
- audio-grep — grep-like search with timestamps and JSON output
- voice-search — search inside audio files by spoken content
- dictate — speak into a text file via live microphone
- live-caption — real-time captions in the terminal
- whisper-compat — drop-in replacement for OpenAI Whisper CLI
ohr + apfel
- minutes — meeting recording to structured meeting minutes
- action-items — extract to-dos and commitments from meetings
- translate-audio — transcribe then translate to any language
- podcast-chapters — generate timestamped chapter markers
- voice-note — record, transcribe, and optionally summarize
Quick examples
demo/subtitle lecture.m4a --save demo/audio-grep "budget" meetings/*.m4a demo/minutes standup.m4a -o markdown > standup.md demo/batch-transcribe ~/recordings/ -o srt demo/whisper-compat audio.m4a --output_format srt
Assets 2
v0.1.3 — QA + Homebrew + Integration Tests
QA + Homebrew + Integration Tests
QA Fixes
- Fixed stdin piping — binary audio was corrupted by
readLine, now uses rawFileHandle.readDataToEndOfFilewith magic byte format detection - All CLI paths verified: file, stdin, srt, vtt, json, timestamps, language, errors
- All server endpoints verified: health, models, transcription (5 formats), auth, CORS, stubs
Integration Tests (42 tests, all green)
cli_e2e_test.py— help, version, exit codes, file transcription, output formats, stdinserver_test.py— health, models, transcription (json/verbose_json/text/srt/vtt), error handling, stubssecurity_test.py— token auth, origin validation, CORS preflight
Homebrew
brew tap Arthur-Ficial/tap brew install Arthur-Ficial/tap/ohr
Test Coverage
- 109 unit tests (Swift, pure logic)
- 42 integration tests (Python, end-to-end)
- Total: 151 tests
Binary
ohr-0.1.3-arm64-macos.tar.gz— Apple Silicon macOS binary
Assets 3
v0.1.1 — Full Implementation
Full Implementation — CLI + Server + Transcription Engine
ohr is now a complete, production-grade tool matching apfel's architecture.
CLI Tool
ohr meeting.m4a # Transcribe → plain text ohr -o srt lecture.wav # Generate SRT subtitles ohr -o vtt interview.m4a # Generate VTT subtitles ohr -o json recording.mp3 # JSON with segments and timestamps ohr --timestamps meeting.m4a # Plain text with timestamps ohr meeting.m4a | apfel "summarize" # Pipe to apfel ohr --listen # Live microphone transcription cat audio.wav | ohr # Transcribe from stdin
OpenAI-Compatible HTTP Server
ohr --serve # Start server on :11434
curl -X POST http://localhost:11434/v1/audio/transcriptions \
-F file=@meeting.m4a -F model=apple-speechanalyzerFeatures
- 6 test suites, 109 tests (all green, TDD red-to-green)
- 4 output formats: plain text, JSON, SRT subtitles, VTT subtitles
- OpenAI API compatible: POST /v1/audio/transcriptions, GET /v1/models, GET /health
- Security: origin validation, Bearer token auth, CORS control
- 30 languages supported via Apple's on-device SpeechAnalyzer
- 100% on-device — no cloud, no API keys, no network for inference
- Zero dependencies beyond Hummingbird (HTTP server) and macOS 26 SDK
Architecture (same as apfel)
OhrCore— pure-logic library (testable without Speech framework)ohr— executable (CLI + HTTP server)ohr-tests— pure Swift test runner (no XCTest needed)
Assets 2
v0.1.0 — OhrCore Library
OhrCore Library — Foundation Complete
First release of ohr: on-device speech-to-text from the command line.
What's in this release
OhrCore pure-logic library with 6 test suites, 109 tests (all green):
- OhrError — Speech-to-text error classification with OpenAI API error mapping
- AudioFormat — Audio format detection from file extensions and MIME types (m4a, wav, mp3, caf, aiff, flac, mp4)
- SubtitleFormatter — SRT and VTT subtitle generation with precise timestamp formatting
- OpenAIModels — Transcription API types (TranscriptionResponse, VerboseTranscriptionResponse, segments)
- TranscriptionValidator — Request parameter validation (format, temperature, response format)
- OriginValidator — Localhost CSRF protection and Bearer token authentication
Project scaffolding
Package.swiftwith 3 targets: OhrCore (library), ohr (executable), ohr-tests (test runner)Makefilewith version bumping, build, install, release asset packaging- Pure Swift test runner (no XCTest needed, Command Line Tools only)
- Same architecture as apfel
What's next
CLI implementation, SpeechAnalyzer integration, and OpenAI-compatible HTTP server.