Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Releases: Arthur-Ficial/ohr

v0.1.6 — fix --listen SIGTRAP

15 Apr 12:46
@Arthur-Ficial Arthur-Ficial

Choose a tag to compare

Fixes SIGTRAP on ohr --listen (#1) via SpeechAnalyzer.bestAvailableAudioFormat + AVAudioConverter. TCC keys in Info.plist. Locale fallback. Thanks @pallas.

brew upgrade ohr

Contributors

pallas
Assets 3
Loading

v0.1.5 — QA Report & Documentation

04 Apr 08:08
@Arthur-Ficial Arthur-Ficial

Choose a tag to compare

Comprehensive QA Report

What was tested

  • Formats: m4a, wav, mp3, aiff, flac — all 5 work
  • Durations: 5s, 30s, 1min, 3min, 10min — all work, no upper limit found
  • Performance: ~50x real-time on Apple M2 (10 min audio → 4.7 seconds)
  • Edge cases: empty files, corrupted data, silence, wrong format — all handled cleanly
  • Languages: 30 reported, English verified, others need real speech testing
  • Server: All endpoints, all response formats, security, CORS — all passing

How it was tested

All tests used synthetic speech from macOS say command. This is an important caveat documented honestly in the README and docs/testing.md. Real human speech with accents, noise, and natural pauses has NOT been tested and will likely produce lower accuracy.

Known limitations documented

  • ~90-95% accuracy on clear synthetic speech
  • Number/ordinal confusion ("five second" → "52nd")
  • No speaker diarization
  • apfel context window limits piping of long transcripts

See docs/testing.md for the full report.

Loading

v0.1.4 — Demo Scripts

04 Apr 07:34
@Arthur-Ficial Arthur-Ficial

Choose a tag to compare

10+ Demo Scripts

Real-world shell scripts showcasing ohr's capabilities.

ohr only

  • subtitle — generate SRT/VTT subtitles with --save
  • batch-transcribe — transcribe all audio files in a directory
  • audio-grep — grep-like search with timestamps and JSON output
  • voice-search — search inside audio files by spoken content
  • dictate — speak into a text file via live microphone
  • live-caption — real-time captions in the terminal
  • whisper-compat — drop-in replacement for OpenAI Whisper CLI

ohr + apfel

  • minutes — meeting recording to structured meeting minutes
  • action-items — extract to-dos and commitments from meetings
  • translate-audio — transcribe then translate to any language
  • podcast-chapters — generate timestamped chapter markers
  • voice-note — record, transcribe, and optionally summarize

Quick examples

demo/subtitle lecture.m4a --save
demo/audio-grep "budget" meetings/*.m4a
demo/minutes standup.m4a -o markdown > standup.md
demo/batch-transcribe ~/recordings/ -o srt
demo/whisper-compat audio.m4a --output_format srt
Loading

v0.1.3 — QA + Homebrew + Integration Tests

03 Apr 23:28
@Arthur-Ficial Arthur-Ficial

Choose a tag to compare

QA + Homebrew + Integration Tests

QA Fixes

  • Fixed stdin piping — binary audio was corrupted by readLine, now uses raw FileHandle.readDataToEndOfFile with magic byte format detection
  • All CLI paths verified: file, stdin, srt, vtt, json, timestamps, language, errors
  • All server endpoints verified: health, models, transcription (5 formats), auth, CORS, stubs

Integration Tests (42 tests, all green)

  • cli_e2e_test.py — help, version, exit codes, file transcription, output formats, stdin
  • server_test.py — health, models, transcription (json/verbose_json/text/srt/vtt), error handling, stubs
  • security_test.py — token auth, origin validation, CORS preflight

Homebrew

brew tap Arthur-Ficial/tap
brew install Arthur-Ficial/tap/ohr

Test Coverage

  • 109 unit tests (Swift, pure logic)
  • 42 integration tests (Python, end-to-end)
  • Total: 151 tests

Binary

  • ohr-0.1.3-arm64-macos.tar.gz — Apple Silicon macOS binary
Loading

v0.1.1 — Full Implementation

03 Apr 23:13
@Arthur-Ficial Arthur-Ficial

Choose a tag to compare

Full Implementation — CLI + Server + Transcription Engine

ohr is now a complete, production-grade tool matching apfel's architecture.

CLI Tool

ohr meeting.m4a # Transcribe → plain text
ohr -o srt lecture.wav # Generate SRT subtitles
ohr -o vtt interview.m4a # Generate VTT subtitles
ohr -o json recording.mp3 # JSON with segments and timestamps
ohr --timestamps meeting.m4a # Plain text with timestamps
ohr meeting.m4a | apfel "summarize" # Pipe to apfel
ohr --listen # Live microphone transcription
cat audio.wav | ohr # Transcribe from stdin

OpenAI-Compatible HTTP Server

ohr --serve # Start server on :11434
curl -X POST http://localhost:11434/v1/audio/transcriptions \
 -F file=@meeting.m4a -F model=apple-speechanalyzer

Features

  • 6 test suites, 109 tests (all green, TDD red-to-green)
  • 4 output formats: plain text, JSON, SRT subtitles, VTT subtitles
  • OpenAI API compatible: POST /v1/audio/transcriptions, GET /v1/models, GET /health
  • Security: origin validation, Bearer token auth, CORS control
  • 30 languages supported via Apple's on-device SpeechAnalyzer
  • 100% on-device — no cloud, no API keys, no network for inference
  • Zero dependencies beyond Hummingbird (HTTP server) and macOS 26 SDK

Architecture (same as apfel)

  • OhrCore — pure-logic library (testable without Speech framework)
  • ohr — executable (CLI + HTTP server)
  • ohr-tests — pure Swift test runner (no XCTest needed)
Loading

v0.1.0 — OhrCore Library

03 Apr 22:59
@Arthur-Ficial Arthur-Ficial

Choose a tag to compare

OhrCore Library — Foundation Complete

First release of ohr: on-device speech-to-text from the command line.

What's in this release

OhrCore pure-logic library with 6 test suites, 109 tests (all green):

  • OhrError — Speech-to-text error classification with OpenAI API error mapping
  • AudioFormat — Audio format detection from file extensions and MIME types (m4a, wav, mp3, caf, aiff, flac, mp4)
  • SubtitleFormatter — SRT and VTT subtitle generation with precise timestamp formatting
  • OpenAIModels — Transcription API types (TranscriptionResponse, VerboseTranscriptionResponse, segments)
  • TranscriptionValidator — Request parameter validation (format, temperature, response format)
  • OriginValidator — Localhost CSRF protection and Bearer token authentication

Project scaffolding

  • Package.swift with 3 targets: OhrCore (library), ohr (executable), ohr-tests (test runner)
  • Makefile with version bumping, build, install, release asset packaging
  • Pure Swift test runner (no XCTest needed, Command Line Tools only)
  • Same architecture as apfel

What's next

CLI implementation, SpeechAnalyzer integration, and OpenAI-compatible HTTP server.

Loading

AltStyle によって変換されたページ (->オリジナル) /