Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Releases: FoxNoseTech/diarize

v0.1.2

06 May 10:03
@loookashow loookashow
4f25d27
This commit was created on GitHub.com and signed with GitHub’s verified signature.
GPG key ID: B5690EEEBB952194
Verified
Learn about vigilant mode.

Choose a tag to compare

What's Changed

diarize 0.1.2 focuses on diarization quality, reproducible benchmarks, and clearer accuracy documentation.

Improvements

  • Reduced short speaker label switching with temporal smoothing during diarization assembly.
  • Improved automatic speaker-count selection with silhouette refinement plus a small larger-k prior.
  • Added scripts/benchmark_rttm.py for reproducible audio+RTTM benchmark runs across VoxConverse, AMI, and similar datasets.

Benchmarks and Docs

  • Updated VoxConverse dev benchmark numbers:
    • Weighted DER: ~4.8%
    • Speaker count: 125/216 exact, 178/216 within ±1
  • Added preliminary AMI Mix-Headset test validation:
    • Weighted DER: 14.96%
    • Speaker count: 4/16 exact, 8/16 within ±1
  • Documented known limitations around speaker-count errors and speaker label fragmentation.
  • Added a Changelog page to the documentation.

Package

  • Synced package metadata and runtime diarize.__version__ to 0.1.2.
Assets 2
Loading

v0.1.1

06 Mar 17:20
@loookashow loookashow

Choose a tag to compare

This patch release fixes dependency compatibility for audio loading.

Fixed

  • Pinned torch and torchaudio to a compatible range:
    • torch>=1.13,<2.9
    • torchaudio>=0.13,<2.9
  • Prevents failures where newer torchaudio requires torchcodec.

Docs

  • Clarified that diarize now installs a compatible torch/torchaudio range automatically.

No API changes.

Loading

v0.1.0 — Initial Release

01 Mar 11:30
@loookashow loookashow

Choose a tag to compare

diarize v0.1.0

Speaker diarization for Python — answers "who spoke when?" in any audio file. CPU-only, no GPU, no API keys, no account signup.

Highlights

  • ~10.8% DER on VoxConverse dev set — lower than pyannote's free models (community-1 and 3.1 legacy, both ~11.2%)
  • ~8x faster than real-time on CPU (RTF 0.12 vs pyannote community-1's 0.86)
  • Automatic speaker count detection via GMM BIC with silhouette refinement (1–7 speakers)
  • Zero setup frictionpip install diarize and you're done, no HuggingFace token or account needed

Pipeline

Silero VAD → WeSpeaker ResNet34-LM (ONNX) → GMM BIC → Spectral Clustering

All four stages run on CPU. All components are open-source with permissive licenses.

Usage

from diarize import diarize
result = diarize("meeting.wav")
for seg in result.segments:
 print(f" [{seg.start:.1f}s - {seg.end:.1f}s] {seg.speaker}")

Known Limitations

  • Benchmarked on a single dataset (VoxConverse). Cross-dataset validation is planned.
  • Speaker count estimation degrades for 8+ speakers — pass num_speakers explicitly when known.
  • Overlapping speech is not modeled — each segment is assigned to one speaker.
Loading

AltStyle によって変換されたページ (->オリジナル) /