pyannote

💚 Simply detect, segment, label, and separate speakers in any language

Github Hugging Face Discord LinkedIn X
Playground Documentation

🎤 What is speaker diarization?

Speaker diarization is the process of automatically partitioning the audio recording of a conversation into segments and labeling them by speaker, answering the question "who spoke when?". As the foundational layer of conversational AI, speaker diarization provides high-level insights for human-human and human-machine conversations, and unlocks a wide range of downstream applications: meeting transcription, call center analytics, voice agents, video dubbing.

▶️ Getting started

Install pyannote.audio latest release available from Latest release with either uv (recommended) or pip:

$ uv add pyannote.audio
$ pip install pyannote.audio

Enjoy state-of-the-art speaker diarization:

# download pretrained pipeline from Huggingface
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained('pyannote/speaker-diarization-community-1', token="HUGGINGFACE_TOKEN")
# perform speaker diarization locally
output = pipeline('/path/to/audio.wav')
# enjoy state-of-the-art speaker diarization
for turn, speaker in output.speaker_diarization:
 print(f"{speaker} speaks between t={turn.start}s and t={turn.end}s")

Read community-1 model card to make the most of it.

🏆 State-of-the-art models

pyannoteAI research team trains cutting-edge speaker diarization models, thanks to Jean Zay 🇫🇷 supercomputer managed by GENCI 💚. They come in two flavors:

pyannote.audio open models available on Huggingface and used by 140k+ developers over the world ;
premium models available on pyannoteAI cloud (and on-premise for enterprise customers) that provide state-of-the-art speaker diarization as well as additional enterprise features.

Benchmark (last updated in 2025-09)	`legacy` (3.1)	`community-1`	`precision-2`
AISHELL-4	12.2	11.7	11.4 🏆
AliMeeting (channel 1)	24.5	20.3	15.2 🏆
AMI (IHM)	18.8	17.0	12.9 🏆
AMI (SDM)	22.7	19.9	15.6 🏆
AVA-AVD	49.7	44.6	37.1 🏆
CALLHOME (part 2)	28.5	26.7	16.6 🏆
DIHARD 3 (full)	21.4	20.2	14.7 🏆
Ego4D (dev.)	51.2	46.8	39.0 🏆
MSDWild	25.4	22.8	17.3 🏆
RAMC	22.2	20.8	10.5 🏆
REPERE (phase2)	7.9	8.9	7.4 🏆
VoxConverse (v0.3)	11.2	11.2	8.5 🏆

Diarization error rate (in %, the lower, the better)

⏩️ Going further, better, and faster

precision-2 premium model further improves accuracy, processing speed, as well as brings additional features.

Features	`community-1`	`precision-2`
Set exact/min/max number of speakers	✅	✅
Exclusive speaker diarization (for transcription)	✅	✅
Segmentation confidence scores	❌	✅
Speaker confidence scores	❌	✅
Voiceprinting	❌	✅
Speaker identification	❌	✅
Time to process 1h of audio (on H100)	37s	14s

Create a pyannoteAI account, change one line of code, and enjoy free cloud credits to try precision-2 premium diarization:

# perform premium speaker diarization on pyannoteAI cloud
pipeline = Pipeline.from_pretrained('pyannote/speaker-diarization-precision-2', token="PYANNOTEAI_API_KEY")
better_output = pipeline('/path/to/audio.wav')

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pyannote

💚 Simply detect, segment, label, and separate speakers in any language

🎤 What is speaker diarization?

▶️ Getting started

🏆 State-of-the-art models

⏩️ Going further, better, and faster

Pinned Loading

Repositories

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

People

Top languages

Uh oh!

Most used topics

Uh oh!