1. Home
2. Questions
3. AI Assist
4. Tags
5. Challenges
6. Chat
7. Articles
8. Users
9. Companies
11. Communities for your favorite technologies. Explore all Collectives
Stack Internal

Stack Overflow for Teams is now called Stack Internal. Bring the best of human thought and AI automation together at your work.
Try for free Learn more
Bring the best of human thought and AI automation together at your work. Learn more

Speaker Diarization

Asked 7 months ago

Viewed 108 times

I need to upload an audio file where two or more speakers are having a conversation, and at times their speech overlaps. The requirement is to segment the audio into distinct chunks, each corresponding clearly to a different speaker(Which speaker say what).

I have used speaker diarization for this task, which includes pitch as a parameter for distinguishing speakers. However, when speakers have a similar pitch, the model fails to separate them correctly and treats them as the same person. What additional techniques or features can I incorporate to improve the diarization accuracy and handle overlapping speech more effectively?

Here is my code:

def run_speaker_diarization(audio_path):
 print("inside speaker diarization method")
 pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization", use_auth_token="huggingface_token")
 diarization = pipeline(audio_path)
audio = AudioSegment.from_wav(audio_path)
# Output values
wav_splits = []
labels = []
# Optional: save chunks
base_dir = os.path.dirname(audio_path)
base_name = os.path.splitext(os.path.basename(audio_path))[0]
output_dir = os.path.join(base_dir, base_name + "_sliced_by_pyannote")
os.makedirs(output_dir, exist_ok=True)
for i, turn in enumerate(diarization.itertracks(yield_label=True)):
 segment, _, speaker = turn
 start = segment.start
 end = segment.end
 wav_splits.append((start, end)) # In seconds
 labels.append(speaker)
 # Optional: save chunk to file
 chunk = audio[int(start * 1000):int(end * 1000)]
 chunk_filename = f"{base_name}_Speaker{speaker}_chunk{i+1}.wav"
 chunk_path = os.path.join(output_dir, chunk_filename)
 chunk.export(chunk_path, format="wav")
 print(f" Saved: {chunk_filename}")
print(f"\n All speaker segments saved to: {output_dir}")

Improve this question

edited May 14, 2025 at 13:22

NiziL's user avatar

NiziL

5,1401 gold badge26 silver badges34 bronze badges

asked May 14, 2025 at 11:33

Anjali Pandey's user avatar

Anjali Pandey

211 bronze badge

Did you just provide us your authentication token ? Do not just edit this question, go delete it right now on hugging face.

NiziL
– NiziL

2025年05月14日 13:07:53 +00:00
Commented May 14, 2025 at 13:07

Add a comment |

0

Sorted by: Reset to default

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Draft saved

Draft discarded

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

lang-py

CollectivesTM on Stack Overflow

Speaker Diarization

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions