Newest 'speaker-diarization' Questions

1. Home
2. Questions
3. AI Assist
4. Tags
5. Challenges
6. Chat
7. Articles
8. Users
9. Companies
11. Communities for your favorite technologies. Explore all Collectives
Stack Internal

Stack Overflow for Teams is now called Stack Internal. Bring the best of human thought and AI automation together at your work.
Try for free Learn more
Bring the best of human thought and AI automation together at your work. Learn more

16 questions

0 votes

0 answers

149 views

Why is pyannote speaker diarization returning "Unknown" for speaker label in real-time audio processing?

I'm working on a real-time speech processing pipeline using pyannote-audio, and I’m using the pyannote/speaker-diarization-3.1 pipeline with Hugging Face token authentication. My code captures live ...

Hadil Sghair's user avatar

Hadil Sghair

asked May 21, 2025 at 12:10

2 votes

1 answer

267 views

Trying to build azure speech program that can transcribe and diarize audio real-time, how do I do this on javascript/html? Can't find working examples

I specifically am trying to build an application that can run an html-javascript file that can recognize the speech input from a microphone, transcribe it, and assign it to a speaker, continuously ...

user29960912's user avatar

user29960912

asked Mar 10, 2025 at 20:04

0 votes

0 answers

19 views

404 Error during Azure Speaker Identification despite valid profiles

I’m using Azure’s Speaker Recognition API for speaker identification in my Python script, but I’m encountering a 404 error with the message: Resource not found This error occurs when I try to identify ...

Om Ladumor's user avatar

Om Ladumor

asked Feb 5, 2025 at 19:58

0 votes

0 answers

262 views

How to use speech_recognition and pyannote.audio simultaneously

How can I use the data from speech_recognition's listen() function as an embedding to compare with previously recorded .wav files of different speakers talking so that I can print (speaker): (...

Flamethrower's user avatar

Flamethrower

asked Jan 21, 2025 at 17:08

0 votes

0 answers

395 views

Fine-Tuning Pyannote Model for VAD Task — Issues After Training

I try to fine-tune pre train Pyannote model for VAD task. I can fine-tune it for Segmentation task and everything goes well and I can improve the model results. Here is the code how I fine-tune it: ...

Eliya's user avatar

Eliya

asked Jan 20, 2025 at 16:41

0 votes

1 answer

200 views

Cache Nvidia Nemo model

How do I cache the NVIDIA Nemo model diar_msdd_telephonic.nemo so it is pre downloaded to be referenced in my config file. I am using msdd_model = NeuralDiarizer(cfg=create_config(temp_path)).to("...

Simon Palmer's user avatar

Simon Palmer

asked Jan 5, 2025 at 8:02

1 vote

0 answers

164 views

How to leverage pyannote / speaker-diarization-3.0 with Transformers.js?

pyannote/segmentation-3.0 suggests to use pyannote/speaker-diarization-3.0 since it has better embedding model for diarization. I am trying to use this in client-side JS. It seems like I am supposed ...

Student's user avatar

Student

asked Dec 20, 2024 at 3:49

1 vote

2 answers

5k views

Pyannote: Load and Apply Speaker Diarization Offline

I try to use Pyannotes models offline. I was loading and applying models like this: from pyannote.audio import Pipeline access_token = 'xxxxxxxxxxx' model = Pipeline.from_pretrained( "...

Tütü's user avatar

Tütü

asked Aug 1, 2024 at 12:28

0 votes

1 answer

165 views

Speaker identification embeddings audio fragment length

I have a base of audio samples matched with concrete speaker like nick_sample1.mp3 nick_sample2.mp3 ... nick_sampleN.mp3 john_sample1.mp3 john_sample2.mp3 ... john_sampleK.mp3 The task is to match a ...

Anton Maiorov's user avatar

Anton Maiorov

asked Jul 2, 2024 at 8:53

2 votes

1 answer

348 views

Is there any way to transliterate hindi audio to english using OpenAI whisper

I have task where given an audio file I have to perform speaker diarization on the audio file and then I have to perform the transcription accordingly. For speaker diarization I am using pyannote, ...

Chaitanya Kale's user avatar

Chaitanya Kale

asked May 31, 2024 at 6:12

2 votes

2 answers

9k views

RuntimeError: Library cublas64_12.dll is not found or cannot be loaded. While using WhisperX diarization

I was trying to use whisperx to do speaker diarization. I did it sucessfully on google colab but I'm encountering this error while tyring to transcribe the audio file. Traceback (most recent call last)...

St.Destiny's user avatar

St.Destiny

asked Apr 13, 2024 at 11:18

1 vote

0 answers

562 views

Whisper and pyannote 3.1 : AttributeError: 'list' object has no attribute 'get'

I'm using this script to diarize then transcribe speach using pyannote.audio and whisper. Using pyannote 2.1, it works perfectly, but then, when I change the version used to the latest (3.1), I get ...

boredgirl's user avatar

boredgirl

asked Dec 30, 2023 at 13:11

1 vote

1 answer

195 views

Azure Speech diarization failing to tag speakers properly until a long 7second statement is spoken

Azure speech private preview for diarization was earlier setting "unknown" speaker tag until it recognise a long 7 seconds statement from a speaker, with the api in public preview it started tagging ...

Goofy's user avatar

Goofy

asked Dec 27, 2023 at 7:27

1 vote

0 answers

300 views

Why am I getting "index 0 is out of bounds for axis 0 with size 0 when using pyAudioAnalysis library?

This question is about Speaker diarization. I'm trying to make a script that separates a mp4 file into different segments depending on different speakers. (The input mp4 file contains the dialogue of ...

RonaLightfoot's user avatar

RonaLightfoot

asked Aug 23, 2023 at 4:11

5 votes

1 answer

6k views

Way to Offline Speaker Diarization with Hugging Face

I am looking for Offline / locally saved model for speaker diarization with Hugging face without Authentication. I have gone through google and found no relevant links for the same. Is there any link/...

san1's user avatar

san1

asked Jul 26, 2023 at 9:28

15 30 50 per page

2 Next

CollectivesTM on Stack Overflow

Why is pyannote speaker diarization returning "Unknown" for speaker label in real-time audio processing?

Trying to build azure speech program that can transcribe and diarize audio real-time, how do I do this on javascript/html? Can't find working examples

404 Error during Azure Speaker Identification despite valid profiles

How to use speech_recognition and pyannote.audio simultaneously

Fine-Tuning Pyannote Model for VAD Task — Issues After Training

Cache Nvidia Nemo model

How to leverage pyannote / speaker-diarization-3.0 with Transformers.js?

Pyannote: Load and Apply Speaker Diarization Offline

Speaker identification embeddings audio fragment length

Is there any way to transliterate hindi audio to english using OpenAI whisper

RuntimeError: Library cublas64_12.dll is not found or cannot be loaded. While using WhisperX diarization

Whisper and pyannote 3.1 : AttributeError: 'list' object has no attribute 'get'

Azure Speech diarization failing to tag speakers properly until a long 7second statement is spoken

Why am I getting "index 0 is out of bounds for axis 0 with size 0 when using pyAudioAnalysis library?

Way to Offline Speaker Diarization with Hugging Face

Hot Network Questions