Newest 'diarization' Questions

1. Home
2. Questions
3. AI Assist
4. Tags
5. Challenges
6. Chat
7. Articles
8. Users
9. Companies
11. Communities for your favorite technologies. Explore all Collectives
Stack Internal

Stack Overflow for Teams is now called Stack Internal. Bring the best of human thought and AI automation together at your work.
Try for free Learn more
Bring the best of human thought and AI automation together at your work. Learn more

20 questions

0 votes

1 answer

229 views

Deepgram API returns error code 1011 during short silences or randomly in group voice transcription

At work we are building a voice-based group conversation application that uses Deepgram’s real-time transcription and diarization features. The transcription works well most of the time, but I ...

Kumar Vivek Mitra's user avatar

Kumar Vivek Mitra

33.5k

asked Apr 5, 2025 at 23:16

0 votes

0 answers

396 views

Fine-Tuning Pyannote Model for VAD Task — Issues After Training

I try to fine-tune pre train Pyannote model for VAD task. I can fine-tune it for Segmentation task and everything goes well and I can improve the model results. Here is the code how I fine-tune it: ...

Eliya's user avatar

Eliya

asked Jan 20, 2025 at 16:41

1 vote

0 answers

1k views

Use whisperx and pyannote in Colab without HuggingFace token

Hello Stack Overflow Community! I would like to use WhisperX and Pyannote as described on this GitHub to combine automatic transcription and diarization. I can do it on Colab using the Huggingface (HF)...

bscalingi's user avatar

bscalingi

asked Jul 18, 2024 at 14:05

1 vote

0 answers

562 views

Whisper and pyannote 3.1 : AttributeError: 'list' object has no attribute 'get'

I'm using this script to diarize then transcribe speach using pyannote.audio and whisper. Using pyannote 2.1, it works perfectly, but then, when I change the version used to the latest (3.1), I get ...

boredgirl's user avatar

boredgirl

asked Dec 30, 2023 at 13:11

1 vote

1 answer

195 views

Azure Speech diarization failing to tag speakers properly until a long 7second statement is spoken

Azure speech private preview for diarization was earlier setting "unknown" speaker tag until it recognise a long 7 seconds statement from a speaker, with the api in public preview it started tagging ...

Goofy's user avatar

Goofy

asked Dec 27, 2023 at 7:27

0 votes

1 answer

476 views

Google Speech-to-Text API Speaker Diarization with Python .long_running_recognize() method

I was following the answer in this question. But my audio is more then 1 min so I have to use .long_running_recognize(config, audio) method instead .recognize(config, audio). Here is the code: from ...

Vasyl Kolomiets's user avatar

Vasyl Kolomiets

asked Aug 5, 2023 at 15:07

0 votes

1 answer

93 views

Google Speech to text APIs returns only one side of the conversation

I am using Google APIs speech-to-text to transcript audio files (wav files) that are stored in GCS bucket. The audio files are phone records and have 3 speakers ( IVR, Customer, and Engineer) and the ...

Ahmed Fahmy's user avatar

Ahmed Fahmy

asked May 24, 2023 at 19:05

1 vote

3 answers

4k views

Diart (torchaudio) on Windows x64 results in torchaudio error "ImportError: FFmpeg libraries are not found. Please install FFmpeg."

I am giving a try to a speech diarization project named diart (based on hugging face models) I follow the instructions using a miniconda environment which are essentially: conda create -n diart python=...

LoneWanderer's user avatar

LoneWanderer

3,347

asked May 2, 2023 at 14:19

1 vote

1 answer

2k views

Segmention instead of diarization for speaker count estimation

I'm using diarization of pyannote to determine the number of speakers in an audio, where number of speakers cannot be predetermined. Here is the code to determine speaker count by diarization: from ...

Digil's user avatar

Digil

asked Mar 24, 2023 at 12:50

5 votes

1 answer

4k views

Efficient speaker diarization

I am running a VM instance on google cloud. My goal is to apply speaker diarization to several .wav files stored on cloud buckets. I have tried the following alternatives with the subsequent problems: ...

Luis's user avatar

Luis

asked Feb 15, 2023 at 10:17

2 votes

0 answers

564 views

Extracting voice of different speakers in overlapping speech using pyannote

I am using Pyannote for speaker diarization. I am able to get the overlapping speech's start and end time but not able to do voice separation. Is there a way to use Pyannote for voice separation? If ...

vaibhav jain's user avatar

vaibhav jain

asked Oct 19, 2022 at 8:11

0 votes

1 answer

510 views

Can speech diarization be be integrated with deepspeech?

In an online meeting such as Google Meet/ Zoom, I want to detect change of speaker and then transcribe the audio for different speakers. I am using Deepspeech model for speech to text. I have fine-...

vaibhav jain's user avatar

vaibhav jain

asked Oct 17, 2022 at 7:24

1 vote

1 answer

759 views

AttributeError: 'NoneType' object has no attribute 'items' in pyannote speaker diarization package

When working with the pyannote python package from GitHub (tutorial link -> https://github.com/pyannote/pyannote-audio/blob/develop/tutorials/voice_activity_detection.ipynb) I receive the following ...

askrobola's user avatar

askrobola

asked Jun 13, 2022 at 17:33

2 votes

1 answer

3k views

How can I count the number of people speaks in an audio file

I'm working on an audio project. My goal is to count the number of people who spokes in an audio file. We can consider that we already removed the noise from that audio.(for example, if there are two ...

Kacem ICHAKDI's user avatar

Kacem ICHAKDI

asked May 24, 2022 at 9:08

0 votes

0 answers

600 views

How to split 1 channel audio into 2 channels?

I have an audio file with two speakers on 1 channel. I would like to separate the audio in 2 channels (one per speaker). I was thinking of splitting on silences, or more complicated things like ...

Lucas's user avatar

Lucas

asked Apr 28, 2022 at 7:54

15 30 50 per page

2 Next

CollectivesTM on Stack Overflow

Deepgram API returns error code 1011 during short silences or randomly in group voice transcription

Fine-Tuning Pyannote Model for VAD Task — Issues After Training

Use whisperx and pyannote in Colab without HuggingFace token

Whisper and pyannote 3.1 : AttributeError: 'list' object has no attribute 'get'

Azure Speech diarization failing to tag speakers properly until a long 7second statement is spoken

Google Speech-to-Text API Speaker Diarization with Python .long_running_recognize() method

Google Speech to text APIs returns only one side of the conversation

Diart (torchaudio) on Windows x64 results in torchaudio error "ImportError: FFmpeg libraries are not found. Please install FFmpeg."

Segmention instead of diarization for speaker count estimation

Efficient speaker diarization

Extracting voice of different speakers in overlapping speech using pyannote

Can speech diarization be be integrated with deepspeech?

AttributeError: 'NoneType' object has no attribute 'items' in pyannote speaker diarization package

How can I count the number of people speaks in an audio file

How to split 1 channel audio into 2 channels?

Hot Network Questions