Skip to main content
Stack Overflow
  1. About
  2. For Teams
Filter by
Sorted by
Tagged with
0 votes
1 answer
229 views

At work we are building a voice-based group conversation application that uses Deepgram’s real-time transcription and diarization features. The transcription works well most of the time, but I ...
0 votes
0 answers
396 views

I try to fine-tune pre train Pyannote model for VAD task. I can fine-tune it for Segmentation task and everything goes well and I can improve the model results. Here is the code how I fine-tune it: ...
1 vote
0 answers
1k views

Hello Stack Overflow Community! I would like to use WhisperX and Pyannote as described on this GitHub to combine automatic transcription and diarization. I can do it on Colab using the Huggingface (HF)...
1 vote
0 answers
562 views

I'm using this script to diarize then transcribe speach using pyannote.audio and whisper. Using pyannote 2.1, it works perfectly, but then, when I change the version used to the latest (3.1), I get ...
1 vote
1 answer
195 views

Azure speech private preview for diarization was earlier setting "unknown" speaker tag until it recognise a long 7 seconds statement from a speaker, with the api in public preview it started tagging ...
0 votes
1 answer
476 views

I was following the answer in this question. But my audio is more then 1 min so I have to use .long_running_recognize(config, audio) method instead .recognize(config, audio). Here is the code: from ...
0 votes
1 answer
93 views

I am using Google APIs speech-to-text to transcript audio files (wav files) that are stored in GCS bucket. The audio files are phone records and have 3 speakers ( IVR, Customer, and Engineer) and the ...
1 vote
3 answers
4k views

I am giving a try to a speech diarization project named diart (based on hugging face models) I follow the instructions using a miniconda environment which are essentially: conda create -n diart python=...
1 vote
1 answer
2k views

I'm using diarization of pyannote to determine the number of speakers in an audio, where number of speakers cannot be predetermined. Here is the code to determine speaker count by diarization: from ...
5 votes
1 answer
4k views

I am running a VM instance on google cloud. My goal is to apply speaker diarization to several .wav files stored on cloud buckets. I have tried the following alternatives with the subsequent problems: ...
2 votes
0 answers
564 views

I am using Pyannote for speaker diarization. I am able to get the overlapping speech's start and end time but not able to do voice separation. Is there a way to use Pyannote for voice separation? If ...
0 votes
1 answer
510 views

In an online meeting such as Google Meet/ Zoom, I want to detect change of speaker and then transcribe the audio for different speakers. I am using Deepspeech model for speech to text. I have fine-...
1 vote
1 answer
759 views

When working with the pyannote python package from GitHub (tutorial link -> https://github.com/pyannote/pyannote-audio/blob/develop/tutorials/voice_activity_detection.ipynb) I receive the following ...
2 votes
1 answer
3k views

I'm working on an audio project. My goal is to count the number of people who spokes in an audio file. We can consider that we already removed the noise from that audio.(for example, if there are two ...
0 votes
0 answers
600 views

I have an audio file with two speakers on 1 channel. I would like to separate the audio in 2 channels (one per speaker). I was thinking of splitting on silences, or more complicated things like ...

15 30 50 per page
1
2

AltStyle によって変換されたページ (->オリジナル) /