16 questions
- Bountied 0
- Unanswered
- Frequent
- Score
- Trending
- Week
- Month
- Unanswered (my tags)
0
votes
0
answers
149
views
Why is pyannote speaker diarization returning "Unknown" for speaker label in real-time audio processing?
I'm working on a real-time speech processing pipeline using pyannote-audio, and I’m using the pyannote/speaker-diarization-3.1 pipeline with Hugging Face token authentication.
My code captures live ...
2
votes
1
answer
267
views
Trying to build azure speech program that can transcribe and diarize audio real-time, how do I do this on javascript/html? Can't find working examples
I specifically am trying to build an application that can run an html-javascript file that can recognize the speech input from a microphone, transcribe it, and assign it to a speaker, continuously ...
0
votes
0
answers
19
views
404 Error during Azure Speaker Identification despite valid profiles
I’m using Azure’s Speaker Recognition API for speaker identification in my Python script, but I’m encountering a 404 error with the message:
Resource not found
This error occurs when I try to identify ...
0
votes
0
answers
262
views
How to use speech_recognition and pyannote.audio simultaneously
How can I use the data from speech_recognition's listen() function as an embedding to compare with previously recorded .wav files of different speakers talking so that I can print (speaker): (...
0
votes
0
answers
395
views
Fine-Tuning Pyannote Model for VAD Task — Issues After Training
I try to fine-tune pre train Pyannote model for VAD task.
I can fine-tune it for Segmentation task and everything goes well and I can improve the model results.
Here is the code how I fine-tune it:
...
0
votes
1
answer
200
views
Cache Nvidia Nemo model
How do I cache the NVIDIA Nemo model diar_msdd_telephonic.nemo so it is pre downloaded to be referenced in my config file.
I am using
msdd_model = NeuralDiarizer(cfg=create_config(temp_path)).to("...
1
vote
0
answers
164
views
How to leverage pyannote / speaker-diarization-3.0 with Transformers.js?
pyannote/segmentation-3.0 suggests to use pyannote/speaker-diarization-3.0 since it has better embedding model for diarization. I am trying to use this in client-side JS. It seems like I am supposed ...
1
vote
2
answers
5k
views
Pyannote: Load and Apply Speaker Diarization Offline
I try to use Pyannotes models offline.
I was loading and applying models like this:
from pyannote.audio import Pipeline
access_token = 'xxxxxxxxxxx'
model = Pipeline.from_pretrained(
"...
0
votes
1
answer
165
views
Speaker identification embeddings audio fragment length
I have a base of audio samples matched with concrete speaker like
nick_sample1.mp3
nick_sample2.mp3
...
nick_sampleN.mp3
john_sample1.mp3
john_sample2.mp3
...
john_sampleK.mp3
The task is to match a ...
2
votes
1
answer
348
views
Is there any way to transliterate hindi audio to english using OpenAI whisper
I have task where given an audio file I have to perform speaker diarization on the audio file and then I have to perform the transcription accordingly.
For speaker diarization I am using pyannote, ...
2
votes
2
answers
9k
views
RuntimeError: Library cublas64_12.dll is not found or cannot be loaded. While using WhisperX diarization
I was trying to use whisperx to do speaker diarization. I did it sucessfully on google colab but I'm encountering this error while tyring to transcribe the audio file.
Traceback (most recent call last)...
1
vote
0
answers
562
views
Whisper and pyannote 3.1 : AttributeError: 'list' object has no attribute 'get'
I'm using this script to diarize then transcribe speach using pyannote.audio and whisper. Using pyannote 2.1, it works perfectly, but then, when I change the version used to the latest (3.1), I get ...
1
vote
1
answer
195
views
Azure Speech diarization failing to tag speakers properly until a long 7second statement is spoken
Azure speech private preview for diarization was earlier setting "unknown" speaker tag until it recognise a long 7 seconds statement from a speaker, with the api in public preview it started tagging ...
1
vote
0
answers
300
views
Why am I getting "index 0 is out of bounds for axis 0 with size 0 when using pyAudioAnalysis library?
This question is about Speaker diarization. I'm trying to make a script that separates a mp4 file into different segments depending on different speakers. (The input mp4 file contains the dialogue of ...
5
votes
1
answer
6k
views
Way to Offline Speaker Diarization with Hugging Face
I am looking for Offline / locally saved model for speaker diarization with Hugging face without Authentication.
I have gone through google and found no relevant links for the same.
Is there any link/...