555 questions
- Bountied 0
- Unanswered
- Frequent
- Score
- Trending
- Week
- Month
- Unanswered (my tags)
0
votes
0
answers
55
views
TarsosDSP Pitch Detection Implementation: Sudden Pitch Drops After Note Release with FFT_YIN
Introduction
I'm building a real-time pitch detection app in Kotlin/Android using TarsosDSP. The app captures audio input, detects the fundamental frequency using the FFT_YIN algorithm, and displays ...
0
votes
0
answers
214
views
How to transcribe local audio File/Blob with Transformers.js pipeline? (JSON.parse error)
I'm working on a browser-based audio transcription app using Transformers.js by Xenova. I'm trying to transcribe a .wav file selected by the user using the following code:
import { pipeline } from '@...
0
votes
0
answers
150
views
Why is pyannote speaker diarization returning "Unknown" for speaker label in real-time audio processing?
I'm working on a real-time speech processing pipeline using pyannote-audio, and I’m using the pyannote/speaker-diarization-3.1 pipeline with Hugging Face token authentication.
My code captures live ...
2
votes
0
answers
106
views
Speaker Diarization
I need to upload an audio file where two or more speakers are having a conversation, and at times their speech overlaps. The requirement is to segment the audio into distinct chunks, each ...
1
vote
0
answers
214
views
How to Link Zoom (X-Axis) of Two Separate Plotly Plots in Streamlit?
I want to visualize audio data in Streamlit with two separate Plotly plots: one for the Time Domain waveform and one for the MFCC (Mel-frequency cepstral coefficients). I want to link their X-axes so ...
0
votes
0
answers
63
views
PJSIP audio has low volume in beginning, if aec enabled
After a call negotiated and connected, for first 5 seconds (approx.), outgoing (tx) sound very low and many times distorted. If after a long time silence, same case occurred. If we disable aec (...
0
votes
1
answer
58
views
None Gradients for a model with 2 outputs
I have a model that has a GRU implementation inside and process audio samples. In each forward path I process a single sample of an audio file. To imitate the GRU behavior correctly, I have returned ...
1
vote
0
answers
269
views
Twilio Real-Time Media Streaming to WebSocket Receives Only Noise Instead of Speech
I'm setting up a Twilio Voice call with real-time media streaming to a WebSocket server for speech-to-text processing using Google Cloud Speech-to-Text. The connection is established successfully, and ...
0
votes
0
answers
322
views
Trying to Collect Audio from YouTube video without Downloading
I am trying to get just the audio data of songs from YouTube videos to analyze without downloading (Python). I started with using yt-dlp with the following code
def search_youtube(song_name, ...
0
votes
0
answers
73
views
PyAnnote Speaker Verification: All Speakers Getting Perfect 1.000 Similarity Scores
I'm experiencing an issue with PyAnnote's speaker verification where all speakers are getting perfect similarity scores (1.000), even when they are clearly different voices.
Environment
pyannote....
1
vote
0
answers
100
views
Why can't I send an audio stream from JavaScript via SignalR to a .NET Hub?
I’m trying to send an audio stream captured in JavaScript from a browser tab to a .NET SignalR Hub. My goal is to stream audio in chunks/realTime to the server and broadcast it to all connected ...
0
votes
1
answer
197
views
How to Restrict Azure Speech SDK AudioConfig to Only System Audio and Exclude Microphone Input?
Question:
I am working on a Blazor project where I integrate Azure Speech Service to perform speech-to-text transcription on system audio during screen sharing. However, I am facing an issue where ...
1
vote
0
answers
45
views
How do DAW's process sends/returns with busses if they are routed to themselves?
I'm looking into creating the sends/returns/busses logic for my own audio application. When checking existing DAW's (Logic Pro / Ableton Live) I noticed you can make a routing-loop by sending busses/...
1
vote
0
answers
98
views
Trouble converting PDM audio from a Seeed Studio XIAO nRF52840 for speech transcription — only getting white noise
I'm currently working on an iOS app that uses Bluetooth to stream audio data from a Seeed Studio XIAO nRF52840 (Sense) board, which has a PDM microphone. The board is running the OMI Friend firmware, ...
0
votes
0
answers
54
views
How to calculate the total number of different frequencies that are present in audio signal spectrum?
How to calculate the total number of different frequencies that are present in audio signal spectrum? Is that possible? Linux platform.