Newest 'audio-processing' Questions

1. Home
2. Questions
3. AI Assist
4. Tags
5. Challenges
6. Chat
7. Articles
8. Users
9. Companies
11. Communities for your favorite technologies. Explore all Collectives
Stack Internal

Stack Overflow for Teams is now called Stack Internal. Bring the best of human thought and AI automation together at your work.
Try for free Learn more
Bring the best of human thought and AI automation together at your work. Learn more

555 questions

0 votes

0 answers

55 views

TarsosDSP Pitch Detection Implementation: Sudden Pitch Drops After Note Release with FFT_YIN

Introduction I'm building a real-time pitch detection app in Kotlin/Android using TarsosDSP. The app captures audio input, detects the fundamental frequency using the FFT_YIN algorithm, and displays ...

Manu campano ortega's user avatar

Manu campano ortega

asked Jul 25, 2025 at 10:30

0 votes

0 answers

214 views

How to transcribe local audio File/Blob with Transformers.js pipeline? (JSON.parse error)

I'm working on a browser-based audio transcription app using Transformers.js by Xenova. I'm trying to transcribe a .wav file selected by the user using the following code: import { pipeline } from '@...

piyush's user avatar

piyush

asked Jul 10, 2025 at 8:44

0 votes

0 answers

150 views

Why is pyannote speaker diarization returning "Unknown" for speaker label in real-time audio processing?

I'm working on a real-time speech processing pipeline using pyannote-audio, and I’m using the pyannote/speaker-diarization-3.1 pipeline with Hugging Face token authentication. My code captures live ...

Hadil Sghair's user avatar

Hadil Sghair

asked May 21, 2025 at 12:10

2 votes

0 answers

106 views

Speaker Diarization

I need to upload an audio file where two or more speakers are having a conversation, and at times their speech overlaps. The requirement is to segment the audio into distinct chunks, each ...

Anjali Pandey's user avatar

Anjali Pandey

asked May 14, 2025 at 11:33

1 vote

0 answers

214 views

How to Link Zoom (X-Axis) of Two Separate Plotly Plots in Streamlit?

I want to visualize audio data in Streamlit with two separate Plotly plots: one for the Time Domain waveform and one for the MFCC (Mel-frequency cepstral coefficients). I want to link their X-axes so ...

faith76's user avatar

faith76

asked Apr 12, 2025 at 21:37

0 votes

0 answers

63 views

PJSIP audio has low volume in beginning, if aec enabled

After a call negotiated and connected, for first 5 seconds (approx.), outgoing (tx) sound very low and many times distorted. If after a long time silence, same case occurred. If we disable aec (...

Serdar KÖYLÜ's user avatar

Serdar KÖYLÜ

asked Mar 16, 2025 at 16:50

0 votes

1 answer

58 views

None Gradients for a model with 2 outputs

I have a model that has a GRU implementation inside and process audio samples. In each forward path I process a single sample of an audio file. To imitate the GRU behavior correctly, I have returned ...

Zahra Kokhazad's user avatar

Zahra Kokhazad

asked Feb 24, 2025 at 15:15

1 vote

0 answers

269 views

Twilio Real-Time Media Streaming to WebSocket Receives Only Noise Instead of Speech

I'm setting up a Twilio Voice call with real-time media streaming to a WebSocket server for speech-to-text processing using Google Cloud Speech-to-Text. The connection is established successfully, and ...

dannym25's user avatar

dannym25

asked Feb 21, 2025 at 22:02

0 votes

0 answers

322 views

Trying to Collect Audio from YouTube video without Downloading

I am trying to get just the audio data of songs from YouTube videos to analyze without downloading (Python). I started with using yt-dlp with the following code def search_youtube(song_name, ...

Anthony Reid's user avatar

Anthony Reid

asked Feb 14, 2025 at 1:22

0 votes

0 answers

73 views

PyAnnote Speaker Verification: All Speakers Getting Perfect 1.000 Similarity Scores

I'm experiencing an issue with PyAnnote's speaker verification where all speakers are getting perfect similarity scores (1.000), even when they are clearly different voices. Environment pyannote....

user29588450's user avatar

user29588450

asked Feb 10, 2025 at 20:47

1 vote

0 answers

100 views

Why can't I send an audio stream from JavaScript via SignalR to a .NET Hub?

I’m trying to send an audio stream captured in JavaScript from a browser tab to a .NET SignalR Hub. My goal is to stream audio in chunks/realTime to the server and broadcast it to all connected ...

Levan Amashukeli's user avatar

Levan Amashukeli

asked Jan 1, 2025 at 19:19

0 votes

1 answer

197 views

How to Restrict Azure Speech SDK AudioConfig to Only System Audio and Exclude Microphone Input?

Question: I am working on a Blazor project where I integrate Azure Speech Service to perform speech-to-text transcription on system audio during screen sharing. However, I am facing an issue where ...

Levan Amashukeli's user avatar

Levan Amashukeli

asked Dec 19, 2024 at 19:50

1 vote

0 answers

45 views

How do DAW's process sends/returns with busses if they are routed to themselves?

I'm looking into creating the sends/returns/busses logic for my own audio application. When checking existing DAW's (Logic Pro / Ableton Live) I noticed you can make a routing-loop by sending busses/...

Rene's user avatar

Rene

asked Nov 9, 2024 at 22:08

1 vote

0 answers

98 views

Trouble converting PDM audio from a Seeed Studio XIAO nRF52840 for speech transcription — only getting white noise

I'm currently working on an iOS app that uses Bluetooth to stream audio data from a Seeed Studio XIAO nRF52840 (Sense) board, which has a PDM microphone. The board is running the OMI Friend firmware, ...

guillaume olivieri's user avatar

guillaume olivieri

asked Nov 5, 2024 at 17:16

0 votes

0 answers

54 views

How to calculate the total number of different frequencies that are present in audio signal spectrum?

How to calculate the total number of different frequencies that are present in audio signal spectrum? Is that possible? Linux platform.

Lexx Luxx's user avatar

Lexx Luxx

asked Sep 14, 2024 at 11:33

15 30 50 per page

2 3 4 5

...

37 Next

CollectivesTM on Stack Overflow

TarsosDSP Pitch Detection Implementation: Sudden Pitch Drops After Note Release with FFT_YIN

How to transcribe local audio File/Blob with Transformers.js pipeline? (JSON.parse error)

Why is pyannote speaker diarization returning "Unknown" for speaker label in real-time audio processing?

Speaker Diarization

How to Link Zoom (X-Axis) of Two Separate Plotly Plots in Streamlit?

PJSIP audio has low volume in beginning, if aec enabled

None Gradients for a model with 2 outputs

Twilio Real-Time Media Streaming to WebSocket Receives Only Noise Instead of Speech

Trying to Collect Audio from YouTube video without Downloading

PyAnnote Speaker Verification: All Speakers Getting Perfect 1.000 Similarity Scores

Why can't I send an audio stream from JavaScript via SignalR to a .NET Hub?

How to Restrict Azure Speech SDK AudioConfig to Only System Audio and Exclude Microphone Input?

How do DAW's process sends/returns with busses if they are routed to themselves?

Trouble converting PDM audio from a Seeed Studio XIAO nRF52840 for speech transcription — only getting white noise

How to calculate the total number of different frequencies that are present in audio signal spectrum?

Hot Network Questions