Newest 'speech-recognition' Questions

1. Home
2. Questions
3. AI Assist Labs
4. Tags
5. Challenges
6. Chat
7. Articles
8. Users
9. Companies
11. Communities for your favorite technologies. Explore all Collectives
Teams

Ask questions, find answers and collaborate at work with Stack Overflow for Teams.
Try Teams for free Explore Teams
Ask questions, find answers and collaborate at work with Stack Overflow for Teams. Explore Teams

5,357 questions

0 votes

1 answer

32 views

TTS onDone callback never fires on Samsung (Android 15) post-SpeechRecognizer, even with AUDIOFOCUS_REQUEST_GRANTED

I'm facing a very specific, reproducible bug and I've hit a wall after trying all the standard solutions. I would appreciate any insight. I am developing a voice assistant setup flow where the app ...

Andrei Babenko's user avatar

Andrei Babenko

asked Oct 4 at 12:20

0 votes

1 answer

44 views

Amazon Nova Sonic — should contentStart / contentEnd be sent once per session or once per user turn?

I'm integrating Amazon Nova Sonic (the speech-to-speech foundation model available through Amazon Bedrock) using the bidirectional streaming API The official Amazon Nova Sonic User Guide explains that:...

JJ Kam's user avatar

JJ Kam

asked Sep 28 at 22:10

1 vote

1 answer

41 views

How to transcribe audio files (m4a/wav) on Android? Can SpeechRecognizer API be used for this?

I have an audio file (in .m4a / .wav format) stored on the Android device, and I need to transcribe the speech content from it into text. From my understanding, the built-in SpeechRecognizer API in ...

Tushar raina's user avatar

Tushar raina

asked Sep 24 at 8:13

0 votes

1 answer

34 views

Android: Google Recognizer Intent: EXTRA_PREFER_OFFLINE and API 33+

Consider this Kotlin code to init a Google speech recognizer: recognizerIntent = Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH) .apply { putExtra( ...

Yanay Lehavi's user avatar

Yanay Lehavi

asked Sep 15 at 0:25

0 votes

0 answers

55 views

How to handle limitations and platform differences when using expo-speech-recognition for voice input?

I’m implementing a virtual assistant in my Expo app and want to use expo-speech-recognition for voice input. I’ve read that Android and iOS handle speech recognition differently at the engine level: ...

hoangnv_ral's user avatar

hoangnv_ral

asked Sep 11 at 10:31

0 votes

0 answers

58 views

How to remove the overlay STT box from the SpeechRecognizer API in Android Studio?

While making a STT app in Android Studio (Jetpack Compose). I encountered this in the SpeechRecognizer when I ran the app: STT in app I want to delete that so the UI looks more clean. Is there a way ...

Cold's user avatar

Cold

asked Sep 6 at 8:09

0 votes

0 answers

14 views

Azure Speech Service Speaker Diarization: How to Optimize Real-Time Transcription Latency (Node.js + Angular)

I'm using Azure Speech-to-Text with speaker diarization in a real-time transcription app. Backend: Node.js (v18), using microsoft-cognitiveservices-speech-sdk and WebSocket server. Frontend: Angular (...

SGR's user avatar

SGR

2,375

asked Sep 5 at 10:30

0 votes

0 answers

69 views

Python TensorFlow Speech Recognition -1073741819 (0xC0000005) Error

I'm working on a speech recognition project using TensorFlow in Python. Normally, TensorFlow can only be used with a CPU or an NVIDIA GPU. I have an AMD Radeon 7600S GPU. Because of this, I installed ...

Ömer Faruk Solmaz's user avatar

Ömer Faruk Solmaz

asked Aug 22 at 17:04

1 vote

0 answers

60 views

Voice Wake Word Not Working on Mobile Browsers Using SpeechRecognition in React

I'm building a React web app that uses the Web Speech API (SpeechRecognition) to detect a wake word (like "Hey Wiz") from the user’s microphone input. The functionality works perfectly in ...

Varun V's user avatar

Varun V

asked Jul 30 at 12:33

0 votes

0 answers

61 views

Standalone Android application with AlphaCephei (Vosk) library

I need to integrate the AlphaCephei library to my Android application. I found a sample but it contains two modules - one is app with demo functionality, and another one is model located in the ...

Carlos's user avatar

Carlos

asked Jul 11 at 11:51

0 votes

1 answer

120 views

How to detect speech silence in Twilio Media Streams for real-time transcription using deepgram?

Twilio continuously sends audio chunks every 20 milliseconds, even during periods of silence. These chunks may contain silent audio data, making it challenging to identify "real silence" by ...

Lahfir's user avatar

Lahfir

asked Jul 9 at 17:41

0 votes

0 answers

48 views

Speech recognition model giving garbled output

I used the following github repo: Speech Recognition. But since it didn't have code to train and save the model, I looked online and added code to speech_recognition to train and save the model and ...

FaisalShakeel's user avatar

FaisalShakeel

asked Jul 6 at 12:37

0 votes

0 answers

116 views

How to create a speech recognition model from scratch in Python

I am looking to create a speech recognition model from scratch without using an existing model. I have already used Whisper successfully but I need to create a model that I can train myself whose ...

FaisalShakeel's user avatar

FaisalShakeel

asked Jul 3 at 21:40

0 votes

1 answer

302 views

Why is Ollama answering every question and past question I have asked?

I am currently hosting Ollama locally on my laptop and importing it into a Python file. Every time I ask it a question, I append it to my 'messages' array. I then feed the entire 'messages' array to ...

Shmuck's user avatar

Shmuck

asked Jul 3 at 17:20

0 votes

1 answer

86 views

In Apple's Speech framework is SFTranscriptionSegment timing supposed to be off and speechRecognitionMetadata nil until isFinal?

I'm working in Swift/SwiftUI, running XCode 16.3 on macOS 15.4 and I've seen this when running in the iOS simulator and in a macOS app run from XCode. I've also seen this behaviour with 3 different ...

colourmebrad's user avatar

colourmebrad

asked May 22 at 12:35

15 30 50 per page

2 3 4 5

...

358 Next

CollectivesTM on Stack Overflow

TTS onDone callback never fires on Samsung (Android 15) post-SpeechRecognizer, even with AUDIOFOCUS_REQUEST_GRANTED

Amazon Nova Sonic — should contentStart / contentEnd be sent once per session or once per user turn?

How to transcribe audio files (m4a/wav) on Android? Can SpeechRecognizer API be used for this?

Android: Google Recognizer Intent: EXTRA_PREFER_OFFLINE and API 33+

How to handle limitations and platform differences when using expo-speech-recognition for voice input?

How to remove the overlay STT box from the SpeechRecognizer API in Android Studio?

Azure Speech Service Speaker Diarization: How to Optimize Real-Time Transcription Latency (Node.js + Angular)

Python TensorFlow Speech Recognition -1073741819 (0xC0000005) Error

Voice Wake Word Not Working on Mobile Browsers Using SpeechRecognition in React

Standalone Android application with AlphaCephei (Vosk) library

How to detect speech silence in Twilio Media Streams for real-time transcription using deepgram?

Speech recognition model giving garbled output

How to create a speech recognition model from scratch in Python

Why is Ollama answering every question and past question I have asked?

In Apple's Speech framework is SFTranscriptionSegment timing supposed to be off and speechRecognitionMetadata nil until isFinal?

Hot Network Questions