5,357 questions
- Bountied 0
- Unanswered
- Frequent
- Score
- Trending
- Week
- Month
- Unanswered (my tags)
0
votes
1
answer
32
views
TTS onDone callback never fires on Samsung (Android 15) post-SpeechRecognizer, even with AUDIOFOCUS_REQUEST_GRANTED
I'm facing a very specific, reproducible bug and I've hit a wall after trying all the standard solutions. I would appreciate any insight.
I am developing a voice assistant setup flow where the app ...
0
votes
1
answer
44
views
Amazon Nova Sonic — should contentStart / contentEnd be sent once per session or once per user turn?
I'm integrating Amazon Nova Sonic (the speech-to-speech foundation model available through Amazon Bedrock) using the bidirectional streaming API
The official Amazon Nova Sonic User Guide explains that:...
1
vote
1
answer
41
views
How to transcribe audio files (m4a/wav) on Android? Can SpeechRecognizer API be used for this?
I have an audio file (in .m4a / .wav format) stored on the Android device, and I need to transcribe the speech content from it into text.
From my understanding, the built-in SpeechRecognizer API in ...
0
votes
1
answer
34
views
Android: Google Recognizer Intent: EXTRA_PREFER_OFFLINE and API 33+
Consider this Kotlin code to init a Google speech recognizer:
recognizerIntent = Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH)
.apply {
putExtra(
...
0
votes
0
answers
55
views
How to handle limitations and platform differences when using expo-speech-recognition for voice input?
I’m implementing a virtual assistant in my Expo app and want to use expo-speech-recognition for voice input. I’ve read that Android and iOS handle speech recognition differently at the engine level:
...
0
votes
0
answers
58
views
How to remove the overlay STT box from the SpeechRecognizer API in Android Studio?
While making a STT app in Android Studio (Jetpack Compose). I encountered this in the SpeechRecognizer when I ran the app:
STT in app
I want to delete that so the UI looks more clean. Is there a way ...
0
votes
0
answers
14
views
Azure Speech Service Speaker Diarization: How to Optimize Real-Time Transcription Latency (Node.js + Angular)
I'm using Azure Speech-to-Text with speaker diarization in a real-time transcription app.
Backend: Node.js (v18), using microsoft-cognitiveservices-speech-sdk and WebSocket server.
Frontend: Angular (...
0
votes
0
answers
69
views
Python TensorFlow Speech Recognition -1073741819 (0xC0000005) Error
I'm working on a speech recognition project using TensorFlow in Python. Normally, TensorFlow can only be used with a CPU or an NVIDIA GPU. I have an AMD Radeon 7600S GPU. Because of this, I installed ...
1
vote
0
answers
60
views
Voice Wake Word Not Working on Mobile Browsers Using SpeechRecognition in React
I'm building a React web app that uses the Web Speech API (SpeechRecognition) to detect a wake word (like "Hey Wiz") from the user’s microphone input.
The functionality works perfectly in ...
0
votes
0
answers
61
views
Standalone Android application with AlphaCephei (Vosk) library
I need to integrate the AlphaCephei library to my Android application.
I found a sample but it contains two modules - one is app with demo functionality, and another one is model located in the ...
0
votes
1
answer
120
views
How to detect speech silence in Twilio Media Streams for real-time transcription using deepgram?
Twilio continuously sends audio chunks every 20 milliseconds, even during periods of silence. These chunks may contain silent audio data, making it challenging to identify "real silence" by ...
0
votes
0
answers
48
views
Speech recognition model giving garbled output
I used the following github repo: Speech Recognition.
But since it didn't have code to train and save the model, I looked online and added code to speech_recognition to train and save the model and ...
0
votes
0
answers
116
views
How to create a speech recognition model from scratch in Python
I am looking to create a speech recognition model from scratch without using an existing model. I have already used Whisper successfully but I need to create a model that I can train myself whose ...
0
votes
1
answer
302
views
Why is Ollama answering every question and past question I have asked?
I am currently hosting Ollama locally on my laptop and importing it into a Python file. Every time I ask it a question, I append it to my 'messages' array. I then feed the entire 'messages' array to ...
0
votes
1
answer
86
views
In Apple's Speech framework is SFTranscriptionSegment timing supposed to be off and speechRecognitionMetadata nil until isFinal?
I'm working in Swift/SwiftUI, running XCode 16.3 on macOS 15.4 and I've seen this when running in the iOS simulator and in a macOS app run from XCode. I've also seen this behaviour with 3 different ...