Skip to main content
Stack Overflow
  1. About
  2. For Teams
Filter by
Sorted by
Tagged with
0 votes
1 answer
32 views

TTS onDone callback never fires on Samsung (Android 15) post-SpeechRecognizer, even with AUDIOFOCUS_REQUEST_GRANTED

I'm facing a very specific, reproducible bug and I've hit a wall after trying all the standard solutions. I would appreciate any insight. I am developing a voice assistant setup flow where the app ...
0 votes
1 answer
44 views

Amazon Nova Sonic — should contentStart / contentEnd be sent once per session or once per user turn?

I'm integrating Amazon Nova Sonic (the speech-to-speech foundation model available through Amazon Bedrock) using the bidirectional streaming API The official Amazon Nova Sonic User Guide explains that:...
1 vote
1 answer
41 views

How to transcribe audio files (m4a/wav) on Android? Can SpeechRecognizer API be used for this?

I have an audio file (in .m4a / .wav format) stored on the Android device, and I need to transcribe the speech content from it into text. From my understanding, the built-in SpeechRecognizer API in ...
0 votes
1 answer
34 views

Android: Google Recognizer Intent: EXTRA_PREFER_OFFLINE and API 33+

Consider this Kotlin code to init a Google speech recognizer: recognizerIntent = Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH) .apply { putExtra( ...
0 votes
0 answers
55 views

How to handle limitations and platform differences when using expo-speech-recognition for voice input?

I’m implementing a virtual assistant in my Expo app and want to use expo-speech-recognition for voice input. I’ve read that Android and iOS handle speech recognition differently at the engine level: ...
0 votes
0 answers
58 views

How to remove the overlay STT box from the SpeechRecognizer API in Android Studio?

While making a STT app in Android Studio (Jetpack Compose). I encountered this in the SpeechRecognizer when I ran the app: STT in app I want to delete that so the UI looks more clean. Is there a way ...
0 votes
0 answers
14 views

Azure Speech Service Speaker Diarization: How to Optimize Real-Time Transcription Latency (Node.js + Angular)

I'm using Azure Speech-to-Text with speaker diarization in a real-time transcription app. Backend: Node.js (v18), using microsoft-cognitiveservices-speech-sdk and WebSocket server. Frontend: Angular (...
0 votes
0 answers
69 views

Python TensorFlow Speech Recognition -1073741819 (0xC0000005) Error

I'm working on a speech recognition project using TensorFlow in Python. Normally, TensorFlow can only be used with a CPU or an NVIDIA GPU. I have an AMD Radeon 7600S GPU. Because of this, I installed ...
1 vote
0 answers
60 views

Voice Wake Word Not Working on Mobile Browsers Using SpeechRecognition in React

I'm building a React web app that uses the Web Speech API (SpeechRecognition) to detect a wake word (like "Hey Wiz") from the user’s microphone input. The functionality works perfectly in ...
0 votes
0 answers
61 views

Standalone Android application with AlphaCephei (Vosk) library

I need to integrate the AlphaCephei library to my Android application. I found a sample but it contains two modules - one is app with demo functionality, and another one is model located in the ...
0 votes
1 answer
120 views

How to detect speech silence in Twilio Media Streams for real-time transcription using deepgram?

Twilio continuously sends audio chunks every 20 milliseconds, even during periods of silence. These chunks may contain silent audio data, making it challenging to identify "real silence" by ...
0 votes
0 answers
48 views

Speech recognition model giving garbled output

I used the following github repo: Speech Recognition. But since it didn't have code to train and save the model, I looked online and added code to speech_recognition to train and save the model and ...
0 votes
0 answers
116 views

How to create a speech recognition model from scratch in Python

I am looking to create a speech recognition model from scratch without using an existing model. I have already used Whisper successfully but I need to create a model that I can train myself whose ...
0 votes
1 answer
302 views

Why is Ollama answering every question and past question I have asked?

I am currently hosting Ollama locally on my laptop and importing it into a Python file. Every time I ask it a question, I append it to my 'messages' array. I then feed the entire 'messages' array to ...
0 votes
1 answer
86 views

In Apple's Speech framework is SFTranscriptionSegment timing supposed to be off and speechRecognitionMetadata nil until isFinal?

I'm working in Swift/SwiftUI, running XCode 16.3 on macOS 15.4 and I've seen this when running in the iOS simulator and in a macOS app run from XCode. I've also seen this behaviour with 3 different ...

15 30 50 per page
1
2 3 4 5
...
358

AltStyle によって変換されたページ (->オリジナル) /