SpeechRecognize [audio]
recognizes speech in audio and returns it as a string.
SpeechRecognize [audio,level]
returns a list of strings at the specified structural level.
SpeechRecognize [audio,level,prop]
returns prop for text at the given level.
SpeechRecognize
SpeechRecognize [audio]
recognizes speech in audio and returns it as a string.
SpeechRecognize [audio,level]
returns a list of strings at the specified structural level.
SpeechRecognize [audio,level,prop]
returns prop for text at the given level.
Details and Options
- Speech recognition aims to convert a spoken audio signal to text. It is also known as speech-to-text and is typically used in voice-enabled human-machine interactions and digital personal assistants.
- SpeechRecognize [audio] returns all recognized speech in audio as a single string.
- Structural elements specified in level include:
-
Automatic speech found in the whole audio signal (default)"Segment" a list of transcription segments"Sentence" a list of sentences"Word" a list of words
- The property prop can be one of the following:
-
"Audio" trimmed audio containing the recognized text"Confidence" strength of the recognized text"Interval" interval containing the text"SubtitleRules" a list of time intervals and texts"Text" recognized text (default){prop1,prop2,…} a list of properties
- The following options can be given:
-
TargetDevice "CPU" the device on which to perform recognition
- Use Language lang1lang2 to recognize speech assumed to be in language lang1 and return translated text in language lang2.
- By default, speech in the whole signal is recognized. Use Masking->{int1,int2,…} to limit the recognition to intervals inti.
- Possible settings for Method are:
-
Automatic automatic method"GoogleSpeech" uses Google speech-to-text"NeuralNetwork" uses built-in neural networks"OpenAI" uses OpenAI speech-to-text
- By default, if a method returns non-speech tokens (e.g. [applause]), they are returned in the result. Use Method {method,"NonSpeechReplacement"replacements} to specify different replacements. Use "NonSpeechReplacement""" to remove them.
- SpeechRecognize works for English speech as well as various other languages, such as Chinese, Dutch, French, Japanese and Spanish.
- SpeechRecognize uses machine learning. Its methods, training sets and biases included therein may change and yield varied results in different versions of the Wolfram Language.
- SpeechRecognize may download resources that will be stored in your local object store at $LocalBase , and can be listed using LocalObjects [] and removed using ResourceRemove .
Examples
open all close allBasic Examples (2)
Recognize speech in an audio signal:
Recognize speech from a recording:
Scope (4)
Basic Uses (2)
Recognize speech in a short audio track:
Recognize speech in an audio track of a video file:
Recognize speech in a non-English language:
Classify the language from the recognized text:
Classify the language from the original audio:
Level Specification (1)
By default, all recognized text is returned as one string:
Extract a list of recognized sentences:
Extract a list of words:
Extract a list of segments, typically used for splitting text for subtitles:
Properties (1)
By default, recognized speech is returned as a string or as lists of strings:
Return the speech interval, corresponding chunk of the audio and recognition strength:
Options (3)
Masking (1)
Use the Masking option to recognize parts of a signal:
Method (1)
By default, a local model is used for speech recognition:
Use OpenAI speech recognition:
Use GoogleSpeech speech recognition:
PerformanceGoal (1)
By default, a medium-speed model with moderate quality is used:
Get the result fast:
Get the higher-quality result:
A balanced speed and quality result:
Applications (4)
Use AudioIntervals to select which parts of the signal to recognize:
Interpret a spoken city:
Show the recognized city on the map:
Find the answer from a spoken question in a text:
Build an automatic assistant based on Wolfram|Alpha:
Related Guides
Text
Wolfram Research (2019), SpeechRecognize, Wolfram Language function, https://reference.wolfram.com/language/ref/SpeechRecognize.html (updated 2024).
CMS
Wolfram Language. 2019. "SpeechRecognize." Wolfram Language & System Documentation Center. Wolfram Research. Last Modified 2024. https://reference.wolfram.com/language/ref/SpeechRecognize.html.
APA
Wolfram Language. (2019). SpeechRecognize. Wolfram Language & System Documentation Center. Retrieved from https://reference.wolfram.com/language/ref/SpeechRecognize.html
BibTeX
@misc{reference.wolfram_2025_speechrecognize, author="Wolfram Research", title="{SpeechRecognize}", year="2024", howpublished="\url{https://reference.wolfram.com/language/ref/SpeechRecognize.html}", note=[Accessed: 16-November-2025]}
BibLaTeX
@online{reference.wolfram_2025_speechrecognize, organization={Wolfram Research}, title={SpeechRecognize}, year={2024}, url={https://reference.wolfram.com/language/ref/SpeechRecognize.html}, note=[Accessed: 16-November-2025]}