Enable language recognition in Cloud Speech-to-Text
Stay organized with collections
Save and categorize content based on your preferences.
This page describes how to enable language recognition for audio transcription requests sent to Cloud Speech-to-Text.
In some situations, you don't know for certain what language your audio recordings contain. For example, if you publish your service, app, or product in a country with multiple official languages, you can potentially receive audio input from users in a variety of languages. This can make specifying a single language code for transcription requests significantly more difficult.
Multiple language recognition
Cloud Speech-to-Text offers a way for you to specify a set of alternative languages that your audio data might contain. When you send an audio transcription request to Cloud Speech-to-Text, you can provide a list of additional languages that the audio data might include. If you include a list of languages in your request, Cloud Speech-to-Text attempts to transcribe the audio based upon the language that best fits the sample from the alternates you provide. Cloud Speech-to-Text then labels the transcription results with the predicted language code.
This feature is ideal for apps that need to transcribe short statements like voice commands or search. You can list up to three alternative languages from among those that Cloud Speech-to-Text supports in addition to your primary language (for four languages total).
Even though you can specify alternative languages for your speech
transcription request, you must still provide a primary language code
in the languageCode field. Also, you should constrain the number
of languages you request to a bare minimum. The fewer alternative
language codes that you request helps Cloud Speech-to-Text more
successfully select the correct one. Specifying just a single language
produces the best results.
Enable language recognition in audio transcription requests
To specify alternative languages in your audio transcription,
you must set the alternativeLanguageCodes field to a list of
language codes in the RecognitionConfig
parameters for the request. Cloud STT supports
alternative language codes for all speech recognition methods:
speech:recognize,
speech:longrunningrecognize,
and Streaming.
Use a local file
Protocol
Refer to the speech:recognize
API endpoint for complete details.
To perform synchronous speech recognition, make a POST request and provide the
appropriate request body. The following shows an example of a POST request using
curl. The example uses the Google Cloud CLI to generate an access
token. For instructions on installing the gcloud CLI,
see the quickstart.
The following example shows how to request transcription of an audio file that may include speech in English, French, or German.
curl-s-H"Content-Type: application/json"\ -H"Authorization: Bearer $(gcloudauthapplication-defaultprint-access-token)"\ https://speech.googleapis.com/v1p1beta1/speech:recognize\ --data'{ "config": { "encoding": "LINEAR16", "languageCode": "en-US", "alternativeLanguageCodes": ["fr-FR", "de-DE"], "model": "command_and_search" }, "audio": { "uri": "gs://cloud-samples-tests/speech/commercial_mono.wav" } }' > multi-language.txt
If the request is successful, the server returns a 200 OK HTTP
status code and the response in JSON format, saved to a file
named multi-language.txt.
{
"results": [
{
"alternatives": [
{
"transcript": "hi I'd like to buy a Chromecast I'm ..."
"confidence": 0.9466864
}
],
"languageCode": "en-us"
},
{
"alternatives": [
{
"transcript": " let's go with the black one",
"confidence": 0.9829583
}
],
"languageCode": "en-us"
},
]
}
Java
To learn how to install and use the client library for Cloud STT, see Cloud STT client libraries. For more information, see the Cloud STT Java API reference documentation.
To authenticate to Cloud STT, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
/**
* Transcribe a local audio file with multi-language recognition
*
* @param fileName the path to the audio file
*/
publicstaticvoidtranscribeMultiLanguage(StringfileName)throwsException{
Pathpath=Paths.get(fileName);
// Get the contents of the local audio file
byte[]content=Files.readAllBytes(path);
try(SpeechClientspeechClient=SpeechClient.create()){
RecognitionAudiorecognitionAudio=
RecognitionAudio.newBuilder().setContent(ByteString.copyFrom(content)).build();
ArrayList<String>languageList=newArrayList<>();
languageList.add("es-ES");
languageList.add("en-US");
// Configure request to enable multiple languages
RecognitionConfigconfig=
RecognitionConfig.newBuilder()
.setEncoding(AudioEncoding.LINEAR16)
.setSampleRateHertz(16000)
.setLanguageCode("ja-JP")
.addAllAlternativeLanguageCodes(languageList)
.build();
// Perform the transcription request
RecognizeResponserecognizeResponse=speechClient.recognize(config,recognitionAudio);
// Print out the results
for(SpeechRecognitionResultresult:recognizeResponse.getResultsList()){
// There can be several alternative transcripts for a given chunk of speech. Just use the
// first (most likely) one here.
SpeechRecognitionAlternativealternative=result.getAlternatives(0);
System.out.format("Transcript : %s\n\n",alternative.getTranscript());
}
}
}Node.js
To learn how to install and use the client library for Cloud STT, see Cloud STT client libraries. For more information, see the Cloud STT Node.js API reference documentation.
To authenticate to Cloud STT, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
constfs=require('fs');
// Imports the Google Cloud client library
constspeech=require('@google-cloud/speech').v1p1beta1;
// Creates a client
constclient=newspeech.SpeechClient ();
/**
* TODO(developer): Uncomment the following lines before running the sample.
*/
// const fileName = 'Local path to audio file, e.g. /path/to/audio.raw';
constconfig={
encoding:'LINEAR16',
sampleRateHertz:44100,
languageCode:'en-US',
alternativeLanguageCodes:['es-ES','en-US'],
};
constaudio={
content:fs.readFileSync(fileName).toString('base64'),
};
constrequest={
config:config,
audio:audio,
};
const[response]=awaitclient.recognize(request);
consttranscription=response.results
.map(result=>result.alternatives[0].transcript)
.join('\n');
console.log(`Transcription: ${transcription}`);Python
To learn how to install and use the client library for Cloud STT, see Cloud STT client libraries. For more information, see the Cloud STT Python API reference documentation.
To authenticate to Cloud STT, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
fromgoogle.cloudimport speech_v1p1beta1 as speech
client = speech.SpeechClient()
speech_file = "resources/multi.wav"
first_lang = "en-US"
second_lang = "es"
with open(speech_file, "rb") as audio_file:
content = audio_file.read()
audio = speech.RecognitionAudio(content=content)
config = speech.RecognitionConfig(
encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=44100,
audio_channel_count=2,
language_code=first_lang,
alternative_language_codes=[second_lang],
)
print("Waiting for operation to complete...")
response = client.recognize(config=config, audio=audio)
for i, result in enumerate(response.results):
alternative = result.alternatives[0]
print("-" * 20)
print(f"First alternative of result {i}: {alternative}")
print(f"Transcript: {alternative.transcript}")
return response.resultsUse a remote file
Java
To learn how to install and use the client library for Cloud STT, see Cloud STT client libraries. For more information, see the Cloud STT Java API reference documentation.
To authenticate to Cloud STT, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
/**
* Transcribe a remote audio file with multi-language recognition
*
* @param gcsUri the path to the remote audio file
*/
publicstaticvoidtranscribeMultiLanguageGcs(StringgcsUri)throwsException{
try(SpeechClientspeechClient=SpeechClient.create()){
ArrayList<String>languageList=newArrayList<>();
languageList.add("es-ES");
languageList.add("en-US");
// Configure request to enable multiple languages
RecognitionConfigconfig=
RecognitionConfig.newBuilder()
.setEncoding(AudioEncoding.LINEAR16)
.setSampleRateHertz(16000)
.setLanguageCode("ja-JP")
.addAllAlternativeLanguageCodes(languageList)
.build();
// Set the remote path for the audio file
RecognitionAudioaudio=RecognitionAudio.newBuilder().setUri(gcsUri).build();
// Use non-blocking call for getting file transcription
OperationFuture<LongRunningRecognizeResponse,LongRunningRecognizeMetadata>response=
speechClient.longRunningRecognizeAsync(config,audio);
while(!response.isDone()){
System.out.println("Waiting for response...");
Thread.sleep(10000);
}
for(SpeechRecognitionResultresult:response.get().getResultsList()){
// There can be several alternative transcripts for a given chunk of speech. Just use the
// first (most likely) one here.
SpeechRecognitionAlternativealternative=result.getAlternativesList().get(0);
// Print out the result
System.out.printf("Transcript : %s\n\n",alternative.getTranscript());
}
}
}Node.js
To learn how to install and use the client library for Cloud STT, see Cloud STT client libraries. For more information, see the Cloud STT Node.js API reference documentation.
To authenticate to Cloud STT, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
// Imports the Google Cloud client library
constspeech=require('@google-cloud/speech').v1p1beta1;
// Creates a client
constclient=newspeech.SpeechClient ();
/**
* TODO(developer): Uncomment the following line before running the sample.
*/
// const uri = path to GCS audio file e.g. `gs:/bucket/audio.wav`;
constconfig={
encoding:'LINEAR16',
sampleRateHertz:44100,
languageCode:'en-US',
alternativeLanguageCodes:['es-ES','en-US'],
};
constaudio={
uri:gcsUri,
};
constrequest={
config:config,
audio:audio,
};
const[operation]=awaitclient.longRunningRecognize(request);
const[response]=awaitoperation.promise();
consttranscription=response.results
.map(result=>result.alternatives[0].transcript)
.join('\n');
console.log(`Transcription: ${transcription}`);Python
To learn how to install and use the client library for Cloud STT, see Cloud STT client libraries. For more information, see the Cloud STT Python API reference documentation.
To authenticate to Cloud STT, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
fromgoogle.cloudimport speech_v1p1beta1 as speech
deftranscribe_file_with_multilanguage_gcs(audio_uri: str) -> str:
"""Transcribe a remote audio file with multi-language recognition
Args:
audio_uri (str): The Google Cloud Storage path to an audio file.
E.g., gs://[BUCKET]/[FILE]
Returns:
str: The generated transcript from the audio file provided.
"""
client = speech.SpeechClient()
first_language = "es-ES"
alternate_languages = ["en-US", "fr-FR"]
# Configure request to enable multiple languages
recognition_config = speech.RecognitionConfig(
encoding=speech.RecognitionConfig.AudioEncoding.FLAC,
sample_rate_hertz=44100,
language_code=first_language,
alternative_language_codes=alternate_languages,
)
# Set the remote path for the audio file
audio = speech.RecognitionAudio(uri=audio_uri)
# Use non-blocking call for getting file transcription
response = client.long_running_recognize(
config=recognition_config, audio=audio
).result(timeout=300)
transcript_builder = []
for i, result in enumerate(response.results):
alternative = result.alternatives[0]
transcript_builder.append("-" * 20 + "\n")
transcript_builder.append(f"First alternative of result {i}: {alternative}")
transcript_builder.append(f"Transcript: {alternative.transcript}\n")
transcript = "".join(transcript_builder)
print(transcript)
return transcript