New Cloud Speech-to-Text users should use the V2 API. Read our migration guide to learn how to migrate existing projects to the latest version.

Transcribe long audio files into text

This page demonstrates how to transcribe long audio files (longer than 1 minute) to text using the Cloud Speech-to-Text API and asynchronous speech recognition.

About asynchronous speech recognition

Asynchronous speech recognition starts a long running audio processing operation. Use asynchronous speech recognition to transcribe audio that is longer than 60 seconds. For shorter audio, synchronous speech recognition is faster and simpler. The upper limit for asynchronous speech recognition is 480 minutes.

Cloud Speech-to-Text and asynchronous processing

Audio content can be sent directly to Cloud Speech-to-Text from a local file for asynchronous processing. However, the audio time limit for local files is 60 seconds. Attempting to transcribe local audio files that are longer than 60 seconds will result in an error. To use asynchronous speech recognition to transcribe audio longer than 60 seconds, you must have your data saved in a Cloud Storage bucket .

You can retrieve the results of the operation using the google.longrunning.Operations method. Results remain available for retrieval for 5 days (120 hours). You also have the option of uploading your results directly to a Cloud Storage bucket.

Transcribe long audio files using a Cloud Storage bucket

These samples use a Cloud Storage bucket to store the raw audio input for the long-running transcription process. For an example of a typical longrunningrecognize operation response, see the reference documentation.

Protocol

Refer to the speech:longrunningrecognize API endpoint for complete details.

To perform synchronous speech recognition, make a POST request and provide the appropriate request body. The following shows an example of a POST request using curl. The example uses the Google Cloud CLI to generate an access token. For instructions on installing the gcloud CLI, see the quickstart.

curl-XPOST\
-H"Authorization: Bearer "$(gcloudauthapplication-defaultprint-access-token)\
-H"Content-Type: application/json; charset=utf-8"\
--data"{
 'config': {
 'language_code': 'en-US'
 },
 'audio':{
 'uri':'gs://cloud-samples-tests/speech/brooklyn.flac'
 }
}""https://speech.googleapis.com/v1/speech:longrunningrecognize"

See the RecognitionConfig and RecognitionAudio reference documentation for more information on configuring the request body.

If the request is successful, the server returns a 200 OK HTTP status code and the response in JSON format:

{
 "name": "7612202767953098924"
}

Where name is the name of the long running operation created for the request.

Wait for processing to complete. Processing time differs depending on your source audio. In most cases, you will get results in half the length of the source audio. You can get the status of your long-running operation by making a GET request to the https://speech.googleapis.com/v1/operations/ endpoint. Replace your-operation-name with the name returned from your longrunningrecognize request.

curl-H"Authorization: Bearer "$(gcloudauthapplication-defaultprint-access-token)\
-H"Content-Type: application/json; charset=utf-8"\
"https://speech.googleapis.com/v1/operations/your-operation-name"

If the request is successful, the server returns a 200 OK HTTP status code and the response in JSON format:

{
"name":"7612202767953098924",
"metadata":{
"@type":"type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeMetadata",
"progressPercent":100,
"startTime":"2017-07-20T16:36:55.033650Z",
"lastUpdateTime":"2017-07-20T16:37:17.158630Z"
},
"done":true,
"response":{
"@type":"type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeResponse",
"results":[
{
"alternatives":[
{
"transcript":"how old is the Brooklyn Bridge",
"confidence":0.96096134,
}
]
},
{
"alternatives":[
{
...
}
]
}
]
}
}

If the operation has not completed, you can poll the endpoint by repeatedly making the GET request until the done property of the response is true.

gcloud

Refer to the recognize-long-running command for complete details.

To perform asynchronous speech recognition, use the Google Cloud CLI, providing the path of a local file or a Cloud Storage URL.

gcloudmlspeechrecognize-long-running\
'gs://cloud-samples-tests/speech/brooklyn.flac'\
--language-code='en-US'--async

If the request is successful, the server returns the ID of the long-running operation in JSON format.

{
 "name": OPERATION_ID
}

You can then get information about the operation by running the following command.

gcloudmlspeechoperationsdescribeOPERATION_ID

You can also poll the operation until it completes by running the following command.

gcloudmlspeechoperationswaitOPERATION_ID

After the operation completes, the operation returns a transcript of the audio in JSON format.

{
"@type":"type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeResponse",
"results":[
{
"alternatives":[
{
"confidence":0.9840146,
"transcript":"how old is the Brooklyn Bridge"
}
]
}
]
}

Go

To learn how to install and use the client library for Cloud STT, see Cloud STT client libraries. For more information, see the Cloud STT Go API reference documentation.

To authenticate to Cloud STT, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.


funcsendGCS(wio.Writer,client*speech.Client,gcsURIstring)error{
ctx:=context.Background()
// Send the contents of the audio file with the encoding and
// and sample rate information to be transcripted.
req:=&speechpb.LongRunningRecognizeRequest{
Config:&speechpb.RecognitionConfig{
Encoding:speechpb.RecognitionConfig_LINEAR16,
SampleRateHertz:16000,
LanguageCode:"en-US",
},
Audio:&speechpb.RecognitionAudio{
AudioSource:&speechpb.RecognitionAudio_Uri{Uri:gcsURI},
},
}
op,err:=client.LongRunningRecognize(ctx,req)
iferr!=nil{
returnerr
}
resp,err:=op.Wait(ctx)
iferr!=nil{
returnerr
}
// Print the results.
for_,result:=rangeresp.Results{
for_,alt:=rangeresult.Alternatives{
fmt.Fprintf(w,"\"%v\" (confidence=%3f)\n",alt.Transcript,alt.Confidence)
}
}
returnnil
}

Java

To learn how to install and use the client library for Cloud STT, see Cloud STT client libraries. For more information, see the Cloud STT Java API reference documentation.

To authenticate to Cloud STT, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

/**
 * Performs non-blocking speech recognition on remote FLAC file and prints the transcription.
 *
 * @param gcsUri the path to the remote LINEAR16 audio file to transcribe.
 */
publicstaticvoidasyncRecognizeGcs(StringgcsUri)throwsException{
// Configure polling algorithm
SpeechSettings.BuilderspeechSettings=SpeechSettings.newBuilder();
TimedRetryAlgorithmtimedRetryAlgorithm=
OperationTimedPollAlgorithm.create(
RetrySettings.newBuilder()
.setInitialRetryDelay(Duration.ofMillis(500L))
.setRetryDelayMultiplier(1.5)
.setMaxRetryDelay(Duration.ofMillis(5000L))
.setInitialRpcTimeout(Duration.ZERO)// ignored
.setRpcTimeoutMultiplier(1.0)// ignored
.setMaxRpcTimeout(Duration.ZERO)// ignored
.setTotalTimeout(Duration.ofHours(24L))// set polling timeout to 24 hours
.build());
speechSettings.longRunningRecognizeOperationSettings().setPollingAlgorithm(timedRetryAlgorithm);
// Instantiates a client with GOOGLE_APPLICATION_CREDENTIALS
try(SpeechClientspeech=SpeechClient.create(speechSettings.build())){
// Configure remote file request for FLAC
RecognitionConfigconfig=
RecognitionConfig.newBuilder()
.setEncoding(AudioEncoding.FLAC)
.setLanguageCode("en-US")
.setSampleRateHertz(16000)
.build();
RecognitionAudioaudio=RecognitionAudio.newBuilder().setUri(gcsUri).build();
// Use non-blocking call for getting file transcription
OperationFuture<LongRunningRecognizeResponse,LongRunningRecognizeMetadata>response=
speech.longRunningRecognizeAsync(config,audio);
while(!response.isDone()){
System.out.println("Waiting for response...");
Thread.sleep(10000);
}
List<SpeechRecognitionResult>results=response.get().getResultsList();
for(SpeechRecognitionResultresult:results){
// There can be several alternative transcripts for a given chunk of speech. Just use the
// first (most likely) one here.
SpeechRecognitionAlternativealternative=result.getAlternativesList().get(0);
System.out.printf("Transcription: %s\n",alternative.getTranscript());
}
}
}

Node.js

To learn how to install and use the client library for Cloud STT, see Cloud STT client libraries. For more information, see the Cloud STT Node.js API reference documentation.

To authenticate to Cloud STT, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

// Imports the Google Cloud client library
constspeech=require('@google-cloud/speech');
// Creates a client
constclient=newspeech.SpeechClient ();
/**
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// const gcsUri = 'gs://my-bucket/audio.raw';
// const encoding = 'Encoding of the audio file, e.g. LINEAR16';
// const sampleRateHertz = 16000;
// const languageCode = 'BCP-47 language code, e.g. en-US';
constconfig={
encoding:encoding,
sampleRateHertz:sampleRateHertz,
languageCode:languageCode,
};
constaudio={
uri:gcsUri,
};
constrequest={
config:config,
audio:audio,
};
// Detects speech in the audio file. This creates a recognition job that you
// can wait for now, or get its result later.
const[operation]=awaitclient.longRunningRecognize(request);
// Get a Promise representation of the final result of the job
const[response]=awaitoperation.promise();
consttranscription=response.results
.map(result=>result.alternatives[0].transcript)
.join('\n');
console.log(`Transcription: ${transcription}`);

Python

To learn how to install and use the client library for Cloud STT, see Cloud STT client libraries. For more information, see the Cloud STT Python API reference documentation.

To authenticate to Cloud STT, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

fromgoogle.cloudimport speech
deftranscribe_gcs(gcs_uri: str) -> str:
"""Asynchronously transcribes the audio file from Cloud Storage
 Args:
 gcs_uri: The Google Cloud Storage path to an audio file.
 E.g., "gs://storage-bucket/file.flac".
 Returns:
 The generated transcript from the audio file provided.
 """
 client = speech.SpeechClient()
 audio = speech.RecognitionAudio (uri=gcs_uri)
 config = speech.RecognitionConfig (
 encoding=speech.RecognitionConfig.AudioEncoding.FLAC,
 sample_rate_hertz=44100,
 language_code="en-US",
 )
 operation = client.long_running_recognize (config=config, audio=audio)
 print("Waiting for operation to complete...")
 response = operation.result(timeout=90)
 transcript_builder = []
 # Each result is for a consecutive portion of the audio. Iterate through
 # them to get the transcripts for the entire audio file.
 for result in response.results:
 # The first alternative is the most likely one for this portion.
 transcript_builder.append(f"\nTranscript: {result.alternatives[0].transcript}")
 transcript_builder.append(f"\nConfidence: {result.alternatives[0].confidence}")
 transcript = "".join(transcript_builder)
 print(transcript)
 return transcript

Additional languages

C#: Please follow the C# setup instructions on the client libraries page and then visit the Cloud STT reference documentation for .NET.

PHP: Please follow the PHP setup instructions on the client libraries page and then visit the Cloud STT reference documentation for PHP.

Ruby: Please follow the Ruby setup instructions on the client libraries page and then visit the Cloud STT reference documentation for Ruby.

Upload your transcription results to a Cloud Storage bucket

Cloud Speech-to-Text supports uploading your longrunning recognition results directly to a Cloud Storage bucket. If you implement this feature with Cloud Storage Triggers, Cloud Storage uploads can trigger notifications that call Cloud Functions and remove the need to poll Cloud Speech-to-Text for recognition results.

To have your results uploaded to a Cloud Storage bucket, provide the optional TranscriptOutputConfig output configuration in your longrunning recognition request.

messageTranscriptOutputConfig{
oneofoutput_type{
// Specifies a Cloud Storage URI for the recognition results. Must be
// specified in the format: `gs://bucket_name/object_name`
stringgcs_uri=1;
}
}

Protocol

Refer to the longrunningrecognize API endpoint for complete details.

The following example shows how to send a POST request using curl, where the body of the request specifies the path to a Cloud Storage bucket. The results are uploaded to this location as a JSON file that stores SpeechRecognitionResult.

curl-XPOST\
-H"Authorization: Bearer $(gcloudauthapplication-defaultprint-access-token)"\
-H"Content-Type: application/json; charset=utf-8"\
--data"{
 'config': {...},
 'output_config': {
 'gcs_uri':'gs://bucket/result-output-path.json'
 },
 'audio': {
 'uri': 'gs://bucket/audio-path'
 }
}""https://speech.googleapis.com/v2/speech:longrunningrecognize"

The LongRunningRecognizeResponse includes the path to the Cloud Storage bucket where the upload was attempted. If the upload was unsuccessful, an output error will be returned. If a file with the same name already exists, the upload writes the results to a new file with a timestamp as the suffix.

{
 ...
 "metadata": {
 ...
 "outputConfig": {...}
 },
 ...
 "response": {
 ...
 "results": [...],
 "outputConfig": {
 "gcs_uri":"gs://bucket/result-output-path"
 },
 "outputError": {...}
 }
}

Try it for yourself

If you're new to Google Cloud, create an account to evaluate how Cloud STT performs in real-world scenarios. New customers also get 300ドル in free credits to run, test, and deploy workloads.

Try Cloud STT free

Transcribe long audio files into text Stay organized with collections Save and categorize content based on your preferences.

About asynchronous speech recognition

Cloud Speech-to-Text and asynchronous processing

Transcribe long audio files using a Cloud Storage bucket

Protocol

gcloud

Go

Java

Node.js

Python

Additional languages

Upload your transcription results to a Cloud Storage bucket

Protocol

Try it for yourself

Transcribe long audio files into text