Generate WebVTT and SRT captions
Stay organized with collections
Save and categorize content based on your preferences.
This page describes how to use the Cloud Speech-to-Text API to automatically generate captions from audio files, in SRT and WebVTT formats. These formats can store the text and timing information of audio, making it possible to display subtitles or captions in sync with the media for subtitling and closed captioning.
Enabling caption outputs in your request to Cloud Speech-to-Text is only supported
in the V2 API. Specifically, you can only use BatchRecognize to transcribe
long audio files. You can save outputs in a Cloud Storage bucket, or they can
be returned inline. Multiple formats can be specified at the same time for the
Cloud Storage output configuration, which is written to the specified bucket
with different file extensions.
Enable caption outputs in a request
To generate SRT or VTT caption outputs for your audio using Cloud Speech-to-Text, follow the next steps to enable caption outputs in your transcription request:
- Make a request to the Cloud Speech-to-Text API
BatchRecognizemethod with theoutput_format_configfield populated. Values specified are:srt, for the output to follow the SRT format. -vtt, for the output to follow the WebVTT format.native, which is the default output format if no format is specified as a serializedBatchRecognizeResultsrequest.
- Since the operation is async, poll the request until it's complete.
Multiple formats can be specified at the same time for the Cloud Storage
output configuration. They're written to the specified bucket with different
file extensions (either .json, .srt, or .vtt).
If multiple formats are specified for the inline output config, each format will
be available as a field in the BatchRecognizeFileResult.inline_result message.
The following code snippet demonstrates how to enable caption outputs in a transcription request to Cloud Speech-to-Text using local and remote files:
API
curl-XPOST\
-H"Content-Type: application/json; charset=utf-8"\
-H"Authorization: Bearer $(gcloud auth application-default print-access-token)"\
https://speech.googleapis.com/v2/projects/my-project/locations/global/recognizers/_:batchRecognize \
--data'{
"files":[{
"uri":"gs://my-bucket/jfk_and_the_press.wav"
}],
"config":{
"features":{"enableWordTimeOffsets":true},
"autoDecodingConfig":{},
"model":"long",
"languageCodes":["en-US"]
},
"recognitionOutputConfig":{
"gcsOutputConfig":{"uri":"gs://my-bucket"},
"output_format_config":{"srt":{}}
}
}'
What's next
- Learn how to transcribe long audio files.
- Learn how to choose the best transcription model.
- Transcribe audio files using Chirp.
- For best performance, accuracy, and other tips, see the best practices documentation.