Evaluate models

Use the benchmarking functionality of the Cloud Cloud Speech-to-Text Console to measure the accuracy of any of the transcription models used in the Cloud Speech-to-Text V2 API.

Cloud Cloud Speech-to-Text Console provides visual benchmarking for pre-trained and Custom Speech-to-Text models. You can inspect the recognition quality by comparing Word-Error-Rate (WER) evaluation metrics across multiple transcription models to help you decide which model best fits your application.

Before you begin

Ensure you have signed up for a Google Cloud account, created a project, trained a custom speech model, and deployed using an endpoint.

Create a ground-truth dataset

To create a custom benchmarking dataset, gather audio samples that accurately reflect the type of traffic the transcription model will encounter in a production environment. The aggregate duration of these audio files should ideally span a minimum of 30 minutes and not exceed 10 hours. To assemble the dataset, you will need to:

  1. Create a directory in a Cloud Storage bucket of your choice to store the audio and text files for the dataset.
  2. For every audio-file in the dataset, create reasonably accurate transcriptions. For each audio file (such as example_audio_1.wav), a corresponding ground-truth text file (example_audio_1.txt) must be created. This service uses these audio-text pairings in a Cloud Storage bucket to assemble the dataset.

Benchmark the model

Using the Custom Speech-to-Text model and your benchmarking dataset to assess the accuracy of your model, follow the Measure and improve accuracy guide.

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025年11月24日 UTC.