Bidirectional API

The BiDiStreamingAnalyzeContent API is the primary API for next-generation audio and multi-modal experiences in both Conversational Agents and Agent Assist. This API facilitates the streaming of audio data and returns either transcriptions or human agent suggestions to you.

Unlike previous APIs, the simplified audio configuration has optimized support for human-to-human conversations and an extended deadline limit of 15 minutes. Except for live translation, this API also supports all the Agent Assist features that StreamingAnalyzeContent supports.

Streaming basics

The following diagram illustrates how the stream works.

Start a stream by sending an audio configuration to the server. Then you send audio files, and the server sends you a transcript or suggestions for a human agent. Send more audio data for more transcripts and suggestions. This exchange continues until you end it by half closing the stream.

Streaming guide

To use the BiDiStreamingAnalyzeContent API at conversation runtime, follow these guidelines.

  1. Call BiDiStreamingAnalyzeContent method and set the following fields:
    • BiDiStreamingAnalyzeContentRequest.participant
    • (Optional) BiDiStreamingAnalyzeContentRequest.voice_session_config.input_audio_sample_rate_hertz (When specified, this overrides the configuration from ConversationProfile.stt_config.sample_rate_hertz.)
    • (Optional) BiDiStreamingAnalyzeContentRequest.voice_session_config.input_audio_encoding (When specified, this overrides the configuration from ConversationProfile.stt_config.audio_encoding.)
  2. Prepare the stream and set your audio configuration with your first BiDiStreamingAnalyzeContent request.
  3. In subsequent requests, send audio bytes to the stream through BiDiStreamingAnalyzeContentRequest.audio.
  4. After you send the second request with an audio payload, you should receive some BidiStreamingAnalyzeContentResponses from the stream.
    • Intermediate and final transcription results are available with the following command: BiDiStreamingAnalyzeContentResponse.recognition_result.
    • You can access human agent suggestions and processed conversation messages with the following command: BiDiStreamingAnalyzeContentResponse.analyze_content_response.
  5. You can half close the stream at any time. After you half close the stream, the server sends back the response containing remaining recognition results, along with potential Agent Assist suggestions.
  6. Start or restart a new stream in the following cases:
    • The stream is broken. For example, the stream stopped when it wasn't supposed to.
    • Your conversation is approaching the request maximum of 15 minutes.
  7. For best quality, when you start a stream, send audio data generated after the last speech_end_offset of the BiDiStreamingAnalyzeContentResponse.recognition_result with is_final=true to BidiStreamingAnalyzeContent.

Use the API through Python client library

Client libraries help you access Google APIs from a particular code language. You can use the Python client library for Agent Assist with BidiStreamingAnalyzeContent as follows.

fromgoogle.cloudimport dialogflow_v2beta1
fromgoogle.api_core.client_optionsimport ClientOptions
fromgoogle.cloudimport storage
importtime
importgoogle.auth
importparticipant_management
importconversation_management
PROJECT_ID="your-project-id"
CONVERSATION_PROFILE_ID="your-conversation-profile-id"
BUCKET_NAME="your-audio-bucket-name"
SAMPLE_RATE =48000
# Calculate the bytes with Sample_rate_hertz * bit Depth / 8 -> bytes
# 48000(sample/second) * 16(bits/sample) / 8 = 96000 byte per second,
# 96000 / 10 = 9600 we send 0.1 second to the stream API
POINT_ONE_SECOND_IN_BYTES = 9600
FOLDER_PTAH_FOR_CUSTOMER_AUDIO="your-customer-audios-files-path" 
FOLDER_PTAH_FOR_AGENT_AUDIO="your-agent-audios-file-path"
client_options = ClientOptions(api_endpoint="dialogflow.googleapis.com")
credentials, _ = google.auth.default(scopes=["https://www.googleapis.com/auth/cloud-platform",
 "https://www.googleapis.com/auth/dialogflow"])
storage_client = storage .Client (credentials = credentials, project=PROJECT_ID)
participant_client = dialogflow_v2beta1.ParticipantsClient (client_options=client_options,
 credentials=credentials)
defdownload_blob(bucket_name, folder_path, audio_array : list):
"""Uploads a file to the bucket."""
 bucket = storage_client.bucket (bucket_name, user_project=PROJECT_ID)
 blobs = bucket.list_blobs(prefix=folder_path)
 for blob in blobs:
 if not blob.name.endswith('/'):
 audio_array.append(blob.download_as_string ())
defrequest_iterator(participant : dialogflow_v2beta1.Participant , audios):
"""Iterate the request for bidi streaming analyze content
 """
 yield dialogflow_v2beta1.BidiStreamingAnalyzeContentRequest (
 config={
 "participant": participant.name,
 "voice_session_config": {
 "input_audio_encoding": dialogflow_v2beta1.AudioEncoding .AUDIO_ENCODING_LINEAR_16,
 "input_audio_sample_rate_hertz": SAMPLE_RATE,
 },
 }
 )
 print(f"participant {participant}")
 for i in range(0, len(audios)):
 audios_array = audio_request_iterator(audios[i])
 for chunk in audios_array:
 if not chunk:
 break
 yield dialogflow_v2beta1.BidiStreamingAnalyzeContentRequest (
 input={
 "audio":chunk
 },
 )
 time.sleep(0.1)
 time.sleep(0.1)
defparticipant_bidi_streaming_analyze_content(participant, audios):
"""call bidi streaming analyze content API
 """
 bidi_responses = participant_client.bidi_streaming_analyze_content (
 requests=request_iterator(participant, audios)
 )
 for response in bidi_responses:
 bidi_streaming_analyze_content_response_handler(response)
defbidi_streaming_analyze_content_response_handler(response: dialogflow_v2beta1.BidiStreamingAnalyzeContentResponse ):
"""Call Bidi Streaming Analyze Content
 """
 if response.recognition_result:
 print(f"Recognition result: {response.recognition_result.transcript}", )
defaudio_request_iterator(audio):
"""Iterate the request for bidi streaming analyze content
 """
 total_audio_length = len(audio)
 print(f"total audio length {total_audio_length}")
 array = []
 for i in range(0, total_audio_length, POINT_ONE_SECOND_IN_BYTES):
 chunk = audio[i : i + POINT_ONE_SECOND_IN_BYTES]
 array.append(chunk)
 if not chunk:
 break
 return array
defpython_client_handler():
"""Downloads audios from the google cloud storage bucket and stream to
 the Bidi streaming AnalyzeContent site.
 """
 print("Start streaming")
 conversation = conversation_management.create_conversation(
 project_id=PROJECT_ID, conversation_profile_id=CONVERSATION_PROFILE_ID_STAGING
 )
 conversation_id = conversation.name.split("conversations/")[1].rstrip()
 human_agent = human_agent = participant_management.create_participant(
 project_id=PROJECT_ID, conversation_id=conversation_id, role="HUMAN_AGENT"
 )
 end_user = end_user = participant_management.create_participant(
 project_id=PROJECT_ID, conversation_id=conversation_id, role="END_USER"
 )
 end_user_requests = []
 agent_request= []
 download_blob(BUCKET_NAME, FOLDER_PTAH_FOR_CUSTOMER_AUDIO, end_user_requests)
 download_blob(BUCKET_NAME, FOLDER_PTAH_FOR_AGENT_AUDIO, agent_request)
 participant_bidi_streaming_analyze_content( human_agent, agent_request)
 participant_bidi_streaming_analyze_content( end_user, end_user_requests)
 conversation_management.complete_conversation(PROJECT_ID, conversation_id)

Enable for telephony SipRec integration

You can enable telephony SipRec integration to use BidiStreamingAnalyzeContent for audio processing. Configure your audio processing either with the Agent Assist console or a direct API request.

Console

Follow these steps to configure your audio processing to use BidiStreamingAnalyzeContent.

  1. Go to the Agent Assist console and select your project.

    Agent Assist console

  2. Click Conversation Profiles > the name of a profile.

  3. Navigate to Telephony settings.

  4. Click to enable Use Bidirectional Streaming API> Save.

API

You can call the API directly to create or update a conversation profile by configuring the flag at ConversationProfile.use_bidi_streaming.

Example configuration:

{
"name": "projects/PROJECT_ID/locations/global/conversationProfiles/CONVERSATION_PROFILE_ID",f
"displayName": "CONVERSATION_PROFILE_NAME",
"automatedAgentConfig": {
},
"humanAgentAssistantConfig": {
 "notificationConfig": {
 "topic": "projects/PROJECT_ID/topics/FEATURE_SUGGESTION_TOPIC_ID",
 "messageFormat": "JSON"
 },
 },
"useBidiStreaming": true,
"languageCode": "en-US"
}

Quotas

The number of concurrent BidiStreamingAnalyzeContent requests is limited by a new quota ConcurrentBidiStreamingSessionsPerProjectPerRegion. See the Google Cloud quotas guide for information on quota usage and how to request a quota limit increase.

For quotas, the use of BidiStreamingAnalyzeContent requests to the global Dialogflow endpoint is in the us-central1 region.

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025年10月28日 UTC.