Using speech-to-speech translation

The speech-to-speech translation feature uses AI to interpret language, enabling conversations between individuals and systems who speak different languages. Your application can use this feature to process an audio stream containing speech in one language and translate it into another language in real time.

Unlike other Live API features that support turn-based conversations, speech-to-speech translation continuously processes audio input and streams the following outputs as they become available:

  • Transcription: The recognized text from the input audio stream in the original language.
  • Translation: The translated text in the target language.
  • Synthesized audio: An audio stream of the translated text spoken in the target language that matches the original speaker's voice.

Supported models

You can use speech-to-speech translation with the following model:

Model version Availability level
gemini-2.5-flash-s2st-exp-11-2025 Private experimental

Input audio requirements

speech-to-speech translation only supports audio input. For information on supported audio formats, codecs, and specifications like sample rate, see Supported audio formats.

Use speech-to-speech translation

To use speech-to-speech translation, see the following code examples:

Python

# Set language_code to your desired language, in this case, Mandarin Chinese.
speech_config = SpeechConfig(language_code="cmn")
config = LiveConnectConfig(
 response_modalities=["AUDIO"],
 speech_config=speech_config,
 input_audio_transcription=input_transcription,
 output_audio_transcription=output_transcription,
)
audio_file = Part.from_uri(file_uri=audio_url, mime_type="audio/mpeg")
contents = [audio_file]
response = client.models.generate_content(model=MODEL_ID, contents=contents)
display(Markdown(response.text))

Python

importasyncio
# Set model generation_config
CONFIG = {
 "response_modalities": ["AUDIO"],
 "speech_config": {
 "language_code": "cmn",
 },
}
headers = {
 "Content-Type": "application/json",
 "Authorization": f"Bearer {bearer_token[0]}",
}
# Connect to the server
async with connect(SERVICE_URL, additional_headers=headers) as ws:
 # Setup the session
 await ws.send(
 json.dumps(
 {
 "setup": {
 "model": MODEL,
 "generation_config": CONFIG,
 "input_audio_transcription": {},
 "output_audio_transcription": {},
 "enable_speech_to_speech_translation": True,
 }
 }
 )
 )
 # Receive setup response
 raw_response = await ws.recv(decode=False)
 setup_response = json.loads(raw_response.decode("ascii"))
 print(setup_response)
 msg = {
 "realtime_input": {
 "audio": {
 "mime_type": "audio/pcm",
 "data": base64.b64encode(wav_data).decode('utf-8'),
 }
 }
 }
 await ws.send(json.dumps(msg))
 overall_responses = []
 timeout_seconds = 10 # Set timeout to 3 seconds
 # Receive chucks of server response with a timeout
 try:
 while True:
 try:
 raw_response = await asyncio.wait_for(ws.recv(decode=False), timeout_seconds)
 response = json.loads(raw_response.decode())
 server_content = response.pop("serverContent", None)
 if server_content is None:
 break
 # Input Transcription.
 input_transcription = server_content.pop("inputTranscription", None)
 if input_transcription is not None:
 raw_text = input_transcription.pop("text", None)
 if raw_text is not None:
 display(Markdown(f"**Input>** {raw_text}"))
 # Output Transcription.
 output_transcription = server_content.pop("outputTranscription", None)
 if output_transcription is not None:
 raw_text = output_transcription.pop("text", None)
 if raw_text is not None:
 display(Markdown(f"**Response>** {raw_text}"))
 model_turn = server_content.pop("modelTurn", None)
 if model_turn is not None:
 parts = model_turn.pop("parts", None)
 if parts is not None:
 for part in parts:
 pcm_data = base64.b64decode(part["inlineData"]["data"])
 overall_responses.append(np.frombuffer(pcm_data, dtype=np.int16))
 # End of turn
 # turn_complete = server_content.pop("turnComplete", None)
 # if turn_complete:
 # break
 except asyncio.TimeoutError:
 print(f"Timeout: No response received from the websocket within {timeout_seconds} seconds.")
 if overall_responses:
 display(Audio(np.concatenate(overall_responses), rate=24000, autoplay=True))
 break # Exit the loop on timeout
 except websockets.exceptions.ConnectionClosed as e:
 print(f"Connection closed by exception, code: {e.code}, reason: {e.reason}")
 if overall_responses:
 display(Audio(np.concatenate(overall_responses), rate=24000, autoplay=True))
 break # Exit the loop on connection closed
 except Exception as e:
 print(f"An unexpected error occurred: {e}")
 if overall_responses:
 display(Audio(np.concatenate(overall_responses), rate=24000, autoplay=True))
 break # Exit the loop on other exceptions
 finally:
 try:
 await ws.close(code=1000, reason="Normal closure") #example close
 except websockets.exceptions.ConnectionClosed as e:
 print(f"Connection closed by exception, code: {e.code}, reason: {e.reason}")
 except Exception as e:
 print(f"An unexpected error occurred: {e}")

Supported languages

Language Code Language
aaAfar
abAbkhazian
aceAchinese
achAcoli
afAfrikaans
akAkan
alzAlur
amAmharic
anAragonese
arArabic
asAssamese
avAvaric
awaAwadhi
ayAymara
azAzerbaijani
baBashkir
balBaluchi
banBalinese
bbcBatak Toba
bciBaoulé
beBelarusian
bemBemba
berBerber
bewBetawi
bgBulgarian
bgcHaryanvi
bhoBhojpuri
biBislama
bmBambara
bnBengali
boTibetan
brBreton
bsBosnian
btsBatak Simalungun
btxBatak Karo
caCatalan
ceChechen
cebCebuano
cggChiga
chChamorro
chkChuukese
cmnMandarin Chinese
cnhHakha Chin
coCorsican
crCree
crhCrimean Tatar
crsSeselwa Creole French
csCzech
cvChuvash
cyWelsh
daDanish
deGerman
dinDinka
doiDogri
dovDombe
dvDivehi
dyuDyula
dzDzongkha
eeEwe
elGreek
enEnglish
eoEsperanto
esSpanish
etEstonian
euBasque
faFarsi
ffFulah
fiFinnish
filFilipino
fjFijian
foFaroese
fonFon
frFrench
furFriulian
fyWestern Frisian
gaIrish
gaaGa
gdGaelic
glGalician
gnGuarani
guGujarati
gvManx
haHausa
hawHawaiian
heHebrew
hiHindi
hilHiligaynon
hmnHmong
hoHiri Motu
hrCroatian
hrxHunsrik
htHaitian, Haitian Creole
huHungarian
hyArmenian
hzHerero
ibaIban
idIndonesian
igIgbo
iloIloko
isIcelandic
itItalian
iuInuktitut
jaJapanese
jamJamaican Creole English
jvJavanese
kaGeorgian
kacKachin
kekKekchi
kgKongo
khaKhasi
kiKikuyu
kjKuanyama
kkKazakh
klGreenlandic
kmCentral Khmer
knKannada
koKorean
kokKonkani
krKanuri
kriKrio
ksKashmiri
ktuKituba
kuKurdish
kvKomi
kwCornish
kyKyrgyz
laLatin
lbLuxembourgish
lgGanda
liLimburgan
lijLigurian
lmoLombard
lnLingala
loLao
ltLithuanian
luLuba-Katanga
luaLuba-Lulua
luoDholuo
lusMizo
lvLatvian
madMadurese
maiMaithili
makMakasar
mamMam
mfeMorisyen
mgMalagasy
mhMarshallese
minMinangkabau
mkMacedonian
mlMalayalam
mnMongolian
mrMarathi
msMalay
mtMaltese
mwrMarwari
myBurmese
naNauru
nbNorwegian Bokmål
ndNorth Ndebele
ndcNdau
neNepali
newNewari
ngNdonga
nheEastern Huasteca Nahuatl
nlDutch
nnNorwegian Nynorsk
nrSouth Ndebele
nsoPedi
nusNuer
nvNavajo
nyChichewa
ocOccitan
ojOjibwa
omOromo
orOriya
osOssetian
paPunjabi
pagPangasinan
pamPampanga
papPapiamento
plPolish
psPashto
ptPortuguese
quQuechua
rmRomansh
rnRundi
roRomanian
ruRussian
rwKinyarwanda
saSanskrit
sahYakut
satSantali
scSardinian
scnSicilian
sdSindhi
seNorthern Sami
sgSango
shnShan
siSinhala
skSlovak
slSlovenian
smSamoan
snShona
soSomali
sqAlbanian
srSerbian
ssSwati
stSouthern Sotho
suSundanese
svSwedish
swSwahili
szlSilesian
taTamil
tcyTulu
teTelugu
tetTetum
tgTajik
thThai
tiTigrinya
tivTiv
tkTurkmen
tlTagalog
tnTswana
toTonga
tpiTok Pisin
trTurkish
trpKok Borok
tsTsonga
ttTatar
tumTumbuka
twTwi
tyTahitian
tyvTuvinian
udmUdmurt
ugUighur
ukUkrainian
urUrdu
uzUzbek
veVenda
vecVenetian
viVietnamese
waWalloon
warWaray
woWolof
xhXhosa
yiYiddish
yoYoruba
yuaYucatec Maya
yueCantonese
zaZhuang
zhChinese
zuZulu

Billing

As an experimental feature, you won't be charged to use speech-to-speech translation.

For more information on pricing and billing, see Vertex AI pricing.

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025年11月18日 UTC.