Examples

Call Gemini with the Chat Completions API

The following sample shows you how to send non-streaming requests:

REST

curl-XPOST\
-H"Authorization: Bearer $(gcloudauthprint-access-token)"\
-H"Content-Type: application/json"\
https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/endpoints/openapi/chat/completions\
-d'{
 "model": "google/${MODEL_ID}",
 "messages": [{
 "role": "user",
 "content": "Write a story about a magic backpack."
 }]
 }'

Python

Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Python API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

fromgoogle.authimport default
importgoogle.auth.transport.requests
importopenai
# TODO(developer): Update and un-comment below lines
# project_id = "PROJECT_ID"
# location = "us-central1"
# Programmatically get an access token
credentials, _ = default(scopes=["https://www.googleapis.com/auth/cloud-platform"])
credentials.refresh(google.auth.transport.requests.Request())
# OpenAI Client
client = openai.OpenAI(
 base_url=f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/openapi",
 api_key=credentials.token,
)
response = client.chat.completions.create(
 model="google/gemini-2.0-flash-001",
 messages=[{"role": "user", "content": "Why is the sky blue?"}],
)
print(response)

The following sample shows you how to send streaming requests to a Gemini model by using the Chat Completions API:

REST

curl-XPOST\
-H"Authorization: Bearer $(gcloudauthprint-access-token)"\
-H"Content-Type: application/json"\
https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/endpoints/openapi/chat/completions\
-d'{
 "model": "google/${MODEL_ID}",
 "stream": true,
 "messages": [{
 "role": "user",
 "content": "Write a story about a magic backpack."
 }]
 }'

Python

Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Python API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

fromgoogle.authimport default
importgoogle.auth.transport.requests
importopenai
# TODO(developer): Update and un-comment below lines
# project_id = "PROJECT_ID"
# location = "us-central1"
# Programmatically get an access token
credentials, _ = default(scopes=["https://www.googleapis.com/auth/cloud-platform"])
credentials.refresh(google.auth.transport.requests.Request())
# OpenAI Client
client = openai.OpenAI(
 base_url=f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/openapi",
 api_key=credentials.token,
)
response = client.chat.completions.create(
 model="google/gemini-2.0-flash-001",
 messages=[{"role": "user", "content": "Why is the sky blue?"}],
 stream=True,
)
for chunk in response:
 print(chunk)

Send a prompt and an image to the Gemini API in Vertex AI

Python

Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Python API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.


fromgoogle.authimport default
importgoogle.auth.transport.requests
importopenai
# TODO(developer): Update and un-comment below lines
# project_id = "PROJECT_ID"
# location = "us-central1"
# Programmatically get an access token
credentials, _ = default(scopes=["https://www.googleapis.com/auth/cloud-platform"])
credentials.refresh(google.auth.transport.requests.Request())
# OpenAI Client
client = openai.OpenAI(
 base_url=f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/openapi",
 api_key=credentials.token,
)
response = client.chat.completions.create(
 model="google/gemini-2.0-flash-001",
 messages=[
 {
 "role": "user",
 "content": [
 {"type": "text", "text": "Describe the following image:"},
 {
 "type": "image_url",
 "image_url": "gs://cloud-samples-data/generative-ai/image/scones.jpg",
 },
 ],
 }
 ],
)
print(response)

Call a self-deployed model with the Chat Completions API

The following sample shows you how to send non-streaming requests:

REST

curl-XPOST\
-H"Authorization: Bearer $(gcloudauthprint-access-token)"\
-H"Content-Type: application/json"\
https://aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/global/endpoints/${ENDPOINT}/chat/completions\
-d'{
 "messages": [{
 "role": "user",
 "content": "Write a story about a magic backpack."
 }]
 }'

Python

Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Python API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

fromgoogle.authimport default
importgoogle.auth.transport.requests
importopenai
# TODO(developer): Update and un-comment below lines
# project_id = "PROJECT_ID"
# location = "us-central1"
# model_id = "gemma-2-9b-it"
# endpoint_id = "YOUR_ENDPOINT_ID"
# Programmatically get an access token
credentials, _ = default(scopes=["https://www.googleapis.com/auth/cloud-platform"])
credentials.refresh(google.auth.transport.requests.Request())
# OpenAI Client
client = openai.OpenAI(
 base_url=f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/{endpoint_id}",
 api_key=credentials.token,
)
response = client.chat.completions.create(
 model=model_id,
 messages=[{"role": "user", "content": "Why is the sky blue?"}],
)
print(response)

The following sample shows you how to send streaming requests to a self-deployed model by using the Chat Completions API:

REST

curl-XPOST\
-H"Authorization: Bearer $(gcloudauthprint-access-token)"\
-H"Content-Type: application/json"\
https://aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/global/endpoints/${ENDPOINT}/chat/completions\
-d'{
 "stream": true,
 "messages": [{
 "role": "user",
 "content": "Write a story about a magic backpack."
 }]
 }'

Python

Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Python API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

fromgoogle.authimport default
importgoogle.auth.transport.requests
importopenai
# TODO(developer): Update and un-comment below lines
# project_id = "PROJECT_ID"
# location = "us-central1"
# model_id = "gemma-2-9b-it"
# endpoint_id = "YOUR_ENDPOINT_ID"
# Programmatically get an access token
credentials, _ = default(scopes=["https://www.googleapis.com/auth/cloud-platform"])
credentials.refresh(google.auth.transport.requests.Request())
# OpenAI Client
client = openai.OpenAI(
 base_url=f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/{endpoint_id}",
 api_key=credentials.token,
)
response = client.chat.completions.create(
 model=model_id,
 messages=[{"role": "user", "content": "Why is the sky blue?"}],
 stream=True,
)
for chunk in response:
 print(chunk)

extra_body examples

You can use either the SDK or the REST API to pass in extra_body.

Add thought_tag_marker

{
...,
"extra_body":{
"google":{
...,
"thought_tag_marker":"..."
}
}
}

Add extra_body using the SDK

client.chat.completions.create(
...,
extra_body={
'extra_body':{'google':{...}}
},
)

extra_content examples

You can populate this field by using the REST API directly.

extra_content with string content

{
"messages":[
{"role":"...","content":"...","extra_content":{"google":{...}}}
]
}

Per-message extra_content

{
"messages":[
{
"role":"...",
"content":[
{"type":"...",...,"extra_content":{"google":{...}}}
]
}
}

Per-tool call extra_content

{
"messages":[
{
"role":"...",
"tool_calls":[
{
...,
"extra_content":{"google":{...}}
}
]
}
]
}

Sample curl requests

You can use these curl requests directly, rather than going through the SDK.

Use thinking_config with extra_body

curl-XPOST\
-H"Authorization: Bearer $(gcloudauthprint-access-token)"\
-H"Content-Type: application/json"\
https://us-central1-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/us-central1/endpoints/openapi/chat/completions\
-d'{ \
 "model": "google/gemini-2.5-flash-preview-04-17", \
 "messages": [ \
 { "role": "user", \
 "content": [ \
 { "type": "text", \
 "text": "Are there any primes number of the form n*ceil(log(n))" \
 }] }], \
 "extra_body": { \
 "google": { \
 "thinking_config": { \
 "include_thoughts": true, "thinking_budget": 10000 \
 }, \
 "thought_tag_marker": "think" } }, \
 "stream": true }'

Multimodal requests

The Chat Completions API supports a variety of multimodal input, including both audio and video.

Use image_url to pass in image data

curl-XPOST\
-H"Authorization: Bearer $(gcloudauthprint-access-token)"\
-H"Content-Type: application/json"\
https://us-central1-aiplatform.googleapis.com/v1/projects/${PROJECT}/locations/us-central1/endpoints/openapi/chat/completions\
-d'{ \
 "model": "google/gemini-2.0-flash-001", \
 "messages": [{ "role": "user", "content": [ \
 { "type": "text", "text": "Describe this image" }, \
 { "type": "image_url", "image_url": "gs://cloud-samples-data/generative-ai/image/scones.jpg" }] }] }'

Use input_audio to pass in audio data

curl-XPOST\
-H"Authorization: Bearer $(gcloudauthprint-access-token)"\
-H"Content-Type: application/json"\
https://us-central1-aiplatform.googleapis.com/v1/projects/${PROJECT}/locations/us-central1/endpoints/openapi/chat/completions\
-d'{ \
 "model": "google/gemini-2.0-flash-001", \
 "messages": [ \
 { "role": "user", \
 "content": [ \
 { "type": "text", "text": "Describe this: " }, \
 { "type": "input_audio", "input_audio": { \
 "format": "audio/mp3", \
 "data": "gs://cloud-samples-data/generative-ai/audio/pixel.mp3" } }] }] }'

Structured output

You can use the response_format parameter to get structured output.

Example using SDK

frompydanticimport BaseModel
fromopenaiimport OpenAI
client = OpenAI()
classCalendarEvent(BaseModel):
 name: str
 date: str
 participants: list[str]
completion = client.beta.chat.completions.parse(
 model="google/gemini-2.5-flash-preview-04-17",
 messages=[
 {"role": "system", "content": "Extract the event information."},
 {"role": "user", "content": "Alice and Bob are going to a science fair on Friday."},
 ],
 response_format=CalendarEvent,
)
print(completion.choices[0].message.parsed)

What's next

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025年11月24日 UTC.