Using OpenAI libraries with Vertex AI
Stay organized with collections
Save and categorize content based on your preferences.
The Chat Completions API works as an Open AI-compatible endpoint, designed to make it easier to interface with Gemini on Vertex AI by using the OpenAI libraries for Python and REST. If you're already using the OpenAI libraries, you can use this API as a low-cost way to switch between calling OpenAI models and Vertex AI hosted models to compare output, cost, and scalability, without changing your existing code. If you aren't already using the OpenAI libraries, we recommend that you use the Google Gen AI SDK.
Supported models
The Chat Completions API supports both Gemini models and select self-deployed models from Model Garden.
Gemini models
The following models provide support for the Chat Completions API:
- Gemini 3 Pro Preview model
- Gemini 2.5 Pro
- Gemini 2.5 Flash Preview model
- Gemini 2.5 Flash
- Gemini 2.0 Flash
- Gemini 2.0 Flash-Lite
Self-deployed models from Model Garden
The Hugging Face Text Generation Interface (HF TGI) and Vertex AI Model Garden prebuilt vLLM containers support the Chat Completions API. However, not every model deployed to these containers supports the Chat Completions API. The following table includes the most popular supported models by container:
HF TGI |
vLLM |
|---|---|
Supported parameters
For Google models, the Chat Completions API supports the following OpenAI parameters. For a description of each parameter, see OpenAI's documentation on Creating chat completions. Parameter support for third-party models varies by model. To see which parameters are supported, consult the model's documentation.
messages
System messageUser message: Thetextandimage_urltypes are supported. Theimage_urltype supports images stored a Cloud Storage URI or a base64 encoding in the form"data:<MIME-TYPE>;base64,<BASE64-ENCODED-BYTES>". To learn how to create a Cloud Storage bucket and upload a file to it, see Discover object storage. Thedetailoption is not supported.Assistant messageTool messageFunction message: This field is deprecated, but supported for backwards compatibility.
model
max_completion_tokens
Alias for max_tokens.
max_tokens
n
frequency_penalty
presence_penalty
reasoning_effort
Configures how much time and how many tokens are used on a response.
low: 1024medium: 8192high: 24576
reasoning_effort or extra_body.google.thinking_config
may be specified.
response_format
json_object: Interpreted as passing "application/json" to the Gemini API.json_schema. Fully recursive schemas are not supported.additional_propertiesis supported.text: Interpreted as passing "text/plain" to the Gemini API.- Any other MIME type is passed as is to the model, such as passing "application/json" directly.
seed
Corresponds to GenerationConfig.seed.
stop
stream
temperature
top_p
tools
typefunctionnamedescriptionparameters: Specify parameters by using the OpenAPI specification. This differs from the OpenAI parameters field, which is described as a JSON Schema object. To learn about keyword differences between OpenAPI and JSON Schema, see the OpenAPI guide.
tool_choice
noneautorequired: Corresponds to the modeANYin theFunctionCallingConfig.validated: Corresponds to the modeVALIDATEDin theFunctionCallingConfig. This is Google-specific.
web_search_options
Corresponds to the GoogleSearch tool. No sub-options are
supported.
function_call
This field is deprecated, but supported for backwards
compatibility.
functions
This field is deprecated, but supported for backwards
compatibility.
If you pass any unsupported parameter, it is ignored.
Multimodal input parameters
The Chat Completions API supports select multimodal inputs.
input_audio
data:Any URI or valid blob format. We support all blob types, including image, audio, and video. Anything supported byGenerateContentis supported (HTTP, Cloud Storage, etc.).format:OpenAI supports bothwav(audio/wav) andmp3(audio/mp3). Using Gemini, all valid MIME types are supported.
image_url
data:Likeinput_audio, any URI or valid blob format is supported.
Note thatimage_urlas a URL will default to the image/* MIME-type andimage_urlas blob data can be used as any multimodal input.detail:Similar to media resolution, this determines the maximum tokens per image for the request. Note that while OpenAI's field is per-image, Gemini enforces the same detail across the request, and passing multiple detail types in one request will throw an error.
In general, the data parameter can be a URI or a combination of MIME type and
base64 encoded bytes in the form "data:<MIME-TYPE>;base64,<BASE64-ENCODED-BYTES>".
For a full list of MIME types, see GenerateContent.
For more information on OpenAI's base64 encoding, see their documentation.
For usage, see our multimodal input examples.
Gemini-specific parameters
There are several features supported by Gemini that are not available in OpenAI models.
These features can still be passed in as parameters, but must be contained within an
extra_content or extra_body or they will be ignored.
extra_body features
Include a google field to contain any Gemini-specific
extra_body features.
{
...,
"extra_body":{
"google":{
...,
// Add extra_body features here.
}
}
}
safety_settings
This corresponds to Gemini's SafetySetting .
cached_content
This corresponds to Gemini's GenerateContentRequest.cached_content.
thinking_config
This corresponds to Gemini's GenerationConfig.ThinkingConfig .
thought_tag_marker
Used to separate a model's thoughts from its responses for models with Thinking available.If not specified, no tags will be returned around the model's thoughts. If present, subsequent queries will strip the thought tags and mark the thoughts appropriately for context. This helps preserve the appropriate context for subsequent queries.
extra_part features
extra_part lets you specify additional settings at a per-Part level.
Include a google field to contain any Gemini-specific
extra_part features.
{
...,
"extra_part":{
"google":{
...,
// Add extra_part features here.
}
}
}
extra_content
A field for adding Gemini-specific content that shouldn't be
ignored.
thought
This field explicitly marks if a field is a thought and takes precedence
over thought_tag_marker. It helps distinguish between different
steps in a thought process, especially in tool use scenarios where intermediate
steps might be mistaken for final answers. By tagging specific parts of the
input as thoughts, you can guide the model to treat them as internal
reasoning rather than user-facing responses.
thought_signature
A bytes field that provides a thought signature to validate against
thoughts returned by the model. This field is distinct from
thought, which is a boolean field. For more information, see
Thought signatures.
What's next
- Learn more about authentication and credentialing with the OpenAI-compatible syntax.
- See examples of calling the Chat Completions API with the OpenAI-compatible syntax.
- See examples of calling the Inference API with the OpenAI-compatible syntax.
- See examples of calling the Function Calling API with OpenAI-compatible syntax.
- Learn more about the Gemini API.
- Learn more about migrating from Azure OpenAI to the Gemini API.