Cloud - Ollama

Cloud Models

Ollama’s cloud models are a new kind of model in Ollama that can run without a powerful GPU. Instead, cloud models are automatically offloaded to Ollama’s cloud service while offering the same capabilities as local models, making it possible to keep using your local tools while running larger models that wouldn’t fit on a personal computer.

Supported models

For a list of supported models, see Ollama’s model library.

Running Cloud models

Ollama’s cloud models require an account on ollama.com. To sign in or create an account, run:

ollama signin

CLI
Python
JavaScript
cURL

To run a cloud model, open the terminal and run:

ollama run gpt-oss:120b-cloud

First, pull a cloud model so it can be accessed:

ollama pull gpt-oss:120b-cloud

Next, install Ollama’s Python library:

pip install ollama

Next, create and run a simple Python script:

from ollama import Client

client = Client()

messages = [
 {
 'role': 'user',
 'content': 'Why is the sky blue?',
 },
]

for part in client.chat('gpt-oss:120b-cloud', messages=messages, stream=True):
 print(part['message']['content'], end='', flush=True)

First, pull a cloud model so it can be accessed:

ollama pull gpt-oss:120b-cloud

Next, install Ollama’s JavaScript library:

npm i ollama

Then use the library to run a cloud model:

import { Ollama } from "ollama";

const ollama = new Ollama();

const response = await ollama.chat({
 model: "gpt-oss:120b-cloud",
 messages: [{ role: "user", content: "Explain quantum computing" }],
 stream: true,
});

for await (const part of response) {
 process.stdout.write(part.message.content);
}

First, pull a cloud model so it can be accessed:

ollama pull gpt-oss:120b-cloud

Run the following cURL command to run the command via Ollama’s API:

curl http://localhost:11434/api/chat -d '{
 "model": "gpt-oss:120b-cloud",
 "messages": [{
 "role": "user",
 "content": "Why is the sky blue?"
 }],
 "stream": false
}'

Cloud API access

Cloud models can also be accessed directly on ollama.com’s API. In this mode, ollama.com acts as a remote Ollama host.

Authentication

For direct access to ollama.com’s API, first create an API key. Then, set the OLLAMA_API_KEY environment variable to your API key.

export OLLAMA_API_KEY=your_api_key

Listing models

For models available directly via Ollama’s API, models can be listed via:

curl https://ollama.com/api/tags

Generating a response

Python
JavaScript
cURL

First, install Ollama’s Python library

pip install ollama

Then make a request

import os
from ollama import Client

client = Client(
 host="https://ollama.com",
 headers={'Authorization': 'Bearer ' + os.environ.get('OLLAMA_API_KEY')}
)

messages = [
 {
 'role': 'user',
 'content': 'Why is the sky blue?',
 },
]

for part in client.chat('gpt-oss:120b', messages=messages, stream=True):
 print(part['message']['content'], end='', flush=True)

First, install Ollama’s JavaScript library:

npm i ollama

Next, make a request to the model:

import { Ollama } from "ollama";

const ollama = new Ollama({
 host: "https://ollama.com",
 headers: {
 Authorization: "Bearer " + process.env.OLLAMA_API_KEY,
 },
});

const response = await ollama.chat({
 model: "gpt-oss:120b",
 messages: [{ role: "user", content: "Explain quantum computing" }],
 stream: true,
});

for await (const part of response) {
 process.stdout.write(part.message.content);
}

Generate a response via Ollama’s chat API:

curl https://ollama.com/api/chat \
 -H "Authorization: Bearer $OLLAMA_API_KEY" \
 -d '{
 "model": "gpt-oss:120b",
 "messages": [{
 "role": "user",
 "content": "Why is the sky blue?"
 }],
 "stream": false
 }'

Local only

Ollama can run in local-only mode by disabling Ollama’s cloud features.

Deprecations

Ollama will occasionally deprecate and retire older cloud models as newer and better open-source models are released. Tools and applications relying on Ollama Cloud models may need to be updated to keep working. Impacted users will be notified in advance of model deprecation and retirement. Deprecations will be communicated through email and on the Ollama website. Ollama Cloud model retirement does not affect local models.

Upcoming deprecations

Retirement date	Model	Recommended alternative
June 16, 2026	`kimi-k2-thinking`	`kimi-k2.6`
June 16, 2026	`kimi-k2:1t`	`kimi-k2.6`
June 16, 2026	`minimax-m2`	`minimax-m3`
June 16, 2026	`glm-4.6`	`glm-5.1`
June 16, 2026	`qwen3-next:80b`	`qwen3.5`
June 16, 2026	`qwen3-vl:235b`	`qwen3.5`
June 16, 2026	`qwen3-vl:235b-instruct`	`qwen3.5`
June 16, 2026	`cogito-2.1:671b`	`deepseek-v4-flash`

Streaming

​ Cloud Models

​ Supported models

​ Running Cloud models

​ Cloud API access

​ Authentication

​ Listing models

​ Generating a response

​ Local only

​ Deprecations

​ Upcoming deprecations

Cloud Models

Supported models

Running Cloud models

Cloud API access

Authentication

Listing models

Generating a response

Local only

Deprecations

Upcoming deprecations