Anthropic compatibility

Ollama provides compatibility with the Anthropic Messages API to help connect existing applications to Ollama, including tools like Claude Code.

Usage

Environment variables

To use Ollama with tools that expect the Anthropic API (like Claude Code), set these environment variables:

export ANTHROPIC_AUTH_TOKEN=ollama # required but ignored
export ANTHROPIC_BASE_URL=http://localhost:11434

Simple `/v1/messages` example

basic.py

import anthropic

client = anthropic.Anthropic(
 base_url='http://localhost:11434',
 api_key='ollama', # required but ignored
)

message = client.messages.create(
 model='qwen3-coder',
 max_tokens=1024,
 messages=[
 {'role': 'user', 'content': 'Hello, how are you?'}
 ]
)
print(message.content[0].text)

Streaming example

streaming.py

import anthropic

client = anthropic.Anthropic(
 base_url='http://localhost:11434',
 api_key='ollama',
)

with client.messages.stream(
 model='qwen3-coder',
 max_tokens=1024,
 messages=[{'role': 'user', 'content': 'Count from 1 to 10'}]
) as stream:
 for text in stream.text_stream:
 print(text, end='', flush=True)

Tool calling example

tools.py

import anthropic

client = anthropic.Anthropic(
 base_url='http://localhost:11434',
 api_key='ollama',
)

message = client.messages.create(
 model='qwen3-coder',
 max_tokens=1024,
 tools=[
 {
 'name': 'get_weather',
 'description': 'Get the current weather in a location',
 'input_schema': {
 'type': 'object',
 'properties': {
 'location': {
 'type': 'string',
 'description': 'The city and state, e.g. San Francisco, CA'
 }
 },
 'required': ['location']
 }
 }
 ],
 messages=[{'role': 'user', 'content': "What's the weather in San Francisco?"}]
)

for block in message.content:
 if block.type == 'tool_use':
 print(f'Tool: {block.name}')
 print(f'Input: {block.input}')

Using with Claude Code

Claude Code can be configured to use Ollama as its backend.

Recommended models

For coding use cases, models like glm-4.7, minimax-m2.1, and qwen3-coder are recommended. Download a model before use:

ollama pull qwen3-coder

Note: Qwen 3 coder is a 30B parameter model requiring at least 24GB of VRAM to run smoothly. More is required for longer context lengths.

ollama pull glm-4.7:cloud

Quick setup

ollama launch claude

This will prompt you to select a model, configure Claude Code automatically, and launch it. To configure without launching:

ollama launch claude --config

Manual setup

Set the environment variables and run Claude Code:

ANTHROPIC_AUTH_TOKEN=ollama ANTHROPIC_BASE_URL=http://localhost:11434 claude --model qwen3-coder

Or set the environment variables in your shell profile:

export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_BASE_URL=http://localhost:11434

Then run Claude Code with any Ollama model:

claude --model qwen3-coder

Endpoints

`/v1/messages`

Supported features

Messages
Streaming
System prompts
Multi-turn conversations
Vision (images)
Tools (function calling)
Tool results
Thinking/extended thinking

Supported request fields

model
max_tokens
messages
- Text content
- Image content (base64)
- Array of content blocks
- tool_use blocks
- tool_result blocks
- thinking blocks
system (string or array)
stream
temperature
top_p
top_k
stop_sequences
tools
thinking
tool_choice
metadata

Supported response fields

id
type
role
model
content (text, tool_use, thinking blocks)
stop_reason (end_turn, max_tokens, tool_use)
usage (input_tokens, output_tokens)

Streaming events

message_start
content_block_start
content_block_delta (text_delta, input_json_delta, thinking_delta)
content_block_stop
message_delta
message_stop
ping
error

Models

Ollama supports both local and cloud models.

Local models

Pull a local model before use:

ollama pull qwen3-coder

Recommended local models:

qwen3-coder - Excellent for coding tasks
gpt-oss:20b - Strong general-purpose model

Cloud models

Cloud models are available immediately without pulling:

glm-4.7:cloud - High-performance cloud model
minimax-m2.1:cloud - Fast cloud model

Default model names

For tooling that relies on default Anthropic model names such as claude-3-5-sonnet, use ollama cp to copy an existing model name:

ollama cp qwen3-coder claude-3-5-sonnet

Afterwards, this new model name can be specified in the model field:

curl http://localhost:11434/v1/messages \
 -H "Content-Type: application/json" \
 -d '{
 "model": "claude-3-5-sonnet",
 "max_tokens": 1024,
 "messages": [
 {
 "role": "user",
 "content": "Hello!"
 }
 ]
 }'

Differences from the Anthropic API

Behavior differences

API key is accepted but not validated
anthropic-version header is accepted but not used
Token counts are approximations based on the underlying model’s tokenizer

Not supported

The following Anthropic API features are not currently supported:

Feature	Description
`/v1/messages/count_tokens`	Token counting endpoint
`tool_choice`	Forcing specific tool use or disabling tools
`metadata`	Request metadata (user_id)
Prompt caching	`cache_control` blocks for caching prefixes
Batches API	`/v1/messages/batches` for async batch processing
Citations	`citations` content blocks
PDF support	`document` content blocks with PDF files
Server-sent errors	`error` events during streaming (errors return HTTP status)

Partial support

Feature	Status
Image content	Base64 images supported; URL images not supported
Extended thinking	Basic support; `budget_tokens` accepted but not enforced

GenerateGenerates a response for the provided prompt

​ Usage

​ Environment variables

​ Simple /v1/messages example

​ Streaming example

​ Tool calling example

​ Using with Claude Code

​ Recommended models

​ Quick setup

​ Manual setup

​ Endpoints

​ /v1/messages

​ Supported features

​ Supported request fields

​ Supported response fields

​ Streaming events

​ Models

​ Local models

​ Cloud models

​ Default model names

​ Differences from the Anthropic API

​ Behavior differences

​ Not supported

​ Partial support