Usage
Environment variables
To use Ollama with tools that expect the Anthropic API (like Claude Code), set these environment variables:export ANTHROPIC_AUTH_TOKEN=ollama # required but ignored
export ANTHROPIC_BASE_URL=http://localhost:11434
Simple /v1/messages example
basic.py
import anthropic
client = anthropic.Anthropic(
base_url='http://localhost:11434',
api_key='ollama', # required but ignored
)
message = client.messages.create(
model='qwen3-coder',
max_tokens=1024,
messages=[
{'role': 'user', 'content': 'Hello, how are you?'}
]
)
print(message.content[0].text)
Streaming example
streaming.py
import anthropic
client = anthropic.Anthropic(
base_url='http://localhost:11434',
api_key='ollama',
)
with client.messages.stream(
model='qwen3-coder',
max_tokens=1024,
messages=[{'role': 'user', 'content': 'Count from 1 to 10'}]
) as stream:
for text in stream.text_stream:
print(text, end='', flush=True)
Tool calling example
tools.py
import anthropic
client = anthropic.Anthropic(
base_url='http://localhost:11434',
api_key='ollama',
)
message = client.messages.create(
model='qwen3-coder',
max_tokens=1024,
tools=[
{
'name': 'get_weather',
'description': 'Get the current weather in a location',
'input_schema': {
'type': 'object',
'properties': {
'location': {
'type': 'string',
'description': 'The city and state, e.g. San Francisco, CA'
}
},
'required': ['location']
}
}
],
messages=[{'role': 'user', 'content': "What's the weather in San Francisco?"}]
)
for block in message.content:
if block.type == 'tool_use':
print(f'Tool: {block.name}')
print(f'Input: {block.input}')
Using with Claude Code
Claude Code can be configured to use Ollama as its backend.Recommended models
For coding use cases, models likeglm-4.7, minimax-m2.1, and qwen3-coder are recommended.
Download a model before use:
ollama pull qwen3-coder
Note: Qwen 3 coder is a 30B parameter model requiring at least 24GB of VRAM to run smoothly. More is required for longer context lengths.
ollama pull glm-4.7:cloud
Quick setup
ollama launch claude
ollama launch claude --config
Manual setup
Set the environment variables and run Claude Code:ANTHROPIC_AUTH_TOKEN=ollama ANTHROPIC_BASE_URL=http://localhost:11434 claude --model qwen3-coder
export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_BASE_URL=http://localhost:11434
claude --model qwen3-coder
Endpoints
/v1/messages
Supported features
- Messages
- Streaming
- System prompts
- Multi-turn conversations
- Vision (images)
- Tools (function calling)
- Tool results
- Thinking/extended thinking
Supported request fields
-
model -
max_tokens -
messages- Text
content - Image
content(base64) - Array of content blocks
-
tool_useblocks -
tool_resultblocks -
thinkingblocks
- Text
-
system(string or array) -
stream -
temperature -
top_p -
top_k -
stop_sequences -
tools -
thinking -
tool_choice -
metadata
Supported response fields
-
id -
type -
role -
model -
content(text, tool_use, thinking blocks) -
stop_reason(end_turn, max_tokens, tool_use) -
usage(input_tokens, output_tokens)
Streaming events
-
message_start -
content_block_start -
content_block_delta(text_delta, input_json_delta, thinking_delta) -
content_block_stop -
message_delta -
message_stop -
ping -
error
Models
Ollama supports both local and cloud models.Local models
Pull a local model before use:ollama pull qwen3-coder
qwen3-coder- Excellent for coding tasksgpt-oss:20b- Strong general-purpose model
Cloud models
Cloud models are available immediately without pulling:glm-4.7:cloud- High-performance cloud modelminimax-m2.1:cloud- Fast cloud model
Default model names
For tooling that relies on default Anthropic model names such asclaude-3-5-sonnet, use ollama cp to copy an existing model name:
ollama cp qwen3-coder claude-3-5-sonnet
model field:
curl http://localhost:11434/v1/messages \
-H "Content-Type: application/json" \
-d '{
"model": "claude-3-5-sonnet",
"max_tokens": 1024,
"messages": [
{
"role": "user",
"content": "Hello!"
}
]
}'
Differences from the Anthropic API
Behavior differences
- API key is accepted but not validated
anthropic-versionheader is accepted but not used- Token counts are approximations based on the underlying model’s tokenizer
Not supported
The following Anthropic API features are not currently supported:| Feature | Description |
|---|---|
/v1/messages/count_tokens | Token counting endpoint |
tool_choice | Forcing specific tool use or disabling tools |
metadata | Request metadata (user_id) |
| Prompt caching | cache_control blocks for caching prefixes |
| Batches API | /v1/messages/batches for async batch processing |
| Citations | citations content blocks |
| PDF support | document content blocks with PDF files |
| Server-sent errors | error events during streaming (errors return HTTP status) |
Partial support
| Feature | Status |
|---|---|
| Image content | Base64 images supported; URL images not supported |
| Extended thinking | Basic support; budget_tokens accepted but not enforced |