stream parameter to True.
Key streaming concepts
- Chatting: Stream partial assistant messages. Each chunk includes the
contentso you can render messages as they arrive. - Thinking: Thinking-capable models emit a
thinkingfield alongside regular content in each chunk. Detect this field in streaming chunks to show or hide reasoning traces before the final answer arrives. - Tool calling: Watch for streamed
tool_callsin each chunk, execute the requested tool, and append tool outputs back into the conversation.
Handling streamed chunks
It is necessary to accumulate the partial fields in order to maintain the history of the conversation. This is particularly important for tool calling where the thinking, tool call from the model, and the executed tool result must be passed back to the model in the next request.
- Python
- JavaScript
from ollama import chat
stream = chat(
model='qwen3',
messages=[{'role': 'user', 'content': 'What is 17 ×ばつ 23?'}],
stream=True,
)
in_thinking = False
content = ''
thinking = ''
for chunk in stream:
if chunk.message.thinking:
if not in_thinking:
in_thinking = True
print('Thinking:\n', end='', flush=True)
print(chunk.message.thinking, end='', flush=True)
# accumulate the partial thinking
thinking += chunk.message.thinking
elif chunk.message.content:
if in_thinking:
in_thinking = False
print('\n\nAnswer:\n', end='', flush=True)
print(chunk.message.content, end='', flush=True)
# accumulate the partial content
content += chunk.message.content
# append the accumulated fields to the messages for the next request
new_messages = [{ role: 'assistant', thinking: thinking, content: content }]
import ollama from 'ollama'
async function main() {
const stream = await ollama.chat({
model: 'qwen3',
messages: [{ role: 'user', content: 'What is 17 ×ばつ 23?' }],
stream: true,
})
let inThinking = false
let content = ''
let thinking = ''
for await (const chunk of stream) {
if (chunk.message.thinking) {
if (!inThinking) {
inThinking = true
process.stdout.write('Thinking:\n')
}
process.stdout.write(chunk.message.thinking)
// accumulate the partial thinking
thinking += chunk.message.thinking
} else if (chunk.message.content) {
if (inThinking) {
inThinking = false
process.stdout.write('\n\nAnswer:\n')
}
process.stdout.write(chunk.message.content)
// accumulate the partial content
content += chunk.message.content
}
}
// append the accumulated fields to the messages for the next request
new_messages = [{ role: 'assistant', thinking: thinking, content: content }]
}
main().catch(console.error)