Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

How do I stream chat completions with OpenAI’s Python API? #2462

Discussion options

I'm using the official openai Python package to call the Chat API (gpt-3.5 or gpt-4), and I'd like to stream the response instead of waiting for the full reply.

I tried this:

response = openai.ChatCompletion.create(
 model="gpt-3.5-turbo",
 messages=[{"role": "user", "content": "Hello"}]
)

But it waits until everything is returned.

How can I make it stream the tokens one by one as they’re generated?

You must be logged in to vote

Great question!

To stream chat completions using the openai Python package, you need to set stream=True and then iterate over the events.

Here’s how you can do it:

import openai
openai.api_key = "your-api-key"
response = openai.ChatCompletion.create(
 model="gpt-3.5-turbo",
 messages=[{"role": "user", "content": "Hello!"}],
 stream=True # ✅ this enables streaming
)
for chunk in response:
 if "choices" in chunk:
 content = chunk["choices"][0]["delta"].get("content", "")
 print(content, end="", flush=True)

This will print the generated message token-by-token in real time.

Let me know if that works — and feel free to mark this as the answer if it helps! ✅

Replies: 2 comments 2 replies

Comment options

Great question!

To stream chat completions using the openai Python package, you need to set stream=True and then iterate over the events.

Here’s how you can do it:

import openai
openai.api_key = "your-api-key"
response = openai.ChatCompletion.create(
 model="gpt-3.5-turbo",
 messages=[{"role": "user", "content": "Hello!"}],
 stream=True # ✅ this enables streaming
)
for chunk in response:
 if "choices" in chunk:
 content = chunk["choices"][0]["delta"].get("content", "")
 print(content, end="", flush=True)

This will print the generated message token-by-token in real time.

Let me know if that works — and feel free to mark this as the answer if it helps! ✅

You must be logged in to vote
1 reply
Comment options

Thank you so much!

Answer selected by Istituto-freudinttheprodev
Comment options

You must be logged in to vote
1 reply
Comment options

@AhmedGMurtaza Thanks for the link to the article, I didn't think OpenAI had released an official paper on this specific topic!
In any case I will gladly read it to find out more

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

AltStyle によって変換されたページ (->オリジナル) /