try/except with time.sleep(), you're doing it the hard way. Here's the fix in one decorator.
The Code
import openai
from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type, before_sleep_log
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
@retry(
retry=retry_if_exception_type((
openai.RateLimitError,
openai.APITimeoutError,
openai.APIConnectionError,
)),
wait=wait_exponential(multiplier=1, min=2, max=30),
stop=stop_after_attempt(3),
before_sleep=before_sleep_log(logger, logging.WARNING),
)
def call_llm(prompt: str) -> str:
client = openai.OpenAI()
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}],
)
return response.choices[0].message.content
result = call_llm("Explain retry logic in one sentence.")
print(result)
Install the dependency:
pip install tenacity openai
Run it. If OpenAI returns a rate limit error, the call waits 2 seconds, then 4, then 8 -- and retries up to 3 times. If it still fails after 3 attempts, the exception propagates normally.
How It Works
retry_if_exception_type tells tenacity which errors to retry. We target three specific OpenAI exceptions:
-
RateLimitError (429) -- you hit the token or request limit
-
APITimeoutError -- the request took too long
-
APIConnectionError -- network issues between you and OpenAI
All other errors (like AuthenticationError or BadRequestError) raise immediately. You don't want to retry a bad API key three times.
wait_exponential(multiplier=1, min=2, max=30) sets the backoff schedule. First retry waits 2 seconds, second waits 4 seconds, and it caps at 30 seconds. This is critical for rate limits -- hammering the API with instant retries makes the problem worse.
stop_after_attempt(3) caps the total attempts. Three retries is the sweet spot for most LLM calls. More than that usually means the issue isn't transient.
before_sleep_log logs a warning before each retry so you know exactly when and why retries happen. No silent failures.
What You're Replacing
Here's what most codebases have instead:
# Don't do this
import time
def call_llm_bad(prompt: str) -> str:
for attempt in range(3):
try:
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}],
)
return response.choices[0].message.content
except Exception:
time.sleep(2)
raise Exception("LLM call failed after 3 attempts")
This catches every exception (including auth errors that will never self-resolve), uses fixed sleep (no backoff), and gives you zero visibility into what failed. The tenacity version is 4 lines shorter and handles all of this correctly.
Quick Customizations
Retry more aggressively for batch jobs:
@retry(
retry=retry_if_exception_type(openai.RateLimitError),
wait=wait_exponential(multiplier=2, min=4, max=60),
stop=stop_after_attempt(5),
)
Add a callback when all retries fail:
from tenacity import retry, RetryError
try:
result = call_llm("Your prompt here")
except RetryError:
result = "Fallback: LLM unavailable. Using cached response."
The pattern works with any LLM provider -- swap openai.RateLimitError for the equivalent exception from Anthropic, Google, or your provider's SDK.
Next Steps
Retry logic is one piece of production-ready LLM code. For the full picture of what breaks in production agents and how to fix each failure mode, check out 5 AI Agent Failures in Production.
If you're building agents that chain multiple LLM calls and tool actions, Nebula handles retry and fallback logic automatically for every tool call -- no decorators needed.
Part of the AI Agent Quick Tips series.