-
Couldn't load subscription status.
- Fork 0
6146b12 Replicate’s v2 Python SDK is now in public beta. 🎉
As always, the replicate package is published on PyPI, and you can install it with pip using the --pre flag:
pip install --pre replicate
What’s new?
This new version is a complete rewrite of the SDK, built in partnership with Stainless, the team that helps design and maintain official SDKs for companies like OpenAI, Anthropic, and Cloudflare.
Replicate's v2 Python SDK is generated dynamically from our public OpenAPI schema. This allows us to automate client code generation and provide a Python API with method names, type hints, and documentation that is perfectly consistent with our HTTP API.
Now that most of the client code is generated dynamically, all changes to Replicate’s HTTP API are automatically supported by the Python SDK. This means whenever we add a new operation (like the new search API) or improve our docs for an existing API (like predictions.create()), the changes are automatically published in a new release of the Python SDK.
Running models
We think running AI models should be as easy as installing and running a package from PyPI.
With this idea in mind, we designed a new replicate.use() method that lets you run models as Python functions:
# pip install --pre replicate import replicate claude = replicate.use("anthropic/claude-4.5-sonnet") seedream = replicate.use("bytedance/seedream-4") veo = replicate.use("google/veo-3-fast") # Enhance a simple prompt image_prompt = claude(prompt="bananas wearing cowboy hats", system_prompt="turn prompts into image prompts") # Generate an image from the enhanced prompt images = seedream(prompt=image_prompt) # Generate a video from the image video = veo(prompt="dancing bananas", image_input=images[0]) open(video)
The new .use() method also supports streaming output. Here’s an example showing how to consume output tokens from Claude Sonnet 4.5 while the model is running:
import replicate claude = replicate.use("anthropic/claude-4.5-sonnet", streaming=True) for chunk in claude(prompt="Write a haiku about streaming output."): print(str(chunk), end="") # Bytes flow through the pipe # Data chunks arrive in waves # Code drinks from the stream
API design
Our new SDK was designed to be approachable for newcomers while also being feature-complete for power users. There are three levels of APIs built into the new SDK, varying from simple high-level abstractions to powerful low-level methods that you give you complete control:
🍰 High-level API
The v2 SDK provides a new replicate.use() method that make it easy to run models and get their output all at once or as a streaming response. The replicate.run() method is still supported so your applications will continue to work, but recommend using use() going forward.
🛠️ Mid-level API
The v2 SDK has methods for every single operation available in our public HTTP API, like search(), predictions.create() , and collections.list(). These more fine-grained methods are defined by our OpenAPI schema, and updated in lock-step with our API. Every new feature, bug fix, or documentation improvement in our API becomes available immediately in a new release of the Python SDK. See our HTTP API docs and Python SDK docs for reference.
The SDK now supports all of these API operations:
search- Search models, collections, and docs (beta)predictions.create- Create a predictionpredictions.get- Get a predictionpredictions.list- List predictionspredictions.cancel- Cancel a predictionmodels.create- Create a modelmodels.get- Get a modelmodels.list- List public modelsmodels.update- Update metadata for a modelmodels.search- Search public modelsmodels.delete- Delete a modelmodels.examples.list- List examples for a modelmodels.predictions.create- Create a prediction using an official modelmodels.readme.get- Get a model's READMEmodels.versions.get- Get a model versionmodels.versions.list- List model versionsmodels.versions.delete- Delete a model versioncollections.get- Get a collection of modelscollections.list- List collections of modelsdeployments.create- Create a deploymentdeployments.get- Get a deploymentdeployments.list- List deploymentsdeployments.update- Update a deploymentdeployments.delete- Delete a deploymentdeployments.predictions.create- Create a prediction using a deploymenttrainings.create- Create a trainingtrainings.get- Get a trainingtrainings.list- List trainingstrainings.cancel- Cancel a traininghardware.list- List available hardware for modelsaccount.get- Get the authenticated accountwebhooks.default.secret.get- Get the signing secret for the default webhook
🔬Low-level API
The v2 SDK includes generic request methods like replicate.get() and replicate.post() for making custom API requests with full control over the request and response. This is useful for testing undocumented APIs, setting custom headers, or getting lower-level access to response objects.
New SDK features
In addition to the new API design, there are loads of new features in the v2 SDK:
- Type hints: Typed requests and responses provide autocomplete and documentation within your editor.
- Pagination: All list methods are paginated, and the SDK provides auto-paginating iterators with each list response so you do not have to request successive pages manually.
- Retries: Certain errors like 408, 409, 429, and >=500 are automatically retried 2 times by default, with a short exponential backoff.
- Async/await support: Full async client with
AsyncReplicatethat supports all SDK methods includingrun()andstream(). - Alternative HTTP backends: Optional aiohttp support for improved concurrency performance in async applications.
- Streaming output: Stream model outputs in real-time with
replicate.stream()for language models. - File upload flexibility: Pass files as URLs, file handles, bytes, PathLike objects, or tuples of
(filename, contents, media_type). - Raw response access: Access response headers and raw data with
.with_raw_responseand.with_streaming_response. - Per-request configuration: Override client options on a per-request basis with
.with_options(). - Configurable timeouts: Fine-grained timeout control at the client or request level, including separate read/write/connect timeouts.
- Better error handling: Specific exception types for different HTTP status codes (
BadRequestError,AuthenticationError,RateLimitError, etc.) with access to underlying response data. - Manual pagination control: Granular page control with
has_next_page(),next_page_info(), andget_next_page()methods. - Response serialization: Pydantic models with built-in
.to_json()and.to_dict()methods. - Logging support: Built-in logging via
REPLICATE_LOGenvironment variable. - Context manager support: Proper resource management with context managers for both sync and async clients.
For more details about all these new features, see the README.
Installation
To get started using the public beta pre-release, pip install it with the --pre flag:
pip install --pre replicate
Usage
For basic usage, the new 2.x client works just like the old 1.x client.
As always, you can import the library and run models with just a few lines of code:
import replicate model = "anthropic/claude-4.5-sonnet" prompt = "Write me a poem about SDKs" output = replicate.run(model, input={"prompt": prompt}) print(output)
You can also stream output as the model is running:
import replicate claude = replicate.use("anthropic/claude-4.5-sonnet", streaming=True) for event in claude(prompt="Write a haiku about streaming output."): print(str(event), end="")
Want to try it out but don’t have Python at your disposal? Check out the Google Colab notebook.
See sdks.replicate.com/python for a complete summary of available methods.
Migrating from v1 to v2
v2 is a complete rewrite of the Python SDK, so there are user-facing breaking changes you should be aware of if you’re migrating an existing codebase from v1 to v2. We’ve published a guide to simplify the migration process:
https://github.com/replicate/replicate-python-beta/blob/main/UPGRADING.md
☝️ We recommend feeding this guide to an agent like Claude Code, OpenAI Codex, Gemini CLI, or Cursor to help you automate the process of migrating your project from v1 to v2.
Pinning to v1
v2 has lots of new bells and whistles, but you are not required to upgrade. If you’re already using the v1 version and want to continue using it, pin the version number in your dependency file.
See pinning in the migration guide for details.