Generate videos

These models can generate and edit videos from text prompts and images. They use advanced AI techniques like diffusion models and latent space interpolation to create high-quality, controllable video content.

Key capabilities:

Text-to-video generation&nbsp;- Convert text prompts into video clips and animations. Useful for quickly prototyping video concepts.
Image-to-video generation&nbsp;- Animate still images into video.
Inpainting for infinite zoom&nbsp;- Use image inpainting to extrapolate video frames and create infinite zoom effects.
Stylization&nbsp;- Apply artistic filters like cartoonification to give videos a unique look and feel.

State of the art:&nbsp;google/veo-3-fast

For most people looking to generate custom videos from text prompts, we recommend google/veo-3

Open source:&nbsp;wan-video

The Wan video models model by Wan-AI is an excellent open-source option, competitive with the best proprietary video models. Try adjusting the number of steps used for each frame to trade off between generation speed and detail.

Other rankings

Generative video is a rapidly advancing field. Check out the arena and leaderboard at Artificial Analysis to see what's popular today.

Featured models

sora-2

openaiopenai/sora-2

OpenAI's Flagship video generation with synced audio

Updated 1 week, 5 days ago

158.4K runs

Official

wan-videowan-video/wan-2.5-t2v

Alibaba Wan 2.5 text to video generation model

Updated 3 weeks, 6 days ago

28.9K runs

Official

wan-videowan-video/wan-2.5-i2v

Alibaba Wan 2.5 Image to video generation with background audio

Updated 3 weeks, 6 days ago

142.2K runs

Official

veo-3.1-fast

googlegoogle/veo-3.1-fast

New and improved version of Veo 3 Fast, with higher-fidelity video, context-aware audio and last frame support

Updated 1 month ago

199.4K runs

Official

googlegoogle/veo-3.1

New and improved version of Veo 3, with higher-fidelity video, context-aware audio, reference image and last frame support

Updated 1 month ago

243.6K runs

Official

pixverse-v5

pixversepixverse/pixverse-v5

Create 5s-8s videos with enhanced character movement, visual effects, and exclusive 1080p-8s support. Optimized for anime characters and complex actions

Updated 1 month, 1 week ago

734.4K runs

Official

wan-videowan-video/wan-2.5-t2v-fast

Wan 2.5 text-to-video, optimized for speed

Updated 1 month, 1 week ago

32.1K runs

Official

bytedancebytedance/seedance-1-pro-fast

A faster and cheaper version of Seedance 1 Pro

Updated 1 month, 3 weeks ago

390.2K runs

Official

kwaivgikwaivgi/kling-v2.5-turbo-pro

Kling 2.5 Turbo Pro: Unlock pro-level text-to-video and image-to-video creation with smooth motion, cinematic depth, and remarkable prompt adherence.

Updated 1 month, 3 weeks ago

1.5M runs

Official

wan-videowan-video/wan-2.5-i2v-fast

Wan 2.5 image-to-video, optimized for speed

Updated 1 month, 3 weeks ago

40.8K runs

Official

minimaxminimax/hailuo-2.3

A high-fidelity video generation model optimized for realistic human motion, cinematic VFX, expressive characters, and strong prompt and style adherence across both text-to-video and image-to-video workflows

Updated 1 month, 3 weeks ago

31.2K runs

Official

minimaxminimax/hailuo-2.3-fast

A lower-latency image-to-video version of Hailuo 2.3 that preserves core motion quality, visual consistency, and stylization performance while enabling faster iteration cycles.

Updated 1 month, 3 weeks ago

24.9K runs

Official

Recommended Models

Frequently asked questions

Which models are the fastest?

The open-source Wan suite (like wan-video/wan-2.1-t2v-480p) is among the faster text-to-video options on Replicate, especially at lower resolutions and shorter durations. Many models also have "fast" variants, like google/veo-3-fast, designed for quicker turnaround.
Note: Faster runs usually mean lower resolution or simpler motion.

Which models give the best balance of cost and quality?

PixVerse v4 offers a strong balance for many use cases. It uses a unit-based system at 0ドル.01 per unit — for example, a 5-second, 360p video costs about 0ドル.30. Hailuo 02 is another good middle-ground option, with both standard and pro modes for different quality levels. Your ideal choice depends on how much resolution and runtime you need and how much you want to spend.

Which models are best for specific use-cases within this collection?

For cinematic realism with high resolution and optional audio, try Veo 3.
For fast prototyping (short clips, lower res), Wan models or PixVerse v4 work well.
For both text-to-video and image-to-video, Hailuo 02 supports both.
If you’re on a budget, stick with 480p or 360p outputs to keep costs low.

What’s the difference between key sub-types or approaches in this collection?

Text-to-video (T2V): You write a prompt and get a video.
Image-to-video (I2V): You provide a still image (or first frame) and animate it. Not all models support this.
Quality / resolution tiers: Some models focus on speed and lower res (e.g., Wan Fast), while others aim for higher resolution and richer motion (e.g., Hailuo 02, Veo 3).
Open-source vs proprietary: Open models like Wan are cheaper and often faster. Licensed models like Veo 3 offer higher fidelity but can be more expensive.

Which models are good for one common style or output type?

For short, stylized clips (5–10 seconds at lower resolution), PixVerse v4 and Wan models are great picks. They’re fast and relatively inexpensive, making them ideal for concept work, storyboarding, or rapid iteration.

Which models are good for another common style or output type?

If you want high-fidelity motion, longer clips, or more realistic physics, Veo 3 or Hailuo 02 Pro are better options. Hailuo 02 supports 768p in standard mode and 1080p in Pro mode, which makes it a solid choice for more polished results.

What types of outputs do these models produce?

Most text-to-video models generate short video clips (5–10 seconds) at 24 or 30 fps. Supported resolutions range from 360p to 1080p, depending on the model. Some, like Veo 3, can include audio as part of the output.

How much do runs typically cost?

Costs vary by model and resolution:

PixVerse v4: about 0ドル.30 for a 5-second, 360p video.
Wan models: generally very inexpensive for short, low-res clips.
Veo 3 and Hailuo 02: prices vary and aren’t always listed publicly, so check the model page for up-to-date details.
Generally, you’ll pay more for longer durations and higher resolutions.

How can I self-host or push a model to Replicate?

You can push your own model by packaging it with Cog and deploying it. If you’re working with open-source video models, you can also fine-tune them and publish your version for others to use.

Can I use these models for commercial work?

Yes, but always check the model’s license. Most text-to-video models on Replicate are available for commercial use, but some authors include additional restrictions.

How do I use or run these models?

You can use the Replicate playground or run them programmatically.

Pick a model from the text-to-video collection.
Add your prompt (and optionally an image for I2V).
Run the model and wait for the video to generate.
Download or embed your output.

Any other collection-specific tips or considerations?

Start with short durations and lower resolutions to experiment without overspending.
If animating a still image, choose a clean, well-framed starting image for better results.
Be specific in your prompts — details like camera motion or scene type improve output quality.
Not all models handle character consistency or motion equally well; higher-tier models tend to do better here.
Compare resolution and duration to match your budget and needs.
Check for updates, as text-to-video models evolve quickly and new versions can improve speed and quality.

State of the art:&amp;nbsp;google/veo-3-fast

Open source:&amp;nbsp;wan-video

Other rankings

Frequently asked questions

Which models are the fastest?

Which models give the best balance of cost and quality?

Which models are best for specific use-cases within this collection?

What’s the difference between key sub-types or approaches in this collection?

Which models are good for one common style or output type?

Which models are good for another common style or output type?

What types of outputs do these models produce?

How much do runs typically cost?

How can I self-host or push a model to Replicate?

Can I use these models for commercial work?

How do I use or run these models?

Any other collection-specific tips or considerations?

State of the art: google/veo-3-fast

Open source: wan-video