Generate natural-sounding speech from text with these powerful models. Clone your own voice or pick from a variety of languages and speaking styles.
Featured models
Text-to-Audio (T2A) that offers voice synthesis, emotional expression, and multilingual capabilities. Designed for real-time applications with low latency
Updated 1 month, 3 weeks ago
7.3M runs
Text-to-Audio (T2A) that offers voice synthesis, emotional expression, and multilingual capabilities. Optimized for high-fidelity applications like voiceovers and audiobooks.
Updated 1 month, 3 weeks ago
1.3M runs
Generate expressive, natural speech in 23 languages. Features instant voice cloning from short audio, emotion control, and seamless cross-language voice transfer.
Updated 3 months, 4 weeks ago
10.5K runs
Generate expressive, natural speech. Features unique emotion control, instant voice cloning from short audio, and built-in watermarking.
Updated 6 months, 1 week ago
201K runs
Generate expressive, natural speech with Resemble AI's Chatterbox.
Updated 6 months, 1 week ago
16K runs
Kokoro v1.0 - text-to-speech (82M params, based on StyleTTS2)
Updated 11 months ago
71.4M runs
Recommended Models
If low latency matters most, minimax/speech-02-turbo is the standout model in the text-to-speech collection. It’s designed for near real-time audio generation, making it ideal for interactive experiences like chatbots, voice assistants, and in-game dialogue.
Higher-fidelity models like afiaka87/tortoise-tts are slower and better suited for offline rendering or projects where speed isn’t critical.
minimax/speech-02-hd is a strong all-around option in the text-to-speech collection. It provides clear, natural voices with expressive control and reasonable generation time.
Open-source options like afiaka87/tortoise-tts may be more cost-efficient to self-host, but they’re slower and less predictable in performance.
For polished audio content like voiceovers, podcasts, audiobooks, or narration, minimax/speech-02-hd is a great fit. It supports expressive delivery, natural pacing, and multiple languages. If you need finer emotional control or unique character voices, resemble-ai/chatterbox also performs well.
For applications where speed is essential—like voice-enabled apps, live interactions, or game characters—minimax/speech-02-turbo is the best match. It prioritizes fast generation and low latency while maintaining solid audio clarity.
For projects like games, animations, or storytelling, resemble-ai/chatterbox excels. It supports emotion control and fast voice cloning, letting you generate distinct character voices from just a few seconds of reference audio.
Most text-to-speech models return audio files, typically in MP3 format. Some also support WAV. Output voice options and supported languages vary by model, so check the model page for specifics.
Open-source models like afiaka87/tortoise-tts can be self-hosted with standard tooling. If you want to publish your own model on Replicate, package it with the required files and configuration and push it from your account.
Many models in the text-to-speech collection support commercial use, but always check the license. Some models include watermarking or usage restrictions that may affect how you use the audio in commercial projects.
Recommended Models
Clone voices to use with Minimax's speech-02-hd and speech-02-turbo
Updated 1 month, 3 weeks ago
27.4K runs
Dia 1.6B by Nari Labs, Generates realistic dialogue audio from text, including non-verbal cues and voice cloning
Updated 5 months, 2 weeks ago
10.1K runs
CSM (Conversational Speech Model) is a speech generation model from Sesame that generates RVQ audio codes from text and audio inputs
Updated 9 months, 1 week ago
1.1K runs
Orpheus 3B - high quality, emotive Text to Speech
Updated 9 months, 1 week ago
32.9K runs
Zero-Shot Speech Editing and Text-to-Speech in the Wild
Updated 9 months, 2 weeks ago
10.8K runs
A F5-TTS fine-tuned for Spanish
Updated 1 year, 1 month ago
1.4K runs
F5-TTS, the new state-of-the-art in open source voice cloning
Updated 1 year, 2 months ago
38.1K runs
A novel speech model for insane prosody.
Updated 1 year, 6 months ago
533 runs
Updated to OpenVoice v2: Versatile Instant Voice Cloning
Updated 1 year, 7 months ago
81.5K runs
lightweight text-to-speech (TTS) model, trained on 10.5K hours of audio data
Updated 1 year, 8 months ago
2.7K runs
Generates speech from text
Updated 1 year, 11 months ago
132K runs
Pheme generates a variety of conversational voices in 16 kHz for phone-call applications
Updated 1 year, 11 months ago
566 runs
Coqui XTTS-v2: Multilingual Text To Speech Voice Cloning
Updated 2 years, 1 month ago
4.7M runs
Create song covers with any RVC v2 trained AI voice from audio files.
Updated 2 years, 1 month ago
1.4M runs
SeamlessM4T—Massively Multilingual & Multimodal Machine Translation
Updated 2 years, 3 months ago
92.7K runs
NeonAI Coqui AI TTS Plugin.
Updated 2 years, 4 months ago
175.7K runs
🔊 Text-Prompted Generative Audio Model
Updated 2 years, 8 months ago
303.4K runs
Generate speech from text, clone voices from mp3 files. From James Betker AKA "neonbjb".
Updated 3 years, 4 months ago
173K runs