License: MIT Python 3.10+ Docker
Turn a text prompt into a fully produced video. 6 AI agents collaborate through a structured pipeline -- from creative direction to final cut.
User Prompt
|
v
[Director Agent] -- Analyzes creative vision, plans shots
|
v
[Script Agent] -- Writes production script with video prompts, narration, subtitles
|
+------------------+------------------+
| | |
v v v
[Visual Agent] [Voice Agent] [Music Agent]
Hailuo Video Speech TTS Music Gen
| | |
+------------------+------------------+
|
v
[Editor Agent] -- ffmpeg: concat, mix audio, burn subtitles
|
v
Final MP4
Visual, Voice, and Music agents run in parallel after the Script is ready. The Editor waits for all three, then composites the final video.
- Multi-agent SOP -- 6 specialized agents, each with a single responsibility, orchestrated in a production pipeline
- Parallel execution -- Video, voice, and music generation run concurrently to minimize total wait time
- Real-time progress -- SSE streaming shows every agent's status and logs as the pipeline runs
- One prompt, full video -- From text description to a complete MP4 with narration, subtitles, and background music
- ~0ドル.50 per video -- Typical cost for a 3-shot video using MiniMax APIs
| Component | Technology |
|---|---|
| LLM | MiniMax M1 (Director + Script agents) |
| Video | MiniMax Hailuo 2.3 (1080P, 6s clips) |
| Voice | MiniMax Speech-2.6-HD (TTS with emotion) |
| Music | MiniMax Music-2.0 (instrumental BGM) |
| Composition | ffmpeg (concat, audio mix, subtitles) |
| Backend | Python FastAPI + SSE streaming |
| Frontend | HTML + Tailwind CSS |
- Python 3.10+
- ffmpeg (
brew install ffmpegorapt install ffmpeg) - MiniMax API key
git clone https://github.com/calderbuild/agentcut.git cd agentcut pip install -r backend/requirements.txt cp .env.example .env # Edit .env and add your MiniMax API key
python -m backend.main
cp .env.example .env
# Edit .env and add your MiniMax API key
docker compose up- Enter a video description (e.g. "A sunset over Tokyo, cinematic aerial view")
- Choose duration and shot count
- Click Start Production
- Watch the 6 agents work in real-time
- Download the final MP4
| Endpoint | Method | Description |
|---|---|---|
/api/create |
POST | Start a production job |
/api/stream/{job_id} |
GET | SSE event stream for progress |
/api/status/{job_id} |
GET | Poll job status |
/api/download/{job_id} |
GET | Download final video |
/api/health |
GET | Health check |
curl -X POST http://localhost:8000/api/create \ -H "Content-Type: application/json" \ -d '{"prompt": "A sunset over Tokyo", "duration": 18, "num_shots": 3}'
Each video generation uses MiniMax API credits:
- LLM calls (Director + Script): ~0ドル.01
- Video generation (3 shots): ~0ドル.30-0.60
- Voice TTS (3 shots): ~0ドル.01
- Music generation: ~0ドル.05
Estimated total per video: ~0ドル.40-0.70
ffmpeg subtitle filter not available: If subtitles are not burned into the video, your ffmpeg build lacks libass. Install with brew install ffmpeg (macOS, usually includes it) or apt install ffmpeg libass-dev (Linux). AgentCut will still produce videos without subtitles if the filter is missing.
API key errors: Make sure .env contains a valid MINIMAX_API_KEY. The server will refuse to start if the key is missing.
Video generation timeout: Hailuo video generation can take 2-5 minutes per shot. The pipeline polls every 10 seconds with a 5-minute timeout per shot.
backend/
agents/
director.py -- Creative director: prompt -> shot outline
script.py -- Scriptwriter: outline -> production script
visual.py -- Visual artist: script -> video clips (parallel)
voice.py -- Voice artist: script -> narration audio
music.py -- Composer: style+mood -> background music
editor.py -- Editor: ffmpeg composition
pipeline.py -- Agent orchestration (parallel execution)
main.py -- FastAPI server + SSE streaming
config.py -- Configuration + validation
frontend/
index.html -- Single-page app with real-time pipeline UI