Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

CineTrans: Learning to Generate Videos with Cinematic Transitions via Masked Diffusion Models

Notifications You must be signed in to change notification settings

whuhxb/CineTrans

Repository files navigation

CineTrans: Learning to Generate Videos with Cinematic Transitions via Masked Diffusion Models

This repository contains the official PyTorch implementation of CineTrans, a novel framework for generating videos with controllable cinematic transitions via masked diffusion models.

Paper Project Page

πŸŽ₯ Demo

teaser_video_compressed_.mp4

πŸ“₯ Installation

  1. Clone the Repository
git clone https://github.com/UknowSth/CineTrans.git
cd CineTrans
  1. Set up Environment
conda create -n cinetrans python==3.11.9
conda activate cinetrans
pip install torch==2.5.1 torchvision==0.20.1 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt

πŸ€— Checkpoint

CineTrans-Unet

Download the required model weights and place them in the ckpt/ directory.

ckpt/
│── stable-diffusion-v1-4/
β”‚ β”œβ”€β”€ scheduler/
β”‚ β”œβ”€β”€ text_encoder/
β”‚ β”œβ”€β”€ tokenizer/ 
β”‚ │── unet/
β”‚ └── vae_temporal_decoder/
│── checkpoint.pt
│── longclip-L.pt

CineTrans-DiT

Download the weights of Wan2.1-T2V-1.3B and lora weights. Place them as:

Wan2.1-T2V-1.3B/ # original weights
│── google/
β”‚ └── umt5-xxl/
│── config.json
│── diffusion_pytorch_model.safetensors
│── models_t5_umt5-xxl-enc-bf16.pth
│── Wan2.1_VAE.pth
ckpt/
└── weights.pt # lora weights

πŸ–₯️ Inference

To run the inference, use the following command:

CineTrans-Unet

python pipelines/sample.py --config configs/sample.yaml

Using a single A100 GPU, generating a single video takes approximately 40s. You can modify the relevant configurations and prompt in configs/sample.yaml to adjust the generation process.

CineTrans-DiT

python generate.py

Using a single A100 GPU, generating a single video takes approximately 5min. You can modify the relevant configurations and prompt in configs/t2v.yaml to adjust the generation process.

πŸ–ΌοΈ Gallery

coffee_cup white_flower snow
Shot1:[0s,4s] Shot2:[4s,8s] Shot1:[0s,4s] Shot2:[4s,8s] Shot1:[0s,2.75s] Shot2:[2.75s,5.5s] Shot3:[5.5s,8s]
vintage city_night sea
Shot1:[0s,2.5s] Shot2:[2.5s,5s] Shot3:[5s,8s] Shot1:[0s,2.5s] Shot2:[2.5s,5s] Shot3:[5s,8s] Shot1:[0s,3s] Shot2:[3s,6s] Shot3:[6s,8s]

πŸ“‘ BiTeX

If you find CineTrans useful for your research and applications, please cite using this BibTeX:

@misc{wu2025cinetranslearninggeneratevideos,
 title={CineTrans: Learning to Generate Videos with Cinematic Transitions via Masked Diffusion Models}, 
 author={Xiaoxue Wu and Bingjie Gao and Yu Qiao and Yaohui Wang and Xinyuan Chen},
 year={2025},
 eprint={2508.11484},
 archivePrefix={arXiv},
 primaryClass={cs.CV},
 url={https://arxiv.org/abs/2508.11484}, 
}

About

CineTrans: Learning to Generate Videos with Cinematic Transitions via Masked Diffusion Models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

AltStyle γ«γ‚ˆγ£γ¦ε€‰ζ›γ•γ‚ŒγŸγƒšγƒΌγ‚Έ (->γ‚ͺγƒͺγ‚ΈγƒŠγƒ«) /