Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

jcelerier/librediffusion

Repository files navigation

librediffusion

A C++ / CUDA / TensorRT implementation of StreamDiffusion

Implemented in ossia score

Benchmarks

On a RTX 5090 at 1 step:

SDXL Turbo 1024x1024: stable 26 fps

sdxl

SD Turbo 512x512: stable 96 fps

sdturbo

SDXS: above 600 fps

sdxs

Models need to be converted to TensorRT through the Python script [train-lora.py] beforehand:

$ uv run train-lora.py --model stabilityai/sd-turbo --min-batch 1 --max-batch 1 --opt-batch 1 --min-resolution 512 --max-resolution 1024 --output ./engines-sd-turbo

Engine build recipes (validated)

All commands run as uv run python train-lora.py <args>. Resolution flags are --min-resolution/--max-resolution/--opt-width/--opt-height (512 for SD1.5, 1024 for SDXL); add -l REPO (or -l "REPO|file.safetensors") to fuse a LoRA, repeatable to stack LoRAs.

Base models

# SD1.5 family (--type sd15)
train-lora.py --type sd15 --model stabilityai/sd-turbo --min-resolution 512 --max-resolution 512 --output engines/sd-turbo
train-lora.py --type sd15 --model SimianLuo/LCM_Dreamshaper_v7 --min-resolution 512 --max-resolution 512 --output engines/lcm-dreamshaper
train-lora.py --type sd15 --model Lykon/dreamshaper-8-lcm --min-resolution 512 --max-resolution 512 --output engines/dreamshaper-8-lcm
train-lora.py --type sd15 --model IDKiro/sdxs-512-dreamshaper --min-resolution 512 --max-resolution 512 --output engines/sdxs
train-lora.py --type sd15 --model runwayml/stable-diffusion-v1-5 -l latent-consistency/lcm-lora-sdv1-5 --min-resolution 512 --max-resolution 512 --output engines/sd15-lcm-lora
train-lora.py --type sd15 --model runwayml/stable-diffusion-v1-5 -l "ByteDance/Hyper-SD|Hyper-SD15-4steps-lora.safetensors" --min-resolution 512 --max-resolution 512 --output engines/hyper-sd15
# SDXL family (--type sdxl)
train-lora.py --type sdxl --model stabilityai/sdxl-turbo --min-resolution 1024 --max-resolution 1024 --output engines/sdxl-turbo
train-lora.py --type sdxl --model stabilityai/stable-diffusion-xl-base-1.0 -l latent-consistency/lcm-lora-sdxl --min-resolution 1024 --max-resolution 1024 --output engines/sdxl-lcm-lora
train-lora.py --type sdxl --model stabilityai/stable-diffusion-xl-base-1.0 -l "ByteDance/Hyper-SD|Hyper-SDXL-4steps-lora.safetensors" --min-resolution 1024 --max-resolution 1024 --output engines/hyper-sdxl
train-lora.py --type sdxl --model stabilityai/stable-diffusion-xl-base-1.0 -l "ByteDance/SDXL-Lightning|sdxl_lightning_4step_lora.safetensors" --min-resolution 1024 --max-resolution 1024 --output engines/sdxl-lightning
train-lora.py --type sdxl --model segmind/Segmind-Vega -l segmind/Segmind-VegaRT --min-resolution 1024 --max-resolution 1024 --output engines/vega-rt
# img2img-turbo (GaParmar pix2pix-turbo skip-VAE) and FLUX.2-klein-4B
train-lora.py --type img2img-turbo --model edge_to_image --min-resolution 512 --max-resolution 512 --output engines/img2img-turbo
train-lora.py --type klein --model black-forest-labs/FLUX.2-klein-4B --output engines/klein

ControlNet (--controlnet REPO)

The ControlNet must match the base UNet's block/residual count: standard SD1.5/SDXL ControlNets for standard models, and the model-native ControlNet for reduced architectures (e.g. sdxs).

# SD1.5 (12+mid residuals) — works on any standard SD1.5 base (swap --model/-l)
train-lora.py --type sd15 --model SimianLuo/LCM_Dreamshaper_v7 --controlnet lllyasviel/control_v11p_sd15_canny --min-resolution 512 --max-resolution 512 --output engines/lcm-dreamshaper-canny
# other SD1.5 ControlNets: control_v11f1p_sd15_depth, control_v11p_sd15_openpose, control_v11p_sd15_softedge
# SDXL (works on any SDXL base, incl. LoRA-fused)
train-lora.py --type sdxl --model stabilityai/stable-diffusion-xl-base-1.0 -l latent-consistency/lcm-lora-sdxl --controlnet diffusers/controlnet-canny-sdxl-1.0 --min-resolution 1024 --max-resolution 1024 --output engines/sdxl-lcm-lora-canny
# sdxs needs its NATIVE sketch ControlNet (7 residuals) — generic SD1.5 canny (13) is incompatible
train-lora.py --type sd15 --model IDKiro/sdxs-512-dreamshaper --controlnet IDKiro/sdxs-512-dreamshaper-sketch --min-resolution 512 --max-resolution 512 --output engines/sdxs-sketch

IP-Adapter (--ipadapter CKPT)

# SD1.5 (h94 ip-adapter_sd15.bin) — works on standard SD1.5 bases (NOT sdxs, whose attention differs)
train-lora.py --type sd15 --model SimianLuo/LCM_Dreamshaper_v7 --ipadapter /path/to/h94/IP-Adapter/models/ip-adapter_sd15.bin --min-resolution 512 --max-resolution 512 --output engines/lcm-dreamshaper-ip
# SDXL (h94 sdxl_models/ip-adapter_sdxl_vit-h.bin)
train-lora.py --type sdxl --model stabilityai/sdxl-turbo --ipadapter /path/to/h94/IP-Adapter/sdxl_models/ip-adapter_sdxl_vit-h.bin --min-resolution 1024 --max-resolution 1024 --output engines/sdxl-turbo-ip

Merged / stacked LoRAs (-l repeated)

Any base + accel LoRA + arbitrary style LoRA(s); LoRAs are fused at export time.

# accel (LCM) + a style LoRA, stacked
train-lora.py --type sdxl --model stabilityai/stable-diffusion-xl-base-1.0 \
 -l latent-consistency/lcm-lora-sdxl \
 -l "ostris/crayon_style_lora_sdxl|crayons_v1_sdxl.safetensors" \
 --min-resolution 1024 --max-resolution 1024 --output engines/sdxl-lcm-crayon
# single style LoRA on a turbo base
train-lora.py --type sdxl --model stabilityai/sdxl-turbo \
 -l "ostris/ikea-instructions-lora-sdxl|ikea_instructions_xl_v1_5.safetensors" \
 --min-resolution 1024 --max-resolution 1024 --output engines/sdxlturbo-ikea

Optional flags

  • --fp8 — FP8 (e4m3) UNet quantization (SDXL/SD; diffusion recipe).
  • --v2v / --v2v-inject — StreamV2V kvo extended-attention UNet (SD1.5 only).
  • --hw-compat ampere_plus — portable engine across SM 8.0+ GPUs (~5–15% slower, larger).
  • -l "REPO:WEIGHT" — set LoRA fusion scale; --lora-scale for the global scale.

About

C++ / CUDA implementation of StreamDiffusion

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project

Contributors

AltStyle によって変換されたページ (->オリジナル) /