Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

wan2.1 vae take more gpu memory after compile #12082

Open
Labels
bugSomething isn't working
@gameofdimension

Description

Describe the bug

After torch.compile wan2.1 vae consume more GPU memory than no compilation, which is unexpected in my opinion.

compiled
Image

no-compile
Image

Reproduction

import sys
import torch
from diffusers import AutoencoderKLWan
def compile_wan_vae(compile):
 model_id = 'Wan-AI/Wan2.1-T2V-14B-Diffusers'
 dtype = torch.float32
 device = 'cuda'
 torch.cuda.memory._record_memory_history()
 vae = AutoencoderKLWan.from_pretrained(
 model_id, subfolder="vae", torch_dtype=dtype
 ).to(device)
 if compile:
 vae.decoder = torch.compile(vae.decoder)
 shape = (1, 16, 13, 120, 120)
 with torch.no_grad():
 latents = torch.randn(shape, device=device, dtype=dtype)
 video = vae.decode(latents, return_dict=False)[0]
 torch.cuda.empty_cache()
 with torch.no_grad():
 for _ in range(3):
 latents = torch.randn(shape, device=device, dtype=dtype)
 video = vae.decode(latents, return_dict=False)[0]
 torch.cuda.memory._dump_snapshot(f"{compile}-compile.pickle")
if __name__ == '__main__':
 compile_wan_vae(sys.argv[1] == 'compile')

Logs

System Info

  • 🤗 Diffusers version: 0.34.0
  • Platform: Linux-5.10.134-16.1.3.vip.an8.x86_64-x86_64-with-glibc2.39
  • Running on Google Colab?: No
  • Python version: 3.12.3
  • PyTorch version (GPU?): 2.7.1+cu126 (True)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Huggingface_hub version: 0.34.2
  • Transformers version: 4.54.0
  • Accelerate version: 1.9.0
  • PEFT version: 0.16.0
  • Bitsandbytes version: not installed
  • Safetensors version: 0.5.3
  • xFormers version: not installed
  • Accelerator: NVIDIA L20, 46068 MiB
    NVIDIA L20, 46068 MiB
    NVIDIA L20, 46068 MiB
    NVIDIA L20, 46068 MiB
    NVIDIA L20, 46068 MiB
    NVIDIA L20, 46068 MiB
    NVIDIA L20, 46068 MiB
    NVIDIA L20, 46068 MiB
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

Who can help?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

      Relationships

      None yet

      Development

      No branches or pull requests

      Issue actions

        AltStyle によって変換されたページ (->オリジナル) /