fix crash if tiling mode is enabled #12521

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

sywangyi wants to merge 2 commits into huggingface:main

from sywangyi:wan2.2_fix_tile

Open

fix crash if tiling mode is enabled #12521

sywangyi wants to merge 2 commits into huggingface:main from sywangyi:wan2.2_fix_tile

Conversation

@sywangyi

Copy link

Contributor

@sywangyi sywangyi commented Oct 21, 2025 •

edited by sayakpaul

Loading

@sayakpaul @dg845 please help review, test script

import torch
import numpy as np
from diffusers import WanPipeline, AutoencoderKLWan, WanTransformer3DModel, UniPCMultistepScheduler
from diffusers.utils import export_to_video, load_image
dtype = torch.bfloat16
device = "xpu"
access_token = "hf_xxxxxxxxxxxxxxxxxxxxxx"
model_id = "Wan-AI/Wan2.2-TI2V-5B-Diffusers"
vae = AutoencoderKLWan.from_pretrained(model_id, subfolder="vae", torch_dtype=torch.float32, token=access_token)
pipe = WanPipeline.from_pretrained(model_id, vae=vae, torch_dtype=dtype, token=access_token)
pipe.enable_model_cpu_offload()
print(torch.xpu.max_memory_allocated())
pipe.vae.enable_tiling(tile_sample_min_height=480,tile_sample_min_width=960,tile_sample_stride_height=352,tile_sample_stride_width=640)
height = 704
width = 1280
num_frames = 20
num_inference_steps = 50
guidance_scale = 5.0
prompt = "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
negative_prompt = "色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发>灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸
形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背>景人很多,倒着走"
output = pipe(
 prompt=prompt,
 negative_prompt=negative_prompt,
 height=height,
 width=width,
 num_frames=num_frames,
 guidance_scale=guidance_scale,
 num_inference_steps=num_inference_steps,
).frames[0]
export_to_video(output, "5bit2v_output.mp4", fps=24)
print(torch.xpu.max_memory_allocated())

@sywangyi


 fix crash in tiling mode is enabled

0cc20ee

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

@sywangyi

Copy link

Contributor Author

sywangyi commented Oct 21, 2025

cuda should have similar issue

@sywangyi sywangyi changed the title ~~(削除) fix crash in tiling mode is enabled (削除ここまで)~~ (追記) fix crash if tiling mode is enabled (追記ここまで)

Oct 21, 2025

sayakpaul

sayakpaul reviewed

Oct 21, 2025

View reviewed changes

Copy link

Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your PR.

But before we go on reviewing it, could you please:

Include an error trace that you get without the changes from this PR?
Include an output with the changes from this PR?
Additionally, the changes introduced in this PR seem non-intrusive to me. So, if you add comments to explain those changes, that'd be super nice.

@sywangyi

fmt

d777895

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

@HuggingFaceDocBuilderDev

Copy link

HuggingFaceDocBuilderDev commented Oct 21, 2025

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@sywangyi

Copy link

Contributor Author

sywangyi commented Oct 21, 2025

wo the change, crash like
Traceback (most recent call last):
File "/workspace/test.py", line 27, in
output = pipe(
^^^^^
File "/opt/venv/lib/python3.12/site-packages/torch/utils/contextlib.py", line 120, in decorate context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/workspace/diffusers/src/diffusers/pipelines/wan/pipeline_wan.py", line 645, in call
video = self.vae.decode(latents, return_dict=False)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/diffusers/src/diffusers/utils/accelerate_utils.py", line 46, in wrapper
return method(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/diffusers/src/diffusers/models/autoencoders/autoencoder_kl_wan.py", line 1248, in decode
decoded = self._decode(z).sample
^^^^^^^^^^^^^^^
File "/workspace/diffusers/src/diffusers/models/autoencoders/autoencoder_kl_wan.py", line 1204, in _decode
return self.tiled_decode(z, return_dict=return_dict)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/diffusers/src/diffusers/models/autoencoders/autoencoder_kl_wan.py", line 1374, in tiled_decode
decoded = self.decoder(tile, feat_cache=self._feat_map, feat_idx=self._conv_idx)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped _call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_im pl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/diffusers/src/diffusers/models/autoencoders/autoencoder_kl_wan.py", line 892, i n forward
x = up_block(x, feat_cache, feat_idx, first_chunk=first_chunk)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped _call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_im pl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/diffusers/src/diffusers/models/autoencoders/autoencoder_kl_wan.py", line 708, i n forward
x = x + self.avg_shortcut(x_copy, first_chunk=first_chunk)
~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
RuntimeError: The size of tensor a (2) must match the size of tensor b (4) at non-singleton dimens ion 2

@sywangyi

Copy link

Contributor Author

sywangyi commented Oct 21, 2025

image these lines aim to fix the crash. however, there's another crash after this crash is fixed.

@sayakpaul

Copy link

Member

sayakpaul commented Oct 21, 2025

Thanks! What about the outputs? Cc: @asomoza if you wanna help test it out a bit?

@sayakpaul

Copy link

Member

sayakpaul commented Oct 21, 2025

however, there's another crash after this crash is fixed.

So, it doesn't work yet?

@sywangyi

Copy link

Contributor Author

sywangyi commented Oct 21, 2025

however, there's another crash after this crash is fixed.

So, it doesn't work yet?

it works, the other crash is because patch_size is not considered in tiling mode. in this model, it's 2. and this PR fix it.

@sywangyi

Copy link

Contributor Author

sywangyi commented Oct 21, 2025

crash like
Traceback (most recent call last):
File "/workspace/test.py", line 36, in
export_to_video(output, "5bit2v_output.mp4", fps=24)
File "/workspace/diffusers/src/diffusers/utils/export_utils.py", line 177, in export_to_video
return _legacy_export_to_video(video_frames, output_video_path, fps)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/diffusers/src/diffusers/utils/export_utils.py", line 135, in _legacy_export_to_video
img = cv2.cvtColor(video_frames[i], cv2.COLOR_RGB2BGR)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
cv2.error: OpenCV(4.11.0) /io/opencv/modules/imgproc/src/color.simd_helpers.hpp:92: error: (-15:Bad number of channels) in function 'cv::impl::{anonymous}::CvtHelper<VScn, VDcn, VDepth, sizePolicy>::CvtHelper(cv::InputArray, cv::OutputArray, int) [with VScn = cv::impl::{anonymous}::Set<3, 4>; VDcn = cv::impl::{anonymous}::Set<3, 4>; VDepth = cv::impl::{anonymous}::Set<0, 2, 5>; cv::impl::{anonymous}::SizePolicy sizePolicy = cv::impl::::NONE; cv::InputArray = const cv::_InputArray&; cv::OutputArray = const cv::_OutputArray&]'

Invalid number of channels in input image:
'VScn::contains(scn)'
where
'scn' is 12
this PR also fix it

@sayakpaul sayakpaul mentioned this pull request

Oct 22, 2025

Wan2.2 TI2V-5B Tiled VAE Tensor size mismatch #12529

Open

@sayakpaul

Copy link

Member

sayakpaul commented Oct 22, 2025

@sywangyi would you be able to post some outputs after applying the fix?

@asomoza

Copy link

Member

asomoza commented Oct 22, 2025 •

edited

Loading

tested it with a simple pipe.vae.enable_tiling() over the example code:

~~(削除) in fact, it doesn't work with main, but this PR also doesn't fix it, still got: (削除ここまで)~~

RuntimeError: The size of tensor a (2) must match the size of tensor b (4) at non-singleton dimension 2

edit: I correct myself, I did a silly mistake, this PR does fix the issue for the 5B, I'll do a comparison with main

@asomoza

Copy link

Member

asomoza commented Oct 23, 2025

here they are:

main (without tiling)

5bit2v__main_output.mp4

PR with pipe.vae.enable_tiling()

5bit2v__pr_output.mp4

@vladmandic vladmandic mentioned this pull request

Oct 24, 2025

[Issue]: WAN 2.2 5B TI2V Tiled VAE Error vladmandic/sdnext#4289

Open

2 tasks

@yao-matrix

Copy link

Contributor

yao-matrix commented Oct 27, 2025

@sayakpaul , seems the PR works, could you pls take a look again? Thx very much

Labels

None yet

5 participants

@sywangyi @HuggingFaceDocBuilderDev @sayakpaul @asomoza @yao-matrix

Uh oh!

fix crash if tiling mode is enabled #12521

Are you sure you want to change the base?

fix crash if tiling mode is enabled #12521

Uh oh!

Conversation

@sywangyi sywangyi commented Oct 21, 2025 • edited by sayakpaul Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sywangyi commented Oct 21, 2025

Uh oh!

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Oct 21, 2025

Uh oh!

sywangyi commented Oct 21, 2025

Uh oh!

sywangyi commented Oct 21, 2025

Uh oh!

sayakpaul commented Oct 21, 2025

Uh oh!

sayakpaul commented Oct 21, 2025

Uh oh!

sywangyi commented Oct 21, 2025

Uh oh!

sywangyi commented Oct 21, 2025

Uh oh!

sayakpaul commented Oct 22, 2025

Uh oh!

asomoza commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asomoza commented Oct 23, 2025

Uh oh!

yao-matrix commented Oct 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

@sywangyi sywangyi commented Oct 21, 2025 •

edited by sayakpaul

Loading

asomoza commented Oct 22, 2025 •

edited

Loading