Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

fix crash if tiling mode is enabled #12521

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
sywangyi wants to merge 2 commits into huggingface:main
base: main
Choose a base branch
Loading
from sywangyi:wan2.2_fix_tile

Conversation

@sywangyi
Copy link
Contributor

@sywangyi sywangyi commented Oct 21, 2025
edited by sayakpaul
Loading

@sayakpaul @dg845 please help review, test script

import torch
import numpy as np
from diffusers import WanPipeline, AutoencoderKLWan, WanTransformer3DModel, UniPCMultistepScheduler
from diffusers.utils import export_to_video, load_image
dtype = torch.bfloat16
device = "xpu"
access_token = "hf_xxxxxxxxxxxxxxxxxxxxxx"
model_id = "Wan-AI/Wan2.2-TI2V-5B-Diffusers"
vae = AutoencoderKLWan.from_pretrained(model_id, subfolder="vae", torch_dtype=torch.float32, token=access_token)
pipe = WanPipeline.from_pretrained(model_id, vae=vae, torch_dtype=dtype, token=access_token)
pipe.enable_model_cpu_offload()
print(torch.xpu.max_memory_allocated())
pipe.vae.enable_tiling(tile_sample_min_height=480,tile_sample_min_width=960,tile_sample_stride_height=352,tile_sample_stride_width=640)
height = 704
width = 1280
num_frames = 20
num_inference_steps = 50
guidance_scale = 5.0
prompt = "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
negative_prompt = "色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发>灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸
形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背>景人很多,倒着走"
output = pipe(
 prompt=prompt,
 negative_prompt=negative_prompt,
 height=height,
 width=width,
 num_frames=num_frames,
 guidance_scale=guidance_scale,
 num_inference_steps=num_inference_steps,
).frames[0]
export_to_video(output, "5bit2v_output.mp4", fps=24)
print(torch.xpu.max_memory_allocated())

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Copy link
Contributor Author

cuda should have similar issue

@sywangyi sywangyi changed the title (削除) fix crash in tiling mode is enabled (削除ここまで) (追記) fix crash if tiling mode is enabled (追記ここまで) Oct 21, 2025
Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your PR.

But before we go on reviewing it, could you please:

  • Include an error trace that you get without the changes from this PR?
  • Include an output with the changes from this PR?
  • Additionally, the changes introduced in this PR seem non-intrusive to me. So, if you add comments to explain those changes, that'd be super nice.

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Contributor Author

wo the change, crash like
Traceback (most recent call last):
File "/workspace/test.py", line 27, in
output = pipe(
^^^^^
File "/opt/venv/lib/python3.12/site-packages/torch/utils/contextlib.py", line 120, in decorate context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/workspace/diffusers/src/diffusers/pipelines/wan/pipeline_wan.py", line 645, in call
video = self.vae.decode(latents, return_dict=False)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/diffusers/src/diffusers/utils/accelerate_utils.py", line 46, in wrapper
return method(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/diffusers/src/diffusers/models/autoencoders/autoencoder_kl_wan.py", line 1248, in decode
decoded = self._decode(z).sample
^^^^^^^^^^^^^^^
File "/workspace/diffusers/src/diffusers/models/autoencoders/autoencoder_kl_wan.py", line 1204, in _decode
return self.tiled_decode(z, return_dict=return_dict)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/diffusers/src/diffusers/models/autoencoders/autoencoder_kl_wan.py", line 1374, in tiled_decode
decoded = self.decoder(tile, feat_cache=self._feat_map, feat_idx=self._conv_idx)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped _call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_im pl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/diffusers/src/diffusers/models/autoencoders/autoencoder_kl_wan.py", line 892, i n forward
x = up_block(x, feat_cache, feat_idx, first_chunk=first_chunk)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped _call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_im pl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/diffusers/src/diffusers/models/autoencoders/autoencoder_kl_wan.py", line 708, i n forward
x = x + self.avg_shortcut(x_copy, first_chunk=first_chunk)
~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
RuntimeError: The size of tensor a (2) must match the size of tensor b (4) at non-singleton dimens ion 2

Copy link
Contributor Author

image these lines aim to fix the crash. however, there's another crash after this crash is fixed.

Copy link
Member

Thanks! What about the outputs? Cc: @asomoza if you wanna help test it out a bit?

asomoza reacted with thumbs up emoji

Copy link
Member

however, there's another crash after this crash is fixed.

So, it doesn't work yet?

Copy link
Contributor Author

however, there's another crash after this crash is fixed.

So, it doesn't work yet?

it works, the other crash is because patch_size is not considered in tiling mode. in this model, it's 2. and this PR fix it.

Copy link
Contributor Author

crash like
Traceback (most recent call last):
File "/workspace/test.py", line 36, in
export_to_video(output, "5bit2v_output.mp4", fps=24)
File "/workspace/diffusers/src/diffusers/utils/export_utils.py", line 177, in export_to_video
return _legacy_export_to_video(video_frames, output_video_path, fps)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/diffusers/src/diffusers/utils/export_utils.py", line 135, in _legacy_export_to_video
img = cv2.cvtColor(video_frames[i], cv2.COLOR_RGB2BGR)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
cv2.error: OpenCV(4.11.0) /io/opencv/modules/imgproc/src/color.simd_helpers.hpp:92: error: (-15:Bad number of channels) in function 'cv::impl::{anonymous}::CvtHelper<VScn, VDcn, VDepth, sizePolicy>::CvtHelper(cv::InputArray, cv::OutputArray, int) [with VScn = cv::impl::{anonymous}::Set<3, 4>; VDcn = cv::impl::{anonymous}::Set<3, 4>; VDepth = cv::impl::{anonymous}::Set<0, 2, 5>; cv::impl::{anonymous}::SizePolicy sizePolicy = cv::impl::::NONE; cv::InputArray = const cv::_InputArray&; cv::OutputArray = const cv::_OutputArray&]'

Invalid number of channels in input image:
'VScn::contains(scn)'
where
'scn' is 12
this PR also fix it

Copy link
Member

@sywangyi would you be able to post some outputs after applying the fix?

Copy link
Member

asomoza commented Oct 22, 2025
edited
Loading

tested it with a simple pipe.vae.enable_tiling() over the example code:

(削除) in fact, it doesn't work with main, but this PR also doesn't fix it, still got: (削除ここまで)

RuntimeError: The size of tensor a (2) must match the size of tensor b (4) at non-singleton dimension 2

edit: I correct myself, I did a silly mistake, this PR does fix the issue for the 5B, I'll do a comparison with main

Copy link
Member

asomoza commented Oct 23, 2025

here they are:

main (without tiling)

5bit2v__main_output.mp4

PR with pipe.vae.enable_tiling()

5bit2v__pr_output.mp4

Copy link
Contributor

@sayakpaul , seems the PR works, could you pls take a look again? Thx very much

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

@sayakpaul sayakpaul sayakpaul left review comments

At least 1 approving review is required to merge this pull request.

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

AltStyle によって変換されたページ (->オリジナル) /