-
Notifications
You must be signed in to change notification settings - Fork 432
Has anyone had any success running WAN2.2 TI2V 5B on a GPU with 16 GB VRAM or less at resolutions lower than 832x480? #868
-
I've been struggling with this model, trying to get it to produce usable results while still staying within the 16 GB boundary of my RX 7600 XT (ROCm 6.4.3, Linux). It's the VAE process that's the problem, as an 832x480 resolution requires 20895.80 MB VAE compute buffer size for a 2 second video. If I run --vae-on-cpu, it works, but takes 27 min 15 seconds for a 2 second clip, which isn't practical or viable. sd.cpp appears to ignore --vae-tiling with WAN models, so that's of no help.
WAN2.1 has the same problem with the VAE buffer size, but can be worked around by lowering the resolution down to 416x240. It's fully capable but much slower than 2.2, though (an 8 second clip at this resolution takes ~23 minutes), which is why I'd like to get 2.2 working.
Here's the problem. When attempting to do the same half resolution (416x240) on 2.2, it produces garbage results while 2.1's are clear. I can do IMG2VID with an -i reference image at the same res, but the video is still a garbled mess, even though the reference image subject is somewhat recognizable. I've tried every combination of options I can think of, from varying --steps and --cfg values, to --flow-shift settings, to combinations of different --sampling-method and --scheduler, to --diffusion-conv-direct and/or --vae-conv-direct to no avail. WAN2.2 simply doesn't "like" this resolution.
If I up the resolution to 3/4 size (624x360) with just a text prompt, I get valid video output, but the video itself is "cropped" like WAN is still trying to generate a video at 832x480 and cutting off the missing pixels, which renders it useless. If I try an IMG2VID at this resolution, I get a crash/segfault regardless of the -i image size:
[INFO ] stable-diffusion.cpp:2618 - IMG2VID
[INFO ] ggml_extend.hpp:1648 - wan_vae offload params (1344.24 MB, 196 tensors) to runtime backend (ROCm0), taking 0.41s
/home/b/Reckless/AlernativeBuilds/2025-09-24/stable-diffusion.cpp/ggml/src/ggml.c:3436: GGML_ASSERT(ggml_nelements(a) == ne0*ne1*ne2*ne3) failed
/home/b/Reckless/AlernativeBuilds/2025-09-24/plonkus3/bin/sd(+0x683376) [0x56382a921376]
/home/b/Reckless/AlernativeBuilds/2025-09-24/plonkus3/bin/sd(+0x6837b3) [0x56382a9217b3]
/home/b/Reckless/AlernativeBuilds/2025-09-24/plonkus3/bin/sd(+0x683950) [0x56382a921950]
/home/b/Reckless/AlernativeBuilds/2025-09-24/plonkus3/bin/sd(+0x68a2e7) [0x56382a9282e7]
/home/b/Reckless/AlernativeBuilds/2025-09-24/plonkus3/bin/sd(+0x13d81c) [0x56382a3db81c]
/home/b/Reckless/AlernativeBuilds/2025-09-24/plonkus3/bin/sd(+0x13e628) [0x56382a3dc628]
/home/b/Reckless/AlernativeBuilds/2025-09-24/plonkus3/bin/sd(+0x13f5f1) [0x56382a3dd5f1]
/home/b/Reckless/AlernativeBuilds/2025-09-24/plonkus3/bin/sd(+0x13fc85) [0x56382a3ddc85]
/home/b/Reckless/AlernativeBuilds/2025-09-24/plonkus3/bin/sd(+0x1016a8) [0x56382a39f6a8]
/home/b/Reckless/AlernativeBuilds/2025-09-24/plonkus3/bin/sd(+0x101ddd) [0x56382a39fddd]
/home/b/Reckless/AlernativeBuilds/2025-09-24/plonkus3/bin/sd(+0x106926) [0x56382a3a4926]
/home/b/Reckless/AlernativeBuilds/2025-09-24/plonkus3/bin/sd(+0xe650f) [0x56382a38450f]
/home/b/Reckless/AlernativeBuilds/2025-09-24/plonkus3/bin/sd(+0x46248) [0x56382a2e4248]
/usr/lib/libc.so.6(+0x27675) [0x7f3cec827675]
/usr/lib/libc.so.6(__libc_start_main+0x89) [0x7f3cec827729]
/home/b/Reckless/AlernativeBuilds/2025-09-24/plonkus3/bin/sd(+0x4abc5) [0x56382a2e8bc5]
zsh: IOT instruction (core dumped) /home/b/Reckless/AlernativeBuilds/2025-09-24/plonkus3/bin/sd -M vid_gen
This is especially weird because that crash doesn't happen at 416x240.
So, is anyone else experiencing these types of problems with lower resolutions/lower VRAM cards, or do you know of some way to lower the VAE compute buffer size to < 16 GB?
For additional reference, I'm running the Wan2.2-TI2V-5B-Q8_0.gguf model, and have tried both the fp16 and the Q8_0 versions of the umt5_xxl text encoder.
Beta Was this translation helpful? Give feedback.
All reactions
Replies: 1 comment
-
So... I managed to figure a couple of things out through a ridiculously long series of tests and web searches:
- Using
--diffusion-fawith ROCm is absolutely necessary to get viable, non-scrambled output.--clip-on-cpuis needed to avoid black output images, but I've been using that with Chroma as kind of a default, so I forgot to check it earlier. - WAN2.2 really only "likes" 16:9 ratio images, especially those evenly divisible by 8. Why 832x480, which is a weird 26:15 ratio, is stated as being a recommended resolution in varying online docs is beyond me. Perhaps it was mistaken as a "480p" resolution? What most folks call 480p refers back to the old 640x480 4:3 aspect ratio or the oddball DVD 3:2 res of 720x480. Closest 16:9 ratio res would be 832x468, although it's not a clean div-by-8 value. Nearest clean values above and below this would be either 896x504 or 768x432.
- 640x360 is the largest 16:9 resolution I can use where the VAE process can still be run on VRAM. Using this also appears to solve the cropping problem I noted above. But, that's really prompt dependent... gotta specify things like "wide shot, centered on subject", etc. VAE VRAM usage for this res reports
wan_vae compute buffer size: 12235.26 MB(VRAM)in the sd.cpp log output, butamdgpu_topreports well over 14,000 MiB in use while generating a 24 FPS/121 Frame/5 Second video. - 512x288 is the largest 16:9 resolution where I can use IMG2VID. Otherwise I still get the crash/segfault as described above.
- I have not tested IMG2VID on any res higher than 640x360. (Really no point, since my primary goal is to see what I can run with VAE in VRAM).
Does anyone else have any info/insights they can share for additional reference?
Beta Was this translation helpful? Give feedback.