Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Qwen image edit doesn't always work on AMD HIP/ROCm #133

wasd-tech started this conversation in AMD HIP/ROCm Support
Discussion options

Hi! I'm a huge fan of this project, it's helped me with video generation on my RX 9070 XT. Recently, I created a workflow for Qwen image editing and wanted to use DistorchLoader to load the GGUF. The problem is that, when it comes to processing the prompt, it loads the clip but also allocates memory for the model because Power LoRA takes both the model and the clip as input. This behaviour completely destroys the workflow because it does not try to use the allocated memory, but instead tries to reload the model. Ultimately, at the ksampler, it just hangs indefinitely.

You must be logged in to vote

Replies: 9 comments

Comment options

I recently reverted a few changes that was causing some memory issues.

Can you please try again, and if you get the same results, please post a workflow and I will attempt to replicate your issue.

Cheers!

You must be logged in to vote
0 replies
Comment options

My workflow:
Test.json

Settings:
pytorch version: 2.9.0.dev20250908+rocm6.4
AMD arch: gfx1201
ROCm version: (6, 4)

The memory situation when it doesn't work:
Image

The memory situation when it works:
Image

I tried the update but unfortunately it didn't change anything. I can tell that the gpu actually crashes

You must be logged in to vote
0 replies
Comment options

I think that this is relevant, on the power lora:

[MultiGPU Initialization] current_device set to: cuda:0
[MultiGPU_DisTorch2] Successfully patched ModelPatcher.partially_load
gguf qtypes: F32 (1087), BF16 (6), Q5_K (28), Q4_K (580), Q6_K (232)
model weight dtype torch.bfloat16, manual cast: None
model_type FLUX
[MultiGPU_DisTorch2] Settings changed for model 9613af3a. Previous settings hash: None, New settings hash: 0c1d17df681ee861c5929ebf28e73a7361eec3e5dd27031928de093b6cc63b23. Forcing reload.
gguf qtypes: F32 (1087), BF16 (6), Q5_K (28), Q4_K (580), Q6_K (232)
model weight dtype torch.bfloat16, manual cast: None
model_type FLUX
[MultiGPU_DisTorch2] Full allocation string: #cuda:0;15.0;cpu

And later on the ksampler:

[MultiGPU_DisTorch2] Using static allocation for model 9613af3a
[MultiGPU_DisTorch2] Compute Device: cuda:0
[MultiGPU_DisTorch2] Expert String Examples:
 Direct(byte) Mode - cuda:0,500mb;cuda:1,3.0g;cpu,5gb* -> '*' cpu = over/underflow device, put 0.50gb on cuda0, 3.00gb on cuda1, and 5.00gb (or the rest) on cpu
 Ratio(%) Mode - cuda:0,8%;cuda:1,8%;cpu,4% -> 8:8:4 ratio, put 40% on cuda0, 40% on cuda1, and 20% on cpu
===============================================
 DisTorch2 Model Virtual VRAM Analysis
===============================================
Object Role Original(GB) Total(GB) Virt(GB)
-----------------------------------------------
cuda:0 recip 15.92GB 30.92GB +15.00GB
cpu donor 30.95GB 15.95GB -15.00GB
-----------------------------------------------
model model 12.17GB 0.00GB -15.00GB
[MultiGPU_DisTorch2] Final Allocation String: cuda:0,0.0000;cpu,0.4847
==================================================
 DisTorch2 Model Device Allocations
==================================================
Device VRAM GB Dev % Model GB Dist %
--------------------------------------------------
cuda:0 15.92 0.0% 0.00 0.0%
cpu 30.95 48.5% 15.00 100.0%
--------------------------------------------------
 DisTorch2 Model Layer Distribution
--------------------------------------------------
Layer Type Layers Memory (MB) % Total
--------------------------------------------------
Linear 846 12568.20 100.0%
RMSNorm 241 0.07 0.0%
LayerNorm 241 0.00 0.0%
--------------------------------------------------
DisTorch2 Model Final Device/Layer Assignments
--------------------------------------------------
Device Layers Memory (MB) % Total
--------------------------------------------------
cuda:0 (<0.01%) 484 0.83 0.0%
cpu 844 12567.43 100.0%
--------------------------------------------------
You must be logged in to vote
0 replies
Comment options

Hey, @wasd-tech - I am @italbar over on Discord if you are on that platform and are interested in likely more timely interactions.

Still trying to get your workflow going. The CLIP .gguf I am using is giving the workflow fits. I'll post once I can get a standard generation to run.

You must be logged in to vote
0 replies
Comment options

Of course I can join a discord, just leave the link.

The CLIP .gguf I am using is giving the workflow fits

You need another file to use qwen2.5vl with the GGUF format:https://huggingface.co/QuantStack/Qwen-Image-Edit-GGUF

I'm afraid this is a problem exclusively related to AMD. If you come to this conclusion, do not hesitate to ask me to do tests and other things.

You must be logged in to vote
0 replies
Comment options

Hey, @wasd-tech I know you have seen some variable results on your end. I am happy to keep this issue open if there is something we can work on that looks like a MultGPU-specific issue. 😸

You must be logged in to vote
0 replies
Comment options

@pollockjj thanks a lot for the help 👍 , After some testing it seems that it's qwen image edit the problem and not the power lora, so I changed the title of the issue.

You must be logged in to vote
0 replies
Comment options

Since this seems a problem specific with my AMD gpu (ROCm fails to allocate and use so much memory and only with a virtual vram of about 10 it works) I suggest to leave this open for future reference, so people will not open multiple issues on the same thing. I will continually test and see if something changes, maybe with the major release of ROCm 7.0 or different versions of pytorch.

@pollockjj Are you okay with this?

You must be logged in to vote
0 replies
Comment options

Migrating this to a discussion - that way it remains a open, ongoing resource and can remain open indefinitely.

Hope that works @wasd-tech.

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation expected/normal behavior This issue describes normal operation and is not currently targeted for change
Converted from issue

This discussion was converted from issue #106 on October 16, 2025 07:41.

AltStyle によって変換されたページ (->オリジナル) /