fix scale_shift_factor being on cpu for wan and ltx #12347

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

DN6 merged 7 commits into huggingface:main from vladmandic:main

Oct 5, 2025

Merged

fix scale_shift_factor being on cpu for wan and ltx #12347

DN6 merged 7 commits into huggingface:main from vladmandic:main

Oct 5, 2025

+7 −5

Conversation

vladmandic

Copy link

Contributor

@vladmandic vladmandic commented Sep 18, 2025 •

edited

Loading

wan transformer block creates scale_shift_table on cpu and then adds it regardless of where temb tensor actually resides
and this causes typical cpu-vs-cuda device mismatch

│ 473 │ │ │ shift_msa, scale_msa, gate_msa, c_shift_msa, c_scale_msa, c_gate_msa = ( │
│❱ 474 │ │ │ │ self.scale_shift_table + temb.float() │
│ 475 │ │ │ ).chunk(6, dim=1) │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

cc @sayakpaul @yiyixuxu @a-r-r-o-w @DN6

@vladmandic


 wan fix scale_shift_factor being on cpu

a2e8ff1

@vladmandic vladmandic mentioned this pull request

Sep 18, 2025

[Issue]: Error about tensor expected in same place not on cuda0 and cpu, when using wan 2.2 14b vladmandic/sdnext#4209

Open

2 tasks

@vladmandic


 Merge branch 'main' into main

5bef842

@DN6

Copy link

Collaborator

DN6 commented Sep 24, 2025

@vladmandic Do you have a snippet to reproduce the error?

@vladmandic

Copy link

Contributor Author

vladmandic commented Sep 24, 2025

@vladmandic Do you have a snippet to reproduce the error?

I dont have easy reproduction as it happens relatively randomly when offloading is enabled and on low-memory systems.
but if you look at other pipelines (e.g. sana, pixart, etc.) that implement similar methods, they all guard against this.
and i dont see any risk/side-effects of the device cast.

vladmandic added 2 commits

September 25, 2025 10:41

@vladmandic


 Merge branch 'main' into main

d72b50e

@vladmandic


 apply device cast to ltx transformer

6e08b7b

@vladmandic vladmandic changed the title ~~(削除) wan fix scale_shift_factor being on cpu (削除ここまで)~~ (追記) fix scale_shift_factor being on cpu for wan and ltx (追記ここまで)

Sep 25, 2025

@vladmandic

Copy link

Contributor Author

vladmandic commented Sep 25, 2025

update: same issue occurs for some of my users for ltxvideo so i've extended the pr with the same fix in that pipeline.
see vladmandic/sdnext#4223 for details

@vladmandic vladmandic mentioned this pull request

Sep 25, 2025

[Issue]: LTX fails with tensor device error when model spills into shared GPU memory vladmandic/sdnext#4223

Open

2 tasks

@DN6


 Merge branch 'main' into main

1857c56

@HuggingFaceDocBuilderDev

Copy link

HuggingFaceDocBuilderDev commented Sep 29, 2025

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@vladmandic

Copy link

Contributor Author

vladmandic commented Sep 30, 2025

gentle nudge - any particular reason why this pr is not merged?
i understand lack of clear reproduction, but on the other hand, same guards exist in other pipelines plus its clear that scale_shift_factor tensor is created on cpu so even if it works, the operation is performed on cpu which is non-desirable.

@DN6

Copy link

Collaborator

DN6 commented Oct 2, 2025

Hi @vladmandic sorry for the delay. It's just that since the introduction of hooks for offloading and casting, we're trying to avoid direct assignment of device, dtype in the models in case of unexpected behaviour, which is why I asked for the repro to check if there's something else going on.

Could you just share the type of offloading you're using? Is it group offloading?

@vladmandic

Copy link

Contributor Author

vladmandic commented Oct 2, 2025 •

edited

Loading

@DN6 i understand the general desire, however:

most pipelines already do this exact thing, this is not new. i'm just adding it to pipelines which dont have it!
scale_shift_factor is a manually created tensor on cpu and there is no mechanism anywhere that manages its placement. its not part of the model so it can be managed by hooks.
i'm not assigning to manually specified device, i'm assigning just to same as other part of the equation
this is not readily reproducible because it works on torch-cuda, but it fails on other flavors of torch (zluda, rocm) i don't know why, but imo correct behavior is to FAIL because like i said, right now there is NO mechanism in the codebase that would handle this
sdnext uses custom offloading, not model/sequential/group from diffusers. there are reasons why we felt we needed to develop our own - if you want, i can do a writeup, but thats besides the poinf for this pr. fyi, its onloading is also based on accelerate hooks, its just decision making when to offload are very different.

if you don't want to merge this pr, then please propose an alternative as right now my users are blocked.

@vladmandic


 Merge branch 'main' into main

46293d9

@DN6

Copy link

Collaborator

DN6 commented Oct 4, 2025

@bot /style

@github-actions GitHub Actions

Copy link

Contributor

github-actions bot commented Oct 4, 2025 •

edited

Loading

Style bot fixed some files and pushed the changes.

@github-actions


 Apply style fixes

4b97719

DN6

DN6 approved these changes

Oct 5, 2025

View reviewed changes

@DN6

Copy link

Collaborator

DN6 commented Oct 5, 2025 •

edited

Loading

@vladmandic Parameters in the module are subject to offloading hooks

diffusers/src/diffusers/hooks/group_offloading.py

Lines 628 to 629 in 941ac9c

parameters = _gather_parameters_with_no_group_offloading_parent(module, modules_with_group_offloading)

buffers = _gather_buffers_with_no_group_offloading_parent(module, modules_with_group_offloading)

The direct casts in Pixart and SANA were added before offloading hooks were introduced and no other instances use this type of casting (it shouldn't be necessary)

Based on your description it seems to be an issue with custom offloading logic applied in non-CUDA environments. This really feels like an edge case to me and the issue doesn't seem related to diffusers code.

However, I've run the offloading tests on the models and they're passing so I'm okay to merge this, but FYI, there is a chance we remove direct casts from models in the future if we see any issues. I'll will give you a heads up if that's the case to avoid breaking anything downstream.

@DN6 DN6 merged commit 2b7deff into huggingface:main

Oct 5, 2025

9 of 10 checks passed

Labels

None yet

3 participants

@vladmandic @DN6 @HuggingFaceDocBuilderDev

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix scale_shift_factor being on cpu for wan and ltx #12347

fix scale_shift_factor being on cpu for wan and ltx #12347

Conversation

@vladmandic vladmandic commented Sep 18, 2025 •

edited

Loading

Uh oh!

Uh oh!

DN6 commented Sep 24, 2025

Uh oh!

vladmandic commented Sep 24, 2025

Uh oh!

vladmandic commented Sep 25, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Sep 29, 2025

Uh oh!

vladmandic commented Sep 30, 2025

Uh oh!

DN6 commented Oct 2, 2025

Uh oh!

vladmandic commented Oct 2, 2025 •

edited

Loading

Uh oh!

Uh oh!

DN6 commented Oct 4, 2025

Uh oh!

github-actions bot commented Oct 4, 2025 •

edited

Loading

Uh oh!

Uh oh!

DN6 commented Oct 5, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fix scale_shift_factor being on cpu for wan and ltx #12347

fix scale_shift_factor being on cpu for wan and ltx #12347

Conversation

@vladmandic vladmandic commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DN6 commented Sep 24, 2025

Uh oh!

vladmandic commented Sep 24, 2025

Uh oh!

vladmandic commented Sep 25, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Sep 29, 2025

Uh oh!

vladmandic commented Sep 30, 2025

Uh oh!

DN6 commented Oct 2, 2025

Uh oh!

vladmandic commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DN6 commented Oct 4, 2025

Uh oh!

github-actions bot commented Oct 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DN6 commented Oct 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

@vladmandic vladmandic commented Sep 18, 2025 •

edited

Loading

vladmandic commented Oct 2, 2025 •

edited

Loading

github-actions bot commented Oct 4, 2025 •

edited

Loading

DN6 commented Oct 5, 2025 •

edited

Loading