[docs] Parallel loading of shards #12135

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

sayakpaul merged 5 commits into huggingface:main from stevhliu:parallel-loading

Aug 14, 2025

Merged

[docs] Parallel loading of shards #12135

sayakpaul merged 5 commits into huggingface:main from stevhliu:parallel-loading

Aug 14, 2025

Conversation

stevhliu

Copy link

Member

@stevhliu stevhliu commented Aug 13, 2025

Docs to go along with #12028.

I didn't mention #11904 since it doesn't appear to require any action on a user's behalf and it just works in the background. It's a cool design/implementation detail though and I think it'd be pretty interesting to maybe do a blog post about optimizations like this.

@stevhliu


 initial

b0dd3b7

@stevhliu stevhliu requested a review from sayakpaul

August 13, 2025 03:47

@HuggingFaceDocBuilderDev

Copy link

HuggingFaceDocBuilderDev commented Aug 13, 2025

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

sayakpaul

sayakpaul reviewed

Aug 13, 2025

View reviewed changes

docs/source/en/using-diffusers/loading.md

pipeline = DiffusionPipeline.from_pretrained(

"Wan-AI/Wan2.2-I2V-A14B-Diffusers",

torch_dtype=torch.bfloat16,

device_map="cuda"

Copy link

Member

@sayakpaul sayakpaul Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to talk a bit about device_map? The motivation behind passing device_map essentially comes from this PR: #11904. I have also tried providing a bit of motivation behind adding it at the pipeline-level here: #12122 (comment).

stevhliu and others added 2 commits

August 13, 2025 10:52

@stevhliu


 feedback

e06b21f

@sayakpaul


 Merge branch 'main' into parallel-loading

9a9fd95

@sayakpaul

Copy link

Member

sayakpaul commented Aug 14, 2025

I didn't mention #11904 since it doesn't appear to require any action on a user's behalf and it just works in the background. It's a cool design/implementation detail though and I think it'd be pretty interesting to maybe do a blog post about optimizations like this.

@stevhliu it comes to fruition when we initialize the model directly on the accelerator device through device_map instead of doing to("cuda"). So, it could be important to mention I guess.

sayakpaul

sayakpaul approved these changes

Aug 14, 2025

View reviewed changes

@sayakpaul


 Merge branch 'main' into parallel-loading

2f42f08

sayakpaul

sayakpaul approved these changes

Aug 14, 2025

View reviewed changes

docs/source/en/using-diffusers/loading.md Outdated Show resolved Hide resolved

@sayakpaul


 Update docs/source/en/using-diffusers/loading.md

ca661ef

@sayakpaul sayakpaul merged commit 421ee07 into huggingface:main

Aug 14, 2025

1 check passed

@stevhliu stevhliu deleted the parallel-loading branch

August 14, 2025 16:47

Labels

None yet

3 participants

@stevhliu @HuggingFaceDocBuilderDev @sayakpaul

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[docs] Parallel loading of shards #12135

[docs] Parallel loading of shards #12135

Uh oh!

Conversation

@stevhliu stevhliu commented Aug 13, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Aug 13, 2025

Uh oh!

@sayakpaul sayakpaul Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

sayakpaul commented Aug 14, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!