|
| 1 | +<!--Copyright 2025 The HuggingFace Team. All rights reserved. |
| 2 | + |
| 3 | +Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with |
| 4 | +the License. You may obtain a copy of the License at |
| 5 | + |
| 6 | +http://www.apache.org/licenses/LICENSE-2.0 |
| 7 | + |
| 8 | +Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on |
| 9 | +an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the |
| 10 | +specific language governing permissions and limitations under the License. |
| 11 | +--> |
| 12 | + |
| 13 | +[[open-in-colab]] |
| 14 | + |
| 15 | +# Models |
| 16 | + |
| 17 | +A diffusion model relies on a few individual models working together to generate an output. These models are responsible for denoising, encoding inputs, and decoding latents into the actual outputs. |
| 18 | + |
| 19 | +This guide will show you how to load models. |
| 20 | + |
| 21 | +## Loading a model |
| 22 | + |
| 23 | +All models are loaded with the [`~ModelMixin.from_pretrained`] method, which downloads and caches the latest model version. If the latest files are available in the local cache, [`~ModelMixin.from_pretrained`] reuses files in the cache. |
| 24 | + |
| 25 | +Pass the `subfolder` argument to [`~ModelMixin.from_pretrained`] to specify where to load the model weights from. Omit the `subfolder` argument if the repository doesn't have a subfolder structure or if you're loading a standalone model. |
| 26 | + |
| 27 | +```py |
| 28 | +from diffusers import QwenImageTransformer2DModel |
| 29 | + |
| 30 | +model = QwenImageTransformer2DModel.from_pretrained("Qwen/Qwen-Image", subfolder="transformer") |
| 31 | +``` |
| 32 | + |
| 33 | +## AutoModel |
| 34 | + |
| 35 | +[`AutoModel`] detects the model class from a `model_index.json` file or a model's `config.json` file. It fetches the correct model class from these files and delegates the actual loading to the model class. [`AutoModel`] is useful for automatic model type detection without needing to know the exact model class beforehand. |
| 36 | + |
| 37 | +```py |
| 38 | +from diffusers import AutoModel |
| 39 | + |
| 40 | +model = AutoModel.from_pretrained( |
| 41 | + "Qwen/Qwen-Image", subfolder="transformer" |
| 42 | +) |
| 43 | +``` |
| 44 | + |
| 45 | +## Model data types |
| 46 | + |
| 47 | +Use the `torch_dtype` argument in [`~ModelMixin.from_pretrained`] to load a model with a specific data type. This allows you to load a model in a lower precision to reduce memory usage. |
| 48 | + |
| 49 | +```py |
| 50 | +import torch |
| 51 | +from diffusers import QwenImageTransformer2DModel |
| 52 | + |
| 53 | +model = QwenImageTransformer2DModel.from_pretrained( |
| 54 | + "Qwen/Qwen-Image", |
| 55 | + subfolder="transformer", |
| 56 | + torch_dtype=torch.bfloat16 |
| 57 | +) |
| 58 | +``` |
| 59 | + |
| 60 | +[nn.Module.to](https://docs.pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.to) can also convert to a specific data type on the fly. However, it converts *all* weights to the requested data type unlike `torch_dtype` which respects `_keep_in_fp32_modules`. This argument preserves layers in `torch.float32` for numerical stability and best generation quality (see example [_keep_in_fp32_modules](https://github.com/huggingface/diffusers/blob/f864a9a352fa4a220d860bfdd1782e3e5af96382/src/diffusers/models/transformers/transformer_wan.py#L374)) |
| 61 | + |
| 62 | +```py |
| 63 | +from diffusers import QwenImageTransformer2DModel |
| 64 | + |
| 65 | +model = QwenImageTransformer2DModel.from_pretrained( |
| 66 | + "Qwen/Qwen-Image", subfolder="transformer" |
| 67 | +) |
| 68 | +model = model.to(dtype=torch.float16) |
| 69 | +``` |
| 70 | + |
| 71 | +## Device placement |
| 72 | + |
| 73 | +Use the `device_map` argument in [`~ModelMixin.from_pretrained`] to place a model on an accelerator like a GPU. It is especially helpful where there are multiple GPUs. |
| 74 | + |
| 75 | +Diffusers currently provides three options to `device_map` for individual models, `"cuda"`, `"balanced"` and `"auto"`. Refer to the table below to compare the three placement strategies. |
| 76 | + |
| 77 | +| parameter | description | |
| 78 | +|---|---| |
| 79 | +| `"cuda"` | places pipeline on a supported accelerator (CUDA) | |
| 80 | +| `"balanced"` | evenly distributes pipeline on all GPUs | |
| 81 | +| `"auto"` | distribute model from fastest device first to slowest | |
| 82 | + |
| 83 | +Use the `max_memory` argument in [`~ModelMixin.from_pretrained`] to allocate a maximum amount of memory to use on each device. By default, Diffusers uses the maximum amount available. |
| 84 | + |
| 85 | +```py |
| 86 | +import torch |
| 87 | +from diffusers import QwenImagePipeline |
| 88 | + |
| 89 | +max_memory = {0: "16GB", 1: "16GB"} |
| 90 | +pipeline = QwenImagePipeline.from_pretrained( |
| 91 | + "Qwen/Qwen-Image", |
| 92 | + torch_dtype=torch.bfloat16, |
| 93 | + device_map="cuda", |
| 94 | + max_memory=max_memory |
| 95 | +) |
| 96 | +``` |
| 97 | + |
| 98 | +The `hf_device_map` attribute allows you to access and view the `device_map`. |
| 99 | + |
| 100 | +```py |
| 101 | +print(transformer.hf_device_map) |
| 102 | +# {'': device(type='cuda')} |
| 103 | +``` |
| 104 | + |
| 105 | +## Saving models |
| 106 | + |
| 107 | +Save a model with the [`~ModelMixin.save_pretrained`] method. |
| 108 | + |
| 109 | +```py |
| 110 | +from diffusers import QwenImageTransformer2DModel |
| 111 | + |
| 112 | +model = QwenImageTransformer2DModel.from_pretrained("Qwen/Qwen-Image", subfolder="transformer") |
| 113 | +model.save_pretrained("./local/model") |
| 114 | +``` |
| 115 | + |
| 116 | +For large models, it is helpful to use `max_shard_size` to save a model as multiple shards. A shard can be loaded faster and save memory (refer to the [parallel loading](./loading#parallel-loading) docs for more details), especially if there is more than one GPU. |
| 117 | + |
| 118 | +```py |
| 119 | +model.save_pretrained("./local/model", max_shard_size="5GB") |
| 120 | +``` |
0 commit comments