Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[Auto] update AuraFlow transformer docstrings (cluster-9591-5): merged 1 of 5 PRs #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
evalstate wants to merge 4 commits into main
base: main
Choose a base branch
Loading
from merge-cluster-cluster-9591年5月20日260427125508
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 13 additions & 11 deletions src/diffusers/models/transformers/auraflow_transformer_2d.py
View file Open in desktop
Original file line number Diff line number Diff line change
Expand Up @@ -202,7 +202,7 @@ class AuraFlowJointTransformerBlock(nn.Module):
* No bias in the attention blocks
* Most LayerNorms are in FP32

Parameters:
Args:
dim (`int`): The number of channels in the input and output.
num_attention_heads (`int`): The number of heads to use for multi-head attention.
attention_head_dim (`int`): The number of channels in each head.
Expand Down Expand Up @@ -279,21 +279,21 @@ class AuraFlowTransformer2DModel(ModelMixin, AttentionMixin, ConfigMixin, PeftAd
r"""
A 2D Transformer model as introduced in AuraFlow (https://blog.fal.ai/auraflow/).

Parameters:
Args:
sample_size (`int`): The width of the latent images. This is fixed during training since
it is used to learn a number of position embeddings.
patch_size (`int`): Patch size to turn the input data into small patches.
in_channels (`int`, *optional*, defaults to 4): The number of channels in the input.
num_mmdit_layers (`int`, *optional*, defaults to 4): The number of layers of MMDiT Transformer blocks to use.
num_single_dit_layers (`int`, *optional*, defaults to 32):
in_channels (`int`, *optional*, defaults to `4`): The number of channels in the input.
num_mmdit_layers (`int`, *optional*, defaults to `4`): The number of layers of MMDiT Transformer blocks to use.
num_single_dit_layers (`int`, *optional*, defaults to `32`):
The number of layers of Transformer blocks to use. These blocks use concatenated image and text
representations.
attention_head_dim (`int`, *optional*, defaults to 256): The number of channels in each head.
num_attention_heads (`int`, *optional*, defaults to 12): The number of heads to use for multi-head attention.
attention_head_dim (`int`, *optional*, defaults to `256`): The number of channels in each head.
num_attention_heads (`int`, *optional*, defaults to `12`): The number of heads to use for multi-head attention.
joint_attention_dim (`int`, *optional*): The number of `encoder_hidden_states` dimensions to use.
caption_projection_dim (`int`): Number of dimensions to use when projecting the `encoder_hidden_states`.
out_channels (`int`, defaults to 4): Number of output channels.
pos_embed_max_size (`int`, defaults to 1024): Maximum positions to embed from the image latents.
out_channels (`int`, defaults to `4`): Number of output channels.
pos_embed_max_size (`int`, defaults to `1024`): Maximum positions to embed from the image latents.
"""

_no_split_modules = ["AuraFlowJointTransformerBlock", "AuraFlowSingleTransformerBlock", "AuraFlowPatchEmbed"]
Expand Down Expand Up @@ -368,8 +368,10 @@ def __init__(
# Copied from diffusers.models.unets.unet_2d_condition.UNet2DConditionModel.fuse_qkv_projections with FusedAttnProcessor2_0->FusedAuraFlowAttnProcessor2_0
def fuse_qkv_projections(self):
"""
Enables fused QKV projections. For self-attention modules, all projection matrices (i.e., query, key, value)
are fused. For cross-attention modules, key and value projection matrices are fused.
Enables fused QKV projections.

For self-attention modules, all projection matrices (i.e., query, key, value) are fused.
For cross-attention modules, only key and value projection matrices are fused.

> [!WARNING] > This API is 🧪 experimental.
"""
Expand Down
Loading

AltStyle によって変換されたページ (->オリジナル) /