-
Notifications
You must be signed in to change notification settings - Fork 6.3k
Open
@Voveka98
Description
Hi! I am trying to convert Transformer2DModel to onnx and cannot solve some problems.
I am trying to export UNet model with next architecture
Transformer2DModel( (pos_embed): PatchEmbed( (proj): Conv2d(96, 1584, kernel_size=(2, 2), stride=(2, 2)) ) (transformer_blocks): ModuleList( (0-23): 24 x BasicTransformerBlock( (norm1): LayerNorm((1584,), eps=1e-06, elementwise_affine=False) (attn1): Attention( (to_q): Linear(in_features=1584, out_features=1584, bias=True) (to_k): Linear(in_features=1584, out_features=1584, bias=True) (to_v): Linear(in_features=1584, out_features=1584, bias=True) (to_out): ModuleList( (0): Linear(in_features=1584, out_features=1584, bias=True) (1): Dropout(p=0.0, inplace=False) ) ) (norm2): LayerNorm((1584,), eps=1e-06, elementwise_affine=False) (ff): FeedForward( (net): ModuleList( (0): GELU( (proj): Linear(in_features=1584, out_features=6336, bias=True) ) (1): Dropout(p=0.0, inplace=False) (2): Linear(in_features=6336, out_features=1584, bias=True) ) ) ) ) (norm_out): LayerNorm((1584,), eps=1e-06, elementwise_affine=False) (proj_out): Linear(in_features=1584, out_features=128, bias=True) (adaln_single): AdaLayerNormSingleFlow( (emb): PixArtAlphaCombinedFlowEmbeddings( (timestep_embedder): TimestepEmbedding( (linear_1): Linear(in_features=512, out_features=1584, bias=True) (act): SiLU() (linear_2): Linear(in_features=1584, out_features=1584, bias=True) ) ) (silu): SiLU() (linear): Linear(in_features=1584, out_features=9504, bias=True) ) )
As as input for my model i use next inputs with shapes:
hidden_states -> (B, 96, Height, Width)
timestep -> (B)
resolution -> (B, 2)
aspect_ratio -> (B, 1)
In python version resolution and aspect ratio are parts of added_cond_kwargs, but since onnx doesn't support dicts i wrote a wrapper that
import torch import torch.nn as nn class Transformer2DWrapper(nn.Module): def __init__(self, model): super().__init__() self.model = model def forward( self, hidden_states: torch.Tensor, timestep: torch.Tensor, resolution: torch.Tensor, aspect_ratio: torch.Tensor, ): timestep = timestep.float() added_cond_kwargs = { "resolution": resolution, "aspect_ratio": aspect_ratio, } out = self.model( hidden_states=hidden_states, timestep=timestep, added_cond_kwargs=added_cond_kwargs, return_dict=False, ) return out[0] # sample tensor
I export to onnx with torch.onnx.export
wrapper = Transformer2DWrapper(unet_model) torch.onnx.export( wrapper, dummy_inputs, "unet_converted/model.onnx", input_names=["hidden_states", "timestep", "resolution", "aspect_ratio"], output_names=["out_sample"], dynamic_axes={ "hidden_states": {0: "batch", 2: "height", 3: "width"}, "timestep": {0: "batch"}, "resolution": {0: "batch"}, "aspect_ratio": {0: "batch"}, "out_sample": {0: "batch", 2: "height", 3: "width"}, }, opset_version=17, )
But after loading onnx version there is error with squeeze operation
Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from unet_converted/model.onnx failed:Node (/model/transformer_blocks.0/If) Op (If) [TypeInferenceError] Graph attribute inferencing failed: Node (/model/transformer_blocks.0/Squeeze) Op (Squeeze) [ShapeInferenceError] Dimension of input 1 must be 1 instead of 256
Can you please help with convertation?
Versions:
diffusers==0.27.2
torch==2.2.0+cu118
Metadata
Metadata
Assignees
Labels
No labels