Why is a dimension-adding problem occuring during the conversion from my onnx to tflite model?

Question 1

I am kinda new to model training and machine learning in general, so sorry in advance if my question mind seem weird. Last week I managed to train a model with PyTorch and got an .pth file. To use that model on android I wanted to convert it to .tflite. I learned that this conversion has to be done in two steps:

Convert the .pth model to an .onnx model
Convert the resulting model to .tf/.tflite

The conversion from .pth to .onnx worked, I can check the rather huge model in netron.app and no errors occured.

My problem happens when I'm trying to convert the .onnx model to .tflite with the script:

import onnx
import onnx2tf
import tensorflow as tf
import json
# Paths
onnx_model_path = "raft_model.onnx"
tflite_model_path = "raft_model.tflite"
tf_model_path = "raft_tf_model"
param_replacement_path = "param_replacement.json"
# Step 1: JSON for onnx2tf conversion fixes
param_replacement = {
 "operations": [
 {
 "op_name": "/fnet/layer1/layer1.1/Add",
 "param_target": "inputs",
 "param_name": "x",
 "value": "tf.transpose(x, perm=[0, 2, 3, 1])"
 },
 {
 "op_name": "/fnet/layer1/layer1.1/Add",
 "param_target": "inputs",
 "param_name": "y",
 "value": "tf.transpose(y, perm=[0, 2, 3, 1])"
 }
 ]
}
# Save JSON file
with open(param_replacement_path, "w") as f:
 json.dump(param_replacement, f, indent=4)
# Step 2: Convert ONNX → TensorFlow
onnx2tf.convert(
 input_onnx_file_path=onnx_model_path,
 output_folder_path=tf_model_path,
 keep_ncw_or_nchw_or_ncdhw_input_names=[
 "/fnet/layer1/layer1.0/relu_2/Relu_output_0",
 "/fnet/layer1/layer1.1/relu_1/Relu_output_0"
 ],
 keep_nwc_or_nhwc_or_ndhwc_input_names=["/fnet/layer1/layer1.1/Add_output_0"],
 param_replacement_file=param_replacement_path
)
# Step 3: Convert TensorFlow → TensorFlow Lite
try:
 converter = tf.lite.TFLiteConverter.from_saved_model(tf_model_path)
 converter.optimizations = [tf.lite.Optimize.DEFAULT]
 tflite_model = converter.convert()
 # Save the .tflite model
 with open(tflite_model_path, "wb") as f:
 f.write(tflite_model)
 print("Successful conversion! .tflite model saved as:", tflite_model_path)
except Exception as e:
 print("Error during conversion:", str(e))

Resulting in:

INFO: 46 / 1727
INFO: onnx_op_type: Add onnx_op_name: /cnet/layer2/layer2.0/Add
INFO: input_name.1: /cnet/layer2/layer2.0/downsample/downsample.0/Conv_output_0 shape: [1, 96, 192, 384] dtype: float32
INFO: input_name.2: /cnet/layer2/layer2.0/relu_1/Relu_output_0 shape: [1, 96, 192, 384] dtype: float32
INFO: output_name.1: /cnet/layer2/layer2.0/Add_output_0 shape: [1, 96, 192, 384] dtype: float32
DEBUG: before_op_output_shape_trans = True
INFO: tf_op_type: add
INFO: input.1.x: name: tf.math.add_13/Add:0 shape: (1, 192, 384, 96) dtype: <dtype: 'float32'> 
INFO: input.2.y: name: tf.nn.relu_13/Relu:0 shape: (1, 192, 384, 96) dtype: <dtype: 'float32'> 
INFO: output.1.output: name: tf.math.add_16/Add:0 shape: (1, 192, 384, 96) dtype: <dtype: 'float32'> 
INFO: 47 / 1727
INFO: onnx_op_type: Add onnx_op_name: /fnet/layer1/layer1.1/Add
INFO: input_name.1: /fnet/layer1/layer1.0/relu_2/Relu_output_0 shape: [2, 64, 384, 768] dtype: float32
INFO: input_name.2: /fnet/layer1/layer1.1/relu_1/Relu_output_0 shape: [2, 64, 384, 768] dtype: float32
INFO: output_name.1: /fnet/layer1/layer1.1/Add_output_0 shape: [2, 64, 384, 768] dtype: float32
DEBUG: before_op_output_shape_trans = True
ERROR: The trace log is below.
Traceback (most recent call last):
 File "/home/user/Exporter/onnx2tflite/lib/python3.10/site-packages/onnx2tf/utils/common_functions.py", line 313, in print_wrapper_func
 result = func(*args, **kwargs)
 File "/home/user/Exporter/onnx2tflite/lib/python3.10/site-packages/onnx2tf/utils/common_functions.py", line 386, in inverted_operation_enable_disable_wrapper_func
 result = func(*args, **kwargs)
 File "/home/user/Exporter/onnx2tflite/lib/python3.10/site-packages/onnx2tf/utils/common_functions.py", line 56, in get_replacement_parameter_wrapper_func
 func(*args, **kwargs)
 File "/home/user/Exporter/onnx2tflite/lib/python3.10/site-packages/onnx2tf/ops/Add.py", line 283, in make_node
 merge_two_consecutive_identical_ops_into_one(
 File "/home/user/Exporter/onnx2tflite/lib/python3.10/site-packages/onnx2tf/utils/common_functions.py", line 5475, in merge_two_consecutive_identical_ops_into_one
 tf.math.add(
 File "/home/user/Exporter/onnx2tflite/lib/python3.10/site-packages/tensorflow/python/ops/weak_tensor_ops.py", line 142, in wrapper
 return op(*args, **kwargs)
 File "/home/user/Exporter/onnx2tflite/lib/python3.10/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
 raise e.with_traceback(filtered_tb) from None
 File "/home/user/Exporter/onnx2tflite/lib/python3.10/site-packages/tf_keras/src/layers/core/tf_op_layer.py", line 119, in handle
 return TFOpLambda(op)(*args, **kwargs)
 File "/home/user/Exporter/onnx2tflite/lib/python3.10/site-packages/tf_keras/src/utils/traceback_utils.py", line 70, in error_handler
 raise e.with_traceback(filtered_tb) from None
ValueError: Exception encountered when calling layer "tf.math.add_17" (type TFOpLambda).
Dimensions must be equal, but are 768 and 384 for '{{node tf.math.add_17/Add}} = AddV2[T=DT_FLOAT](Placeholder, Placeholder_1)' with input shapes: [2,768,384,64], [2,384,768,64].
Call arguments received by layer "tf.math.add_17" (type TFOpLambda):
 • x=tf.Tensor(shape=(2, 768, 384, 64), dtype=float32)
 • y=tf.Tensor(shape=(2, 384, 768, 64), dtype=float32)
 • name='/fnet/layer1/layer1.1/Add'
ERROR: input_onnx_file_path: raft_model_fixed.onnx
ERROR: onnx_op_name: /fnet/layer1/layer1.1/Add
ERROR: Read this and deal with it. https://github.com/PINTO0309/onnx2tf#parameter-replacement
ERROR: Alternatively, if the input OP has a dynamic dimension, use the -b or -ois option to rewrite it to a static shape and try again.
ERROR: If the input OP of ONNX before conversion is NHWC or an irregular channel arrangement other than NCHW, use the -kt or -kat option.
ERROR: Also, for models that include NonMaxSuppression in the post-processing, try the -onwdt option.

Some things might seems weird, but I already tried to adjust the wrong dimensions, which don't want to (can't) be added in the Add node. As I understand it the first of the two tensors is not correctly converted from NCHW to NHWC, which results in wrongly added dimensions. (That's why there is a .json file with correct permutation, to force NHWC for those specific tensors).Or i

In the error output I also added the step before the error, in which the Add node has no problem adding two tensors. Could it be because of the increasing of the nodes from 1 to 2? Is my .onnx model faulty resulting in this mess or is it something else?

Like are my installed versions of TensorFlow and onnx(-tf) not harmonizing?

TensorFlow: 2.19.0
onnx: 1.16.1
onnx-tf: 1.27.1

Question 2

So I am still not sure, why the dimension inversion happened, but I managed to counter it with a working .json file with the format:

param_replacement = { "format_version": 1, "operations": [ { "op_name": "/fnet/layer1/layer1.1/Add", "param_target": "inputs", "param_name": "/fnet/layer1/layer1.0/relu_2/Relu_output_0", "values": [2, 384, 768, 64] } ] }

I just forced the problematic tensor to have matching dimension (Syntax from link)

CollectivesTM on Stack Overflow

Why is a dimension-adding problem occuring during the conversion from my onnx to tflite model?

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions