Fix QAT model converting #2190

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open

veralauee wants to merge 2 commits into onnx:main

Open

veralauee wants to merge 2 commits into onnx:main from veralauee:fix_quantize

Conversation

Copy link

Convert quantization aware trained model from TF to ONNX has several issues --

QuantizeLinear and DequantizeLinear are fused into conv layer, but the downstream compiler(e.g., TensorRT) needs the Q/DQ layers to determine whether to use int8 or not. See issue QDQ node for weight tensor of Con2D undergoes Constant folding (enabled for node using tf type=FakeQuantWithMinMaxVarsPerChannel) #1972 . We need to keep Q/DQ layer unfused. QuantizeLinear and DequantizeLinear are corresponding to FakeQuantWithMinMaxVars in TensorFlow, so excluding it from can_fold in tf_utils.py can solve it.
Need to allow narrow_range in quantized nodes. TensorRT maps [min, max] to [-127, 127](see Page 12) , which needs 0 in fp32 to be mapped to 0 in int8. Also see narrow_range=True in TensorRT/tools/tensorflow-quantization here.

veralauee and others added 2 commits


 fix QAT model converting

bd9e9e5


 Merge branch 'main' into fix_quantize