Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Add quantized yolov4 model #521

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
XinyuYe-Intel wants to merge 8 commits into onnx:main
base: main
Choose a base branch
Loading
from XinyuYe-Intel:xinyuye/yolov4

Conversation

Copy link

@XinyuYe-Intel XinyuYe-Intel commented Apr 29, 2022

YOLOv4

Description

YOLOv4 optimizes the speed and accuracy of object detection. It is two times faster than EfficientDet. It improves YOLOv3's AP and FPS by 10% and 12%, respectively, with mAP50 of 52.32 on the COCO 2017 dataset and FPS of 41.7 on Tesla 100.

Model

Model Download Download (with sample test data) ONNX version Opset version Accuracy
YOLOv4 251 MB 236 MB 1.6 11 mAP of 0.5733
YOLOv4-int8 63.0 MB 61.8 MB 1.9.0 11 mAP of 0.570

Compared with the YOLOv4, YOLOv4-int8's mAP decline is 0.33% and performance improvement is 1.59x.

Note the performance depends on the test hardware.

Performance data here is collected with Intel® Xeon® Platinum 8280 Processor, 1s 4c per instance, CentOS Linux 8.3, data batch size is 1.

Source

Tensorflow YOLOv4 => ONNX YOLOv4

Inference

Conversion

A tutorial for the conversion process can be found in the conversion notebook.

Validation of the converted model and a graph representation of it can be found in the validation notebook.

Running inference

A tutorial for running inference using onnxruntime can be found in the inference notebook.

Input to model

This model expects input shapes of (1, 416, 416, 3). Each dimension represents the following variables: (batch_size, height, width, channels).

Preprocessing steps

The following code shows how preprocessing is done. For more information and an example on how preprocess is done, please visit the inference notebook.

import numpy as np
import cv2
# this function is from tensorflow-yolov4-tflite/core/utils.py
def image_preprocess(image, target_size, gt_boxes=None):
 ih, iw = target_size
 h, w, _ = image.shape
 scale = min(iw/w, ih/h)
 nw, nh = int(scale * w), int(scale * h)
 image_resized = cv2.resize(image, (nw, nh))
 image_padded = np.full(shape=[ih, iw, 3], fill_value=128.0)
 dw, dh = (iw - nw) // 2, (ih-nh) // 2
 image_padded[dh:nh+dh, dw:nw+dw, :] = image_resized
 image_padded = image_padded / 255.
 if gt_boxes is None:
 return image_padded
 else:
 gt_boxes[:, [0, 2]] = gt_boxes[:, [0, 2]] * scale + dw
 gt_boxes[:, [1, 3]] = gt_boxes[:, [1, 3]] * scale + dh
 return image_padded, gt_boxes
# input
input_size = 416
original_image = cv2.imread("input.jpg")
original_image = cv2.cvtColor(original_image, cv2.COLOR_BGR2RGB)
original_image_size = original_image.shape[:2]
image_data = image_preprocess(np.copy(original_image), [input_size, input_size])
image_data = image_data[np.newaxis, ...].astype(np.float32)

Output of model

Output shape: (1, 52, 52, 3, 85)

There are 3 output layers. For each layer, there are 255 outputs: 85 values per anchor, times 3 anchors.

The 85 values of each anchor consists of 4 box coordinates describing the predicted bounding box (x, y, h, w), 1 object confidence, and 80 class confidences. Here is the class list.

Postprocessing steps

The following postprocessing steps are modified from the hunglc007/tensorflow-yolov4-tflite repository.

from scipy import special
import colorsys
import random
def get_anchors(anchors_path, tiny=False):
 '''loads the anchors from a file'''
 with open(anchors_path) as f:
 anchors = f.readline()
 anchors = np.array(anchors.split(','), dtype=np.float32)
 return anchors.reshape(3, 3, 2)
def postprocess_bbbox(pred_bbox, ANCHORS, STRIDES, XYSCALE=[1,1,1]):
 '''define anchor boxes'''
 for i, pred in enumerate(pred_bbox):
 conv_shape = pred.shape
 output_size = conv_shape[1]
 conv_raw_dxdy = pred[:, :, :, :, 0:2]
 conv_raw_dwdh = pred[:, :, :, :, 2:4]
 xy_grid = np.meshgrid(np.arange(output_size), np.arange(output_size))
 xy_grid = np.expand_dims(np.stack(xy_grid, axis=-1), axis=2)
 xy_grid = np.tile(np.expand_dims(xy_grid, axis=0), [1, 1, 1, 3, 1])
 xy_grid = xy_grid.astype(np.float)
 pred_xy = ((special.expit(conv_raw_dxdy) * XYSCALE[i]) - 0.5 * (XYSCALE[i] - 1) + xy_grid) * STRIDES[i]
 pred_wh = (np.exp(conv_raw_dwdh) * ANCHORS[i])
 pred[:, :, :, :, 0:4] = np.concatenate([pred_xy, pred_wh], axis=-1)
 pred_bbox = [np.reshape(x, (-1, np.shape(x)[-1])) for x in pred_bbox]
 pred_bbox = np.concatenate(pred_bbox, axis=0)
 return pred_bbox
def postprocess_boxes(pred_bbox, org_img_shape, input_size, score_threshold):
 '''remove boundary boxs with a low detection probability'''
 valid_scale=[0, np.inf]
 pred_bbox = np.array(pred_bbox)
 pred_xywh = pred_bbox[:, 0:4]
 pred_conf = pred_bbox[:, 4]
 pred_prob = pred_bbox[:, 5:]
 # (1) (x, y, w, h) --> (xmin, ymin, xmax, ymax)
 pred_coor = np.concatenate([pred_xywh[:, :2] - pred_xywh[:, 2:] * 0.5,
 pred_xywh[:, :2] + pred_xywh[:, 2:] * 0.5], axis=-1)
 # (2) (xmin, ymin, xmax, ymax) -> (xmin_org, ymin_org, xmax_org, ymax_org)
 org_h, org_w = org_img_shape
 resize_ratio = min(input_size / org_w, input_size / org_h)
 dw = (input_size - resize_ratio * org_w) / 2
 dh = (input_size - resize_ratio * org_h) / 2
 pred_coor[:, 0::2] = 1.0 * (pred_coor[:, 0::2] - dw) / resize_ratio
 pred_coor[:, 1::2] = 1.0 * (pred_coor[:, 1::2] - dh) / resize_ratio
 # (3) clip some boxes that are out of range
 pred_coor = np.concatenate([np.maximum(pred_coor[:, :2], [0, 0]),
 np.minimum(pred_coor[:, 2:], [org_w - 1, org_h - 1])], axis=-1)
 invalid_mask = np.logical_or((pred_coor[:, 0] > pred_coor[:, 2]), (pred_coor[:, 1] > pred_coor[:, 3]))
 pred_coor[invalid_mask] = 0
 # (4) discard some invalid boxes
 bboxes_scale = np.sqrt(np.multiply.reduce(pred_coor[:, 2:4] - pred_coor[:, 0:2], axis=-1))
 scale_mask = np.logical_and((valid_scale[0] < bboxes_scale), (bboxes_scale < valid_scale[1]))
 # (5) discard some boxes with low scores
 classes = np.argmax(pred_prob, axis=-1)
 scores = pred_conf * pred_prob[np.arange(len(pred_coor)), classes]
 score_mask = scores > score_threshold
 mask = np.logical_and(scale_mask, score_mask)
 coors, scores, classes = pred_coor[mask], scores[mask], classes[mask]
 return np.concatenate([coors, scores[:, np.newaxis], classes[:, np.newaxis]], axis=-1)
def bboxes_iou(boxes1, boxes2):
 '''calculate the Intersection Over Union value'''
 boxes1 = np.array(boxes1)
 boxes2 = np.array(boxes2)
 boxes1_area = (boxes1[..., 2] - boxes1[..., 0]) * (boxes1[..., 3] - boxes1[..., 1])
 boxes2_area = (boxes2[..., 2] - boxes2[..., 0]) * (boxes2[..., 3] - boxes2[..., 1])
 left_up = np.maximum(boxes1[..., :2], boxes2[..., :2])
 right_down = np.minimum(boxes1[..., 2:], boxes2[..., 2:])
 inter_section = np.maximum(right_down - left_up, 0.0)
 inter_area = inter_section[..., 0] * inter_section[..., 1]
 union_area = boxes1_area + boxes2_area - inter_area
 ious = np.maximum(1.0 * inter_area / union_area, np.finfo(np.float32).eps)
 return ious
def nms(bboxes, iou_threshold, sigma=0.3, method='nms'):
 """
 :param bboxes: (xmin, ymin, xmax, ymax, score, class)

 Note: soft-nms, https://arxiv.org/pdf/1704.04503.pdf
 https://github.com/bharatsingh430/soft-nms
 """
 classes_in_img = list(set(bboxes[:, 5]))
 best_bboxes = []
 for cls in classes_in_img:
 cls_mask = (bboxes[:, 5] == cls)
 cls_bboxes = bboxes[cls_mask]
 while len(cls_bboxes) > 0:
 max_ind = np.argmax(cls_bboxes[:, 4])
 best_bbox = cls_bboxes[max_ind]
 best_bboxes.append(best_bbox)
 cls_bboxes = np.concatenate([cls_bboxes[: max_ind], cls_bboxes[max_ind + 1:]])
 iou = bboxes_iou(best_bbox[np.newaxis, :4], cls_bboxes[:, :4])
 weight = np.ones((len(iou),), dtype=np.float32)
 assert method in ['nms', 'soft-nms']
 if method == 'nms':
 iou_mask = iou > iou_threshold
 weight[iou_mask] = 0.0
 if method == 'soft-nms':
 weight = np.exp(-(1.0 * iou ** 2 / sigma))
 cls_bboxes[:, 4] = cls_bboxes[:, 4] * weight
 score_mask = cls_bboxes[:, 4] > 0.
 cls_bboxes = cls_bboxes[score_mask]
 return best_bboxes
def read_class_names(class_file_name):
 '''loads class name from a file'''
 names = {}
 with open(class_file_name, 'r') as data:
 for ID, name in enumerate(data):
 names[ID] = name.strip('\n')
 return names
def draw_bbox(image, bboxes, classes=read_class_names("coco.names"), show_label=True):
 """
 bboxes: [x_min, y_min, x_max, y_max, probability, cls_id] format coordinates.
 """
 num_classes = len(classes)
 image_h, image_w, _ = image.shape
 hsv_tuples = [(1.0 * x / num_classes, 1., 1.) for x in range(num_classes)]
 colors = list(map(lambda x: colorsys.hsv_to_rgb(*x), hsv_tuples))
 colors = list(map(lambda x: (int(x[0] * 255), int(x[1] * 255), int(x[2] * 255)), colors))
 random.seed(0)
 random.shuffle(colors)
 random.seed(None)
 for i, bbox in enumerate(bboxes):
 coor = np.array(bbox[:4], dtype=np.int32)
 fontScale = 0.5
 score = bbox[4]
 class_ind = int(bbox[5])
 bbox_color = colors[class_ind]
 bbox_thick = int(0.6 * (image_h + image_w) / 600)
 c1, c2 = (coor[0], coor[1]), (coor[2], coor[3])
 cv2.rectangle(image, c1, c2, bbox_color, bbox_thick)
 if show_label:
 bbox_mess = '%s: %.2f' % (classes[class_ind], score)
 t_size = cv2.getTextSize(bbox_mess, 0, fontScale, thickness=bbox_thick//2)[0]
 cv2.rectangle(image, c1, (c1[0] + t_size[0], c1[1] - t_size[1] - 3), bbox_color, -1)
 cv2.putText(image, bbox_mess, (c1[0], c1[1]-2), cv2.FONT_HERSHEY_SIMPLEX,
 fontScale, (0, 0, 0), bbox_thick//2, lineType=cv2.LINE_AA)
 return image

Dataset

Pretrained yolov4 weights can be downloaded here.

Validation accuracy

YOLOv4:
mAP50 on COCO 2017 dataset is 0.5733, based on the original tensorflow model.

YOLOv4-int8:
mAP50 on COCO 2017 dataset is 0.570, metric is COCO box mAP@[IoU=0.50:0.95 | area= large | maxDets=100].


Quantization

YOLOv4-int8 is obtained by quantizing YOLOv4 model. We use Intel® Neural Compressor with onnxruntime backend to perform quantization. View the instructions to understand how to use Intel® Neural Compressor for quantization.

Environment

onnx: 1.9.0
onnxruntime: 1.10.0

Prepare model

wget https://github.com/onnx/models/blob/main/vision/object_detection_segmentation/yolov4/model/yolov4.onnx

Model quantize

bash run_tuning.sh --input_model=path/to/model \  # model path as *.onnx
 --config=yolov4.yaml \
 --data_path=path/to/COCO2017 \
 --output_model=path/to/save

Publication/Attribution

References


Contributors

License

MIT License

Signed-off-by: Xinyu Ye <xinyu.ye@intel.com>
Copy link
Author

Hi @jcwchen , I have tested in my local linux env with the command python workflow_scripts/test_models.py --target onnxruntime and passed all tests, but it was failed here. Could you please help me on this?
image

Copy link
Member

jcwchen commented May 13, 2022

Hi @XinyuYe-Intel,
Thanks for letting me know this issue. Do your Linux machine have VNNI (avx512) support?

Copy link
Author

Hi @XinyuYe-Intel, Thanks for letting me know this issue. Do your Linux machine have VNNI (avx512) support?

No, I use Xeon Gold 6248 processor. Checked /proc/cpuinfo, 'avx512_vnni' is absent.

Copy link
Member

jcwchen commented May 16, 2022
edited
Loading

No, I use Xeon Gold 6248 processor. Checked /proc/cpuinfo, 'avx512_vnni' is absent.

Probably that's why current CI fails because I believe most of GitHub action machines do have VNNI support (although some of them do not). It's an existing issue that the CI in ONNX Model Zoo will have different ORT behavior with or without VNNI support #522 so sometimes the CI will fail. I will try to prioritize solving it since it's really confusing for this inconsistent CI.

Still, I believe all outputs of existing int8 models in ONNX Model Zoo were produced by ORT with VNNI support. (@mengniwang95 please correct me if I am wrong. Thanks!) It seems to me that we should make all output of int8 models be produced by VNNI support for consistency. If I understand correctly, could you please make this output be produced by a machine with VNNI support? Thank you.

Copy link
Contributor

Hi @jcwchen , existing int8 models are all generated with VNNI support.

jcwchen reacted with thumbs up emoji

Copy link
Author

No, I use Xeon Gold 6248 processor. Checked /proc/cpuinfo, 'avx512_vnni' is absent.

Probably that's why current CI fails because I believe most of GitHub action machines do have VNNI support (although some of them do not). It's an existing issue that the CI in ONNX Model Zoo will have different ORT behavior with or without VNNI support #522 so sometimes the CI will fail. I will try to prioritize solving it since it's really confusing for this inconsistent CI.

Still, I believe all outputs of existing int8 models in ONNX Model Zoo were produced by ORT with VNNI support. (@mengniwang95 please correct me if I am wrong. Thanks!) It seems to me that we should make all output of int8 models be produced by VNNI support for consistency. If I understand correctly, could you please make this output be produced by a machine with VNNI support? Thank you.

Sure, I'll reproduce it. Thanks for your help!

jcwchen reacted with thumbs up emoji

Copy link
Member

@jcwchen jcwchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for getting back to you late. I just merged my PR to improve the CIs: #526. Ideally the CI should be consistent now (skip ORT test if the CI doesn't have VNNI support). I think the Windows CI failed because it has VNNI support and the inferred result is different from yours. To confirm: did you produce the output.pb with a machine with VNNI support? If so, there might other issue causing this behavior difference...

Copy link
Author

Sorry for getting back to you late. I just merged my PR to improve the CIs: #526. Ideally the CI should be consistent now (skip ORT test if the CI doesn't have VNNI support). I think the Windows CI failed because it has VNNI support and the inferred result is different from yours. To confirm: did you produce the output.pb with a machine with VNNI support? If so, there might other issue causing this behavior difference...

No problem. I followed advice of @mengniwang95 , produced yolov4-int8.onnx with yolov4.onnx as input in a VNNI supported linux machine, and produced test_data_set in a linux machine without VNNI support, doesn't invlove *.pb.

Copy link
Member

jcwchen commented Jun 6, 2022

Thanks for the context! Could you please regenerated the test_data_set in a linux machine with VNNI support? Then it should pass the CIs.

Copy link
Author

Thanks for the context! Could you please regenerated the test_data_set in a linux machine with VNNI support? Then it should pass the CIs.

Sure, I'll try it.

Copy link
Member

jcwchen commented Jun 9, 2022

Thanks for updating the outpub.pb! but the updated one is still not reproducible in the CI machine which has avx512 support and the difference seems quite a little... I am trying to figure out the root cause about this behavior difference -- did you produce the output.pb with the latest ONNX Runtime (1.11) and an avx512 machine?

The only reason I can think of is the GitHub action machines only have avx512f support and do not have avx512_vnni support, but in the past the CI doesn't encounter this significant result difference with int8 test data...

Copy link
Author

Thanks for updating the outpub.pb! but the updated one is still not reproducible in the CI machine which has avx512 support and the difference seems quite a little... I am trying to figure out the root cause about this behavior difference -- did you produce the output.pb with the latest ONNX Runtime (1.11) and an avx512 machine?

The only reason I can think of is the GitHub action machines only have avx512f support and do not have avx512_vnni support, but in the past the CI doesn't encounter this significant result difference with int8 test data...

Yes, in the avx512_vnni supported machine, I produced yolov4 int8 model with onnx: 1.11.0, onnxruntime: 1.10.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

@jcwchen jcwchen jcwchen left review comments

Assignees

No one assigned

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

AltStyle によって変換されたページ (->オリジナル) /