yolo_box¶
- paddle.vision.ops. yolo_box ( x, img_size, anchors, class_num, conf_thresh, downsample_ratio, clip_bbox=True, name=None, scale_x_y=1.0, iou_aware=False, iou_aware_factor=0.5 ) [source]
- 
         This operator generates YOLO detection boxes from output of YOLOv3 network. The output of previous network is in shape [N, C, H, W], while H and W should be the same, H and W specify the grid size, each grid point predict given number boxes, this given number, which following will be represented as S, is specified by the number of anchors. In the second dimension(the channel dimension), C should be equal to S * (5 + class_num) if iou_awareis false, otherwise C should be equal to S * (6 + class_num). class_num is the object category number of source dataset(such as 80 in coco dataset), so the second(channel) dimension, apart from 4 box location coordinates x, y, w, h, also includes confidence score of the box and class one-hot key of each anchor box.Assume the 4 location coordinates are \(t_x, t_y, t_w, t_h\), the box predictions should be as follows: $$ b_x = \sigma(t_x) + c_x $$ $$ b_y = \sigma(t_y) + c_y $$ $$ b_w = p_w e^{t_w} $$ $$ b_h = p_h e^{t_h} $$ in the equation above, \(c_x, c_y\) is the left top corner of current grid and \(p_w, p_h\) is specified by anchors. The logistic regression value of the 5th channel of each anchor prediction boxes represents the confidence score of each prediction box, and the logistic regression value of the last class_numchannels of each anchor prediction boxes represents the classifcation scores. Boxes with confidence scores less thanconf_threshshould be ignored, and box final scores is the product of confidence scores and classification scores.$$ score_{pred} = score_{conf} * score_{class} $$ where the confidence scores follow the formula bellow \[\begin{split}score_{conf} = \begin{case} obj, \text{if } iou_aware == flase \\ obj^{1 - iou_aware_factor} * iou^{iou_aware_factor}, \text{otherwise} \end{case}\end{split}\]- Parameters
- 
           - x (Tensor) – The input tensor of YoloBox operator is a 4-D tensor with shape of [N, C, H, W]. The second dimension(C) stores box locations, confidence score and classification one-hot keys of each anchor box. Generally, X should be the output of YOLOv3 network. The data type is float32 or float64. 
- img_size (Tensor) – The image size tensor of YoloBox operator, This is a 2-D tensor with shape of [N, 2]. This tensor holds height and width of each input image used for resizing output box in input image scale. The data type is int32. 
- anchors (list|tuple) – The anchor width and height, it will be parsed pair by pair. 
- class_num (int) – The number of classes. 
- conf_thresh (float) – The confidence scores threshold of detection boxes. Boxes with confidence scores under threshold should be ignored. 
- downsample_ratio (int) – The downsample ratio from network input to - yolo_boxoperator input, so 32, 16, 8 should be set for the first, second, and thrid- yolo_boxlayer.
- clip_bbox (bool) – Whether clip output bonding box in - img_sizeboundary. Default true.
- scale_x_y (float) – Scale the center point of decoded bounding box. Default 1.0 
- name (string) – The default value is None. Normally there is no need for user to set this property. For more information, please refer to Name 
- iou_aware (bool) – Whether use iou aware. Default false 
- iou_aware_factor (float) – iou aware factor. Default 0.5 
 
- Returns
- 
           A 3-D tensor with shape [N, M, 4], the coordinates of boxes, and a 3-D tensor with shape [N, M, class_num], the classification scores of boxes.
- Return type
- 
           Tensor 
- Raises
- 
           - TypeError – Input x of yolov_box must be Tensor 
- TypeError – Attr anchors of yolo box must be list or tuple 
- TypeError – Attr class_num of yolo box must be an integer 
- TypeError – Attr conf_thresh of yolo box must be a float number 
 
 Examples: import paddle import numpy as np x = np.random.random([2, 14, 8, 8]).astype('float32') img_size = np.ones((2, 2)).astype('int32') x = paddle.to_tensor(x) img_size = paddle.to_tensor(img_size) boxes, scores = paddle.vision.ops.yolo_box(x, img_size=img_size, anchors=[10, 13, 16, 30], class_num=2, conf_thresh=0.01, downsample_ratio=8, clip_bbox=True, scale_x_y=1.) 
