paddle.fluid.layers.detection. yolo_box ( x, img_size, anchors, class_num, conf_thresh, downsample_ratio, clip_bbox=True, name=None, scale_x_y=1.0 ) [source]

Warning: API “paddle.fluid.layers.detection.yolo_box” is deprecated since 2.0.0, and will be removed in future versions. Please use “paddle.vision.ops.yolo_box” instead.

This operator generates YOLO detection boxes from output of YOLOv3 network.

The output of previous network is in shape [N, C, H, W], while H and W should be the same, H and W specify the grid size, each grid point predict given number boxes, this given number, which following will be represented as S, is specified by the number of anchors. In the second dimension(the channel dimension), C should be equal to S * (5 + class_num), class_num is the object category number of source dataset(such as 80 in coco dataset), so the second(channel) dimension, apart from 4 box location coordinates x, y, w, h, also includes confidence score of the box and class one-hot key of each anchor box.

Assume the 4 location coordinates are \(t_x, t_y, t_w, t_h\), the box predictions should be as follows:

$$ b_x = \sigma(t_x) + c_x $$ $$ b_y = \sigma(t_y) + c_y $$ $$ b_w = p_w e^{t_w} $$ $$ b_h = p_h e^{t_h} $$

in the equation above, \(c_x, c_y\) is the left top corner of current grid and \(p_w, p_h\) is specified by anchors.

The logistic regression value of the 5th channel of each anchor prediction boxes represents the confidence score of each prediction box, and the logistic regression value of the last class_num channels of each anchor prediction boxes represents the classifcation scores. Boxes with confidence scores less than conf_thresh should be ignored, and box final scores is the product of confidence scores and classification scores.

$$ score_{pred} = score_{conf} * score_{class} $$


x (Variable): The input tensor of YoloBox operator is a 4-D tensor with shape of [N, C, H, W]. The second dimension(C) stores box locations, confidence score and classification one-hot keys of each anchor box. Generally, X should be the output of YOLOv3 network The data type is float32 or float64. img_size (Variable): The image size tensor of YoloBox operator, This is a 2-D tensor with shape of [N, 2]. This tensor holds height and width of each input image used for resizing output box in input image scale The data type is int32. anchors (list|tuple): The anchor width and height, it will be parsed pair by pair class_num (int): The number of classes to predict conf_thresh (float): The confidence scores threshold of detection boxes. Boxes with confidence scores under threshold should be ignored downsample_ratio (int): The downsample ratio from network input to YoloBox operator input, so 32, 16, 8 should be set for the first, second, and thrid YoloBox operators clip_bbox (bool): Whether clip output bonding box in Input(ImgSize) boundary. Default true scale_x_y (float): Scale the center point of decoded bounding box. Default 1.0 name (string): The default value is None. Normally there is no need

System Message: ERROR/3 (/usr/local/lib/python3.8/site-packages/paddle/fluid/layers/detection.py:docstring of paddle.fluid.layers.detection.yolo_box, line 32)

Unexpected indentation.

for user to set this property. For more information, please refer to Name


Variable: A 3-D tensor with shape [N, M, 4], the coordinates of boxes, and a 3-D tensor with shape [N, M, class_num], the classification scores of boxes.


TypeError: Input x of yolov_box must be Variable TypeError: Attr anchors of yolo box must be list or tuple TypeError: Attr class_num of yolo box must be an integer TypeError: Attr conf_thresh of yolo box must be a float number


import paddle.fluid as fluid
import paddle
x = fluid.data(name='x', shape=[None, 255, 13, 13], dtype='float32')
img_size = fluid.data(name='img_size',shape=[None, 2],dtype='int64')
anchors = [10, 13, 16, 30, 33, 23]
boxes,scores = fluid.layers.yolo_box(x=x, img_size=img_size, class_num=80, anchors=anchors,
                                conf_thresh=0.01, downsample_ratio=32)