retinanet_detection_output

paddle.fluid.layers.retinanet_detection_output(bboxes, scores, anchors, im_info, score_threshold=0.05, nms_top_k=1000, keep_top_k=100, nms_threshold=0.3, nms_eta=1.0)[source]

Detection Output Layer for the detector RetinaNet.

In the detector RetinaNet , many FPN levels output the category and location predictions, this OP is to get the detection results by performing following steps:

  1. For each FPN level, decode box predictions according to the anchor boxes from at most nms_top_k top-scoring predictions after thresholding detector confidence at score_threshold.

  2. Merge top predictions from all levels and apply multi-class non maximum suppression (NMS) on them to get the final detections.

Parameters
  • bboxes (List) – A list of Tensors from multiple FPN levels represents the location prediction for all anchor boxes. Each element is a 3-D Tensor with shape \([N, Mi, 4]\), \(N\) is the batch size, \(Mi\) is the number of bounding boxes from \(i\)-th FPN level and each bounding box has four coordinate values and the layout is [xmin, ymin, xmax, ymax]. The data type of each element is float32 or float64.

  • scores (List) – A list of Tensors from multiple FPN levels represents the category prediction for all anchor boxes. Each element is a 3-D Tensor with shape \([N, Mi, C]\), \(N\) is the batch size, \(C\) is the class number (excluding background), \(Mi\) is the number of bounding boxes from \(i\)-th FPN level. The data type of each element is float32 or float64.

  • anchors (List) – A list of Tensors from multiple FPN levels represents the locations of all anchor boxes. Each element is a 2-D Tensor with shape \([Mi, 4]\), \(Mi\) is the number of bounding boxes from \(i\)-th FPN level, and each bounding box has four coordinate values and the layout is [xmin, ymin, xmax, ymax]. The data type of each element is float32 or float64.

  • im_info (Variable) – A 2-D Tensor with shape \([N, 3]\) represents the size information of input images. \(N\) is the batch size, the size informarion of each image is a 3-vector which are the height and width of the network input along with the factor scaling the origin image to the network input. The data type of im_info is float32.

  • score_threshold (float) – Threshold to filter out bounding boxes with a confidence score before NMS, default value is set to 0.05.

  • nms_top_k (int) – Maximum number of detections per FPN layer to be kept according to the confidences before NMS, default value is set to 1000.

  • keep_top_k (int) – Number of total bounding boxes to be kept per image after NMS step. Default value is set to 100, -1 means keeping all bounding boxes after NMS step.

  • nms_threshold (float) – The Intersection-over-Union(IoU) threshold used to filter out boxes in NMS.

  • nms_eta (float) – The parameter for adjusting nms_threshold in NMS. Default value is set to 1., which represents the value of nms_threshold keep the same in NMS. If nms_eta is set to be lower than 1. and the value of nms_threshold is set to be higher than 0.5, everytime a bounding box is filtered out, the adjustment for nms_threshold like nms_threshold = nms_threshold * nms_eta will not be stopped until the actual value of nms_threshold is lower than or equal to 0.5.

Notice: In some cases where the image sizes are very small, it’s possible that there is no detection if score_threshold are used at all levels. Hence, this OP do not filter out anchors from the highest FPN level before NMS. And the last element in bboxes:, scores and anchors is required to be from the hightest FPN level.

Returns

The detection output is a 1-level LoDTensor with shape \([No, 6]\). Each row has six values: [label, confidence, xmin, ymin, xmax, ymax]. \(No\) is the total number of detections in this mini-batch. The \(i\)-th image has LoD[i + 1] - LoD[i] detected results, if LoD[i + 1] - LoD[i] is 0, the \(i\)-th image has no detected results. If all images have no detected results, LoD will be set to 0, and the output tensor is empty (None).

Return type

Variable(The data type is float32 or float64)

Examples

import paddle.fluid as fluid

bboxes_low = fluid.data(
    name='bboxes_low', shape=[1, 44, 4], dtype='float32')
bboxes_high = fluid.data(
    name='bboxes_high', shape=[1, 11, 4], dtype='float32')
scores_low = fluid.data(
    name='scores_low', shape=[1, 44, 10], dtype='float32')
scores_high = fluid.data(
    name='scores_high', shape=[1, 11, 10], dtype='float32')
anchors_low = fluid.data(
    name='anchors_low', shape=[44, 4], dtype='float32')
anchors_high = fluid.data(
    name='anchors_high', shape=[11, 4], dtype='float32')
im_info = fluid.data(
    name="im_info", shape=[1, 3], dtype='float32')
nmsed_outs = fluid.layers.retinanet_detection_output(
                               bboxes=[bboxes_low, bboxes_high],
                               scores=[scores_low, scores_high],
                               anchors=[anchors_low, anchors_high],
                               im_info=im_info,
                               score_threshold=0.05,
                               nms_top_k=1000,
                               keep_top_k=100,
                               nms_threshold=0.45,
                               nms_eta=1.)