retinanet_detection_output¶
- paddle.fluid.layers.detection. retinanet_detection_output ( bboxes, scores, anchors, im_info, score_threshold=0.05, nms_top_k=1000, keep_top_k=100, nms_threshold=0.3, nms_eta=1.0 ) [source]
- 
         Detection Output Layer for the detector RetinaNet. In the detector RetinaNet , many FPN levels output the category and location predictions, this OP is to get the detection results by performing following steps: - For each FPN level, decode box predictions according to the anchor boxes from at most - nms_top_ktop-scoring predictions after thresholding detector confidence at- score_threshold.
- Merge top predictions from all levels and apply multi-class non maximum suppression (NMS) on them to get the final detections. 
 - Parameters
- 
           - bboxes (List) – A list of Tensors from multiple FPN levels represents the location prediction for all anchor boxes. Each element is a 3-D Tensor with shape \([N, Mi, 4]\), \(N\) is the batch size, \(Mi\) is the number of bounding boxes from \(i\)-th FPN level and each bounding box has four coordinate values and the layout is [xmin, ymin, xmax, ymax]. The data type of each element is float32 or float64. 
- scores (List) – A list of Tensors from multiple FPN levels represents the category prediction for all anchor boxes. Each element is a 3-D Tensor with shape \([N, Mi, C]\), \(N\) is the batch size, \(C\) is the class number (excluding background), \(Mi\) is the number of bounding boxes from \(i\)-th FPN level. The data type of each element is float32 or float64. 
- anchors (List) – A list of Tensors from multiple FPN levels represents the locations of all anchor boxes. Each element is a 2-D Tensor with shape \([Mi, 4]\), \(Mi\) is the number of bounding boxes from \(i\)-th FPN level, and each bounding box has four coordinate values and the layout is [xmin, ymin, xmax, ymax]. The data type of each element is float32 or float64. 
- im_info (Variable) – A 2-D Tensor with shape \([N, 3]\) represents the size information of input images. \(N\) is the batch size, the size information of each image is a 3-vector which are the height and width of the network input along with the factor scaling the origin image to the network input. The data type of - im_infois float32.
- score_threshold (float) – Threshold to filter out bounding boxes with a confidence score before NMS, default value is set to 0.05. 
- nms_top_k (int) – Maximum number of detections per FPN layer to be kept according to the confidences before NMS, default value is set to 1000. 
- keep_top_k (int) – Number of total bounding boxes to be kept per image after NMS step. Default value is set to 100, -1 means keeping all bounding boxes after NMS step. 
- nms_threshold (float) – The Intersection-over-Union(IoU) threshold used to filter out boxes in NMS. 
- nms_eta (float) – The parameter for adjusting - nms_thresholdin NMS. Default value is set to 1., which represents the value of- nms_thresholdkeep the same in NMS. If- nms_etais set to be lower than 1. and the value of- nms_thresholdis set to be higher than 0.5, everytime a bounding box is filtered out, the adjustment for- nms_thresholdlike- nms_threshold=- nms_threshold*- nms_etawill not be stopped until the actual value of- nms_thresholdis lower than or equal to 0.5.
 
 Notice: In some cases where the image sizes are very small, it’s possible that there is no detection if score_thresholdare used at all levels. Hence, this OP do not filter out anchors from the highest FPN level before NMS. And the last element inbboxes:,scoresandanchorsis required to be from the highest FPN level.- Returns
- 
           The detection output is a 1-level LoDTensor with shape \([No, 6]\). Each row has six values: [label, confidence, xmin, ymin, xmax, ymax]. \(No\) is the total number of detections in this mini-batch. The \(i\)-th image has LoD[i + 1] - LoD[i] detected results, if LoD[i + 1] - LoD[i] is 0, the \(i\)-th image has no detected results. If all images have no detected results, LoD will be set to 0, and the output tensor is empty (None). 
- Return type
- 
           Variable(The data type is float32 or float64) 
 Examples import paddle.fluid as fluid bboxes_low = fluid.data( name='bboxes_low', shape=[1, 44, 4], dtype='float32') bboxes_high = fluid.data( name='bboxes_high', shape=[1, 11, 4], dtype='float32') scores_low = fluid.data( name='scores_low', shape=[1, 44, 10], dtype='float32') scores_high = fluid.data( name='scores_high', shape=[1, 11, 10], dtype='float32') anchors_low = fluid.data( name='anchors_low', shape=[44, 4], dtype='float32') anchors_high = fluid.data( name='anchors_high', shape=[11, 4], dtype='float32') im_info = fluid.data( name="im_info", shape=[1, 3], dtype='float32') nmsed_outs = fluid.layers.retinanet_detection_output( bboxes=[bboxes_low, bboxes_high], scores=[scores_low, scores_high], anchors=[anchors_low, anchors_high], im_info=im_info, score_threshold=0.05, nms_top_k=1000, keep_top_k=100, nms_threshold=0.45, nms_eta=1.0) 
