detection_output( loc, scores, prior_box, prior_box_var, background_label=0, nms_threshold=0.3, nms_top_k=400, keep_top_k=200, score_threshold=0.01, nms_eta=1.0, return_index=False )
Given the regression locations, classification confidences and prior boxes, calculate the detection outputs by performing following steps:
Decode input bounding box predictions according to the prior boxes and regression locations.
Get the final detection results by applying multi-class non maximum suppression (NMS).
Please note, this operation doesn’t clip the final output bounding boxes to the image window.
loc (Variable) – A 3-D Tensor with shape [N, M, 4] represents the predicted locations of M bounding bboxes. Data type should be float32 or float64. N is the batch size, and each bounding box has four coordinate values and the layout is [xmin, ymin, xmax, ymax].
scores (Variable) – A 3-D Tensor with shape [N, M, C] represents the predicted confidence predictions. Data type should be float32 or float64. N is the batch size, C is the class number, M is number of bounding boxes.
prior_box (Variable) – A 2-D Tensor with shape [M, 4] holds M boxes, each box is represented as [xmin, ymin, xmax, ymax]. Data type should be float32 or float64.
prior_box_var (Variable) – A 2-D Tensor with shape [M, 4] holds M group of variance. Data type should be float32 or float64.
background_label (int) – The index of background label, the background label will be ignored. If set to -1, then all categories will be considered. Default: 0.
nms_threshold (float) – The threshold to be used in NMS. Default: 0.3.
nms_top_k (int) – Maximum number of detections to be kept according to the confidences after filtering detections based on score_threshold and before NMS. Default: 400.
keep_top_k (int) – Number of total bboxes to be kept per image after NMS step. -1 means keeping all bboxes after NMS step. Default: 200.
score_threshold (float) – Threshold to filter out bounding boxes with low confidence score. If not provided, consider all boxes. Default: 0.01.
nms_eta (float) – The parameter for adaptive NMS. It works only when the value is less than 1.0. Default: 1.0.
return_index (bool) – Whether return selected index. Default: False
(Out, Index) if return_index is True, otherwise, a tuple with one Variable(Out) is returned.
Out (Variable): The detection outputs is a LoDTensor with shape [No, 6]. Data type is the same as input (loc). Each row has six values: [label, confidence, xmin, ymin, xmax, ymax]. No is the total number of detections in this mini-batch. For each instance, the offsets in first dimension are called LoD, the offset number is N + 1, N is the batch size. The i-th image has LoD[i + 1] - LoD[i] detected results, if it is 0, the i-th image has no detected results.
Index (Variable): Only return when return_index is True. A 2-D LoDTensor with shape [No, 1] represents the selected index which type is Integer. The index is the absolute value cross batches. No is the same number as Out. If the index is used to gather other attribute such as age, one needs to reshape the input(N, M, 1) to (N * M, 1) as first, where N is the batch size and M is the number of boxes.
- Return type
A tuple with two Variables
import paddle.fluid as fluid import paddle paddle.enable_static() pb = fluid.data(name='prior_box', shape=[10, 4], dtype='float32') pbv = fluid.data(name='prior_box_var', shape=[10, 4], dtype='float32') loc = fluid.data(name='target_box', shape=[2, 21, 4], dtype='float32') scores = fluid.data(name='scores', shape=[2, 21, 10], dtype='float32') nmsed_outs, index = fluid.layers.detection_output(scores=scores, loc=loc, prior_box=pb, prior_box_var=pbv, return_index=True)