rpn_target_assign(bbox_pred, cls_logits, anchor_box, anchor_var, gt_boxes, is_crowd, im_info, rpn_batch_size_per_im=256, rpn_straddle_thresh=0.0, rpn_fg_fraction=0.5, rpn_positive_overlap=0.7, rpn_negative_overlap=0.3, use_random=True)
Target Assign Layer for region proposal network (RPN) in Faster-RCNN detection.
This layer can be, for given the Intersection-over-Union (IoU) overlap between anchors and ground truth boxes, to assign classification and regression targets to each each anchor, these target labels are used for train RPN. The classification targets is a binary class label (of being an object or not). Following the paper of Faster-RCNN, the positive labels are two kinds of anchors: (i) the anchor/anchors with the highest IoU overlap with a ground-truth box, or (ii) an anchor that has an IoU overlap higher than rpn_positive_overlap(0.7) with any ground-truth box. Note that a single ground-truth box may assign positive labels to multiple anchors. A non-positive anchor is when its IoU ratio is lower than rpn_negative_overlap (0.3) for all ground-truth boxes. Anchors that are neither positive nor negative do not contribute to the training objective. The regression targets are the encoded ground-truth boxes associated with the positive anchors.
bbox_pred (Variable) – A 3-D Tensor with shape [N, M, 4] represents the predicted locations of M bounding bboxes. N is the batch size, and each bounding box has four coordinate values and the layout is [xmin, ymin, xmax, ymax]. The data type can be float32 or float64.
cls_logits (Variable) – A 3-D Tensor with shape [N, M, 1] represents the predicted confidence predictions. N is the batch size, 1 is the frontground and background sigmoid, M is number of bounding boxes. The data type can be float32 or float64.
anchor_box (Variable) – A 2-D Tensor with shape [M, 4] holds M boxes, each box is represented as [xmin, ymin, xmax, ymax], [xmin, ymin] is the left top coordinate of the anchor box, if the input is image feature map, they are close to the origin of the coordinate system. [xmax, ymax] is the right bottom coordinate of the anchor box. The data type can be float32 or float64.
anchor_var (Variable) – A 2-D Tensor with shape [M,4] holds expanded variances of anchors. The data type can be float32 or float64.
gt_boxes (Variable) – The ground-truth bounding boxes (bboxes) are a 2D LoDTensor with shape [Ng, 4], Ng is the total number of ground-truth bboxes of mini-batch input. The data type can be float32 or float64.
is_crowd (Variable) – A 1-D LoDTensor which indicates groud-truth is crowd. The data type must be int32.
im_info (Variable) – A 2-D LoDTensor with shape [N, 3]. N is the batch size,
is the height, width and scale. (3) –
rpn_batch_size_per_im (int) – Total number of RPN examples per image. The data type must be int32.
rpn_straddle_thresh (float) – Remove RPN anchors that go outside the image by straddle_thresh pixels. The data type must be float32.
rpn_fg_fraction (float) – Target fraction of RoI minibatch that is labeled foreground (i.e. class > 0), 0-th class is background. The data type must be float32.
rpn_positive_overlap (float) – Minimum overlap required between an anchor and ground-truth box for the (anchor, gt box) pair to be a positive example. The data type must be float32.
rpn_negative_overlap (float) – Maximum overlap allowed between an anchor and ground-truth box for the (anchor, gt box) pair to be a negative examples. The data type must be float32.
A tuple(predicted_scores, predicted_location, target_label, target_bbox, bbox_inside_weight) is returned. The predicted_scores and predicted_location is the predicted result of the RPN. The target_label and target_bbox is the ground truth, respectively. The predicted_location is a 2D Tensor with shape [F, 4], and the shape of target_bbox is same as the shape of the predicted_location, F is the number of the foreground anchors. The predicted_scores is a 2D Tensor with shape [F + B, 1], and the shape of target_label is same as the shape of the predicted_scores, B is the number of the background anchors, the F and B is depends on the input of this operator. Bbox_inside_weight represents whether the predicted loc is fake_fg or not and the shape is [F, 4].
- Return type
import paddle.fluid as fluid bbox_pred = fluid.data(name='bbox_pred', shape=[None, 4], dtype='float32') cls_logits = fluid.data(name='cls_logits', shape=[None, 1], dtype='float32') anchor_box = fluid.data(name='anchor_box', shape=[None, 4], dtype='float32') anchor_var = fluid.data(name='anchor_var', shape=[None, 4], dtype='float32') gt_boxes = fluid.data(name='gt_boxes', shape=[None, 4], dtype='float32') is_crowd = fluid.data(name='is_crowd', shape=[None], dtype='float32') im_info = fluid.data(name='im_infoss', shape=[None, 3], dtype='float32') loc, score, loc_target, score_target, inside_weight = fluid.layers.rpn_target_assign( bbox_pred, cls_logits, anchor_box, anchor_var, gt_boxes, is_crowd, im_info)