generate_proposals

paddle.vision.ops. generate_proposals ( scores, bbox_deltas, img_size, anchors, variances, pre_nms_top_n=6000, post_nms_top_n=1000, nms_thresh=0.5, min_size=0.1, eta=1.0, pixel_offset=False, return_rois_num=False, name=None ) [source]

This operation proposes RoIs according to each box with their probability to be a foreground object. And the proposals of RPN output are calculated by anchors, bbox_deltas and scores. Final proposals could be used to train detection net.

For generating proposals, this operation performs following steps:

  1. Transpose and resize scores and bbox_deltas in size of (H * W * A, 1) and (H * W * A, 4)

  2. Calculate box locations as proposals candidates.

  3. Clip boxes to image

  4. Remove predicted boxes with small area.

  5. Apply non-maximum suppression (NMS) to get final proposals as output.

Parameters
  • scores (Tensor) – A 4-D Tensor with shape [N, A, H, W] represents the probability for each box to be an object. N is batch size, A is number of anchors, H and W are height and width of the feature map. The data type must be float32.

  • bbox_deltas (Tensor) – A 4-D Tensor with shape [N, 4*A, H, W] represents the difference between predicted box location and anchor location. The data type must be float32.

  • img_size (Tensor) – A 2-D Tensor with shape [N, 2] represents origin image shape information for N batch, including height and width of the input sizes. The data type can be float32 or float64.

  • anchors (Tensor) – A 4-D Tensor represents the anchors with a layout of [H, W, A, 4]. H and W are height and width of the feature map, num_anchors is the box count of each position. Each anchor is in (xmin, ymin, xmax, ymax) format an unnormalized. The data type must be float32.

  • variances (Tensor) – A 4-D Tensor. The expanded variances of anchors with a layout of [H, W, num_priors, 4]. Each variance is in (xcenter, ycenter, w, h) format. The data type must be float32.

  • pre_nms_top_n (float, optional) – Number of total bboxes to be kept per image before NMS. 6000 by default.

  • post_nms_top_n (float, optional) – Number of total bboxes to be kept per image after NMS. 1000 by default.

  • nms_thresh (float, optional) – Threshold in NMS. The data type must be float32. 0.5 by default.

  • min_size (float, optional) – Remove predicted boxes with either height or width less than this value. 0.1 by default.

  • eta (float, optional) – Apply in adaptive NMS, only works if adaptive threshold > 0.5, adaptive_threshold = adaptive_threshold * eta in each iteration. 1.0 by default.

  • pixel_offset (bool, optional) – Whether there is pixel offset. If True, the offset of img_size will be 1. ‘False’ by default.

  • return_rois_num (bool, optional) – Whether to return rpn_rois_num . When setting True, it will return a 1D Tensor with shape [N, ] that includes Rois’s num of each image in one batch. ‘False’ by default.

  • name (str, optional) – For detailed information, please refer to Name. Usually name is no need to set and None by default.

Returns

The generated RoIs. 2-D Tensor with shape [N, 4] while N is the number of RoIs. The data type is the same as scores. - rpn_roi_probs (Tensor): The scores of generated RoIs. 2-D Tensor with shape [N, 1] while N is the number of RoIs. The data type is the same as scores. - rpn_rois_num (Tensor): Rois’s num of each image in one batch. 1-D Tensor with shape [B,] while B is the batch size. And its sum equals to RoIs number N .

Return type

  • rpn_rois (Tensor)

Examples

>>> import paddle
>>> paddle.seed(2023)

>>> scores = paddle.rand((2,4,5,5), dtype=paddle.float32)
>>> bbox_deltas = paddle.rand((2, 16, 5, 5), dtype=paddle.float32)
>>> img_size = paddle.to_tensor([[224.0, 224.0], [224.0, 224.0]])
>>> anchors = paddle.rand((2,5,4,4), dtype=paddle.float32)
>>> variances = paddle.rand((2,5,10,4), dtype=paddle.float32)
>>> rois, roi_probs, roi_nums = paddle.vision.ops.generate_proposals(scores, bbox_deltas,
...                 img_size, anchors, variances, return_rois_num=True)
>>> 
>>> print(rois, roi_probs, roi_nums)
Tensor(shape=[2, 4], dtype=float32, place=Place(cpu), stop_gradient=True,
[[0., 0., 0., 0.],
 [0., 0., 0., 0.]])
Tensor(shape=[2, 1], dtype=float32, place=Place(cpu), stop_gradient=True,
[[0.],
 [0.]])
Tensor(shape=[2], dtype=int32, place=Place(cpu), stop_gradient=True,
[1, 1])