- paddle.vision.ops. roi_pool ( x, boxes, boxes_num, output_size, spatial_scale=1.0, name=None ) [source]
This operator implements the roi_pooling layer. Region of interest pooling (also known as RoI pooling) is to perform max pooling on inputs of nonuniform sizes to obtain fixed-size feature maps (e.g. 7*7). The operator has three steps: 1. Dividing each region proposal into equal-sized sections with output_size(h, w) 2. Finding the largest value in each section 3. Copying these max values to the output buffer For more information, please refer to https://stackoverflow.com/questions/43430056/what-is-roi-layer-in-fast-rcnn.
x (Tensor) – input feature, 4D-Tensor with the shape of [N,C,H,W], where N is the batch size, C is the input channel, H is Height, W is weight. The data type is float32 or float64.
boxes (Tensor) – boxes (Regions of Interest) to pool over. 2D-Tensor with the shape of [num_boxes,4]. Given as [[x1, y1, x2, y2], …], (x1, y1) is the top left coordinates, and (x2, y2) is the bottom right coordinates.
boxes_num (Tensor) – the number of RoIs in each image, data type is int32. Default: None
output_size (int or tuple[int, int]) – the pooled output size(h, w), data type is int32. If int, h and w are both equal to output_size.
spatial_scale (float, optional) – multiplicative spatial scale factor to translate ROI coords from their input scale to the scale used when pooling. Default: 1.0
name (str, optional) – for detailed information, please refer to Name. Usually name is no need to set and None by default.
the pooled feature, 4D-Tensor with the shape of [num_boxes, C, output_size, output_size].
- Return type
import paddle from paddle.vision.ops import roi_pool data = paddle.rand([1, 256, 32, 32]) boxes = paddle.rand([3, 4]) boxes[:, 2] += boxes[:, 0] + 3 boxes[:, 3] += boxes[:, 1] + 4 boxes_num = paddle.to_tensor().astype('int32') pool_out = roi_pool(data, boxes, boxes_num=boxes_num, output_size=3) assert pool_out.shape == [3, 256, 3, 3], ''