roi_pool roi_pool ( x, boxes, boxes_num, output_size, spatial_scale=1.0, name=None ) [source]

This operator implements the roi_pooling layer. Region of interest pooling (also known as RoI pooling) is to perform max pooling on inputs of nonuniform sizes to obtain fixed-size feature maps (e.g. 7*7). The operator has three steps: 1. Dividing each region proposal into equal-sized sections with output_size(h, w) 2. Finding the largest value in each section 3. Copying these max values to the output buffer For more information, please refer to

  • x (Tensor) – input feature, 4D-Tensor with the shape of [N,C,H,W], where N is the batch size, C is the input channel, H is Height, W is weight. The data type is float32 or float64.

  • boxes (Tensor) – boxes (Regions of Interest) to pool over. 2D-Tensor with the shape of [num_boxes,4]. Given as [[x1, y1, x2, y2], …], (x1, y1) is the top left coordinates, and (x2, y2) is the bottom right coordinates.

  • boxes_num (Tensor) – the number of RoIs in each image, data type is int32.

  • output_size (int or tuple[int, int]) – the pooled output size(h, w), data type is int32. If int, h and w are both equal to output_size.

  • spatial_scale (float, optional) – multiplicative spatial scale factor to translate ROI coords from their input scale to the scale used when pooling. Default: 1.0.

  • name (str, optional) – for detailed information, please refer to Name. Usually name is no need to set and None by default. Default: None.


the pooled feature, 4D-Tensor with the shape of [num_boxes, C, output_size[0], output_size[1]].

Return type

pool_out (Tensor)


>>> import paddle
>>> from import roi_pool

>>> data = paddle.rand([1, 256, 32, 32])
>>> boxes = paddle.rand([3, 4])
>>> boxes[:, 2] += boxes[:, 0] + 3
>>> boxes[:, 3] += boxes[:, 1] + 4
>>> boxes_num = paddle.to_tensor([3]).astype('int32')
>>> pool_out = roi_pool(data, boxes, boxes_num=boxes_num, output_size=3)
>>> print(pool_out.shape)
[3, 256, 3, 3]