roi_align roi_align ( x, boxes, boxes_num, output_size, spatial_scale=1.0, sampling_ratio=- 1, aligned=True, name=None ) [source]

Implementing the roi_align layer. Region of Interest (RoI) Align operator (also known as RoI Align) is to perform bilinear interpolation on inputs of nonuniform sizes to obtain fixed-size feature maps (e.g. 7*7), as described in Mask R-CNN.

Dividing each region proposal into equal-sized sections with the pooled_width and pooled_height. Location remains the origin result.

In each ROI bin, the value of the four regularly sampled locations are computed directly through bilinear interpolation. The output is the mean of four locations. Thus avoid the misaligned problem.

  • x (Tensor) – Input feature, 4D-Tensor with the shape of [N,C,H,W], where N is the batch size, C is the input channel, H is Height, W is weight. The data type is float32 or float64.

  • boxes (Tensor) – Boxes (RoIs, Regions of Interest) to pool over. It should be a 2-D Tensor of shape (num_boxes, 4). The data type is float32 or float64. Given as [[x1, y1, x2, y2], …], (x1, y1) is the top left coordinates, and (x2, y2) is the bottom right coordinates.

  • boxes_num (Tensor) – The number of boxes contained in each picture in the batch, the data type is int32.

  • output_size (int or Tuple[int, int]) – The pooled output size(h, w), data type is int32. If int, h and w are both equal to output_size.

  • spatial_scale (float32, optional) – Multiplicative spatial scale factor to translate ROI coords from their input scale to the scale used when pooling. Default: 1.0.

  • sampling_ratio (int32, optional) – number of sampling points in the interpolation grid used to compute the output value of each pooled output bin. If > 0, then exactly sampling_ratio x sampling_ratio sampling points per bin are used. If <= 0, then an adaptive number of grid points are used (computed as ceil(roi_width / output_width), and likewise for height). Default: -1.

  • aligned (bool, optional) – If False, use the legacy implementation. If True, pixel shift the box coordinates it by -0.5 for a better alignment with the two neighboring pixel indices. This version is used in Detectron2. Default: True.

  • name (str, optional) – For detailed information, please refer to : ref:api_guide_Name. Usually name is no need to set and None by default. Default: None.


The output of ROIAlignOp is a 4-D tensor with shape (num_boxes, channels, pooled_h, pooled_w). The data type is float32 or float64.


>>> import paddle
>>> from import roi_align

>>> data = paddle.rand([1, 256, 32, 32])
>>> boxes = paddle.rand([3, 4])
>>> boxes[:, 2] += boxes[:, 0] + 3
>>> boxes[:, 3] += boxes[:, 1] + 4
>>> boxes_num = paddle.to_tensor([3]).astype('int32')
>>> align_out = roi_align(data, boxes, boxes_num, output_size=3)
>>> print(align_out.shape)
[3, 256, 3, 3]