paddle.fluid.layers.detection. generate_mask_labels ( im_info, gt_classes, is_crowd, gt_segms, rois, labels_int32, num_classes, resolution ) [source]

Generate Mask Labels for Mask-RCNN

This operator can be, for given the RoIs and corresponding labels, to sample foreground RoIs. This mask branch also has a :math: K \times M^{2} dimensional output targets for each foreground RoI, which encodes K binary masks of resolution M x M, one for each of the K classes. This mask targets are used to compute loss of mask branch.

Please note, the data format of groud-truth segmentation, assumed the segmentations are as follows. The first instance has two gt objects. The second instance has one gt object, this object has two gt segmentations.

#  [[[229.14, 370.9, 229.14, 370.9, ...]],
#   [[343.7, 139.85, 349.01, 138.46, ...]]], # 0-th instance
#  [[[500.0, 390.62, ...],[115.48, 187.86, ...]]] # 1-th instance

batch_masks = []
for semgs in batch_semgs:
    gt_masks = []
    for semg in semgs:
        gt_segm = []
        for polys in semg:
            gt_segm.append(np.array(polys).reshape(-1, 2))

place = fluid.CPUPlace()
feeder = fluid.DataFeeder(place=place, feed_list=feeds)
  • im_info (Variable) – A 2-D Tensor with shape [N, 3] and float32 data type. N is the batch size, each element is [height, width, scale] of image. Image scale is target_size / original_size, target_size is the size after resize, original_size is the original image size.

  • gt_classes (Variable) – A 2-D LoDTensor with shape [M, 1]. Data type should be int. M is the total number of ground-truth, each element is a class label.

  • is_crowd (Variable) – A 2-D LoDTensor with same shape and same data type as gt_classes, each element is a flag indicating whether a groundtruth is crowd.

  • gt_segms (Variable) – This input is a 2D LoDTensor with shape [S, 2] and float32 data type, it’s LoD level is 3. Usually users do not needs to understand LoD, The users should return correct data format in reader. The LoD[0] represents the ground-truth objects number of each instance. LoD[1] represents the segmentation counts of each objects. LoD[2] represents the polygons number of each segmentation. S the total number of polygons coordinate points. Each element is (x, y) coordinate points.

  • rois (Variable) – A 2-D LoDTensor with shape [R, 4] and float32 data type float32. R is the total number of RoIs, each element is a bounding box with (xmin, ymin, xmax, ymax) format in the range of original image.

  • labels_int32 (Variable) – A 2-D LoDTensor in shape of [R, 1] with type of int32. R is the same as it in rois. Each element represents a class label of a RoI.

  • num_classes (int) – Class number.

  • resolution (int) – Resolution of mask predictions.


A 2D LoDTensor with shape [P, 4] and same data type as rois. P is the total number of sampled RoIs. Each element is a bounding box with [xmin, ymin, xmax, ymax] format in range of original image size.

mask_rois_has_mask_int32 (Variable): A 2D LoDTensor with shape [P, 1] and int data type, each element represents the output mask RoI index with regard to input RoIs.

mask_int32 (Variable): A 2D LoDTensor with shape [P, K * M * M] and int data type, K is the classes number and M is the resolution of mask predictions. Each element represents the binary mask targets.

Return type

mask_rois (Variable)


import paddle.fluid as fluid

im_info ="im_info", shape=[None, 3],
gt_classes ="gt_classes", shape=[None, 1],
    dtype="float32", lod_level=1)
is_crowd ="is_crowd", shape=[None, 1],
    dtype="float32", lod_level=1)
gt_masks ="gt_masks", shape=[None, 2],
    dtype="float32", lod_level=3)
# rois, roi_labels can be the output of
# fluid.layers.generate_proposal_labels.
rois ="rois", shape=[None, 4],
    dtype="float32", lod_level=1)
roi_labels ="roi_labels", shape=[None, 1],
    dtype="int32", lod_level=1)
mask_rois, mask_index, mask_int32 = fluid.layers.generate_mask_labels(