paddle.fluid.layers.detection. generate_mask_labels ( im_info, gt_classes, is_crowd, gt_segms, rois, labels_int32, num_classes, resolution ) [source]

This operator can be, for given the RoIs and corresponding labels, to sample foreground RoIs. This mask branch also has a :math: K \times M^{2} dimensional output targets for each foreground RoI, which encodes K binary masks of resolution M x M, one for each of the K classes. This mask targets are used to compute loss of mask branch.

Please note, the data format of groud-truth segmentation, assumed the segmentations are as follows. The first instance has two gt objects. The second instance has one gt object, this object has two gt segmentations.

#[
#  [[[229.14, 370.9, 229.14, 370.9, ...]],
#   [[343.7, 139.85, 349.01, 138.46, ...]]], # 0-th instance
#  [[[500.0, 390.62, ...],[115.48, 187.86, ...]]] # 1-th instance
#]

for semgs in batch_semgs:
for semg in semgs:
gt_segm = []
for polys in semg:
gt_segm.append(np.array(polys).reshape(-1, 2))

place = fluid.CPUPlace()
feeder = fluid.DataFeeder(place=place, feed_list=feeds)

Parameters
• im_info (Variable) – A 2-D Tensor with shape [N, 3] and float32 data type. N is the batch size, each element is [height, width, scale] of image. Image scale is target_size / original_size, target_size is the size after resize, original_size is the original image size.

• gt_classes (Variable) – A 2-D LoDTensor with shape [M, 1]. Data type should be int. M is the total number of ground-truth, each element is a class label.

• is_crowd (Variable) – A 2-D LoDTensor with same shape and same data type as gt_classes, each element is a flag indicating whether a groundtruth is crowd.

• gt_segms (Variable) – This input is a 2D LoDTensor with shape [S, 2] and float32 data type, it’s LoD level is 3. Usually users do not needs to understand LoD, The users should return correct data format in reader. The LoD[0] represents the ground-truth objects number of each instance. LoD[1] represents the segmentation counts of each objects. LoD[2] represents the polygons number of each segmentation. S the total number of polygons coordinate points. Each element is (x, y) coordinate points.

• rois (Variable) – A 2-D LoDTensor with shape [R, 4] and float32 data type float32. R is the total number of RoIs, each element is a bounding box with (xmin, ymin, xmax, ymax) format in the range of original image.

• labels_int32 (Variable) – A 2-D LoDTensor in shape of [R, 1] with type of int32. R is the same as it in rois. Each element represents a class label of a RoI.

• num_classes (int) – Class number.

• resolution (int) – Resolution of mask predictions.

Returns

A 2D LoDTensor with shape [P, 4] and same data type as rois. P is the total number of sampled RoIs. Each element is a bounding box with [xmin, ymin, xmax, ymax] format in range of original image size.

mask_rois_has_mask_int32 (Variable): A 2D LoDTensor with shape [P, 1] and int data type, each element represents the output mask RoI index with regard to input RoIs.

mask_int32 (Variable): A 2D LoDTensor with shape [P, K * M * M] and int data type, K is the classes number and M is the resolution of mask predictions. Each element represents the binary mask targets.

Return type

Examples

import paddle.fluid as fluid

im_info = fluid.data(name="im_info", shape=[None, 3],
dtype="float32")
gt_classes = fluid.data(name="gt_classes", shape=[None, 1],
dtype="float32", lod_level=1)
is_crowd = fluid.data(name="is_crowd", shape=[None, 1],
dtype="float32", lod_level=1)
dtype="float32", lod_level=3)
# rois, roi_labels can be the output of
# fluid.layers.generate_proposal_labels.
rois = fluid.data(name="rois", shape=[None, 4],
dtype="float32", lod_level=1)
roi_labels = fluid.data(name="roi_labels", shape=[None, 1],
dtype="int32", lod_level=1)