multi_box_head

paddle.fluid.layers.multi_box_head(inputs, image, base_size, num_classes, aspect_ratios, min_ratio=None, max_ratio=None, min_sizes=None, max_sizes=None, steps=None, step_w=None, step_h=None, offset=0.5, variance=[0.1, 0.1, 0.2, 0.2], flip=True, clip=False, kernel_size=1, pad=0, stride=1, name=None, min_max_aspect_ratios_order=False)[source]

Base on SSD ((Single Shot MultiBox Detector) algorithm, generate prior boxes, regression location and classification confidence on multiple input feature maps, then output the concatenate results. The details of this algorithm, please refer the section 2.2 of SSD paper SSD: Single Shot MultiBox Detector .

Parameters
  • inputs (list(Variable)|tuple(Variable)) – The list of input variables, the format of all Variables are 4-D Tensor, layout is NCHW. Data type should be float32 or float64.

  • image (Variable) – The input image, layout is NCHW. Data type should be the same as inputs.

  • base_size (int) –

    the base_size is input image size. When len(inputs) > 2 and min_size and max_size are None, the min_size and max_size are calculated by baze_size, ‘min_ratio’ and max_ratio. The formula is as follows:

    min_sizes = []
    max_sizes = []
    step = int(math.floor(((max_ratio - min_ratio)) / (num_layer - 2)))
    for ratio in six.moves.range(min_ratio, max_ratio + 1, step):
        min_sizes.append(base_size * ratio / 100.)
        max_sizes.append(base_size * (ratio + step) / 100.)
        min_sizes = [base_size * .10] + min_sizes
        max_sizes = [base_size * .20] + max_sizes
    

  • num_classes (int) – The number of classes.

  • aspect_ratios (list(float) | tuple(float)) – the aspect ratios of generated prior boxes. The length of input and aspect_ratios must be equal.

  • min_ratio (int) – the min ratio of generated prior boxes.

  • max_ratio (int) – the max ratio of generated prior boxes.

  • min_sizes (list|tuple|None) – If len(inputs) <=2, min_sizes must be set up, and the length of min_sizes should equal to the length of inputs. Default: None.

  • max_sizes (list|tuple|None) – If len(inputs) <=2, max_sizes must be set up, and the length of min_sizes should equal to the length of inputs. Default: None.

  • steps (list|tuple) – If step_w and step_h are the same, step_w and step_h can be replaced by steps.

  • step_w (list|tuple) – Prior boxes step across width. If step_w[i] == 0.0, the prior boxes step across width of the inputs[i] will be automatically calculated. Default: None.

  • step_h (list|tuple) – Prior boxes step across height, If step_h[i] == 0.0, the prior boxes step across height of the inputs[i] will be automatically calculated. Default: None.

  • offset (float) – Prior boxes center offset. Default: 0.5

  • variance (list|tuple) – the variances to be encoded in prior boxes. Default:[0.1, 0.1, 0.2, 0.2].

  • flip (bool) – Whether to flip aspect ratios. Default:False.

  • clip (bool) – Whether to clip out-of-boundary boxes. Default: False.

  • kernel_size (int) – The kernel size of conv2d. Default: 1.

  • pad (int|list|tuple) – The padding of conv2d. Default:0.

  • stride (int|list|tuple) – The stride of conv2d. Default:1,

  • name (str) – The default value is None. Normally there is no need for user to set this property. For more information, please refer to Name.

  • min_max_aspect_ratios_order (bool) – If set True, the output prior box is in order of [min, max, aspect_ratios], which is consistent with Caffe. Please note, this order affects the weights order of convolution layer followed by and does not affect the fininal detection results. Default: False.

Returns

A tuple with four Variables. (mbox_loc, mbox_conf, boxes, variances)

mbox_loc (Variable): The predicted boxes’ location of the inputs. The layout is [N, num_priors, 4], where N is batch size, num_priors is the number of prior boxes. Data type is the same as input.

mbox_conf (Variable): The predicted boxes’ confidence of the inputs. The layout is [N, num_priors, C], where N and num_priors has the same meaning as above. C is the number of Classes. Data type is the same as input.

boxes (Variable): the output prior boxes. The layout is [num_priors, 4]. The meaning of num_priors is the same as above. Data type is the same as input.

variances (Variable): the expanded variances for prior boxes. The layout is [num_priors, 4]. Data type is the same as input.

Return type

tuple

Examples 1: set min_ratio and max_ratio:
import paddle.fluid as fluid

images = fluid.data(name='data', shape=[None, 3, 300, 300], dtype='float32')
conv1 = fluid.data(name='conv1', shape=[None, 512, 19, 19], dtype='float32')
conv2 = fluid.data(name='conv2', shape=[None, 1024, 10, 10], dtype='float32')
conv3 = fluid.data(name='conv3', shape=[None, 512, 5, 5], dtype='float32')
conv4 = fluid.data(name='conv4', shape=[None, 256, 3, 3], dtype='float32')
conv5 = fluid.data(name='conv5', shape=[None, 256, 2, 2], dtype='float32')
conv6 = fluid.data(name='conv6', shape=[None, 128, 1, 1], dtype='float32')

mbox_locs, mbox_confs, box, var = fluid.layers.multi_box_head(
  inputs=[conv1, conv2, conv3, conv4, conv5, conv6],
  image=images,
  num_classes=21,
  min_ratio=20,
  max_ratio=90,
  aspect_ratios=[[2.], [2., 3.], [2., 3.], [2., 3.], [2.], [2.]],
  base_size=300,
  offset=0.5,
  flip=True,
  clip=True)
Examples 2: set min_sizes and max_sizes:
import paddle.fluid as fluid

images = fluid.data(name='data', shape=[None, 3, 300, 300], dtype='float32')
conv1 = fluid.data(name='conv1', shape=[None, 512, 19, 19], dtype='float32')
conv2 = fluid.data(name='conv2', shape=[None, 1024, 10, 10], dtype='float32')
conv3 = fluid.data(name='conv3', shape=[None, 512, 5, 5], dtype='float32')
conv4 = fluid.data(name='conv4', shape=[None, 256, 3, 3], dtype='float32')
conv5 = fluid.data(name='conv5', shape=[None, 256, 2, 2], dtype='float32')
conv6 = fluid.data(name='conv6', shape=[None, 128, 1, 1], dtype='float32')

mbox_locs, mbox_confs, box, var = fluid.layers.multi_box_head(
  inputs=[conv1, conv2, conv3, conv4, conv5, conv6],
  image=images,
  num_classes=21,
  min_sizes=[60.0, 105.0, 150.0, 195.0, 240.0, 285.0],
  max_sizes=[[], 150.0, 195.0, 240.0, 285.0, 300.0],
  aspect_ratios=[[2.], [2., 3.], [2., 3.], [2., 3.], [2.], [2.]],
  base_size=300,
  offset=0.5,
  flip=True,
  clip=True)