# deformable_conv¶

api_attr

declarative programming (static graph)

Deformable Convolution op

Compute 2-D deformable convolution on 4-D input. Given input image x, output feature map y, the deformable convolution operation can be expressed as follow:

Deformable Convolution v2:

$y(p) = \sum_{k=1}^{K}{w_k * x(p + p_k + \Delta p_k) * \Delta m_k}$

Deformable Convolution v1:

$y(p) = \sum_{k=1}^{K}{w_k * x(p + p_k + \Delta p_k)}$

Where $$\Delta p_k$$ and $$\Delta m_k$$ are the learnable offset and modulation scalar for the k-th location, Which $$\Delta m_k$$ is one in deformable convolution v1. Please refer to Deformable ConvNets v2: More Deformable, Better Results and Deformable Convolutional Networks.

Example

• Input:

Input shape: $$(N, C_{in}, H_{in}, W_{in})$$

Filter shape: $$(C_{out}, C_{in}, H_f, W_f)$$

Offset shape: $$(N, 2 * deformable\_groups * H_f * H_w, H_{in}, W_{in})$$

Mask shape: $$(N, deformable\_groups * H_f * H_w, H_{in}, W_{in})$$

• Output:

Output shape: $$(N, C_{out}, H_{out}, W_{out})$$

Where

$\begin{split}H_{out}&= \frac{(H_{in} + 2 * paddings[0] - (dilations[0] * (H_f - 1) + 1))}{strides[0]} + 1 \\ W_{out}&= \frac{(W_{in} + 2 * paddings[1] - (dilations[1] * (W_f - 1) + 1))}{strides[1]} + 1\end{split}$
Parameters
• input (Variable) – The input image with [N, C, H, W] format. A Tensor with type float32, float64.

• offset (Variable) – The input coordinate offset of deformable convolution layer. A Tensor with type float32, float64.

• Mask (Variable, Optional) – The input mask of deformable convolution layer. A Tensor with type float32, float64. It should be None when you use deformable convolution v1.

• num_filters (int) – The number of filter. It is as same as the output image channel.

• filter_size (int|tuple) – The filter size. If filter_size is a tuple, it must contain two integers, (filter_size_H, filter_size_W). Otherwise, the filter will be a square.

• stride (int|tuple) – The stride size. If stride is a tuple, it must contain two integers, (stride_H, stride_W). Otherwise, the stride_H = stride_W = stride. Default: stride = 1.

• dilation (int|tuple) – The dilation size. If dilation is a tuple, it must contain two integers, (dilation_H, dilation_W). Otherwise, the dilation_H = dilation_W = dilation. Default: dilation = 1.

• groups (int) – The groups number of the deformable conv layer. According to grouped convolution in Alex Krizhevsky’s Deep CNN paper: when group=2, the first half of the filters is only connected to the first half of the input channels, while the second half of the filters is only connected to the second half of the input channels. Default: groups=1.

• deformable_groups (int) – The number of deformable group partitions. Default: deformable_groups = 1.

• im2col_step (int) – Maximum number of images per im2col computation; The total batch size should be devisable by this value or smaller than this value; if you face out of memory problem, you can try to use a smaller value here. Default: im2col_step = 64.

• param_attr (ParamAttr, Optional) – The parameter attribute for learnable parameters/weights of deformable conv. If it is set to None or one attribute of ParamAttr, deformable conv will create ParamAttr as param_attr. If the Initializer of the param_attr is not set, the parameter is initialized with $$Normal(0.0, std)$$, and the $$std$$ is $$(\frac{2.0 }{filter\_elem\_num})^{0.5}$$. Default: None.

• bias_attr (ParamAttr|bool, Optional) – The parameter attribute for the bias of deformable conv layer. If it is set to False, no bias will be added to the output units. If it is set to None or one attribute of ParamAttr, conv2d will create ParamAttr as bias_attr. If the Initializer of the bias_attr is not set, the bias is initialized zero. Default: None.

• modulated (bool) – Make sure which version should be used between v1 and v2, where v2 is used while True. Default: True.

• name (str, Optional) – For details, please refer to Name. Generally, no setting is required. Default: None.

Returns

The tensor variable storing the deformable convolution result. A Tensor with type float32, float64.

Return type

Variable

Raises

ValueError – If the shapes of input, filter_size, stride, padding and groups mismatch.

Examples

#deformable conv v2:

C_in, H_in, W_in = 3, 32, 32
filter_size, deformable_groups = 3, 1
data = fluid.data(name='data', shape=[None, C_in, H_in, W_in], dtype='float32')
offset = fluid.data(name='offset', shape=[None, 2*deformable_groups*filter_size**2, H_in, W_in], dtype='float32')