DeformConv2D

class paddle.vision.ops. DeformConv2D ( in_channels: int, out_channels: int, kernel_size: Size2, stride: Size2 = 1, padding: Size2 = 0, dilation: Size2 = 1, deformable_groups: int = 1, groups: int = 1, weight_attr: ParamAttrLike | None = None, bias_attr: ParamAttrLike | None = None ) [source]

Compute 2-D deformable convolution on 4-D input. Given input image x, output feature map y, the deformable convolution operation can be expressed as follow:

Deformable Convolution v2:

\[y(p) = \sum_{k=1}^{K}{w_k * x(p + p_k + \Delta p_k) * \Delta m_k}\]

Deformable Convolution v1:

\[y(p) = \sum_{k=1}^{K}{w_k * x(p + p_k + \Delta p_k)}\]

Where \(\Delta p_k\) and \(\Delta m_k\) are the learnable offset and modulation scalar for the k-th location, Which \(\Delta m_k\) is one in deformable convolution v1. Please refer to Deformable ConvNets v2: More Deformable, Better Results and Deformable Convolutional Networks.

Example

Input:

x shape: \((N, C_{in}, H_{in}, W_{in})\)

weight shape: \((C_{out}, C_{in}, H_f, W_f)\)

offset shape: \((N, 2 * H_f * W_f, H_{out}, W_{out})\)

mask shape: \((N, H_f * W_f, H_{out}, W_{out})\)
Output:

Output shape: \((N, C_{out}, H_{out}, W_{out})\)

Where

\[\begin{split}H_{out}&= \frac{(H_{in} + 2 * paddings[0] - (dilations[0] * (H_f - 1) + 1))}{strides[0]} + 1 \\ W_{out}&= \frac{(W_{in} + 2 * paddings[1] - (dilations[1] * (W_f - 1) + 1))}{strides[1]} + 1\end{split}\]

Parameters

in_channels (int) – The number of input channels in the input image.
out_channels (int) – The number of output channels produced by the convolution.
kernel_size (int|list|tuple) – The size of the convolving kernel.
stride (int|list|tuple, optional) – The stride size. If stride is a list/tuple, it must contain three integers, (stride_H, stride_W). Otherwise, the stride_H = stride_W = stride. The default value is 1.
padding (int|list|tuple, optional) – The padding size. If padding is a list/tuple, it must contain two integers, (padding_H, padding_W). Otherwise, the padding_H = padding_W = padding. Default: padding = 0.
dilation (int|list|tuple, optional) – The dilation size. If dilation is a list/tuple, it must contain three integers, (dilation_D, dilation_H, dilation_W). Otherwise, the dilation_D = dilation_H = dilation_W = dilation. The default value is 1.
deformable_groups (int, optional) – The number of deformable group partitions. Default: deformable_groups = 1.
groups (int, optional) – The groups number of the Conv3D Layer. According to grouped convolution in Alex Krizhevsky’s Deep CNN paper: when group=2, the first half of the filters is only connected to the first half of the input channels, while the second half of the filters is only connected to the second half of the input channels. The default value is 1.
weight_attr (ParamAttr|None, optional) – The parameter attribute for learnable parameters/weights of conv2d. If it is set to None or one attribute of ParamAttr, conv2d will create ParamAttr as param_attr. If it is set to None, the parameter is initialized with \(Normal(0.0, std)\), and the \(std\) is \((\frac{2.0 }{filter\_elem\_num})^{0.5}\). The default value is None.
bias_attr (ParamAttr|bool|None, optional) – The parameter attribute for the bias of conv2d. If it is set to False, no bias will be added to the output units. If it is set to None or one attribute of ParamAttr, conv2d will create ParamAttr as bias_attr. If the Initializer of the bias_attr is not set, the bias is initialized zero. The default value is None.

Attribute:

weight (Parameter): the learnable weights of filter of this layer. bias (Parameter or None): the learnable bias of this layer.

Shape:

x: \((N, C_{in}, H_{in}, W_{in})\)
offset: \((N, 2 * H_f * W_f, H_{out}, W_{out})\)
mask: \((N, H_f * W_f, H_{out}, W_{out})\)
output: \((N, C_{out}, H_{out}, W_{out})\)

Where

\[\begin{split}H_{out}&= \frac{(H_{in} + 2 * paddings[0] - (dilations[0] * (kernel\_size[0] - 1) + 1))}{strides[0]} + 1 \\ W_{out}&= \frac{(W_{in} + 2 * paddings[1] - (dilations[1] * (kernel\_size[1] - 1) + 1))}{strides[1]} + 1\end{split}\]

Examples

           >>> #deformable conv v2:
>>> import paddle
>>> input = paddle.rand((8, 1, 28, 28))
>>> kh, kw = 3, 3
>>> # offset shape should be [bs, 2 * kh * kw, out_h, out_w]
>>> # mask shape should be [bs, hw * hw, out_h, out_w]
>>> # In this case, for an input of 28, stride of 1
>>> # and kernel size of 3, without padding, the output size is 26
>>> offset = paddle.rand((8, 2 * kh * kw, 26, 26))
>>> mask = paddle.rand((8, kh * kw, 26, 26))
>>> deform_conv = paddle.vision.ops.DeformConv2D(
...     in_channels=1,
...     out_channels=16,
...     kernel_size=[kh, kw])
>>> out = deform_conv(input, offset, mask)
>>> print(out.shape)
paddle.Size([8, 16, 26, 26])

>>> #deformable conv v1:
>>> import paddle
>>> input = paddle.rand((8, 1, 28, 28))
>>> kh, kw = 3, 3
>>> # offset shape should be [bs, 2 * kh * kw, out_h, out_w]
>>> # mask shape should be [bs, hw * hw, out_h, out_w]
>>> # In this case, for an input of 28, stride of 1
>>> # and kernel size of 3, without padding, the output size is 26
>>> offset = paddle.rand((8, 2 * kh * kw, 26, 26))
>>> deform_conv = paddle.vision.ops.DeformConv2D(
...     in_channels=1,
...     out_channels=16,
...     kernel_size=[kh, kw])
>>> out = deform_conv(input, offset)
>>> print(out.shape)
paddle.Size([8, 16, 26, 26])

          

forward ( x: Tensor, offset: Tensor, mask: Tensor | None = None ) → Tensor forward¶

Defines the computation performed at every call. Should be overridden by all subclasses.

Parameters

*inputs (tuple) – unpacked tuple arguments
**kwargs (dict) – unpacked dict arguments