dropout

paddle.nn.functional. dropout ( x, p=0.5, axis=None, training=True, mode='upscale_in_train', name=None ) [source]

Dropout is a regularization technique for reducing overfitting by preventing neuron co-adaption during training. The dropout operator randomly sets the outputs of some units to zero, while upscale others according to the given dropout probability.

Parameters
  • x (Tensor) – The input tensor. The data type is float16, float32 or float64.

  • p (float|int, optional) – Probability of setting units to zero. Default: 0.5.

  • axis (int|list|tuple, optional) – The axis along which the dropout is performed. Default: None.

  • training (bool, optional) – A flag indicating whether it is in train phrase or not. Default: True.

  • mode (str, optional) –

    [‘upscale_in_train’(default) | ‘downscale_in_infer’].

    1. upscale_in_train (default), upscale the output at training time

      • train: \(out = input \times \frac{mask}{(1.0 - dropout\_prob)}\)

      • inference: \(out = input\)

    2. downscale_in_infer, downscale the output at inference

      • train: \(out = input \times mask\)

      • inference: \(out = input \times (1.0 - dropout\_prob)\)

  • name (str, optional) – Name for the operation, Default: None. For more information, please refer to Name.

Returns

A Tensor representing the dropout, has same shape and data type as x .

Examples

We use p=0.5 in the following description for simplicity.

  1. When axis=None , this is commonly used dropout, which dropout each element of x randomly.

Let's see a simple case when x is a 2d tensor with shape 2*3:
[[1 2 3]
 [4 5 6]]
we generate mask with the same shape as x, which is 2*3. The value of mask is
sampled from a Bernoulli distribution randomly. For example, we may get such mask:
[[0 1 0]
 [1 0 1]]
So the output is obtained from elementwise multiply of x and mask:
[[0 2 0]
 [4 0 6]]
Using default setting, i.e. ``mode='upscale_in_train'`` ,
if in training phase, the final upscale output is:
[[0 4 0 ]
 [8 0 12]]
if in test phase, the output is the same as input:
[[1 2 3]
 [4 5 6]]
we can also set ``mode='downscale_in_infer'`` , then
if in training phase, the final output is:
[[0 2 0]
 [4 0 6]]
if in test phase, the scale output is:
[[0.5 1.  1.5]
 [2.  2.5 3. ]]
  1. When axis!=None , this is useful for dropping whole channels from an image or sequence.

Let's see the simple case when x is a 2d tensor with shape 2*3 again:
[[1 2 3]
 [4 5 6]]
(1) If ``axis=0`` , this means the dropout is only performed in axis `0` .
    we generate mask with the shape 2*1. Only in axis `0` the value is randomly selected.
    For example, we may get such mask:
    [[1]
     [0]]
    The output is obtained from elementwise multiply of x and mask. Doing that the mask will be
    broadcast from 2*1 to 2*3:
    [[1 1 1]
     [0 0 0]]
    and the result after elementwise multiply is:
    [[1 2 3]
     [0 0 0]]
    then we can do upscale or downscale according to the setting of other arguments.
(2) If ``axis=1`` , this means the dropout is only performed in axis `1` .
    we generate mask with the shape 1*3. Only in axis `1` the value is randomly selected.
    For example, we may get such mask:
    [[1 0 1]]
    Doing elementwise multiply the mask will be broadcast from 1*3 to 2*3:
    [[1 0 1]
     [1 0 1]]
    and the result after elementwise multiply is:
    [[1 0 3]
     [4 0 6]]
(3) What about ``axis=[0, 1]`` ? This means the dropout is performed in all axes of x,
    which is the same case as default setting ``axis=None`` .
(4) You may note that logically `axis=None` means the dropout is performed in none axis of x,
    We generate mask with the shape 1*1. Whole input is randomly selected or dropped.
    For example, we may get such mask:
    [[0]]
    Doing elementwise multiply the mask will be broadcast from 1*1 to 2*3:
    [[0 0 0]
     [0 0 0]]
    and the result after elementwise multiply is:
    [[0 0 0]
     [0 0 0]]
    Actually this is not what we want because all elements may set to zero~

When x is a 4d tensor with shape NCHW, where N is batch size, C is the number of channels, H and W are the height and width of the feature, we can set axis=[0,1] and the dropout will be performed in channel N and C, H and W is tied, i.e. paddle.nn.dropout(x, p, axis=[0,1]) . Please refer to paddle.nn.functional.dropout2d for more details. Similarly, when x is a 5d tensor with shape NCDHW, where D is the depth of the feature, we can set axis=[0,1] to perform dropout3d. Please refer to paddle.nn.functional.dropout3d for more details.

>>> import paddle
>>> paddle.seed(2023)
>>> x = paddle.to_tensor([[1,2,3], [4,5,6]]).astype(paddle.float32)
>>> y_train = paddle.nn.functional.dropout(x, 0.5)
>>> y_test = paddle.nn.functional.dropout(x, 0.5, training=False)
>>> y_0 = paddle.nn.functional.dropout(x, axis=0)
>>> y_1 = paddle.nn.functional.dropout(x, axis=1)
>>> y_01 = paddle.nn.functional.dropout(x, axis=[0,1])
>>> print(x)
Tensor(shape=[2, 3], dtype=float32, place=Place(cpu), stop_gradient=True,
[[1., 2., 3.],
 [4., 5., 6.]])
>>> print(y_train)
Tensor(shape=[2, 3], dtype=float32, place=Place(cpu), stop_gradient=True,
[[2., 4., 0.],
[8., 0., 0.]])
>>> print(y_test)
Tensor(shape=[2, 3], dtype=float32, place=Place(cpu), stop_gradient=True,
[[1., 2., 3.],
 [4., 5., 6.]])
>>> print(y_0)
Tensor(shape=[2, 3], dtype=float32, place=Place(cpu), stop_gradient=True,
[[2., 4., 6.],
 [8. , 10., 12.]])
>>> print(y_1)
Tensor(shape=[2, 3], dtype=float32, place=Place(cpu), stop_gradient=True,
[[2. , 4. , 6. ],
 [8. , 10., 12.]])
>>> print(y_01)
Tensor(shape=[2, 3], dtype=float32, place=Place(cpu), stop_gradient=True,
[[0., 0., 6.],
 [0., 0., 0.]])