fused_bias_dropout_residual_layer_norm

paddle.incubate.nn.functional. fused_bias_dropout_residual_layer_norm ( x, residual, bias=None, ln_scale=None, ln_bias=None, dropout_rate=0.5, ln_epsilon=1e-05, training=True, mode='upscale_in_train', name=None ) [source]

The fused_bias_dropout_residual_layer_norm operator. The pseudo code is as follows:

>>> y = layer_norm(residual + dropout(bias + x))
Parameters
  • x (Tensor) – The input tensor. The shape is [*, embed_dim].

  • residual (Tensor) – The residual tensor. The shape is same as x.

  • bias (Tensor, optional) – The bias of linear. The shape is [embed_dim]. Default None.

  • ln_scale (Tensor, optional) – The weight tensor of layernorm. The shape is [embed_dim]. Default None.

  • ln_bias (Tensor, optional) – The bias tensor of layernorm. The shape is [embed_dim]. Default None.

  • dropout_rate (float, optional) – The dropout probability used on attention weights to drop some attention targets for the dropout after attention. 0 for no dropout. Default 0.5.

  • ln_epsilon (float, optional) – Small float value added to denominator of layer_norm to avoid dividing by zero. Default is 1e-5.

  • training (bool, optional) – A flag indicating whether it is in train phrase or not. Default True.

  • mode (str, optional) –

    [‘upscale_in_train’(default) | ‘downscale_in_infer’]

    1. upscale_in_train(default), upscale the output at training time

      • train: out = input * mask / ( 1.0 - p )

      • inference: out = input

    2. downscale_in_infer, downscale the output at inference

      • train: out = input * mask

      • inference: out = input * (1.0 - p)

  • name (str, optional) – Name for the operation (optional, default is None). For more information, please refer to Name.

Returns

Tensor, The output Tensor, the data type and shape is same as x.

Examples

>>> 
>>> import paddle
>>> paddle.device.set_device('gpu')
>>> import paddle.incubate.nn.functional as F

>>> # input: [batch_size, seq_len, embed_dim]
>>> x = paddle.rand(shape=(2, 4, 128), dtype="float32")
>>> # residual: [batch_size, seq_len, embed_dim]
>>> residual = paddle.rand(shape=(2, 4, 128), dtype="float32")
>>> # linear bias: [embed_dim]
>>> bias = paddle.rand(shape=[128], dtype="float32")
>>> # output: [batch_size, seq_len, embed_dim]
>>> output = F.fused_bias_dropout_residual_layer_norm(
...     x, residual, bias)
>>> print(output.shape)
[2, 4, 128]