fused_bias_dropout_residual_layer_norm

paddle.incubate.nn.functional. fused_bias_dropout_residual_layer_norm ( x, residual, bias=None, ln_scale=None, ln_bias=None, dropout_rate=0.5, ln_epsilon=1e-05, training=True, mode='upscale_in_train', name=None ) [source]

The fused_bias_dropout_residual_layer_norm operator. The pseudo code is as follows:

y = layer_norm(residual + dropout(bias + x))
Parameters
  • x (Tensor) – The input tensor. The shape is [*, embed_dim].

  • residual (Tensor) – The residual tensor. The shape is same as x.

  • bias (Tensor, optional) – The bias of linear. The shape is [embed_dim]. Default None.

  • ln_scale (Tensor, optional) – The weight tensor of layernorm. The shape is [embed_dim]. Default None.

  • ln_bias (Tensor, optional) – The bias tensor of layernorm. The shape is [embed_dim]. Default None.

  • dropout_rate (float, optional) – The dropout probability used on attention weights to drop some attention targets for the dropout after attention. 0 for no dropout. Default 0.5.

  • ln_epsilon (float, optional) – Small float value added to denominator of layer_norm to avoid dividing by zero. Default is 1e-5.

  • training (bool, optional) – A flag indicating whether it is in train phrase or not. Default True.

  • mode (str, optional) –

    [‘upscale_in_train’(default) | ‘downscale_in_infer’]

    1. upscale_in_train(default), upscale the output at training time

      • train: out = input * mask / ( 1.0 - p )

      • inference: out = input

    2. downscale_in_infer, downscale the output at inference

      • train: out = input * mask

      • inference: out = input * (1.0 - p)

  • name (str, optional) – Name for the operation (optional, default is None). For more information, please refer to Name.

Returns

Tensor, The output Tensor, the data type and shape is same as x.

Examples

# required: gpu
import paddle
import paddle.incubate.nn.functional as F

# input: [batch_size, seq_len, embed_dim]
x = paddle.rand(shape=(2, 4, 128), dtype="float32")
# residual: [batch_size, seq_len, embed_dim]
residual = paddle.rand(shape=(2, 4, 128), dtype="float32")
# linear bias: [embed_dim]
bias = paddle.rand(shape=[128], dtype="float32")
# output: [batch_size, seq_len, embed_dim]
output = F.fused_bias_dropout_residual_layer_norm(
    x, residual, bias)
# [2, 4, 128]
print(output.shape)