fused_bias_dropout_residual_layer_norm¶
- paddle.incubate.nn.functional. fused_bias_dropout_residual_layer_norm ( x, residual, bias=None, ln_scale=None, ln_bias=None, dropout_rate=0.5, ln_epsilon=1e-05, training=True, mode='upscale_in_train', name=None ) [source]
-
The fused_bias_dropout_residual_layer_norm operator. The pseudo code is as follows:
>>> y = layer_norm(residual + dropout(bias + x))
- Parameters
-
x (Tensor) – The input tensor. The shape is [*, embed_dim].
residual (Tensor) – The residual tensor. The shape is same as x.
bias (Tensor, optional) – The bias of linear. The shape is [embed_dim]. Default None.
ln_scale (Tensor, optional) – The weight tensor of layernorm. The shape is [embed_dim]. Default None.
ln_bias (Tensor, optional) – The bias tensor of layernorm. The shape is [embed_dim]. Default None.
dropout_rate (float, optional) – The dropout probability used on attention weights to drop some attention targets for the dropout after attention. 0 for no dropout. Default 0.5.
ln_epsilon (float, optional) – Small float value added to denominator of layer_norm to avoid dividing by zero. Default is 1e-5.
training (bool, optional) – A flag indicating whether it is in train phrase or not. Default True.
mode (str, optional) –
[‘upscale_in_train’(default) | ‘downscale_in_infer’]
upscale_in_train(default), upscale the output at training time
train: out = input * mask / ( 1.0 - p )
inference: out = input
downscale_in_infer, downscale the output at inference
train: out = input * mask
inference: out = input * (1.0 - p)
name (str, optional) – Name for the operation (optional, default is None). For more information, please refer to Name.
- Returns
-
Tensor, The output Tensor, the data type and shape is same as x.
Examples
>>> >>> import paddle >>> paddle.device.set_device('gpu') >>> import paddle.incubate.nn.functional as F >>> # input: [batch_size, seq_len, embed_dim] >>> x = paddle.rand(shape=(2, 4, 128), dtype="float32") >>> # residual: [batch_size, seq_len, embed_dim] >>> residual = paddle.rand(shape=(2, 4, 128), dtype="float32") >>> # linear bias: [embed_dim] >>> bias = paddle.rand(shape=[128], dtype="float32") >>> # output: [batch_size, seq_len, embed_dim] >>> output = F.fused_bias_dropout_residual_layer_norm( ... x, residual, bias) >>> print(output.shape) [2, 4, 128]