LayerNorm

class paddle.fluid.dygraph.LayerNorm(normalized_shape, scale=True, shift=True, epsilon=1e-05, param_attr=None, bias_attr=None, act=None, dtype='float32')[source]

This interface is used to construct a callable object of the LayerNorm class. For more details, refer to code examples. It implements the function of the Layer Normalization Layer and can be applied to mini-batch input data. Refer to Layer Normalization

The formula is as follows:

\[ \begin{align}\begin{aligned}\mu & = \frac{1}{H}\sum_{i=1}^{H} x_i\\\sigma & = \sqrt{\frac{1}{H}\sum_{i=1}^{H}{(x_i - \mu)^2} + \epsilon}\\y & = f(\frac{g}{\sigma}(x - \mu) + b)\end{aligned}\end{align} \]
  • \(x\): the vector representation of the summed inputs to the neurons in that layer.

  • \(H\): the number of hidden units in a layers

  • \(\epsilon\): the small value added to the variance to prevent division by zero.

  • \(g\): the trainable scale parameter.

  • \(b\): the trainable bias parameter.

Parameters
  • normalized_shape (int or list or tuple) – Input shape from an expected input of size \([*, normalized_shape[0], normalized_shape[1], ..., normalized_shape[-1]]\). If it is a single integer, this module will normalize over the last dimension which is expected to be of that specific size.

  • scale (bool, optional) – Whether to learn the adaptive gain \(g\) after normalization. Default: True.

  • shift (bool, optional) – Whether to learn the adaptive bias \(b\) after normalization. Default: True.

  • epsilon (float, optional) – The small value added to the variance to prevent division by zero. Default: 1e-05.

  • param_attr (ParamAttr, optional) – The parameter attribute for the learnable gain \(g\). If scale is False, param_attr is omitted. If scale is True and param_attr is None, a default ParamAttr would be added as scale. The param_attr is initialized as 1 if it is added. Default: None.

  • bias_attr (ParamAttr, optional) – The parameter attribute for the learnable bias \(b\). If shift is False, bias_attr is omitted. If shift is True and param_attr is None, a default ParamAttr would be added as bias. The bias_attr is initialized as 0 if it is added. Default: None.

  • act (str, optional) – Activation to be applied to the output of layer normalization. Default: None.

  • dtype (str, optional) – Data type, it can be “float32” or “float64”. Default: “float32”.

Returns

None

Examples

import paddle.fluid as fluid
from paddle.fluid.dygraph.base import to_variable
import numpy

x = numpy.random.random((3, 32, 32)).astype('float32')
with fluid.dygraph.guard():
    x = to_variable(x)
    layerNorm = fluid.LayerNorm([32, 32])
    ret = layerNorm(x)
forward(input)

Defines the computation performed at every call. Should be overridden by all subclasses.

Parameters
  • *inputs (tuple) – unpacked tuple arguments

  • **kwargs (dict) – unpacked dict arguments