LayerNorm¶
-
class
paddle.fluid.dygraph.
LayerNorm
(normalized_shape, scale=True, shift=True, epsilon=1e-05, param_attr=None, bias_attr=None, act=None, dtype='float32')[source] This interface is used to construct a callable object of the
LayerNorm
class. For more details, refer to code examples. It implements the function of the Layer Normalization Layer and can be applied to mini-batch input data. Refer to Layer NormalizationThe formula is as follows:
\[ \begin{align}\begin{aligned}\mu & = \frac{1}{H}\sum_{i=1}^{H} x_i\\\sigma & = \sqrt{\frac{1}{H}\sum_{i=1}^{H}{(x_i - \mu)^2} + \epsilon}\\y & = f(\frac{g}{\sigma}(x - \mu) + b)\end{aligned}\end{align} \]\(x\): the vector representation of the summed inputs to the neurons in that layer.
\(H\): the number of hidden units in a layers
\(\epsilon\): the small value added to the variance to prevent division by zero.
\(g\): the trainable scale parameter.
\(b\): the trainable bias parameter.
- Parameters
normalized_shape (int or list or tuple) – Input shape from an expected input of size \([*, normalized_shape[0], normalized_shape[1], ..., normalized_shape[-1]]\). If it is a single integer, this module will normalize over the last dimension which is expected to be of that specific size.
scale (bool, optional) – Whether to learn the adaptive gain \(g\) after normalization. Default: True.
shift (bool, optional) – Whether to learn the adaptive bias \(b\) after normalization. Default: True.
epsilon (float, optional) – The small value added to the variance to prevent division by zero. Default: 1e-05.
param_attr (ParamAttr, optional) – The parameter attribute for the learnable gain \(g\). If
scale
is False,param_attr
is omitted. Ifscale
is True andparam_attr
is None, a defaultParamAttr
would be added as scale. Theparam_attr
is initialized as 1 if it is added. Default: None.bias_attr (ParamAttr, optional) – The parameter attribute for the learnable bias \(b\). If
shift
is False,bias_attr
is omitted. Ifshift
is True andparam_attr
is None, a defaultParamAttr
would be added as bias. Thebias_attr
is initialized as 0 if it is added. Default: None.act (str, optional) – Activation to be applied to the output of layer normalization. Default: None.
dtype (str, optional) – Data type, it can be “float32” or “float64”. Default: “float32”.
- Returns
None
Examples
import paddle.fluid as fluid from paddle.fluid.dygraph.base import to_variable import numpy x = numpy.random.random((3, 32, 32)).astype('float32') with fluid.dygraph.guard(): x = to_variable(x) layerNorm = fluid.LayerNorm([32, 32]) ret = layerNorm(x)
-
forward
(input) Defines the computation performed at every call. Should be overridden by all subclasses.
- Parameters
*inputs (tuple) – unpacked tuple arguments
**kwargs (dict) – unpacked dict arguments