batch_norm

Note: This API is only avaliable in [Static Graph] mode

paddle.fluid.layers.batch_norm(input, act=None, is_test=False, momentum=0.9, epsilon=1e-05, param_attr=None, bias_attr=None, data_layout='NCHW', in_place=False, name=None, moving_mean_name=None, moving_variance_name=None, do_model_average_for_mean_and_var=True, use_global_stats=False)[source]

Batch Normalization Layer

Can be used as a normalizer function for convolution or fully_connected operations. The required data format for this layer is one of the following:

  1. NHWC [batch, in_height, in_width, in_channels]

  2. NCHW [batch, in_channels, in_height, in_width]

Refer to Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift for more details.

\(input\) is the input features over a mini-batch.

\[ \begin{align}\begin{aligned}\begin{split}\mu_{\beta} &\gets \frac{1}{m} \sum_{i=1}^{m} x_i \qquad &//\ \ mini-batch\ mean \\ \sigma_{\beta}^{2} &\gets \frac{1}{m} \sum_{i=1}^{m}(x_i - \ \mu_{\beta})^2 \qquad &//\ mini-batch\ variance \\ \hat{x_i} &\gets \frac{x_i - \mu_\beta} {\sqrt{\ \sigma_{\beta}^{2} + \epsilon}} \qquad &//\ normalize \\ y_i &\gets \gamma \hat{x_i} + \beta \qquad &//\ scale\ and\ shift\end{split}\\\begin{split}moving\_mean = moving\_mean * momentum + mini-batch\_mean * (1. - momentum) \\ moving\_var = moving\_var * momentum + mini-batch\_var * (1. - momentum)\end{split}\end{aligned}\end{align} \]

moving_mean is global mean and moving_var is global variance.

When use_global_stats = True, the \(\mu_{\beta}\) and \(\sigma_{\beta}^{2}\) are not the statistics of one mini-batch. They are global (or running) statistics. (It usually got from the pre-trained model.) The training and testing (or inference) have the same behavior:

\[\begin{split}\hat{x_i} &\gets \frac{x_i - \mu_\beta} {\sqrt{\ \sigma_{\beta}^{2} + \epsilon}} \\ y_i &\gets \gamma \hat{x_i} + \beta\end{split}\]

Note

if build_strategy.sync_batch_norm=True, the batch_norm in network will use sync_batch_norm automatically. is_test = True can only be used in test program and inference program, is_test CANNOT be set to True in train program, if you want to use global status from pre_train model in train program, please set use_global_stats = True.

Parameters
  • input (Variable) – The rank of input variable can be 2, 3, 4, 5. The data type is float16 or float32 or float64.

  • act (string, Default None) – Activation type, linear|relu|prelu|…

  • is_test (bool, Default False) – A flag indicating whether it is in test phrase or not.

  • momentum (float|Variable, Default 0.9) – The value used for the moving_mean and moving_var computation. This should be a float number or a Variable with shape [1] and data type as float32. The updated formula is: \(moving\_mean = moving\_mean * momentum + new\_mean * (1. - momentum)\) \(moving\_var = moving\_var * momentum + new\_var * (1. - momentum)\) Default is 0.9.

  • epsilon (float, Default 1e-05) – A value added to the denominator for numerical stability. Default is 1e-5.

  • param_attr (ParamAttr|None) – The parameter attribute for Parameter scale of batch_norm. If it is set to None or one attribute of ParamAttr, batch_norm will create ParamAttr as param_attr, the name of scale can be set in ParamAttr. If the Initializer of the param_attr is not set, the parameter is initialized with Xavier. Default: None.

  • bias_attr (ParamAttr|None) – The parameter attribute for the bias of batch_norm. If it is set to None or one attribute of ParamAttr, batch_norm will create ParamAttr as bias_attr, the name of bias can be set in ParamAttr. If the Initializer of the bias_attr is not set, the bias is initialized zero. Default: None.

  • data_layout (str, optional) – Specify the data format of the input, and the data format of the output will be consistent with that of the input. An optional string from: “NCHW”, “NHWC”. The default is “NCHW”. When it is “NCHW”, the data is stored in the order of: [batch_size, input_channels, input_height, input_width].

  • in_place (bool, Default False) – Make the input and output of batch norm reuse memory.

  • name (str|None) – For detailed information, please refer to Name. Usually name is no need to set and None by default.

  • moving_mean_name (str, Default None) – The name of moving_mean which store the global Mean. If it is set to None, batch_norm will save global mean with a random name, otherwise, batch_norm will save global mean with the string.

  • moving_variance_name (str, Default None) – The name of the moving_variance which store the global Variance. If it is set to None, batch_norm will save global variance with a random name, otherwise, batch_norm will save global variance with the string.

  • do_model_average_for_mean_and_var (bool, Default True) – Whether parameter mean and variance should do model average when model average is enabled.

  • use_global_stats (bool, Default False) – Whether to use global mean and variance. In inference or test mode, set use_global_stats to true or is_test to true, and the behavior is equivalent. In train mode, when setting use_global_stats True, the global mean and variance are also used during train period.

Returns

A Variable holding Tensor which is the result after applying batch normalization on the input, has same shape and data type with input.

Examples

import paddle.fluid as fluid
x = fluid.data(name='x', shape=[3, 7, 3, 7], dtype='float32')
hidden1 = fluid.layers.fc(input=x, size=200, param_attr='fc1.w')
hidden2 = fluid.layers.batch_norm(input=hidden1)
# batch_norm with momentum as Variable
import paddle.fluid as fluid
import paddle.fluid.layers.learning_rate_scheduler as lr_scheduler

def get_decay_momentum(momentum_init, decay_steps, decay_rate):
    global_step = lr_scheduler._decay_step_counter()
    momentum = fluid.layers.create_global_var(
        shape=[1],
        value=float(momentum_init),
        dtype='float32',
        # set persistable for save checkpoints and resume
        persistable=True,
        name="momentum")
    div_res = global_step / decay_steps
    decayed_momentum = momentum_init * (decay_rate**div_res)
    fluid.layers.assign(decayed_momentum, momentum)

    return momentum

x = fluid.data(name='x', shape=[3, 7, 3, 7], dtype='float32')
hidden1 = fluid.layers.fc(input=x, size=200, param_attr='fc1.w')
momentum = get_decay_momentum(0.9, 1e5, 0.9)
hidden2 = fluid.layers.batch_norm(input=hidden1, momentum=momentum)