class paddle.fluid.optimizer. ExponentialMovingAverage ( decay=0.999, thres_steps=None, name=None ) [source]

Static Graph

Compute the moving average of parameters with exponential decay. Given a parameter \(\\theta\), its exponential moving average (EMA) will be

\[ \begin{align}\begin{aligned}\begin{split}\\text{EMA}_0 & = 0\end{split}\\\begin{split}\\text{EMA}_t & = \\text{decay} * \\text{EMA}_{t-1} + (1 - \\text{decay}) * \\theta_t\end{split}\end{aligned}\end{align} \]

The average results calculated by update() method will be saved in temporary variables which are created and maintained by the object, and can be applied to parameters of current model by calling apply() method. And the restore() method is used to restore the parameters.

Bias correction. All EMAs are initialized to \(0\) and hence they will be zero biased, which can be corrected by divided by a factor \((1 - \\text{decay}^t)\) , i.e., the actual EMAs applied to parameters when calling apply() method would be

\[\begin{split}\\widehat{\\text{EMA}}_t = \\frac{\\text{EMA}_t}{1 - \\text{decay}^t}\end{split}\]

Decay rate scheduling. A large decay rate very close to 1 would result in that the averages move very slowly. And a better strategy is to set a relative smaller decay rate in the very beginning. The argument thres_steps allows users to pass a Variable to schedule the decay rate, in this case, the actual decay rate becomes

\[\begin{split}\\min(\\text{decay}, \\frac{1 + \\text{thres_steps}}{10 + \\text{thres_steps}})\end{split}\]

Usually thres_steps can be the global training steps.

  • decay (float, optional) – The exponential decay rate, usually close to 1, such as 0.999, 0.9999, … . Default 0.999.

  • thres_steps (Variable|None) – If not None, schedule the decay rate. Default None.

  • name (str|None) – For detailed information, please refer to Name. Usually name is no need to set and None by default.


import numpy
import paddle
import paddle.fluid as fluid

data = fluid.data(name='x', shape=[-1, 5], dtype='float32')
hidden = fluid.layers.fc(input=data, size=10)
cost = fluid.layers.mean(hidden)

test_program = fluid.default_main_program().clone(for_test=True)

optimizer = fluid.optimizer.Adam(learning_rate=0.001)

global_steps = fluid.layers.autoincreased_step_counter()
ema = fluid.optimizer.ExponentialMovingAverage(0.999, thres_steps=global_steps)

place = fluid.CPUPlace()
exe = fluid.Executor(place)

for pass_id in range(3):
    for batch_id in range(6):
        data = numpy.random.random(size=(10, 5)).astype('float32')
            feed={'x': data},

    # usage 1
    with ema.apply(exe):
        data = numpy.random.random(size=(10, 5)).astype('float32')
                feed={'x': data},

     # usage 2
    with ema.apply(exe, need_restore=False):
        data = numpy.random.random(size=(10, 5)).astype('float32')
                feed={'x': data},
update ( )

Update Exponential Moving Average. Should only call this method in train program.

apply ( executor, need_restore=True )

Apply moving average to parameters for evaluation.

  • executor (Executor) – The Executor to execute applying.

  • need_restore (bool, optional) – Whether to restore parameters after applying. Default True.

restore ( executor )

Restore parameters.


executor (Executor) – The Executor to execute restoring.