Neural network in essence is a Optimization problem . With forward computing and back propagation , Optimizer use back-propagation gradients to optimize parameters in a neural network.


SGD is an offspring class of Optimizer implementing Random Gradient Descent which is a method of Gradient Descent . When it needs to train a large number of samples, we usually choose SGD to make loss function converge more quickly.

API Reference: api_fluid_optimizer_SGDOptimizer


Momentum optimizer adds momentum on the basis of SGD , reducing noise problem in the process of random gradient descent. You can set ues_nesterov as False or True, respectively corresponding to traditional Momentum(Section 4.1 in thesis) algorithm and Nesterov accelerated gradient(Section 4.2 in thesis) algorithm.

API Reference: api_fluid_optimizer_MomentumOptimizer

3. Adagrad/AdagradOptimizer

Adagrad Optimizer can adaptively allocate different learning rates for parameters to solve the problem of different sample sizes for different parameters.

API Reference: api_fluid_optimizer_AdagradOptimizer


RMSProp optimizer is a method to adaptively adjust learning rate. It mainly solves the problem of dramatic decrease of learning rate in the mid-term and end term of model training after Adagrad is used.

API Reference: api_fluid_optimizer_RMSPropOptimizer


Optimizer of Adam is a method to adaptively adjust learning rate, fit for most non- convex optimization , big data set and high-dimensional scenarios. Adam is the most common optimization algorithm.

API Reference: api_fluid_optimizer_AdamOptimizer


Adamax is a variant of Adam algorithm, simplifying limits of learning rate, especially upper limit.

API Reference: api_fluid_optimizer_AdamaxOptimizer


DecayedAdagrad Optimizer can be regarded as an Adagrad algorithm incorporated with decay rate to solve the problem of dramatic descent of learning rate in mid-term and end term of model training.

API Reference: api_fluid_optimizer_DecayedAdagrad

8. Ftrl/FtrlOptimizer

FtrlOptimizer Optimizer combines the high accuracy of FOBOS algorithm and the sparsity of RDA algorithm , which is an Online Learning algorithm with significantly satisfying effect.

API Reference: api_fluid_optimizer_FtrlOptimizer


ModelAverage Optimizer accumulates history parameters through sliding window during the model training. We use averaged parameters at inference time to upgrade general accuracy of inference.

API Reference: api_fluid_optimizer_ModelAverage


System Message: WARNING/2 (/FluidDoc/docs/api_guides/low_level/optimizer_en.rst, line 95)

Title underline too short.


Rprop Optimizer, this method considers that the magnitude of gradients for different weight parameters may vary greatly, making it difficult to find a global learning step size. Therefore, an innovative method is proposed to accelerate the optimization process by dynamically adjusting the learning step size through the use of parameter gradient symbols.

API Reference: api_fluid_optimizer_Rprop


System Message: WARNING/2 (/FluidDoc/docs/api_guides/low_level/optimizer_en.rst, line 105)

Title underline too short.


ASGD Optimizer, it is a strategy version of SGD that trades space for time, and is a stochastic optimization method with trajectory averaging. On the basis of SGD, ASGD adds a measure of the average value of historical parameters, making the variance of noise in the descending direction decrease in a decreasing trend, so that the algorithm will eventually converge to the optimal value at a linear speed.

API Reference: api_fluid_optimizer_ASGD