Neural network in essence is a Optimization problem . With forward computing and back propagation , Optimizer use back-propagation gradients to optimize parameters in a neural network.


SGD is an offspring class of Optimizer implementing Random Gradient Descent which is a method of Gradient Descent . When it needs to train a large number of samples, we usually choose SGD to make loss function converge more quickly.

API Reference: api_fluid_optimizer_SGDOptimizer


Momentum optimizer adds momentum on the basis of SGD , reducing noise problem in the process of random gradient descent. You can set ues_nesterov as False or True, respectively corresponding to traditional Momentum(Section 4.1 in thesis) algorithm and Nesterov accelerated gradient(Section 4.2 in thesis) algorithm.

API Reference: api_fluid_optimizer_MomentumOptimizer

3. Adagrad/AdagradOptimizer

Adagrad Optimizer can adaptively allocate different learning rates for parameters to solve the problem of different sample sizes for different parameters.

API Reference: api_fluid_optimizer_AdagradOptimizer


RMSProp optimizer is a method to adaptively adjust learning rate. It mainly solves the problem of dramatic decrease of learning rate in the mid-term and end term of model training after Adagrad is used.

API Reference: api_fluid_optimizer_RMSPropOptimizer


Optimizer of Adam is a method to adaptively adjust learning rate, fit for most non- convex optimization , big data set and high-dimensional scenarios. Adam is the most common optimization algorithm.

API Reference: api_fluid_optimizer_AdamOptimizer


Adamax is a variant of Adam algorithm, simplifying limits of learning rate, especially upper limit.

API Reference: api_fluid_optimizer_AdamaxOptimizer


DecayedAdagrad Optimizer can be regarded as an Adagrad algorithm incorporated with decay rate to solve the problem of dramatic descent of learning rate in mid-term and end term of model training.

API Reference: api_fluid_optimizer_DecayedAdagrad

8. Ftrl/FtrlOptimizer

FtrlOptimizer Optimizer combines the high accuracy of FOBOS algorithm and the sparsity of RDA algorithm , which is an Online Learning algorithm with significantly satisfying effect.

API Reference: api_fluid_optimizer_FtrlOptimizer


ModelAverage Optimizer accumulates history parameters through sliding window during the model training. We use averaged parameters at inference time to upgrade general accuracy of inference.

API Reference: api_fluid_optimizer_ModelAverage