Optimizer¶
Neural network in essence is a Optimization problem . With forward computing and back propagation , Optimizer
use back-propagation gradients to optimize parameters in a neural network.
1.SGD/SGDOptimizer¶
SGD
is an offspring class of Optimizer
implementing Random Gradient Descent which is a method of Gradient Descent . When it needs to train a large number of samples, we usually choose SGD
to make loss function converge more quickly.
API Reference: api_fluid_optimizer_SGDOptimizer
2.Momentum/MomentumOptimizer¶
Momentum
optimizer adds momentum on the basis of SGD
, reducing noise problem in the process of random gradient descent. You can set ues_nesterov
as False or True, respectively corresponding to traditional Momentum(Section 4.1 in thesis) algorithm and Nesterov accelerated gradient(Section 4.2 in thesis) algorithm.
API Reference: api_fluid_optimizer_MomentumOptimizer
3. Adagrad/AdagradOptimizer¶
Adagrad Optimizer can adaptively allocate different learning rates for parameters to solve the problem of different sample sizes for different parameters.
API Reference: api_fluid_optimizer_AdagradOptimizer
4.RMSPropOptimizer¶
RMSProp optimizer is a method to adaptively adjust learning rate. It mainly solves the problem of dramatic decrease of learning rate in the mid-term and end term of model training after Adagrad is used.
API Reference: api_fluid_optimizer_RMSPropOptimizer
5.Adam/AdamOptimizer¶
Optimizer of Adam is a method to adaptively adjust learning rate, fit for most non- convex optimization , big data set and high-dimensional scenarios. Adam
is the most common optimization algorithm.
API Reference: api_fluid_optimizer_AdamOptimizer
6.Adamax/AdamaxOptimizer¶
Adamax is a variant of Adam
algorithm, simplifying limits of learning rate, especially upper limit.
API Reference: api_fluid_optimizer_AdamaxOptimizer
7.DecayedAdagrad/DecayedAdagradOptimizer¶
DecayedAdagrad Optimizer can be regarded as an Adagrad
algorithm incorporated with decay rate to solve the problem of dramatic descent of learning rate in mid-term and end term of model training.
API Reference: api_fluid_optimizer_DecayedAdagrad
8. Ftrl/FtrlOptimizer¶
FtrlOptimizer Optimizer combines the high accuracy of FOBOS algorithm and the sparsity of RDA algorithm , which is an Online Learning algorithm with significantly satisfying effect.
API Reference: api_fluid_optimizer_FtrlOptimizer
9.ModelAverage¶
ModelAverage
Optimizer accumulates history parameters through sliding window during the model training. We use averaged parameters at inference time to upgrade general accuracy of inference.
API Reference: api_fluid_optimizer_ModelAverage