SGD is an offspring class of
Optimizer implementing Random Gradient Descent which is a method of Gradient Descent . When it needs to train a large number of samples, we usually choose
SGD to make loss function converge more quickly.
API Reference: api_fluid_optimizer_SGDOptimizer
Momentum optimizer adds momentum on the basis of
SGD , reducing noise problem in the process of random gradient descent. You can set
ues_nesterov as False or True, respectively corresponding to traditional Momentum(Section 4.1 in thesis) algorithm and Nesterov accelerated gradient(Section 4.2 in thesis) algorithm.
API Reference: api_fluid_optimizer_MomentumOptimizer
Adagrad Optimizer can adaptively allocate different learning rates for parameters to solve the problem of different sample sizes for different parameters.
API Reference: api_fluid_optimizer_AdagradOptimizer
RMSProp optimizer is a method to adaptively adjust learning rate. It mainly solves the problem of dramatic decrease of learning rate in the mid-term and end term of model training after Adagrad is used.
API Reference: api_fluid_optimizer_RMSPropOptimizer
API Reference: api_fluid_optimizer_AdamOptimizer
Adamax is a variant of
Adam algorithm, simplifying limits of learning rate, especially upper limit.
API Reference: api_fluid_optimizer_AdamaxOptimizer
DecayedAdagrad Optimizer can be regarded as an
Adagrad algorithm incorporated with decay rate to solve the problem of dramatic descent of learning rate in mid-term and end term of model training.
API Reference: api_fluid_optimizer_DecayedAdagrad
API Reference: api_fluid_optimizer_FtrlOptimizer
ModelAverage Optimizer accumulates history parameters through sliding window during the model training. We use averaged parameters at inference time to upgrade general accuracy of inference.
API Reference: api_fluid_optimizer_ModelAverage