SGD is an offspring class of
Optimizer implementing Random Gradient Descent which is a method of Gradient Descent .
When it needs to train a large number of samples, we usually choose
SGD to make loss function converge more quickly.
API Reference: SGDOptimizer
Momentum optimizer adds momentum on the basis of
SGD , reducing noise problem in the process of random gradient descent.
You can set
ues_nesterov as False or True, respectively corresponding to traditional Momentum(Section 4.1 in thesis) algorithm and Nesterov accelerated gradient(Section 4.2 in thesis) algorithm.
API Reference: MomentumOptimizer
Adagrad Optimizer can adaptively allocate different learning rates for parameters to solve the problem of different sample sizes for different parameters.
API Reference: AdagradOptimizer
RMSProp optimizer is a method to adaptively adjust learning rate. It mainly solves the problem of dramatic decrease of learning rate in the mid-term and end term of model training after Adagrad is used.
API Reference: RMSPropOptimizer
API Reference: AdamOptimizer
Adamax is a variant of
Adam algorithm, simplifying limits of learning rate, especially upper limit.
API Reference: AdamaxOptimizer
DecayedAdagrad Optimizer can be regarded as an
Adagrad algorithm incorporated with decay rate to solve the problem of dramatic descent of learning rate in mid-term and end term of model training.
API Reference: DecayedAdagrad
API Reference: FtrlOptimizer