\u200E
.. _api_guide_optimizer_en: ########### Optimizer ########### Neural network in essence is a `Optimization problem `_ . With `forward computing and back propagation `_ , :code:`Optimizer` use back-propagation gradients to optimize parameters in a neural network. 1.SGD/SGDOptimizer ------------------ :code:`SGD` is an offspring class of :code:`Optimizer` implementing `Random Gradient Descent `_ which is a method of `Gradient Descent `_ . When it needs to train a large number of samples, we usually choose :code:`SGD` to make loss function converge more quickly. API Reference: :ref:`api_fluid_optimizer_SGDOptimizer` 2.Momentum/MomentumOptimizer ---------------------------- :code:`Momentum` optimizer adds momentum on the basis of :code:`SGD` , reducing noise problem in the process of random gradient descent. You can set :code:`ues_nesterov` as False or True, respectively corresponding to traditional `Momentum(Section 4.1 in thesis) `_ algorithm and `Nesterov accelerated gradient(Section 4.2 in thesis) `_ algorithm. API Reference: :ref:`api_fluid_optimizer_MomentumOptimizer` 3. Adagrad/AdagradOptimizer --------------------------- `Adagrad `_ Optimizer can adaptively allocate different learning rates for parameters to solve the problem of different sample sizes for different parameters. API Reference: :ref:`api_fluid_optimizer_AdagradOptimizer` 4.RMSPropOptimizer ------------------ `RMSProp optimizer `_ is a method to adaptively adjust learning rate. It mainly solves the problem of dramatic decrease of learning rate in the mid-term and end term of model training after Adagrad is used. API Reference: :ref:`api_fluid_optimizer_RMSPropOptimizer` 5.Adam/AdamOptimizer -------------------- Optimizer of `Adam `_ is a method to adaptively adjust learning rate, fit for most non- `convex optimization `_ , big data set and high-dimensional scenarios. :code:`Adam` is the most common optimization algorithm. API Reference: :ref:`api_fluid_optimizer_AdamOptimizer` 6.Adamax/AdamaxOptimizer ------------------------ `Adamax `_ is a variant of :code:`Adam` algorithm, simplifying limits of learning rate, especially upper limit. API Reference: :ref:`api_fluid_optimizer_AdamaxOptimizer` 7.DecayedAdagrad/DecayedAdagradOptimizer ------------------------------------------- `DecayedAdagrad `_ Optimizer can be regarded as an :code:`Adagrad` algorithm incorporated with decay rate to solve the problem of dramatic descent of learning rate in mid-term and end term of model training. API Reference: :ref:`api_fluid_optimizer_DecayedAdagrad` 8. Ftrl/FtrlOptimizer ---------------------- `FtrlOptimizer `_ Optimizer combines the high accuracy of `FOBOS algorithm `_ and the sparsity of `RDA algorithm `_ , which is an `Online Learning `_ algorithm with significantly satisfying effect. API Reference: :ref:`api_fluid_optimizer_FtrlOptimizer` 9.ModelAverage ----------------- :code:`ModelAverage` Optimizer accumulates history parameters through sliding window during the model training. We use averaged parameters at inference time to upgrade general accuracy of inference. API Reference: :ref:`api_fluid_optimizer_ModelAverage`