LBFGS

paddle.incubate.optimizer. LBFGS ( learning_rate=1.0, max_iter=20, max_eval=None, tolerance_grad=1e-07, tolerance_change=1e-09, history_size=100, line_search_fn=None, parameters=None, weight_decay=None, grad_clip=None, name=None ) [source]

Warning

API “paddle.incubate.optimizer.lbfgs.LBFGS” is deprecated since 2.5.0, and will be removed in future versions. Please use “paddle.optimizer.LBFGS” instead.

The L-BFGS is a quasi-Newton method for solving an unconstrained optimization problem over a differentiable function. Closely related is the Newton method for minimization. Consider the iterate update formula:

\[x_{k+1} = x_{k} + H_k \nabla{f_k}\]

If \(H_k\) is the inverse Hessian of \(f\) at \(x_k\), then it’s the Newton method. If \(H_k\) is symmetric and positive definite, used as an approximation of the inverse Hessian, then it’s a quasi-Newton. In practice, the approximated Hessians are obtained by only using the gradients, over either whole or part of the search history, the former is BFGS, the latter is L-BFGS.

Reference:

Jorge Nocedal, Stephen J. Wright, Numerical Optimization, Second Edition, 2006. pp179: Algorithm 7.5 (L-BFGS).

Parameters
  • learning_rate (float, optional) – learning rate .The default value is 1.

  • max_iter (int, optional) – maximal number of iterations per optimization step. The default value is 20.

  • max_eval (int, optional) – maximal number of function evaluations per optimization step. The default value is max_iter * 1.25.

  • tolerance_grad (float, optional) – termination tolerance on first order optimality The default value is 1e-5.

  • tolerance_change (float, optional) – termination tolerance on function value/parameter changes. The default value is 1e-9.

  • history_size (int, optional) – update history size. The default value is 100.

  • line_search_fn (string, optional) – either ‘strong_wolfe’ or None. The default value is strong_wolfe.

  • parameters (list|tuple, optional) – List/Tuple of Tensor names to update to minimize loss. This parameter is required in dygraph mode. The default value is None.

  • weight_decay (float|WeightDecayRegularizer, optional) – The strategy of regularization. It canbe a float value as coeff of L2 regularization or L1Decay, L2Decay. If a parameter has set regularizer using ParamAttr already, the regularization setting here in optimizer will be ignored for this parameter. Otherwise, the regularization setting here in optimizer will take effect. Default None, meaning there is no regularization.

  • grad_clip (GradientClipBase, optional) – Gradient cliping strategy, it’s an instance of some derived class of GradientClipBase . There are three cliping strategies ( ClipGradByGlobalNorm , ClipGradByNorm , ClipGradByValue ). Default None, meaning there is no gradient clipping.

  • name (str, optional) – Normally there is no need for user to set this property. For more information, please refer to Name. The default value is None.

Returns

the final loss of closure.

Return type

loss (Tensor)

Examples

>>> import paddle
>>> import numpy as np
>>> from paddle.incubate.optimizer import LBFGS

>>> paddle.disable_static()
>>> np.random.seed(0)
>>> np_w = np.random.rand(1).astype(np.float32)
>>> np_x = np.random.rand(1).astype(np.float32)

>>> inputs = [np.random.rand(1).astype(np.float32) for i in range(10)]
>>> # y = 2x
>>> targets = [2 * x for x in inputs]

>>> class Net(paddle.nn.Layer):
...     def __init__(self):
...         super().__init__()
...         w = paddle.to_tensor(np_w)
...         self.w = paddle.create_parameter(shape=w.shape, dtype=w.dtype, default_initializer=paddle.nn.initializer.Assign(w))
...     def forward(self, x):
...         return self.w * x

>>> net = Net()
>>> opt = LBFGS(learning_rate=1, max_iter=1, max_eval=None, tolerance_grad=1e-07, tolerance_change=1e-09, history_size=100, line_search_fn='strong_wolfe', parameters=net.parameters())
>>> def train_step(inputs, targets):
...     def closure():
...         outputs = net(inputs)
...         loss = paddle.nn.functional.mse_loss(outputs, targets)
...         print('loss: ', loss.item())
...         opt.clear_grad()
...         loss.backward()
...         return loss
...     opt.step(closure)

>>> for input, target in zip(inputs, targets):
...     input = paddle.to_tensor(input)
...     target = paddle.to_tensor(target)
...     train_step(input, target)