minimize_bfgs

paddle.incubate.optimizer.functional. minimize_bfgs ( objective_func: Callable[[Tensor], Tensor], initial_position: Tensor, max_iters: int = 50, tolerance_grad: float = 1e-07, tolerance_change: float = 1e-09, initial_inverse_hessian_estimate: Tensor | None = None, line_search_fn: Literal['strong_wolfe'] = 'strong_wolfe', max_line_search_iters: int = 50, initial_step_length: float = 1.0, dtype: Literal['float32', 'float64'] = 'float32', name: str | None = None ) → tuple[bool, int, Tensor, Tensor, Tensor, Tensor] [source]

Minimizes a differentiable function func using the BFGS method. The BFGS is a quasi-Newton method for solving an unconstrained optimization problem over a differentiable function. Closely related is the Newton method for minimization. Consider the iterate update formula:

\[x_{k+1} = x_{k} + H_k \nabla{f_k}\]

If \(H_k\) is the inverse Hessian of \(f\) at \(x_k\), then it’s the Newton method. If \(H_k\) is symmetric and positive definite, used as an approximation of the inverse Hessian, then it’s a quasi-Newton. In practice, the approximated Hessians are obtained by only using the gradients, over either whole or part of the search history, the former is BFGS, the latter is L-BFGS.

Reference:: Jorge Nocedal, Stephen J. Wright, Numerical Optimization, Second Edition, 2006. pp140: Algorithm 6.1 (BFGS Method).

Parameters

objective_func – the objective function to minimize. objective_func accepts a 1D Tensor and returns a scalar.
initial_position (Tensor) – the starting point of the iterates, has the same shape with the input of objective_func .
max_iters (int, optional) – the maximum number of minimization iterations. Default value: 50.
tolerance_grad (float, optional) – terminates if the gradient norm is smaller than this. Currently gradient norm uses inf norm. Default value: 1e-7.
tolerance_change (float, optional) – terminates if the change of function value/position/parameter between two iterations is smaller than this value. Default value: 1e-9.
initial_inverse_hessian_estimate (Tensor, optional) – the initial inverse hessian approximation at initial_position. It must be symmetric and positive definite. If not given, will use an identity matrix of order N, which is size of initial_position . Default value: None.
line_search_fn (str, optional) – indicate which line search method to use, only support ‘strong wolfe’ right now. May support ‘Hager Zhang’ in the future. Default value: ‘strong wolfe’.
max_line_search_iters (int, optional) – the maximum number of line search iterations. Default value: 50.
initial_step_length (float, optional) – step length used in first iteration of line search. different initial_step_length may cause different optimal result. For methods like Newton and quasi-Newton the initial trial step length should always be 1.0. Default value: 1.0.
dtype ('float32' | 'float64', optional) – data type used in the algorithm, the data type of the input parameter must be consistent with the dtype. Default value: ‘float32’.
name (str, optional) – Name for the operation. For more information, please refer to Name. Default value: None.

Returns

is_converge (bool): Indicates whether found the minimum within tolerance.
num_func_calls (int): number of objective function called.
position (Tensor): the position of the last iteration. If the search converged, this value is the argmin of the objective function regrading to the initial position.
objective_value (Tensor): objective function value at the position.
objective_gradient (Tensor): objective function gradient at the position.
inverse_hessian_estimate (Tensor): the estimate of inverse hessian at the position.

Return type

output(tuple)

Examples

>>> # Example1: 1D Grid Parameters
>>> import paddle
>>> # Randomly simulate a batch of input data
>>> inputs = paddle. normal(shape=(100, 1))
>>> labels = inputs * 2.0
>>> # define the loss function
>>> def loss(w):
...     y = w * inputs
...     return paddle.nn.functional.square_error_cost(y, labels).mean()
>>> # Initialize weight parameters
>>> w = paddle.normal(shape=(1,))
>>> # Call the bfgs method to solve the weight that makes the loss the smallest, and update the parameters
>>> for epoch in range(0, 10):
...     # Call the bfgs method to optimize the loss, note that the third parameter returned represents the weight
...     w_update = paddle.incubate.optimizer.functional.minimize_bfgs(loss, w)[2]
...     # Use paddle.assign to update parameters in place
...     paddle. assign(w_update, w)

>>> # Example2: Multidimensional Grid Parameters
>>> import paddle
>>> def flatten(x):
...     return x. flatten()
>>> def unflatten(x):
...     return x.reshape((2,2))
>>> # Assume the network parameters are more than one dimension
>>> def net(x):
...     assert len(x.shape) > 1
...     return x.square().mean()
>>> # function to be optimized
>>> def bfgs_f(flatten_x):
...     return net(unflatten(flatten_x))
>>> x = paddle.rand([2,2])
>>> for i in range(0, 10):
...     # Flatten x before using minimize_bfgs
...     x_update = paddle.incubate.optimizer.functional.minimize_bfgs(bfgs_f, flatten(x))[2]
...     # unflatten x_update, then update parameters
...     paddle.assign(unflatten(x_update), x)