minimize_bfgs

paddle.incubate.optimizer.functional. minimize_bfgs ( objective_func, initial_position, max_iters=50, tolerance_grad=1e-07, tolerance_change=1e-09, initial_inverse_hessian_estimate=None, line_search_fn='strong_wolfe', max_line_search_iters=50, initial_step_length=1.0, dtype='float32', name=None ) [source]

Minimizes a differentiable function func using the BFGS method. The BFGS is a quasi-Newton method for solving an unconstrained optimization problem over a differentiable function. Closely related is the Newton method for minimization. Consider the iterate update formula .. math:

System Message: ERROR/3 (/usr/local/lib/python3.8/site-packages/paddle/incubate/optimizer/functional/bfgs.py:docstring of paddle.incubate.optimizer.functional.bfgs.minimize_bfgs, line 7)

Unexpected indentation.

x_{k+1} = x_{k} + H \nabla{f},

System Message: WARNING/2 (/usr/local/lib/python3.8/site-packages/paddle/incubate/optimizer/functional/bfgs.py:docstring of paddle.incubate.optimizer.functional.bfgs.minimize_bfgs, line 8)

Literal block ends without a blank line; unexpected unindent.

If $H$ is the inverse Hessian of $f$ at $x_{k}$, then it’s the Newton method. If $H$ is symmetric and positive definite, used as an approximation of the inverse Hessian, then it’s a quasi-Newton. In practice, the approximated Hessians are obtained by only using the gradients, over either whole or part of the search history, the former is BFGS.

Reference:

Jorge Nocedal, Stephen J. Wright, Numerical Optimization, Second Edition, 2006. pp140: Algorithm 6.1 (BFGS Method).

Following summarizes the the main logic of the program based on BFGS. Note: _k represents value of k_th iteration, ^T represents the transposition of a vector or matrix. repeat

System Message: ERROR/3 (/usr/local/lib/python3.8/site-packages/paddle/incubate/optimizer/functional/bfgs.py:docstring of paddle.incubate.optimizer.functional.bfgs.minimize_bfgs, line 21)

Unexpected indentation.

p_k = H_k * g_k alpha = strong_wolfe(f, x_k, p_k) x_k+1 = x_k + alpha * p_k s_k = x_k+1 - x_k y_k = g_k+1 - g_k rho_k = 1 / (s_k^T * y_k) V_k^T = I - rho_k * s_k * y_k^T V_k = I - rho_k * y_k * s_k^T H_k+1 = V_k^T * H_k * V_k + rho_k * s_k * s_k^T check_converge

System Message: WARNING/2 (/usr/local/lib/python3.8/site-packages/paddle/incubate/optimizer/functional/bfgs.py:docstring of paddle.incubate.optimizer.functional.bfgs.minimize_bfgs, line 31)

Block quote ends without a blank line; unexpected unindent.

end

Parameters
  • objective_func – the objective function to minimize. func accepts a multivariate input and returns a scalar.

  • initial_position (Tensor) – the starting point of the iterates. For methods like Newton and quasi-Newton

  • 1.0. (the initial trial step length should always be) –

  • max_iters (int) – the maximum number of minimization iterations.

  • tolerance_grad (float) – terminates if the gradient norm is smaller than this. Currently gradient norm uses inf norm.

  • tolerance_change (float) – terminates if the change of function value/position/parameter between two iterations is smaller than this value.

  • initial_inverse_hessian_estimate (Tensor) – the initial inverse hessian approximation at initial_position.

  • definite. (It must be symmetric and positive) –

  • line_search_fn (str) – indicate which line search method to use, only support ‘strong wolfe’ right now. May support ‘Hager Zhang’ in the futrue.

  • max_line_search_iters (int) – the maximum number of line search iterations.

  • initial_step_length (float) – step length used in first iteration of line search. different initial_step_length

  • result. (may cause different optimal) –

  • dtype ('float32' | 'float64') – In static graph, float64 will be convert to float32 due to paddle.assign limit.

Returns

Indicates whether found the minimum within tolerance. num_func_calls (int): number of objective function called. position (Tensor): the position of the last iteration. If the search converged, this value is the argmin of the objective function regrading to the initial position. objective_value (Tensor): objective function value at the position. objective_gradient (Tensor): objective function gradient at the position. inverse_hessian_estimate (Tensor): the estimate of inverse hessian at the position.

Return type

is_converge (bool)

Examples

import paddle

def func(x):
    return paddle.dot(x, x)

x0 = paddle.to_tensor([1.3, 2.7])
results = paddle.incubate.optimizer.functional.minimize_bfgs(func, x0)
print("is_converge: ", results[0])
print("the minimum of func is: ", results[2])
# is_converge:  is_converge:  Tensor(shape=[1], dtype=bool, place=Place(gpu:0), stop_gradient=True,
#        [True])
# the minimum of func is:  Tensor(shape=[2], dtype=float32, place=Place(gpu:0), stop_gradient=True,
#        [0., 0.])