paddle.incubate.optimizer.functional. minimize_bfgs ( objective_func, initial_position, max_iters=50, tolerance_grad=1e-07, tolerance_change=1e-09, initial_inverse_hessian_estimate=None, line_search_fn='strong_wolfe', max_line_search_iters=50, initial_step_length=1.0, dtype='float32', name=None ) [source]

Minimizes a differentiable function func using the BFGS method. The BFGS is a quasi-Newton method for solving an unconstrained optimization problem over a differentiable function. Closely related is the Newton method for minimization. Consider the iterate update formula .. math:

x_{k+1} = x_{k} + H \nabla{f},

If $H$ is the inverse Hessian of $f$ at $x_{k}$, then it’s the Newton method. If $H$ is symmetric and positive definite, used as an approximation of the inverse Hessian, then it’s a quasi-Newton. In practice, the approximated Hessians are obtained by only using the gradients, over either whole or part of the search history, the former is BFGS.


Jorge Nocedal, Stephen J. Wright, Numerical Optimization, Second Edition, 2006. pp140: Algorithm 6.1 (BFGS Method).

Following summarizes the the main logic of the program based on BFGS. Note: _k represents value of k_th iteration, ^T represents the transposition of a vector or matrix. repeat

p_k = H_k * g_k alpha = strong_wolfe(f, x_k, p_k) x_k+1 = x_k + alpha * p_k s_k = x_k+1 - x_k y_k = g_k+1 - g_k rho_k = 1 / (s_k^T * y_k) V_k^T = I - rho_k * s_k * y_k^T V_k = I - rho_k * y_k * s_k^T H_k+1 = V_k^T * H_k * V_k + rho_k * s_k * s_k^T check_converge

  • objective_func – the objective function to minimize. func accepts a multivariate input and returns a scalar.

  • initial_position (Tensor) – the starting point of the iterates. For methods like Newton and quasi-Newton

  • max_iters (int) – the maximum number of minimization iterations.

  • tolerance_grad (float) – terminates if the gradient norm is smaller than this. Currently gradient norm uses inf norm.

  • tolerance_change (float) – terminates if the change of function value/position/parameter between two iterations is smaller than this value.

  • initial_inverse_hessian_estimate (Tensor) – the initial inverse hessian approximation at initial_position.

  • line_search_fn (str) – indicate which line search method to use, only support ‘strong wolfe’ right now. May support ‘Hager Zhang’ in the futrue.

  • max_line_search_iters (int) – the maximum number of line search iterations.

  • initial_step_length (float) – step length used in first iteration of line search. different initial_step_length

  • dtype ('float32' | 'float64') – In static graph, float64 will be convert to float32 due to paddle.assign limit.


Indicates whether found the minimum within tolerance. num_func_calls (int): number of objective function called. position (Tensor): the position of the last iteration. If the search converged, this value is the argmin of the objective function regrading to the initial position. objective_value (Tensor): objective function value at the position. objective_gradient (Tensor): objective function gradient at the position. inverse_hessian_estimate (Tensor): the estimate of inverse hessian at the position.

Return type

is_converge (bool)


import paddle

def func(x):
    return, x)

x0 = paddle.to_tensor([1.3, 2.7])
results = paddle.incubate.optimizer.functional.minimize_bfgs(func, x0)
print("is_converge: ", results[0])
print("the minimum of func is: ", results[2])
# is_converge:  is_converge:  Tensor(shape=[1], dtype=bool, place=Place(gpu:0), stop_gradient=True,
#        [True])
# the minimum of func is:  Tensor(shape=[2], dtype=float32, place=Place(gpu:0), stop_gradient=True,
#        [0., 0.])