minimize_bfgs¶
- paddle.incubate.optimizer.functional. minimize_bfgs ( objective_func, initial_position, max_iters=50, tolerance_grad=1e-07, tolerance_change=1e-09, initial_inverse_hessian_estimate=None, line_search_fn='strong_wolfe', max_line_search_iters=50, initial_step_length=1.0, dtype='float32', name=None ) [source]
- 
         Minimizes a differentiable function func using the BFGS method. The BFGS is a quasi-Newton method for solving an unconstrained optimization problem over a differentiable function. Closely related is the Newton method for minimization. Consider the iterate update formula: \[x_{k+1} = x_{k} + H_k \nabla{f_k}\]If \(H_k\) is the inverse Hessian of \(f\) at \(x_k\), then it’s the Newton method. If \(H_k\) is symmetric and positive definite, used as an approximation of the inverse Hessian, then it’s a quasi-Newton. In practice, the approximated Hessians are obtained by only using the gradients, over either whole or part of the search history, the former is BFGS, the latter is L-BFGS. - Reference:
- 
           Jorge Nocedal, Stephen J. Wright, Numerical Optimization, Second Edition, 2006. pp140: Algorithm 6.1 (BFGS Method). 
 - Parameters
- 
           - objective_func – the objective function to minimize. - objective_funcaccepts a 1D Tensor and returns a scalar.
- initial_position (Tensor) – the starting point of the iterates, has the same shape with the input of - objective_func.
- max_iters (int, optional) – the maximum number of minimization iterations. Default value: 50. 
- tolerance_grad (float, optional) – terminates if the gradient norm is smaller than this. Currently gradient norm uses inf norm. Default value: 1e-7. 
- tolerance_change (float, optional) – terminates if the change of function value/position/parameter between two iterations is smaller than this value. Default value: 1e-9. 
- initial_inverse_hessian_estimate (Tensor, optional) – the initial inverse hessian approximation at initial_position. It must be symmetric and positive definite. If not given, will use an identity matrix of order N, which is size of - initial_position. Default value: None.
- line_search_fn (str, optional) – indicate which line search method to use, only support ‘strong wolfe’ right now. May support ‘Hager Zhang’ in the futrue. Default value: ‘strong wolfe’. 
- max_line_search_iters (int, optional) – the maximum number of line search iterations. Default value: 50. 
- initial_step_length (float, optional) – step length used in first iteration of line search. different initial_step_length may cause different optimal result. For methods like Newton and quasi-Newton the initial trial step length should always be 1.0. Default value: 1.0. 
- dtype ('float32' | 'float64', optional) – data type used in the algorithm, the data type of the input parameter must be consistent with the dtype. Default value: ‘float32’. 
- name (str, optional) – Name for the operation. For more information, please refer to Name. Default value: None. 
 
- Returns
- 
           
           - is_converge (bool): Indicates whether found the minimum within tolerance. 
- num_func_calls (int): number of objective function called. 
- position (Tensor): the position of the last iteration. If the search converged, this value is the argmin of the objective function regrading to the initial position. 
- objective_value (Tensor): objective function value at the position. 
- objective_gradient (Tensor): objective function gradient at the position. 
- inverse_hessian_estimate (Tensor): the estimate of inverse hessian at the position. 
 
- Return type
- 
           output(tuple) 
 Examples import paddle def func(x): return paddle.dot(x, x) x0 = paddle.to_tensor([1.3, 2.7]) results = paddle.incubate.optimizer.functional.minimize_bfgs(func, x0) print("is_converge: ", results[0]) print("the minimum of func is: ", results[2]) # is_converge: is_converge: Tensor(shape=[1], dtype=bool, place=Place(gpu:0), stop_gradient=True, # [True]) # the minimum of func is: Tensor(shape=[2], dtype=float32, place=Place(gpu:0), stop_gradient=True, # [0., 0.]) 
