ShardingOptimizer¶
-
class
paddle.distributed.fleet.meta_optimizers.sharding_optimizer.
ShardingOptimizer
( optimizer ) [source] -
-
apply_gradients
( params_grads ) -
Second part of minimize, appending optimization operators for given params_grads pairs.
- Parameters
-
params_grads (list) – list of (param, grad) pair to do optimization.
- Returns
-
A list of operators appended to the current program.
- Return type
-
list
Examples
import paddle.fluid as fluid loss = network() optimizer = fluid.optimizer.SGD(learning_rate=0.1) params_grads = optimizer.backward(loss) # you may append operations for params_grads here # ... optimizer.apply_gradients(params_grads)
-
apply_optimize
( loss, startup_program, params_grads ) -
Second part of minimize, appending optimization operators for given params_grads pairs. :param loss: loss variable to run optimizations. :type loss: Variable :param startup_program: startup_program for initializing parameters
in parameter_list.
- Parameters
-
params_grads (list) – list of (param, grad) pair to do optimization.
- Returns
-
A list of operators appended to the current program.
- Return type
-
list
-
backward
( loss, startup_program=None, parameter_list=None, no_grad_set=None, callbacks=None ) -
The first part of
minimize
, do auto-diff to append backward operations for the current program.- Parameters
-
loss (Variable) –
loss
variable to run optimizations.startup_program (Program, optional) – api_fluid_Program for initializing parameters in
parameter_list
. The default value is None, at this time api_fluid_default_startup_program will be used.parameter_list (Iterable, optional) – Iterable of
Variable
orVariable.name
to update to minimizeloss
. The default value is None, at this time all parameters will be updated.no_grad_set (set, optional) – Set of
Variable
orVariable.name
that don’t need to be updated. The default value is None.callbacks (list, optional) – list of callable objects to run when appending backward operator for one parameter. The default value is None.
- Returns
-
-
list of (param, grad) variable pairs, param is
Parameter
, -
grad is the gradient value corresponding to the parameter.
-
list of (param, grad) variable pairs, param is
- Return type
-
list
Examples
See examples in
apply_gradients
.
-
clear_gradients
( ) -
Clear the gradients of all optimized parameters for model.
If not, new gradient will accumulat on previous gradient.
- Returns
-
None
Examples
import paddle.fluid as fluid import numpy as np with fluid.dygraph.guard(): value = np.arange(26).reshape(2, 13).astype("float32") a = fluid.dygraph.to_variable(value) linear = fluid.Linear(13, 5, dtype="float32") # This can be any optimizer supported by dygraph. adam = fluid.optimizer.Adam(learning_rate = 0.01, parameter_list = linear.parameters()) out = linear(a) out.backward() adam.minimize(out) adam.clear_gradients()
-
current_step_lr
( ) -
- Api_attr
-
imperative
Get current step learning rate. The return value is all the same When LearningRateDecay is not used, otherwise return the step learning rate.
- Returns
-
The learning rate of the current step.
- Return type
-
float
Examples
import paddle.fluid as fluid import numpy as np # example1: LearningRateDecay is not used, return value is all the same with fluid.dygraph.guard(): emb = fluid.dygraph.Embedding([10, 10]) adam = fluid.optimizer.Adam(0.001, parameter_list = emb.parameters()) lr = adam.current_step_lr() print(lr) # 0.001 # example2: PiecewiseDecay is used, return the step learning rate with fluid.dygraph.guard(): inp = np.random.uniform(-0.1, 0.1, [10, 10]).astype("float32") linear = fluid.dygraph.nn.Linear(10, 10) inp = fluid.dygraph.to_variable(inp) out = linear(inp) loss = fluid.layers.reduce_mean(out) bd = [2, 4, 6, 8] value = [0.2, 0.4, 0.6, 0.8, 1.0] adam = fluid.optimizer.Adam(fluid.dygraph.PiecewiseDecay(bd, value, 0), parameter_list=linear.parameters()) # first step: learning rate is 0.2 np.allclose(adam.current_step_lr(), 0.2, rtol=1e-06, atol=0.0) # True # learning rate for different steps ret = [0.2, 0.2, 0.4, 0.4, 0.6, 0.6, 0.8, 0.8, 1.0, 1.0, 1.0, 1.0] for i in range(12): adam.minimize(loss) lr = adam.current_step_lr() np.allclose(lr, ret[i], rtol=1e-06, atol=0.0) # True
-
minimize
( loss, startup_program=None, parameter_list=None, no_grad_set=None ) -
Add operations to minimize
loss
by updatingparameter_list
.- Parameters
-
loss (Variable) – A
Variable
containing the value to minimize.startup_program (Program, optional) – api_fluid_Program for initializing parameters in
parameter_list
. The default value is None, at this time api_fluid_default_startup_program will be used.parameter_list (Iterable, optional) – Iterable of
Variable
orVariable.name
to update to minimizeloss
. The default value is None, at this time all parameters will be updated.no_grad_set (set, optional) – Set of
Variable
orVariable.name
that don’t need to be updated. The default value is None.
- Returns
-
tuple (optimize_ops, params_grads), A list of operators appended by minimize and a list of (param, grad) variable pairs, param is
Parameter
, grad is the gradient value corresponding to the parameter. The returned tuple can be passed tofetch_list
inExecutor.run()
to indicate program pruning. If so, the program will be pruned byfeed
andfetch_list
before run, see details inExecutor
. - Return type
-
tuple
Examples
Please refer to the example of current Optimizer.
-
set_dict
( state_dict ) -
Load optimizer state dict. For Adam optimizer, contains beta1, beta2, momentum etc. If LearningRateDecay have been used, global_step will be changed.
- Parameters
-
state_dict (dict) – Dict contains all the Variable needed by optimizer
- Returns
-
None
Examples
import paddle import paddle.fluid as fluid paddle.disable_static() emb = paddle.nn.Embedding(10, 10) state_dict = emb.state_dict() fluid.save_dygraph(state_dict, "paddle_dy") scheduler = paddle.optimizer.lr.NoamDecay( d_model=0.01, warmup_steps=100, verbose=True) adam = paddle.optimizer.Adam( learning_rate=scheduler, parameters=emb.parameters()) state_dict = adam.state_dict() fluid.save_dygraph(state_dict, "paddle_dy") para_state_dict, opti_state_dict = fluid.load_dygraph("paddle_dy")
-
set_lr
( value ) -
- Api_attr
-
imperative
Set the value of the learning rate manually in the optimizer. If the optimizer use LearningRateDecay, this API cannot be invoked, because it will lead to conflict.
- Parameters
-
value (float|Variable) – the value of learning rate
- Returns
-
None
Examples
import paddle.fluid as fluid with fluid.dygraph.guard(): linear = fluid.dygraph.nn.Linear(10, 10) adam = fluid.optimizer.Adam(0.1, parameter_list=linear.parameters()) # set learning rate manually by python float value lr_list = [0.2, 0.3, 0.4, 0.5, 0.6] for i in range(5): adam.set_lr(lr_list[i]) lr = adam.current_step_lr() print("current lr is {}".format(lr)) # Print: # current lr is 0.2 # current lr is 0.3 # current lr is 0.4 # current lr is 0.5 # current lr is 0.6 # set learning rate manually by framework Variable lr_var = fluid.layers.create_global_var( shape=[1], value=0.7, dtype='float32') adam.set_lr(lr_var) lr = adam.current_step_lr() print("current lr is {}".format(lr)) # Print: # current lr is 0.7
-
set_state_dict
( state_dict ) -
Load optimizer state dict. For Adam optimizer, contains beta1, beta2, momentum etc. If LearningRateDecay have been used, global_step will be changed.
- Parameters
-
state_dict (dict) – Dict contains all the Variable needed by optimizer
- Returns
-
None
Examples
import paddle import paddle.fluid as fluid paddle.disable_static() emb = paddle.nn.Embedding(10, 10) state_dict = emb.state_dict() fluid.save_dygraph(state_dict, "paddle_dy") scheduler = paddle.optimizer.lr.NoamDecay( d_model=0.01, warmup_steps=100, verbose=True) adam = paddle.optimizer.Adam( learning_rate=scheduler, parameters=emb.parameters()) state_dict = adam.state_dict() fluid.save_dygraph(state_dict, "paddle_dy") para_state_dict, opti_state_dict = fluid.load_dygraph("paddle_dy")
-
state_dict
( ) -
Get state dict information from optimizer. It contain all the variable used by optimizer. For Adam optimizer, contains beta1, beta2, momentum etc. If LearningRateDecay have been used, global_step will be include in state dict. If the optimizer never be called(minimize function), the state_dict is empty.
Args: None :returns: dict contains all the variable used by optimizer :rtype: state_dict(dict)
Examples
import paddle.fluid as fluid with fluid.dygraph.guard(): emb = fluid.dygraph.Embedding([10, 10]) adam = fluid.optimizer.Adam(0.001, parameter_list=emb.parameters()) state_dict = adam.state_dict()
-