set_gradient_clip¶
- paddle.fluid.clip. set_gradient_clip ( clip, param_list=None, program=None ) [source]
- 
         - Api_attr
- 
           Static Graph 
 Warning This API must be used after building network, and before minimize, and it may be removed in future releases, so it is not recommended. It is recommended to setgrad_clipwhen initializing theoptimizer, this is a better method to clip gradient. There are three clipping strategies:api_fluid_clip_GradientClipByGlobalNorm , api_fluid_clip_GradientClipByNorm , api_fluid_clip_GradientClipByValue . To specify parameters that require gradient clip. - Parameters
- 
           - grad_clip (GradientClipBase, optional) – Gradient cliping strategy, it’s an instance of some derived class of - GradientClipBase. There are three cliping strategies ( api_fluid_clip_GradientClipByGlobalNorm , api_fluid_clip_GradientClipByNorm , api_fluid_clip_GradientClipByValue ). Default value: None, and there is no gradient clipping.
- param_list (list(Variable), optional) – Parameters that require gradient clip. It can be a list of parameter or a list of parameter’s name. Default None, meaning that all parameters in the program will be included. 
- program (Program, optional) – The program where parameters are located. Default None, meaning that using api_fluid_default_main_program . 
 
- Returns
- 
           None 
 Examples import paddle.fluid as fluid def network(): image = fluid.data(name='image', shape=[ None, 28], dtype='float32') param_attr1 = fluid.ParamAttr("fc1_param") fc1 = fluid.layers.fc(image, size=10, param_attr=param_attr1) param_attr2 = fluid.ParamAttr("fc2_param") fc2 = fluid.layers.fc(fc1, size=10, param_attr=param_attr2) loss = fluid.layers.reduce_mean(fc2) return loss # network 1: clip all parameter gradient with fluid.program_guard(fluid.Program(), fluid.Program()): loss = network() fluid.clip.set_gradient_clip( fluid.clip.GradientClipByGlobalNorm(clip_norm=2.0)) sgd = fluid.optimizer.SGD(learning_rate=1e-3) sgd.minimize(loss) # network 2: clip parameter gradient by name with fluid.program_guard(fluid.Program(), fluid.Program()): loss = network() fluid.clip.set_gradient_clip( fluid.clip.GradientClipByValue(min=-1.0, max=1.0), param_list=["fc1_param", "fc2_param"]) sgd = fluid.optimizer.SGD(learning_rate=1e-3) sgd.minimize(loss) # network 3: clip parameter gradient by value with fluid.program_guard(fluid.Program(), fluid.Program()): loss = network() param_var1 = fluid.default_main_program().global_block().var("fc1_param") param_var2 = fluid.default_main_program().global_block().var("fc2_param") fluid.clip.set_gradient_clip( fluid.clip.GradientClipByValue(min=-1.0, max=1.0), param_list=[param_var1, param_var2]) sgd = fluid.optimizer.SGD(learning_rate=1e-3) sgd.minimize(loss) # network 4: use 'set_gradient_clip' and 'optimize(grad_clip=clip)' together with fluid.program_guard(fluid.Program(), fluid.Program()): loss = network() clip1 = fluid.clip.GradientClipByValue(min=-1.0, max=1.0) clip2 = fluid.clip.GradientClipByNorm(clip_norm=1.0) # Set the gradient clipping strategy: clip1 fluid.clip.set_gradient_clip(clip1) # Set the gradient clipping strategy: clip2 sgd = fluid.optimizer.SGD(learning_rate=1e-3, grad_clip=clip2) sgd.minimize(loss) # 'set_gradient_clip' will not take effect when setting has a conflict, # and the gradient clipping strategy will be 'clip2' 
