set_gradient_clip

api_attr

declarative programming (static graph)

paddle.fluid.clip.set_gradient_clip(clip, param_list=None, program=None)[source]

Warning

This API must be used after building network, and before minimize , and it may be removed in future releases, so it is not recommended. It is recommended to set grad_clip when initializing the optimizer , this is a better method to clip gradient. There are three clipping strategies:

System Message: ERROR/3 (/usr/local/lib/python2.7/dist-packages/paddle/fluid/clip.py:docstring of paddle.fluid.clip.set_gradient_clip, line 7)

Unexpected indentation.

To specify parameters that require gradient clip.

Parameters
  • grad_clip (GradientClipBase, optional) – Gradient cliping strategy, it’s an instance of some derived class of GradientClipBase . There are three cliping strategies ( GradientClipByGlobalNorm , GradientClipByNorm , GradientClipByValue ). Default value: None, and there is no gradient clipping.

  • param_list (list(Variable), optional) – Parameters that require gradient clip. It can be a list of parameter or a list of parameter’s name. Default None, meaning that all parameters in the program will be included.

  • program (Program, optional) – The program where parameters are located. Default None, meaning that using default_main_program .

Returns

None

Examples

import paddle.fluid as fluid

def network():
    image = fluid.data(name='image', shape=[
                       None, 28], dtype='float32')
    param_attr1 = fluid.ParamAttr("fc1_param")
    fc1 = fluid.layers.fc(image, size=10, param_attr=param_attr1)
    param_attr2 = fluid.ParamAttr("fc2_param")
    fc2 = fluid.layers.fc(fc1, size=10, param_attr=param_attr2)
    loss = fluid.layers.reduce_mean(fc2)
    return loss


# network 1: clip all parameter gradient
with fluid.program_guard(fluid.Program(), fluid.Program()):
    loss = network()
    fluid.clip.set_gradient_clip(
        fluid.clip.GradientClipByGlobalNorm(clip_norm=2.0))
    sgd = fluid.optimizer.SGD(learning_rate=1e-3)
    sgd.minimize(loss)

# network 2: clip parameter gradient by name
with fluid.program_guard(fluid.Program(), fluid.Program()):
    loss = network()
    fluid.clip.set_gradient_clip(
        fluid.clip.GradientClipByValue(min=-1.0, max=1.0),
        param_list=["fc1_param", "fc2_param"])
    sgd = fluid.optimizer.SGD(learning_rate=1e-3)
    sgd.minimize(loss)

# network 3: clip parameter gradient by value
with fluid.program_guard(fluid.Program(), fluid.Program()):
    loss = network()
    param_var1 = fluid.default_main_program().global_block().var("fc1_param")
    param_var2 = fluid.default_main_program().global_block().var("fc2_param")
    fluid.clip.set_gradient_clip(
        fluid.clip.GradientClipByValue(min=-1.0, max=1.0),
        param_list=[param_var1, param_var2])
    sgd = fluid.optimizer.SGD(learning_rate=1e-3)
    sgd.minimize(loss)

# network 4: use 'set_gradient_clip' and 'optimize(grad_clip=clip)' together
with fluid.program_guard(fluid.Program(), fluid.Program()):
    loss = network()
    clip1 = fluid.clip.GradientClipByValue(min=-1.0, max=1.0)
    clip2 = fluid.clip.GradientClipByNorm(clip_norm=1.0)
    # Set the gradient clipping strategy: clip1
    fluid.clip.set_gradient_clip(clip1)
    # Set the gradient clipping strategy: clip2
    sgd = fluid.optimizer.SGD(learning_rate=1e-3, grad_clip=clip2)
    sgd.minimize(loss)
    # 'set_gradient_clip' will not take effect when setting has a conflict,
    # and the gradient clipping strategy will be 'clip2'