class paddle.fluid.clip.GradientClipByGlobalNorm(clip_norm, group_name='default_group', need_clip=None)[source]

Given a list of Tensor $$t\_list$$ , calculate the global norm for the elements of all tensors in $$t\_list$$ , and limit it to clip_norm .

• If the global norm is greater than clip_norm , all elements of $$t\_list$$ will be compressed by a ratio.

• If the global norm is less than or equal to clip_norm , nothing will be done.

The list of Tensor $$t\_list$$ is not passed from this class, but the gradients of all parameters in Program . If need_clip is not None, then only part of gradients can be selected for gradient clipping.

Gradient clip will takes effect after being set in optimizer , see the document optimizer (for example: SGDOptimizer).

The clipping formula is:

$t\_list[i] = t\_list[i] * \frac{clip\_norm}{\max(global\_norm, clip\_norm)}$

where:

$global\_norm = \sqrt{\sum_{i=0}^{N-1}(l2norm(t\_list[i]))^2}$
Parameters
• clip_norm (float) – The maximum norm value.

• group_name (str, optional) – The group name for this clip. Default value is default_group

• need_clip (function, optional) – Type: function. This function accepts a Parameter and returns bool (True: the gradient of this Parameter need to be clipped, False: not need). Default: None, and gradients of all parameters in the network will be clipped.

Examples

# use for Static mode
import paddle.fluid as fluid
import numpy as np

main_prog = fluid.Program()
startup_prog = fluid.Program()
with fluid.program_guard(
main_program=main_prog, startup_program=startup_prog):
image = fluid.data(
name='x', shape=[-1, 2], dtype='float32')
predict = fluid.layers.fc(input=image, size=3, act='relu') # Trainable parameters: fc_0.w.0, fc_0.b.0
loss = fluid.layers.mean(predict)

# Clip all parameters in network:

# Clip a part of parameters in network: (e.g. fc_0.w_0)
# pass a function(fileter_func) to need_clip, and fileter_func receive a ParamBase, and return bool
# def fileter_func(Parameter):
# # It can be easily filtered by Parameter.name (name can be set in fluid.ParamAttr, and the default name is fc_0.w_0, fc_0.b_0)
#   return Parameter.name=="fc_0.w_0"
# clip = fluid.clip.GradientClipByGlobalNorm(clip_norm=1.0, need_clip=fileter_func)

sgd_optimizer = fluid.optimizer.SGDOptimizer(learning_rate=0.1, grad_clip=clip)
sgd_optimizer.minimize(loss)

place = fluid.CPUPlace()
exe = fluid.Executor(place)
x = np.random.uniform(-100, 100, (10, 2)).astype('float32')
exe.run(startup_prog)
out = exe.run(main_prog, feed={'x': x}, fetch_list=loss)

# use for Dygraph mode
import paddle.fluid as fluid

with fluid.dygraph.guard():
linear = fluid.dygraph.Linear(10, 10)  # Trainable: linear_0.w.0, linear_0.b.0
inputs = fluid.layers.uniform_random([32, 10]).astype('float32')
out = linear(fluid.dygraph.to_variable(inputs))
loss = fluid.layers.reduce_mean(out)
loss.backward()

# Clip all parameters in network:

# Clip a part of parameters in network: (e.g. linear_0.w_0)
# pass a function(fileter_func) to need_clip, and fileter_func receive a ParamBase, and return bool
# def fileter_func(ParamBase):
# # It can be easily filtered by ParamBase.name(name can be set in fluid.ParamAttr, and the default name is linear_0.w_0, linear_0.b_0)
#   return ParamBase.name == "linear_0.w_0"
# # Note: linear.weight and linear.bias can return the weight and bias of dygraph.Linear, respectively, and can be used to filter
#   return ParamBase.name == linear.weight.name
# clip = fluid.clip.GradientClipByGlobalNorm(clip_norm=1.0, need_clip=fileter_func)

sgd_optimizer = fluid.optimizer.SGD(