ClipGradByGlobalNorm

class paddle.nn. ClipGradByGlobalNorm ( clip_norm, group_name='default_group', auto_skip_clip=False ) [source]

Given a list of Tensor \(t\_list\) , calculate the global norm for the elements of all tensors in \(t\_list\) , and limit it to clip_norm .

  • If the global norm is greater than clip_norm , all elements of \(t\_list\) will be compressed by a ratio.

  • If the global norm is less than or equal to clip_norm , nothing will be done.

The list of Tensor \(t\_list\) is not passed from this class, but the gradients of all parameters set in optimizer. If need_clip of specific param is False in its ParamAttr, then the gradients of this param will not be clipped.

Gradient clip will takes effect after being set in optimizer , see the document optimizer (for example: SGD).

The clipping formula is:

\[t\_list[i] = t\_list[i] * \frac{clip\_norm}{\max(global\_norm, clip\_norm)}\]

where:

\[global\_norm = \sqrt{\sum_{i=0}^{N-1}(l2norm(t\_list[i]))^2}\]

Note

need_clip of ClipGradyGlobalNorm HAS BEEN DEPRECATED since 2.0. Please use need_clip in ParamAttr to specify the clip scope.

Parameters
  • clip_norm (float) – The maximum norm value.

  • group_name (str, optional) – The group name for this clip. Default value is default_group.

  • auto_skip_clip (bool, optional) – skip clipping gradient. Default value is False.

Examples

>>> import paddle
>>> x = paddle.uniform([10, 10], min=-1.0, max=1.0, dtype='float32')
>>> linear = paddle.nn.Linear(in_features=10, out_features=10,
...                           weight_attr=paddle.ParamAttr(need_clip=True),
...                           bias_attr=paddle.ParamAttr(need_clip=False))
>>> out = linear(x)
>>> loss = paddle.mean(out)
>>> loss.backward()

>>> clip = paddle.nn.ClipGradByGlobalNorm(clip_norm=1.0)
>>> sdg = paddle.optimizer.SGD(learning_rate=0.1, parameters=linear.parameters(), grad_clip=clip)
>>> sdg.step()