paddle.nn.functional. smooth_l1_loss ( input, label, reduction='mean', delta=1.0, name=None ) [source]

Calculate smooth_l1_loss. Creates a criterion that uses a squared term if the absolute element-wise error falls below 1 and an L1 term otherwise. In some cases it can prevent exploding gradients and it is more robust and less sensitivity to outliers. Also known as the Huber loss:

\[loss(x,y) = \frac{1}{n}\sum_{i}z_i\]

where \(z_i\) is given by:

\[\begin{split}\mathop{z_i} = \left\{\begin{array}{rcl} 0.5(x_i - y_i)^2 & & {if |x_i - y_i| < \delta} \\ \delta * |x_i - y_i| - 0.5 * \delta^2 & & {otherwise} \end{array} \right.\end{split}\]
  • input (Tensor) – Input tensor, the data type is float32 or float64. Shape is (N, C), where C is number of classes, and if shape is more than 2D, this is (N, C, D1, D2,…, Dk), k >= 1.

  • label (Tensor) – Label tensor, the data type is float32 or float64. The shape of label is the same as the shape of input.

  • reduction (str, optional) – Indicate how to average the loss by batch_size, the candidates are 'none' | 'mean' | 'sum'. If reduction is 'mean', the reduced mean loss is returned; If reduction is 'sum', the reduced sum loss is returned. If reduction is 'none', the unreduced loss is returned. Default is 'mean'.

  • delta (float, optional) – Specifies the hyperparameter \(\delta\) to be used. The value determines how large the errors need to be to use L1. Errors smaller than delta are minimized with L2. Parameter is ignored for negative/zero values. Default = 1.0

  • name (str, optional) – For details, please refer to Name. Generally, no setting is required. Default: None.


Tensor, The tensor variable storing the smooth_l1_loss of input and label.


>>> import paddle
>>> paddle.seed(2023)

>>> input = paddle.rand([3, 3]).astype('float32')
>>> label = paddle.rand([3, 3]).astype('float32')
>>> output = paddle.nn.functional.smooth_l1_loss(input, label)
>>> print(output)
Tensor(shape=[], dtype=float32, place=Place(cpu), stop_gradient=True,