paddle.nn.functional. sigmoid_focal_loss ( logit, label, normalizer=None, alpha=0.25, gamma=2.0, reduction='sum', name=None ) [source]

Focal Loss is proposed to address the foreground-background class imbalance for classification tasks. It down-weights easily-classified examples and thus focuses training on hard examples. For example, it is used in one-stage object detection where the foreground-background class imbalance is extremely high.

This operator measures focal loss function as follows:

\[Out = -Labels * alpha * {(1 - \sigma(Logit))}^{gamma}\log(\sigma(Logit)) - (1 - Labels) * (1 - alpha) * {\sigma(Logit)}^{gamma}\log(1 - \sigma(Logit))\]

We know that \(\sigma(Logit) = \frac{1}{1 + \exp(-Logit)}\).

Then, if normalizer is not None, this operator divides the normalizer tensor on the loss Out:

\[Out = \frac{Out}{normalizer}\]

Finally, this operator applies reduce operation on the loss. If reduction set to 'none', the operator will return the original loss Out. If reduction set to 'mean', the reduced mean loss is \(Out = MEAN(Out)\). If reduction set to 'sum', the reduced sum loss is \(Out = SUM(Out)\).

Note that the target label is 0 for the negative class and is 1 for the positive class.

  • logit (Tensor) – The input logit tensor. The shape is [N, *], where N is batch_size, * means any number of additional dimensions. The logit is usually the output of a convolution layer. Available dtype is float32, float64.

  • label (Tensor) – The target label tensor with the same shape as logit. The target label whose value should be numbers between 0 and 1. Available dtype is float32, float64.

  • normalizer (Tensor, optional) – The number normalizes the focal loss. It has to be a 1-D Tensor with shape [1, ] or 0-D Tensor with shape []. The data type is float32, float64. For object detection task, it is the number of positive samples. If set to None, the focal loss will not be normalized. Default is None.

  • alpha (int|float, optional) – Hyper-parameter to balance the positive and negative example, it should be between 0 and 1. Default value is set to 0.25.

  • gamma (int|float, optional) – Hyper-parameter to modulate the easy and hard examples. Default value is set to 2.0.

  • reduction (str, optional) – Indicate how to average the loss by batch_size, the candidates are 'none' | 'mean' | 'sum'. If reduction is 'none', the unreduced loss is returned; If reduction is 'mean', the reduced mean loss is returned; If reduction is 'sum', the summed loss is returned. Default is 'sum'.

  • name (str, optional) – Name for the operation (optional, default is None). For more information, please refer to Name.


Tensor, if reduction is 'mean' or 'sum', the out shape is \([]\), otherwise the shape is the same as logit. The same dtype as logit tensor.


>>> import paddle

>>> logit = paddle.to_tensor([[0.97, 0.91, 0.03], [0.55, 0.43, 0.71]], dtype='float32')
>>> label = paddle.to_tensor([[1.0, 0.0, 0.0], [0.0, 1.0, 0.0]], dtype='float32')
>>> one = paddle.to_tensor([1.], dtype='float32')
>>> fg_label = paddle.greater_equal(label, one)
>>> fg_num = paddle.sum(paddle.cast(fg_label, dtype='float32'))
>>> output = paddle.nn.functional.sigmoid_focal_loss(logit, label, normalizer=fg_num)
>>> print(output)
Tensor(shape=[], dtype=float32, place=Place(cpu), stop_gradient=True,