BCEWithLogitsLoss

class paddle.nn. BCEWithLogitsLoss ( weight=None, reduction='mean', pos_weight=None, name=None ) [source]

Combine the sigmoid layer and the BCELoss layer.

This measures the element-wise probability error in classification tasks in which each class is independent. This can be thought of as predicting labels for a data-point, where labels are not mutually exclusive. For example, a news article can be about politics, technology or sports at the same time or none of these.

Firstly, calculate loss function as follows:

\[Out = -Labels * \log(\sigma(Logit)) - (1 - Labels) * \log(1 - \sigma(Logit))\]

We know that \(\sigma(Logit) = \frac{1}{1 + e^{-Logit}}\). By substituting this we get:

\[Out = Logit - Logit * Labels + \log(1 + e^{-Logit})\]

For stability and to prevent overflow of \(e^{-Logit}\) when Logit < 0, we reformulate the loss as follows:

\[Out = \max(Logit, 0) - Logit * Labels + \log(1 + e^{-\|Logit\|})\]

Then, if weight or pos_weight is not None, then multiply the weight tensor on the loss Out. The weight tensor will attach different weight on every items in the batch. The pos_weight will attach different weight on the positive label of each class.

Finally, apply reduce operation on the loss. If reduction set to 'none', will return the original loss Out. If reduction set to 'mean', the reduced mean loss is \(Out = MEAN(Out)\). If reduction set to 'sum', the reduced sum loss is \(Out = SUM(Out)\).

Note that the target labels label should be numbers between 0 and 1.

Parameters
  • weight (Tensor, optional) – A manual rescaling weight given to the loss of each batch element. If given, it has to be a 1D Tensor whose size is [N, ], The data type is float32, float64. Default is 'None'.

  • reduction (str, optional) – Indicate how to average the loss by batch_size, the candidates are 'none' | 'mean' | 'sum'. If reduction is 'none', the unreduced loss is returned; If reduction is 'mean', the reduced mean loss is returned; If reduction is 'sum', the summed loss is returned. Default is 'mean'.

  • pos_weight (Tensor, optional) – A weight of positive examples. Must be a vector with length equal to the number of classes. The data type is float32, float64. Default is 'None'.

  • name (str, optional) – Name for the operation (optional, default is None). For more information, please refer to Name.

Shapes:
  • logit (Tensor): The input predications tensor. 2-D tensor with shape: [N, *], N is batch_size, * means number of additional dimensions. The logit is usually the output of Linear layer. Available dtype is float32, float64.

  • label (Tensor): The target labels tensor. 2-D tensor with the same shape as logit. The target labels which values should be numbers between 0 and 1. Available dtype is float32, float64.

  • output (Tensor): If reduction is 'none', the shape of output is same as logit , else the shape of output is scalar.

Returns

A callable object of BCEWithLogitsLoss.

Examples

>>> import paddle

>>> logit = paddle.to_tensor([5.0, 1.0, 3.0], dtype="float32")
>>> label = paddle.to_tensor([1.0, 0.0, 1.0], dtype="float32")
>>> bce_logit_loss = paddle.nn.BCEWithLogitsLoss()
>>> output = bce_logit_loss(logit, label)
>>> print(output)
Tensor(shape=[], dtype=float32, place=Place(cpu), stop_gradient=True,
0.45618808)
forward ( logit, label )

forward

Defines the computation performed at every call. Should be overridden by all subclasses.

Parameters
  • *inputs (tuple) – unpacked tuple arguments

  • **kwargs (dict) – unpacked dict arguments