binary_cross_entropy_with_logits

paddle.nn.functional. binary_cross_entropy_with_logits ( logit, label, weight=None, reduction='mean', pos_weight=None, name=None ) [source]

This operator combines the sigmoid layer and the api_nn_loss_BCELoss layer. Also, we can see it as the combine of sigmoid_cross_entropy_with_logits layer and some reduce operations.

This measures the element-wise probability error in classification tasks in which each class is independent. This can be thought of as predicting labels for a data-point, where labels are not mutually exclusive. For example, a news article can be about politics, technology or sports at the same time or none of these.

First this operator calculate loss function as follows:

\[\begin{split}Out = -Labels * \\log(\\sigma(Logit)) - (1 - Labels) * \\log(1 - \\sigma(Logit))\end{split}\]

We know that \(\\sigma(Logit) = \\frac{1}{1 + e^{-Logit}}\). By substituting this we get:

\[\begin{split}Out = Logit - Logit * Labels + \\log(1 + e^{-Logit})\end{split}\]

For stability and to prevent overflow of \(e^{-Logit}\) when Logit < 0, we reformulate the loss as follows:

\[\begin{split}Out = \\max(Logit, 0) - Logit * Labels + \\log(1 + e^{-\|Logit\|})\end{split}\]

Then, if weight or pos_weight is not None, this operator multiply the weight tensor on the loss Out. The weight tensor will attach different weight on every items in the batch. The pos_weight will attach different weight on the positive label of each class.

Finally, this operator applies reduce operation on the loss. If reduction set to 'none', the operator will return the original loss Out. If reduction set to 'mean', the reduced mean loss is \(Out = MEAN(Out)\). If reduction set to 'sum', the reduced sum loss is \(Out = SUM(Out)\).

Note that the target labels label should be numbers between 0 and 1.

Parameters
  • logit (Tensor) – The input predications tensor. 2-D tensor with shape: [N, *], N is batch_size, * means number of additional dimensions. The logit is usually the output of Linear layer. Available dtype is float32, float64.

  • label (Tensor) – The target labels tensor. 2-D tensor with the same shape as logit. The target labels which values should be numbers between 0 and 1. Available dtype is float32, float64.

  • weight (Tensor, optional) – A manual rescaling weight given to the loss of each batch element. If given, it has to be a 1D Tensor whose size is [N, ], The data type is float32, float64. Default is 'None'.

  • reduction (str, optional) – Indicate how to average the loss by batch_size, the candicates are 'none' | 'mean' | 'sum'. If reduction is 'none', the unreduced loss is returned; If reduction is 'mean', the reduced mean loss is returned; If reduction is 'sum', the summed loss is returned. Default is 'mean'.

  • pos_weight (Tensor, optional) – A weight of positive examples. Must be a vector with length equal to the number of classes. The data type is float32, float64. Default is 'None'.

  • name (str, optional) – Name for the operation (optional, default is None). For more information, please refer to Name.

Returns

If reduction is 'none', the shape of output is

same as logit , else the shape of output is scalar.

Return type

output (Tensor)

Examples

import paddle

logit = paddle.to_tensor([5.0, 1.0, 3.0])
label = paddle.to_tensor([1.0, 0.0, 1.0])
output = paddle.nn.functional.binary_cross_entropy_with_logits(logit, label)
print(output)  # [0.45618808]