BCEWithLogitsLoss¶
- class paddle.nn. BCEWithLogitsLoss ( weight=None, reduction='mean', pos_weight=None, name=None ) [source]
-
Combine the sigmoid layer and the BCELoss layer.
This measures the element-wise probability error in classification tasks in which each class is independent. This can be thought of as predicting labels for a data-point, where labels are not mutually exclusive. For example, a news article can be about politics, technology or sports at the same time or none of these.
Firstly, calculate loss function as follows:
\[Out = -Labels * \log(\sigma(Logit)) - (1 - Labels) * \log(1 - \sigma(Logit))\]We know that \(\sigma(Logit) = \frac{1}{1 + e^{-Logit}}\). By substituting this we get:
\[Out = Logit - Logit * Labels + \log(1 + e^{-Logit})\]For stability and to prevent overflow of \(e^{-Logit}\) when Logit < 0, we reformulate the loss as follows:
\[Out = \max(Logit, 0) - Logit * Labels + \log(1 + e^{-\|Logit\|})\]Then, if
weight
orpos_weight
is not None, then multiply the weight tensor on the loss Out. Theweight
tensor will attach different weight on every items in the batch. Thepos_weight
will attach different weight on the positive label of each class.Finally, apply reduce operation on the loss. If
reduction
set to'none'
, will return the original loss Out. Ifreduction
set to'mean'
, the reduced mean loss is \(Out = MEAN(Out)\). Ifreduction
set to'sum'
, the reduced sum loss is \(Out = SUM(Out)\).Note that the target labels
label
should be numbers between 0 and 1.- Parameters
-
weight (Tensor, optional) – A manual rescaling weight given to the loss of each batch element. If given, it has to be a 1D Tensor whose size is [N, ], The data type is float32, float64. Default is
'None'
.reduction (str, optional) – Indicate how to average the loss by batch_size, the candidates are
'none'
|'mean'
|'sum'
. Ifreduction
is'none'
, the unreduced loss is returned; Ifreduction
is'mean'
, the reduced mean loss is returned; Ifreduction
is'sum'
, the summed loss is returned. Default is'mean'
.pos_weight (Tensor, optional) – A weight of positive examples. Must be a vector with length equal to the number of classes. The data type is float32, float64. Default is
'None'
.name (str, optional) – Name for the operation (optional, default is None). For more information, please refer to Name.
- Shapes:
-
logit (Tensor): The input predications tensor. 2-D tensor with shape: [N, *], N is batch_size, * means number of additional dimensions. The
logit
is usually the output of Linear layer. Available dtype is float32, float64.label (Tensor): The target labels tensor. 2-D tensor with the same shape as
logit
. The target labels which values should be numbers between 0 and 1. Available dtype is float32, float64.output (Tensor): If
reduction
is'none'
, the shape of output is same aslogit
, else the shape of output is scalar.
- Returns
-
A callable object of BCEWithLogitsLoss.
Examples
>>> import paddle >>> logit = paddle.to_tensor([5.0, 1.0, 3.0], dtype="float32") >>> label = paddle.to_tensor([1.0, 0.0, 1.0], dtype="float32") >>> bce_logit_loss = paddle.nn.BCEWithLogitsLoss() >>> output = bce_logit_loss(logit, label) >>> print(output) Tensor(shape=[], dtype=float32, place=Place(cpu), stop_gradient=True, 0.45618808)
-
forward
(
logit,
label
)
forward¶
-
Defines the computation performed at every call. Should be overridden by all subclasses.
- Parameters
-
*inputs (tuple) – unpacked tuple arguments
**kwargs (dict) – unpacked dict arguments