BCEWithLogitsLoss¶
- class paddle.nn. BCEWithLogitsLoss ( weight=None, reduction='mean', pos_weight=None, name=None ) [source]
-
This operator combines the sigmoid layer and the api_nn_loss_BCELoss layer. Also, we can see it as the combine of
sigmoid_cross_entropy_with_logitslayer and some reduce operations.This measures the element-wise probability error in classification tasks in which each class is independent. This can be thought of as predicting labels for a data-point, where labels are not mutually exclusive. For example, a news article can be about politics, technology or sports at the same time or none of these.
First this operator calculate loss function as follows:
\[Out = -Labels * \log(\sigma(Logit)) - (1 - Labels) * \log(1 - \sigma(Logit))\]We know that \(\sigma(Logit) = \frac{1}{1 + e^{-Logit}}\). By substituting this we get:
\[Out = Logit - Logit * Labels + \log(1 + e^{-Logit})\]For stability and to prevent overflow of \(e^{-Logit}\) when Logit < 0, we reformulate the loss as follows:
\[Out = \max(Logit, 0) - Logit * Labels + \log(1 + e^{-\|Logit\|})\]Then, if
weightorpos_weightis not None, this operator multiply the weight tensor on the loss Out. Theweighttensor will attach different weight on every items in the batch. Thepos_weightwill attach different weight on the positive label of each class.Finally, this operator applies reduce operation on the loss. If
reductionset to'none', the operator will return the original loss Out. Ifreductionset to'mean', the reduced mean loss is \(Out = MEAN(Out)\). Ifreductionset to'sum', the reduced sum loss is \(Out = SUM(Out)\).Note that the target labels
labelshould be numbers between 0 and 1.- Parameters
-
weight (Tensor, optional) – A manual rescaling weight given to the loss of each batch element. If given, it has to be a 1D Tensor whose size is [N, ], The data type is float32, float64. Default is
'None'.reduction (str, optional) – Indicate how to average the loss by batch_size, the candicates are
'none'|'mean'|'sum'. Ifreductionis'none', the unreduced loss is returned; Ifreductionis'mean', the reduced mean loss is returned; Ifreductionis'sum', the summed loss is returned. Default is'mean'.pos_weight (Tensor, optional) – A weight of positive examples. Must be a vector with length equal to the number of classes. The data type is float32, float64. Default is
'None'.name (str, optional) – Name for the operation (optional, default is None). For more information, please refer to Name.
- Shapes:
-
- logit (Tensor): The input predications tensor. 2-D tensor with shape: [N, *],
-
N is batch_size, * means number of additional dimensions. The
logitis usually the output of Linear layer. Available dtype is float32, float64. - label (Tensor): The target labels tensor. 2-D tensor with the same shape as
-
logit. The target labels which values should be numbers between 0 and 1. Available dtype is float32, float64. -
output (Tensor): If
reductionis'none', the shape of output is -
same as
logit, else the shape of output is scalar.
- Returns
-
A callable object of BCEWithLogitsLoss.
Examples
-
forward
(
logit,
label
)
forward¶
-
Defines the computation performed at every call. Should be overridden by all subclasses.
- Parameters
-
*inputs (tuple) – unpacked tuple arguments
**kwargs (dict) – unpacked dict arguments
