# cross_entropy¶

paddle.nn.functional. cross_entropy ( input, label, weight=None, ignore_index=- 100, reduction='mean', soft_label=False, axis=- 1, use_softmax=True, label_smoothing=0.0, name=None ) [source]

By default, the cross entropy loss function is implemented using softmax. This function combines the calculation of the softmax operation and the cross entropy loss function to provide a more numerically stable computing.

Calculate the cross entropy loss function without softmax when use_softmax=False.

By default, calculate the mean of the result, and you can also affect the default behavior by using the reduction parameter. Please refer to the part of parameters for details.

Can be used to calculate the softmax cross entropy loss with soft and hard labels. Where, the hard labels mean the actual label value, 0, 1, 2, etc. And the soft labels mean the probability of the actual label, 0.6, 0.8, 0.2, etc.

The calculation includes the following two steps.

• 1.softmax cross entropy

1. Hard label (each sample can only be assigned into one category)

1.1. when use_softmax=True

$\begin{split}\\loss_j=-\text{logits}_{label_j}+\log\left(\sum_{i=0}^{C}\exp(\text{logits}_i)\right) , j = 1,...,N\end{split}$

where, N is the number of samples and C is the number of categories.

1.2. when use_softmax=False

$\begin{split}\\loss_j=-\log\left({P}_{label_j}\right) , j = 1,...,N\end{split}$

where, N is the number of samples and C is the number of categories, P is input(the output of softmax).

1. Soft label (each sample is assigned to multiple categories with a certain probability, and the probability sum is 1).

2.1. when use_softmax=True

$\begin{split}\\loss_j=-\sum_{i=0}^{C}\text{label}_i\left(\text{logits}_i-\log\left(\sum_{i=0}^{C}\exp(\text{logits}_i)\right)\right) , j = 1,...,N\end{split}$

where, N is the number of samples and C is the number of categories.

2.2. when use_softmax=False

$\begin{split}\\loss_j=-\sum_{j=0}^{C}\left({label}_j*\log\left({P}_{label_j}\right)\right) , j = 1,...,N\end{split}$

where, N is the number of samples and C is the number of categories, P is input(the output of softmax).

• 2. Weight and reduction processing

1. Weight

If the weight parameter is None , go to the next step directly.

If the weight parameter is not None , the cross entropy of each sample is weighted by weight according to soft_label = False or True as follows.

1.1. Hard labels (soft_label = False)

$\begin{split}\\loss_j=loss_j*weight[label_j]\end{split}$

1.2. Soft labels (soft_label = True)

$\begin{split}\\loss_j=loss_j*\sum_{i}\left(weight[label_i]*logits_i\right)\end{split}$
2. reduction

2.1 if the reduction parameter is none

Return the previous result directly

2.2 if the reduction parameter is sum

Return the sum of the previous results

$\begin{split}\\loss=\sum_{j}loss_j\end{split}$

2.3 if the reduction parameter is mean , it will be processed according to the weight parameter as follows.

2.3.1. If the weight parameter is None

Return the average value of the previous results

\begin{align}\begin{aligned}\begin{split}\\loss=\sum_{j}loss_j/N\end{split}\\ where, N is the number of samples and C is the number of categories.\end{aligned}\end{align}

2.3.2. If the ‘weight’ parameter is not ‘None’, the weighted average value of the previous result will be returned

1. Hard labels (soft_label = False)

$\begin{split}\\loss=\sum_{j}loss_j/\sum_{j}weight[label_j]\end{split}$
1. Soft labels (soft_label = True)

$\begin{split}\\loss=\sum_{j}loss_j/\sum_{j}\left(\sum_{i}weight[label_i]\right)\end{split}$
Parameters
• input (Tensor) –

the data type is float32, float64. Shape is $$[N_1, N_2, ..., N_k, C]$$, where C is number of classes, k >= 1 .

Note

1. when use_softmax=True, it expects unscaled logits. This operator should not be used with the output of softmax operator, which will produce incorrect results.

2. when use_softmax=False, it expects the output of softmax operator.

• label (Tensor) –

1. If soft_label=False, the shape is $$[N_1, N_2, ..., N_k]$$ or $$[N_1, N_2, ..., N_k, 1]$$, k >= 1. the data type is int32, int64, float32, float64, where each value is [0, C-1].

2. If soft_label=True and no label_smoothing, the shape and data type should be same with input , and the sum of the labels for each sample should be 1.

3. If has label_smoothing, (i.e. label_smoothing > 0.0), no matter what soft_label is, the shape and data type of label could be either the situation 1 or situation 2. In other words, if label_smoothing > 0.0, the format of label could be one-hot label or integer label.

• weight (Tensor, optional) – a manual rescaling weight given to each class. If given, has to be a Tensor of size C and the data type is float32, float64. Default is 'None' .

• ignore_index (int64, optional) – Specifies a target value that is ignored and does not contribute to the loss. A negative value means that no label value needs to be ignored. Only valid when soft_label = False. Default is -100 .

• reduction (str, optional) – Indicate how to average the loss by batch_size, the candidates are 'none' | 'mean' | 'sum'. If reduction is 'mean', the reduced mean loss is returned; If size_average is 'sum', the reduced sum loss is returned. If reduction is 'none', the unreduced loss is returned. Default is 'mean'.

• soft_label (bool, optional) – Indicate whether label is soft. Default is False.

• label_smoothing (float, optional) – A float in [0.0, 1.0]. Specifies the amount of smoothing when computing the loss, where 0.0 means no smoothing. The targets become a mixture of the original ground truth and a uniform distribution as described in paper ‘Rethinking the Inception Architecture for Computer Vision’. Default is 0.0.

• axis (int, optional) – The index of dimension to perform softmax calculations. It should be in range $$[-1, rank - 1]$$, where $$rank$$ is the number of dimensions of input input. Default is -1 .

• use_softmax (bool, optional) – Indicate whether compute softmax before cross_entropy. Default is True.

• name (str, optional) – The name of the operator. Default is None . For more information, please refer to Name .

Returns

Tensor. Return the softmax cross_entropy loss of input and label. The data type is the same as input.

If reduction is 'mean' or 'sum' , the dimension of return value is 1.

If reduction is 'none':

1. If soft_label = False, the dimension of return value is the same with label .

2. if soft_label = True, the dimension of return value is $$[N_1, N_2, ..., N_k, 1]$$ .

Examples

>>> # hard labels
>>> N=100
>>> C=200
>>> reduction='mean'
>>> input =  paddle.rand([N, C], dtype='float64')
>>> label =  paddle.randint(0, C, shape=[N], dtype='int64')

...     weight=weight, reduction=reduction)
>>> dy_ret = cross_entropy_loss(
...                             input,
...                             label)

>>> print(dy_ret)
5.35419278)

>>> # soft labels
>>> # case1: soft labels without label_smoothing
>>> axis = -1
>>> N = 4
>>> C = 3
>>> shape = [N, C]
>>> reduction='mean'
>>> weight = None
>>> logits = paddle.uniform(shape, dtype='float64', min=0.1, max=1.0)
>>> labels = paddle.uniform(shape, dtype='float64', min=0.1, max=1.0)
>>> labels /= paddle.sum(labels, axis=axis, keepdim=True)
...                                                         logits,
...                                                         labels,
...                                                         soft_label=True,
...                                                         axis=axis,
...                                                         weight=weight,
...                                                         reduction=reduction)
1.12801195)

>>> # case2: soft labels with label_smoothing
>>> axis = -1
>>> N = 4
>>> C = 3
>>> shape = [N, C]
>>> label_smoothing = 0.4
>>> reduction='mean'
>>> weight = None
>>> logits = paddle.uniform(shape, dtype='float64', min=0.1, max=1.0)
>>> integer_labels = paddle.randint(low=0, high=C, shape=[N], dtype='int64')

>>> # integer labels
...                                                         logits,
...                                                         integer_labels,
...                                                         axis=axis,
...                                                         weight=weight,
...                                                         label_smoothing=label_smoothing,
...                                                         reduction=reduction)
1.08317309)

>>> # one_hot labels