class paddle.nn. NLLLoss ( weight=None, ignore_index=- 100, reduction='mean', name=None ) [source]

This class accepts input and target label and returns negative log likelihood cross error. It is useful to train a classification problem with C classes.

The input for the loss is expected to contain log-probabilities of each classes. It has to be a Tensor of size either (batch_size, C) or (batch_size, C, d1, d2, …, dK) with K >= 1 for the K-dimensional case. The label for the loss should be a class index in the range [0, C-1] where C is the number of classes. If ignore_index is specified, the specified target value does not contribute to the input gradient.

If the optional argument weight is provided, it should be a 1D Tensor assigning weight to each of the classed. This is particularly useful when you have an unbalanced training set.

The loss is calculated as follows. The unreduced (i.e. with reduction set to 'none') loss can be described as:

\[\ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \quad l_n = - w_{y_n} x_{n,y_n}, \quad w_{c} = \text{weight}[c] \cdot \mathbb{1}\{c \not= \text{ignore_index}\},\]

where \(N\) is the batch size. If reduction is not 'none' (default 'mean'), then

\[\begin{split}\ell(x, y) = \left\{ \begin{array}{lcl} \sum_{n=1}^N \frac{1}{\sum_{n=1}^N w_{y_n}} l_n, & \text{if reduction} = \text{'mean';}\\ \sum_{n=1}^N l_n, & \text{if reduction} = \text{'sum'.} \end{array} \right.\end{split}\]
  • weight (Tensor, optional) – Weight tensor, a manual rescaling weight given to each class. If given, it has to be a 1D Tensor whose size is [C, ]. Otherwise, it treated as if having all ones. the data type is float32, float64, Default is 'None'.

  • ignore_index (int, optional) – Specifies a target value that is ignored and does not contribute to the input gradient.

  • reduction (str, optional) – Indicate how to average the loss, the candicates are 'none' | 'mean' | 'sum'. Default is 'mean'. If reduction is 'mean', the reduced mean loss is returned; if reduction is 'sum', the reduced sum loss is returned; if reduction is 'none', no reduction will be apllied. Default is 'mean'.

  • name (str, optional) – For details, please refer to Name. Generally, no setting is required. Default is 'None'.

  • input (Tensor): Input tensor, the shape is \([N, C]\), C is the number of classes.

    But in K-dimension situation, the shape is \([N, C, d_1, d_2, ..., d_K]\). The data type is float32, float64.

  • label (Tensor): Label tensor, the shape is \([N,]\) or \([N, d_1, d_2, ..., d_K]\).

    The data type is int64.

  • output (Tensor): the negative log likelihood loss between input x and label.

    If reduction is ‘none’, the shape is [N, *]. If reduction is ‘sum’ or ‘mean’, the shape is [].


>>> import paddle

>>> nll_loss = paddle.nn.loss.NLLLoss()
>>> log_softmax = paddle.nn.LogSoftmax(axis=1)

>>> input = paddle.to_tensor([[0.88103855, 0.9908683 , 0.6226845 ],
...                           [0.53331435, 0.07999352, 0.8549948 ],
...                           [0.25879037, 0.39530203, 0.698465  ],
...                           [0.73427284, 0.63575995, 0.18827209],
...                           [0.05689114, 0.0862954 , 0.6325046 ]], "float32")
>>> log_out = log_softmax(input)
>>> label = paddle.to_tensor([0, 2, 1, 1, 0], "int64")
>>> result = nll_loss(log_out, label)
>>> print(result)
Tensor(shape=[], dtype=float32, place=Place(cpu), stop_gradient=True,
forward ( input, label )


Defines the computation performed at every call. Should be overridden by all subclasses.

  • *inputs (tuple) – unpacked tuple arguments

  • **kwargs (dict) – unpacked dict arguments