class paddle.fluid.dygraph.NLLLoss(weight=None, reduction='mean', ignore_index=-100)[source]

This op accepts input and target label and returns negative log likelihood cross error. It is useful to train a classification problem with C classes.

The input for the loss is epected to contain log-probabilities of each classes. It hs to be a Tensor of size either (batch_size, C) or (batch_size, C, d1, d2, …, dK) with K >= 1 for the K-dimensional case. The label for the loss should be a class index in the range [0, C-1] where C is the number of classes. If ignore_index is specified, the specified target value does not contribute to the input gradient.

If the optional argument weight is provided, it should be a 1D Tensor assigning weight to each of the classed. This is particularly useful when you have an unbalanced training set.

The loss is calculated as follows. The unreduced (i.e. with reduction set to 'none') loss can be described as:

\[\ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \quad l_n = - w_{y_n} x_{n,y_n}, \quad w_{c} = \text{weight}[c] \cdot \mathbb{1}\{c \not= \text{ignore\_index}\},\]

where \(N\) is the batch size. If reduction is not 'none' (default 'mean'), then

\[\begin{split}\ell(x, y) = \begin{cases} \sum_{n=1}^N \frac{1}{\sum_{n=1}^N w_{y_n}} l_n, & \text{if reduction} = \text{'mean';}\\ \sum_{n=1}^N l_n, & \text{if reduction} = \text{'sum'.} \end{cases}\end{split}\]
  • input (Variable) – Input tensor, the data type is float32, float64.

  • label (Variable) – Label tensor, the data type is int64_t.

  • weight (Variable, optional) – Weight tensor, a manual rescaling weight given to each class. If given, it has to be a Tensor of size C. Otherwise, it treated as if having all ones. the data type is float32, float64, Default is 'None'.

  • reduction (str, optional) – Indicate how to average the loss, the candicates are 'none' | 'mean' | 'sum'. If reduction is 'mean', the reduced mean loss is returned; Default is 'mean'.

  • ignore_index (int64, optional) – Specifies a target value that is ignored and does not contribute to the input gradient.


The tensor variable storing the nll_loss.

Return type: Variable.


# declarative mode
import paddle.fluid as fluid
import numpy as np

input_np = np.random.random(size=(10, 10)).astype(np.float32)
label_np = np.random.randint(0, 10, size=(10,)).astype(np.int64)
prog = fluid.Program()
startup_prog = fluid.Program()
place = fluid.CPUPlace()
with fluid.program_guard(prog, startup_prog):
    input ='input', shape=[10, 10], dtype='float32')
    label ='label', shape=[10], dtype='int64')
    nll_loss = fluid.dygraph.NLLLoss()
    res = nll_loss(input, label)

    exe = fluid.Executor(place)
    static_result =
        feed={"input": input_np,
              "label": label_np},

# imperative mode
import paddle.fluid.dygraph as dg
with dg.guard(place) as g:
    input = dg.to_variable(input_np)
    label = dg.to_variable(label_np)
    output = nll_loss(input, label)
forward(input, label)

Defines the computation performed at every call. Should be overridden by all subclasses.

  • *inputs (tuple) – unpacked tuple arguments

  • **kwargs (dict) – unpacked dict arguments