paddle.nn.functional. gaussian_nll_loss ( input, label, variance, full=False, epsilon=1e-06, reduction='mean', name=None ) [source]

Gaussian negative log likelihood loss.

Gaussian negative log likelihood loss among input, variance and label. Note that the label is treated as samples from Gaussian distributions. This function is used to train a neural network predicts the input and variance of a gaussian distribution that label are supposed to be coming from. This means input and variance should be functions(the neural network) of some inputs.

For a label having Gaussian distribution with input and variance predicted by neural network the loss is calculated as follows:

\[\text{loss} = \frac{1}{2}\left(\log\left(\text{max}\left(\text{var}, \ \text{epsilon}\right)\right) + \frac{\left(\text{input} - \text{label}\right)^2} {\text{max}\left(\text{var}, \ \text{epsilon}\right)}\right) + \text{const.}\]

where epsilon is used for stability. By default, the constant term of the loss function is omitted unless full is True. If variance is not the same size as input (due to a homoscedastic assumption), it must either have a final dimension of 1 or have one fewer dimension (with all other sizes being the same) for correct broadcasting.

  • input (Tensor) – input tensor, \((N, *)\) or \((*)\) where \(*\) means any number of additional dimensions. Expectation of the Gaussian distribution, available dtype is float32, float64.

  • label (Tensor) – target label tensor, \((N, *)\) or \((*)\), same shape as the input, or same shape as the input but with one dimension equal to 1 (to allow for broadcasting). Sample from the Gaussian distribution, available dtype is float32, float64.

  • variance (Tensor) – tensor of positive variance(s), \((N, *)\) or \((*)\), same shape as the input, or same shape as the input but with one dimension equal to 1, or same shape as the input but with one fewer dimension (to allow for broadcasting). One for each of the expectations in the input (heteroscedastic), or a single one (homoscedastic), available dtype is float32, float64.

  • full (bool, optional) – include the constant term in the loss calculation. Default: False.

  • epsilon (float, optional) – value used to clamp variance (see note below), for stability. Default: 1e-6.

  • reduction (str, optional) – specifies the reduction to apply to the output:'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the output is the average of all batch member losses, 'sum': the output is the sum of all batch member losses. Default: 'mean'.

  • name (str, optional) – Name for the operation (optional, default is None). For more information, please refer to Name.


If reduction is 'none', the shape of output is same as input , else the shape of output is [].

Return type

output (Tensor)

>>> import paddle
>>> import paddle.nn.functional as F
>>> paddle.seed(2023)

>>> input = paddle.randn([5, 2], dtype=paddle.float32)
>>> label = paddle.randn([5, 2], dtype=paddle.float32)
>>> variance = paddle.ones([5, 2], dtype=paddle.float32)

>>> loss = F.gaussian_nll_loss(input, label, variance, reduction='none')
>>> print(loss)
Tensor(shape=[5, 2], dtype=float32, place=Place(cpu), stop_gradient=True,
       [[0.21808575, 1.43013096],
        [1.05245590, 0.00394560],
        [1.20861185, 0.00000062],
        [0.56946373, 0.73300570],
        [0.37142906, 0.12038800]])

>>> loss = F.gaussian_nll_loss(input, label, variance, reduction='mean')
>>> print(loss)
Tensor(shape=[], dtype=float32, place=Place(cpu), stop_gradient=True,


The clamping of variance is ignored with respect to autograd, and so the gradients are unaffected by it.