rnnt_loss¶

paddle.nn.functional. rnnt_loss ( input: Tensor, label: Tensor, input_lengths: Tensor, label_lengths: Tensor, blank: int = 0, fastemit_lambda: float = 0.001, reduction: _ReduceMode = 'mean', name: str | None = None ) → Tensor [source]

An operator integrating the open source Warp-Transducer library (https://github.com/b-flo/warp-transducer.git) to compute Sequence Transduction with Recurrent Neural Networks (RNN-T) loss.

Parameters

input (Tensor) – The logprobs sequence with padding, which is a 4-D Tensor. The tensor shape is [B, Tmax, Umax, D], where Tmax is the longest length of input logit sequence. The data type should be float32 or float64.
label (Tensor) – The ground truth sequence with padding, which must be a 2-D Tensor. The tensor shape is [B, Umax], where Umax is the longest length of label sequence. The data type must be int32.
input_lengths (Tensor) – The length for each input sequence, it should have shape [batch_size] and dtype int64.
label_lengths (Tensor) – The length for each label sequence, it should have shape [batch_size] and dtype int64.
blank (int, optional) – The blank label index of RNN-T loss, which is in the half-opened interval [0, B). The data type must be int32. Default is 0.
fastemit_lambda (float, default 0.001) – Regularization parameter for FastEmit (https://arxiv.org/pdf/2010.11148.pdf)
reduction (string, optional) – Indicate how to average the loss, the candidates are 'none' | 'mean' | 'sum'. If reduction is 'mean', the output will be sum of loss and be divided by the batch_size; If reduction is 'sum', return the sum of loss; If reduction is 'none', no reduction will be applied. Default is 'mean'.
name (str|None, optional) – Name for the operation (optional, default is None). For more information, please refer to Name.

Returns

reduction is 'none', the shape of loss is [batch_size], otherwise, the shape of loss is []. Data type is the same as logprobs.

Return type

Tensor, The RNN-T loss between logprobs and labels. If attr

Examples

>>> # declarative mode
>>> import paddle.nn.functional as F
>>> import numpy as np
>>> import paddle
>>> import functools

>>> fn = functools.partial(F.rnnt_loss, reduction='sum', fastemit_lambda=0.0, blank=0)

>>> acts = np.array([[
...     [[0.1, 0.6, 0.1, 0.1, 0.1],
...      [0.1, 0.1, 0.6, 0.1, 0.1],
...      [0.1, 0.1, 0.2, 0.8, 0.1]],
...     [[0.1, 0.6, 0.1, 0.1, 0.1],
...      [0.1, 0.1, 0.2, 0.1, 0.1],
...      [0.7, 0.1, 0.2, 0.1, 0.1]]
... ]])
>>> labels = [[1, 2]]

>>> acts = paddle.to_tensor(acts, stop_gradient=False)

>>> lengths = [acts.shape[1]] * acts.shape[0]
>>> label_lengths = [len(l) for l in labels]
>>> labels = paddle.to_tensor(labels, paddle.int32)
>>> lengths = paddle.to_tensor(lengths, paddle.int32)
>>> label_lengths = paddle.to_tensor(label_lengths, paddle.int32)

>>> costs = fn(acts, labels, lengths, label_lengths)
>>> print(costs)
Tensor(shape=[], dtype=float64, place=Place(cpu), stop_gradient=False,
       -2.85042444)