paddle.static.nn. row_conv ( input, future_context_size, param_attr=None, act=None ) [source]

Static Graph

Row-convolution operator

The row convolution is called lookahead convolution. This operator was introduced in the following paper for DeepSpeech2:

The main motivation is that a bidirectional RNN, useful in DeepSpeech like speech models, learns representation for a sequence by performing a forward and a backward pass through the entire sequence. However, unlike unidirectional RNNs, bidirectional RNNs are challenging to deploy in an online and low-latency setting. The lookahead convolution incorporates information from future subsequences in a computationally efficient manner to improve unidirectional recurrent neural networks. The row convolution operator is different from the 1D sequence convolution, and is computed as follows:

Given an input sequence $X$ of length $t$ and input dimension $D$, and a filter ($W$) of size $context times D$, the output sequence is convolved as:

$$ out_{i} = \sum_{j=i}^{i + context - 1} X_{j} \cdot W_{j-i} $$

In the above equation:

  • $Out_{i}$: The i-th row of output variable with shape [1, D].

  • $context$: Future context size.

  • $X_{j}$: The j-th row of input variable with shape [1, D].

  • $W_{j-i}$: The (j-i)-th row of parameters with shape [1, D].

More details about row_conv please refer to the design document .

  • input (Variable) – the input(X) is a LodTensor or tensor, LodTensor(X) supports variable time-length input sequences. The underlying tensor in this LoDTensor is a matrix with shape (T x N), where T is the total time steps in this mini-batch and N is the input data dimension. the shape of Tensor input(X) has shape (B x T x N), B is batch size;.

  • future_context_size (int) – Future context size. Please note, the shape of convolution kernel is [future_context_size + 1, D].

  • param_attr (ParamAttr) – Attributes of parameters, including name, initializer etc.

  • act (str) – Non-linear activation to be applied to output variable.


the output(Out) is a LodTensor or Tensor, which has same type and same shape as X.


# for LodTensor inputs
import paddle
x ='x', shape=[9, 16],
                       dtype='float32', lod_level=1)
out = paddle.static.nn.row_conv(input=x, future_context_size=2)
# for Tensor inputs
x ='x', shape=[9, 4, 16], dtype='float32')
out = paddle.static.nn.row_conv(input=x, future_context_size=2)