LSTM

class paddle.nn. LSTM ( input_size: int, hidden_size: int, num_layers: int = 1, direction: _DirectionType | str = 'forward', time_major: bool = False, dropout: float = 0.0, weight_ih_attr: ParamAttrLike | None = None, weight_hh_attr: ParamAttrLike | None = None, bias_ih_attr: ParamAttrLike | None = None, bias_hh_attr: ParamAttrLike | None = None, proj_size: int = 0, name: str | None = None ) [source]

Multilayer LSTM. It takes a sequence and an initial state as inputs, and returns the output sequences and the final states.

Each layer inside the LSTM maps the input sequences and initial states to the output sequences and final states in the following manner: at each step, it takes step inputs(\(x_{t}\)) and previous states(\(h_{t-1}, c_{t-1}\)) as inputs, and returns step outputs(\(y_{t}\)) and new states(\(h_{t}, c_{t}\)).

\[ \begin{align}\begin{aligned}i_{t} & = \sigma(W_{ii}x_{t} + b_{ii} + W_{hi}h_{t-1} + b_{hi})\\f_{t} & = \sigma(W_{if}x_{t} + b_{if} + W_{hf}h_{t-1} + b_{hf})\\o_{t} & = \sigma(W_{io}x_{t} + b_{io} + W_{ho}h_{t-1} + b_{ho})\\\widetilde{c}_{t} & = \tanh (W_{ig}x_{t} + b_{ig} + W_{hg}h_{t-1} + b_{hg})\\c_{t} & = f_{t} * c_{t-1} + i_{t} * \widetilde{c}_{t}\\h_{t} & = o_{t} * \tanh(c_{t})\\y_{t} & = h_{t}\end{aligned}\end{align} \]

If proj_size is specified, the dimension of hidden state \(h_{t}\) will be projected to proj_size:

\[h_{t} = h_{t}W_{proj\_size}\]

where \(\sigma\) is the sigmoid function, and * is the elementwise multiplication operator.

Using key word arguments to construct is recommended.

Parameters

input_size (int) – The input size of \(x\) for the first layer’s cell.
hidden_size (int) – The hidden size of \(h\) for each layer’s cell.
num_layers (int, optional) – Number of recurrent layers. Defaults to 1.
direction (str, optional) – The direction of the network. It can be “forward” or “bidirect”(or “bidirectional”). When “bidirect”, the way to merge outputs of forward and backward is concatenating. Defaults to “forward”.
time_major (bool, optional) – Whether the first dimension of the input means the time steps. If time_major is True, the shape of Tensor is [time_steps,batch_size,input_size], otherwise [batch_size, time_steps,input_size]. Defaults to False. time_steps means the length of input sequence.
dropout (float, optional) – The dropout probability. Dropout is applied to the input of each layer except for the first layer. The range of dropout from 0 to 1. Defaults to 0.
weight_ih_attr (ParamAttr|None, optional) – The parameter attribute for weight_ih of each cell. Default: None.
weight_hh_attr (ParamAttr|None, optional) – The parameter attribute for weight_hh of each cell. Default: None.
bias_ih_attr (ParamAttr|None, optional) – The parameter attribute for the bias_ih of each cells. Default: None.
bias_hh_attr (ParamAttr|None, optional) – The parameter attribute for the bias_hh of each cells. Default: None.
proj_size (int, optional) – If specified, the output hidden state of each layer will be projected to proj_size. proj_size must be smaller than hidden_size. Default: 0.
name (str|None, optional) – Name for the operation (optional, default is None). For more information, please refer to Name.

Inputs:

inputs (Tensor): the input sequence. If time_major is True, the shape is [time_steps, batch_size, input_size], else, the shape is [batch_size, time_steps, input_size]. time_steps means the length of the input sequence.
initial_states (list|tuple, optional): the initial state, a list/tuple of (h, c), the shape of each is [num_layers * num_directions, batch_size, hidden_size]. If initial_state is not given, zero initial states are used.
sequence_length (Tensor, optional): shape [batch_size], dtype: int64 or int32. The valid lengths of input sequences. Defaults to None. If sequence_length is not None, the inputs are treated as padded sequences. In each input sequence, elements whose time step index are not less than the valid length are treated as paddings.

Returns

outputs (Tensor). The output sequence. If time_major is True, the shape is [time_steps, batch_size, num_directions * hidden_size]. If proj_size is specified, shape will be [time_major, batch_size, num_directions * proj_size]. If time_major is False, the shape is [batch_size, time_steps, num_directions * hidden_size]. Note that num_directions is 2 if direction is “bidirectional” else 1. time_steps means the length of the output sequence.
final_states (tuple). The final state, a tuple of two tensors, h and c. The shape of each is [num_layers * num_directions, batch_size, hidden_size]. If proj_size is specified, the last dimension of h will be proj_size.

Note that num_directions is 2 if direction is “bidirectional” (the index of forward states are 0, 2, 4, 6… and the index of backward states are 1, 3, 5, 7…), else 1.

Variables:

weight_ih_l[k]: the learnable input-hidden weights of the k-th layer. If k = 0, the shape is [hidden_size, input_size]. Otherwise, the shape is [hidden_size, num_directions * hidden_size].
weight_hh_l[k]: the learnable hidden-hidden weights of the k-th layer, with shape [hidden_size, hidden_size].
bias_ih_l[k]: the learnable input-hidden bias of the k-th layer, with shape [hidden_size].
bias_hh_l[k]: the learnable hidden-hidden bias of the k-th layer, with shape [hidden_size].

Examples

>>> import paddle

>>> rnn = paddle.nn.LSTM(16, 32, 2)

>>> x = paddle.randn((4, 23, 16))
>>> prev_h = paddle.randn((2, 4, 32))
>>> prev_c = paddle.randn((2, 4, 32))
>>> y, (h, c) = rnn(x, (prev_h, prev_c))

>>> print(y.shape)
[4, 23, 32]
>>> print(h.shape)
[2, 4, 32]
>>> print(c.shape)
[2, 4, 32]