# lstm¶

api_attr

declarative programming (static graph)

paddle.fluid.layers.lstm(input, init_h, init_c, max_len, hidden_size, num_layers, dropout_prob=0.0, is_bidirec=False, is_test=False, name=None, default_initializer=None, seed=-1)[source]
Note:

This OP only supports running on GPU devices.

This OP implements LSTM operation - Hochreiter, S., & Schmidhuber, J. (1997) .

The implementation of this OP does not include diagonal/peephole connections. Please refer to Gers, F. A., & Schmidhuber, J. (2000) . If you need peephole connections, please use dynamic_lstm .

This OP computes each timestep as follows:

$i_t = \sigma(W_{ix}x_{t} + W_{ih}h_{t-1} + b_{x_i} + b_{h_i})$
$f_t = \sigma(W_{fx}x_{t} + W_{fh}h_{t-1} + b_{x_f} + b_{h_f})$
$o_t = \sigma(W_{ox}x_{t} + W_{oh}h_{t-1} + b_{x_o} + b_{h_o})$
$\widetilde{c_t} = tanh(W_{cx}x_t + W_{ch}h_{t-1} + b{x_c} + b_{h_c})$
$c_t = f_t \odot c_{t-1} + i_t \odot \widetilde{c_t}$
$h_t = o_t \odot tanh(c_t)$

The symbolic meanings in the formula are as follows:

• $$x_{t}$$ represents the input at timestep $$t$$

• $$h_{t}$$ represents the hidden state at timestep $$t$$

• $$h_{t-1}, c_{t-1}$$ represent the hidden state and cell state at timestep $$t-1$$ , respectively

• $$\widetilde{c_t}$$ represents the candidate cell state

• $$i_t$$ , $$f_t$$ and $$o_t$$ represent input gate, forget gate, output gate, respectively

• $$W$$ represents weight (e.g., $$W_{ix}$$ is the weight of a linear transformation of input $$x_{t}$$ when calculating input gate $$i_t$$ )

• $$b$$ represents bias (e.g., $$b_{i}$$ is the bias of input gate)

• $$\sigma$$ represents nonlinear activation function for gate, default sigmoid

• $$\odot$$ represents the Hadamard product of a matrix, i.e. multiplying the elements of the same position for two matrices with the same dimension to get another matrix with the same dimension

Parameters
• input (Variable) – LSTM input tensor, 3-D Tensor of shape $$[batch\_size, seq\_len, input\_dim]$$ . Data type is float32 or float64

• init_h (Variable) – The initial hidden state of the LSTM, 3-D Tensor of shape $$[num\_layers, batch\_size, hidden\_size]$$ . If is_bidirec = True, shape should be $$[num\_layers*2, batch\_size, hidden\_size]$$ . Data type is float32 or float64.

• init_c (Variable) – The initial cell state of the LSTM, 3-D Tensor of shape $$[num\_layers, batch\_size, hidden\_size]$$ . If is_bidirec = True, shape should be $$[num\_layers*2, batch\_size, hidden\_size]$$ . Data type is float32 or float64.

• max_len (int) – max length of LSTM. the first dim of input tensor CAN NOT greater than max_len.

• hidden_size (int) – hidden size of the LSTM.

• num_layers (int) – total layers number of the LSTM.

• dropout_prob (float, optional) – dropout prob, dropout ONLY work between rnn layers, NOT between time steps There is NO dropout work on rnn output of the last RNN layers. Default: 0.0.

• is_bidirec (bool, optional) – If it is bidirectional. Default: False.

• is_test (bool, optional) – If it is in test phrase. Default: False.

• name (str, optional) – A name for this layer. If set None, the layer will be named automatically. Default: None.

• default_initializer (Initializer, optional) – Where use initializer to initialize the Weight If set None, default initializer will be used. Default: None.

• seed (int, optional) – Seed for dropout in LSTM, If it’s -1, dropout will use random seed. Default: 1.

Returns

Three tensors, rnn_out, last_h, last_c:

• rnn_out is result of LSTM hidden, shape is $$[seq\_len, batch\_size, hidden\_size]$$ if is_bidirec set to True, shape will be $$[seq\_len, batch\_size, hidden\_size*2]$$

• last_h is the hidden state of the last step of LSTM shape is $$[num\_layers, batch\_size, hidden\_size]$$ if is_bidirec set to True, shape will be $$[num\_layers*2, batch\_size, hidden\_size]$$

• last_c(Tensor): the cell state of the last step of LSTM shape is $$[num\_layers, batch\_size, hidden\_size]$$ if is_bidirec set to True, shape will be $$[num\_layers*2, batch\_size, hidden\_size]$$

Return type

tuple ( Variable , Variable , Variable )

Examples

import paddle.fluid as fluid

emb_dim = 256
vocab_size = 10000
data = fluid.data(name='x', shape=[None, 100], dtype='int64')
emb = fluid.embedding(input=data, size=[vocab_size, emb_dim], is_sparse=True)
batch_size = 20
max_len = 100
dropout_prob = 0.2
input_size = 100
hidden_size = 150
num_layers = 1
init_h = layers.fill_constant( [num_layers, batch_size, hidden_size], 'float32', 0.0 )
init_c = layers.fill_constant( [num_layers, batch_size, hidden_size], 'float32', 0.0 )
rnn_out, last_h, last_c = layers.lstm( emb, init_h, init_c,                     max_len, hidden_size, num_layers,                     dropout_prob=dropout_prob)
rnn_out.shape  # (-1, 100, 150)
last_h.shape  # (1, 20, 150)
last_c.shape  # (1, 20, 150)