LSTMCell

class paddle.fluid.dygraph.rnn. LSTMCell ( hidden_size, input_size, param_attr=None, bias_attr=None, gate_activation=None, activation=None, forget_bias=1.0, use_cudnn_impl=True, dtype='float64' ) [source]

LSTMCell implementation using basic operators. There are two LSTMCell version, the default one is compatible with CUDNN LSTM implementation. The algorithm can be described as the equations below.

\[ \begin{align}\begin{aligned}i_t &= sigmoid(W_{ix}x_{t} + W_{ih}h_{t-1} + bx_i + bh_i)\\f_t &= sigmoid(W_{fx}x_{t} + W_{fh}h_{t-1} + bx_f + bh_f)\\o_t &= sigmoid(W_{ox}x_{t} + W_{oh}h_{t-1} + bx_o + bh_o)\\\begin{split}\\tilde{c_t} &= tanh(W_{cx}x_t + W_{ch}h_{t-1} + bx_c + bh_c)\end{split}\\\begin{split}c_t &= f_t \\odot c_{t-1} + i_t \\odot \\tilde{c_t}\end{split}\\\begin{split}h_t &= o_t \\odot tanh(c_t)\end{split}\end{aligned}\end{align} \]

The other LSTMCell version is compatible with the BasicLSTMUnit used in static graph. The algorithm can be described as the equations below.

\[ \begin{align}\begin{aligned}i_t &= sigmoid(W_{ix}x_{t} + W_{ih}h_{t-1} + b_i)\\f_t &= sigmoid(W_{fx}x_{t} + W_{fh}h_{t-1} + b_f + forget_bias )\\o_t &= sigmoid(W_{ox}x_{t} + W_{oh}h_{t-1} + b_o)\\\begin{split}\\tilde{c_t} &= tanh(W_{cx}x_t + W_{ch}h_{t-1} + b_c)\end{split}\\\begin{split}c_t &= f_t \\odot c_{t-1} + i_t \\odot \\tilde{c_t}\end{split}\\\begin{split}h_t &= o_t \\odot tanh(c_t)\end{split}\end{aligned}\end{align} \]
Parameters
  • hidden_size (integer) – The hidden size used in the Cell.

  • input_size (integer) – The input size used in the Cell.

  • param_attr (ParamAttr|None) – The parameter attribute for the learnable weight matrix. Note: If it is set to None or one attribute of ParamAttr, LSTMCell will create ParamAttr as param_attr. If the Initializer of the param_attr is not set, the parameter is initialized with Xavier. Default: None.

  • bias_attr (ParamAttr|None) – The parameter attribute for the bias of LSTMCell. If it is set to None or one attribute of ParamAttr, LSTMCell will create ParamAttr as bias_attr. If the Initializer of the bias_attr is not set, the bias is initialized as zero. Default: None.

  • gate_activation (function|None) – The activation function for gates (actGate). Default: ‘fluid.layers.sigmoid’

  • activation (function|None) – The activation function for cells (actNode). Default: ‘fluid.layers.tanh’

  • forget_bias (float|1.0) – forget bias used when computing forget gate. This is not used in default LSTMCell implementation (CUDNN compatiable)

  • use_cudnn_impl (bool|True) – whether to use CUDNN compatible LSTMCell

  • dtype (string) – data type used in this cell

Returns

None

Examples

from paddle import fluid
import paddle.fluid.core as core
from paddle.fluid.dygraph import LSTMCell
import numpy as np
batch_size = 64
input_size = 128
hidden_size = 256
step_input_np = np.random.uniform(-0.1, 0.1, (
    batch_size, input_size)).astype('float64')
pre_hidden_np = np.random.uniform(-0.1, 0.1, (
    batch_size, hidden_size)).astype('float64')
pre_cell_np = np.random.uniform(-0.1, 0.1, (
    batch_size, hidden_size)).astype('float64')
if core.is_compiled_with_cuda():
    place = core.CUDAPlace(0)
else:
    place = core.CPUPlace()
with fluid.dygraph.guard(place):
    cudnn_lstm = LSTMCell(hidden_size, input_size)
    step_input_var = fluid.dygraph.to_variable(step_input_np)
    pre_hidden_var = fluid.dygraph.to_variable(pre_hidden_np)
    pre_cell_var = fluid.dygraph.to_variable(pre_cell_np)
    new_hidden, new_cell = cudnn_lstm(step_input_var, pre_hidden_var, pre_cell_var)
forward ( input, pre_hidden, pre_cell )

Defines the computation performed at every call. Should be overridden by all subclasses.

Parameters
  • *inputs (tuple) – unpacked tuple arguments

  • **kwargs (dict) – unpacked dict arguments