dynamic_lstmp¶
- paddle.fluid.layers. dynamic_lstmp ( input, size, proj_size, param_attr=None, bias_attr=None, use_peepholes=True, is_reverse=False, gate_activation='sigmoid', cell_activation='tanh', candidate_activation='tanh', proj_activation='tanh', dtype='float32', name=None, h_0=None, c_0=None, cell_clip=None, proj_clip=None ) [source]
- 
         - api_attr
- 
             Static Graph 
 - Note:
- 
           - In order to improve efficiency, users must first map the input of dimension [T, hidden_size] to input of [T, 4 * hidden_size], and then pass it to this OP. 
 
 This OP implements the LSTMP (LSTM Projected) layer. The LSTMP layer has a separate linear mapping layer behind the LSTM layer. – Sak, H., Senior, A., & Beaufays, F. (2014) . Compared with the standard LSTM layer, LSTMP has an additional linear mapping layer, which is used to map from the original hidden state \(h_t\) to the lower dimensional state \(r_t\) . This reduces the total number of parameters and computational complexity, especially when the output unit is relatively large. The default implementation of the OP contains diagonal/peephole connections, please refer to Gers, F. A., & Schmidhuber, J. (2000) . If you need to disable the peephole connections, set use_peepholes to False. This OP computes each timestep as follows: \[i_t = \sigma(W_{ix}x_{t} + W_{ir}r_{t-1} + W_{ic}c_{t-1} + b_i)\]\[f_t = \sigma(W_{fx}x_{t} + W_{fr}r_{t-1} + W_{fc}c_{t-1} + b_f)\]\[o_t = \sigma(W_{ox}x_{t} + W_{or}r_{t-1} + W_{oc}c_{t-1} + b_o)\]\[\widetilde{c_t} = act_g(W_{cx}x_t + W_{cr}r_{t-1} + b_c)\]\[c_t = f_t \odot c_{t-1} + i_t \odot \widetilde{c_t}\]\[h_t = o_t \odot act_h(c_t)\]\[r_t = \overline{act_h}(W_{rh}h_t)\]The symbolic meanings in the formula are as follows: - \(x_{t}\) represents the input at timestep \(t\) 
- \(h_{t}\) represents the hidden state at timestep \(t\) 
- \(r_{t}\) : represents the state of the projected output of the hidden state \(h_{t}\) 
- \(h_{t-1}, c_{t-1}, r_{t-1}\) represent the hidden state, cell state and projected output at timestep \(t-1\) , respectively 
- \(\widetilde{c_t}\) represents the candidate cell state 
- \(i_t\) , \(f_t\) and \(o_t\) represent input gate, forget gate, output gate, respectively 
- \(W\) represents weight (e.g., \(W_{ix}\) is the weight of a linear transformation of input \(x_{t}\) when calculating input gate \(i_t\) ) 
- \(b\) represents bias (e.g., \(b_{i}\) is the bias of input gate) 
- \(\sigma\) represents nonlinear activation function for gate, default sigmoid 
- \(\odot\) represents the Hadamard product of a matrix, i.e. multiplying the elements of the same position for two matrices with the same dimension to get another matrix with the same dimension 
 - Parameters
- 
           - input (Variable) – The input of dynamic_lstmp layer, which supports variable-time length input sequence. It is a multi-dimensional LODTensor of shape \([T, 4*hidden\_size]\) . Data type is float32 or float64. 
- size (int) – must be 4 * hidden_size. 
- proj_size (int) – The size of projection output. 
- param_attr (ParamAttr, optional) – - Parameter attribute of weight. If it is None, the default weight parameter attribute is used. Please refer to ref:api_fluid_ParamAttr’ . If the user needs to set this parameter, the dimension must be :math:`[hidden_size, 4*hidden_size] . Default: None. - Weights = \(\{ W_{cr},W_{ir},W_{fr},W_{or} \}\) , the shape is [P, 4*hidden_size] , where P is the projection size. 
- Projection weight = \(\{ W_{rh} \}\) , the shape is [hidden_size, P]. 
 
- bias_attr (ParamAttr, optional) – - The bias attribute for the learnable bias weights, which contains two parts, input-hidden bias weights and peephole connections weights if setting use_peepholes to True. Please refer to ref:`api_fluid_ParamAttr’ . Default: None. - use_peepholes = False - Biases = {\(b_c, b_i, b_f, b_o\)}. - The shape is [1, 4*hidden_size]. 
- use_peepholes = True - Biases = { :math:`b_c, b_i, b_f, b_o, W_{ic}, - W_{fc}, W_{oc}`}. - The shape is [1, 7*hidden_size]. 
 
 
- use_peepholes (bool, optional) – Whether to use peephole connection or not. Default True. 
- is_reverse (bool, optional) – Whether to calculate reverse LSTM. Default False. 
- gate_activation (str, optional) – The activation for input gate, forget gate and output gate. Default “sigmoid”. 
- cell_activation (str, optional) – The activation for cell output. Default “tanh”. 
- candidate_activation (str, optional) – The activation for candidate hidden state. Default “tanh”. 
- proj_activation (str, optional) – The activation for projection output. Default “tanh”. 
- dtype (str, optional) – Data type, can be “float32” or “float64”. Default “float32”. 
- name (str, optional) – A name for this layer. Please refer to Name . Default: None. 
- h_0 (Variable , optional) – The initial hidden state is an optional input, default is zero. This is a tensor with shape \([batch\_size, P]\) , where P is the projection size. Default: None. 
- c_0 (Variable , optional) – The initial cell state is an optional input, default is zero. This is a tensor with shape \([batch\_size, P]\) , where P is the projection size. h_0 and c_0 can be None but only at the same time. Default: None. 
- cell_clip (float, optional) – If not None, the cell state is clipped by this value prior to the cell output activation. Default: None. 
- proj_clip (float, optional) – If num_proj > 0 and proj_clip is provided, then the projected values are clipped elementwise to within [-proj_clip, proj_clip]. Default: None. 
 
- Returns
- 
           
           The hidden state and cell state of LSTMP - hidden: LoDTensor with shape of \([T, P]\) , and its lod and dtype is the same as the input. 
- cell: LoDTensor with shape of \([T, hidden\_size]\) , and its lod and dtype is the same as the input. 
 
- Return type
 Examples import paddle.fluid as fluid dict_dim, emb_dim = 128, 64 data = fluid.data(name='sequence', shape=[None], dtype='int64', lod_level=1) emb = fluid.embedding(input=data, size=[dict_dim, emb_dim]) hidden_dim, proj_dim = 512, 256 fc_out = fluid.layers.fc(input=emb, size=hidden_dim * 4, act=None, bias_attr=None) proj_out, last_c = fluid.layers.dynamic_lstmp(input=fc_out, size=hidden_dim * 4, proj_size=proj_dim, use_peepholes=False, is_reverse=True, cell_activation="tanh", proj_activation="tanh") proj_out.shape # (-1, 256) last_c.shape # (-1, 512) 
