gru_unit¶
- paddle.fluid.layers. gru_unit ( input, hidden, size, param_attr=None, bias_attr=None, activation='tanh', gate_activation='sigmoid', origin_mode=False ) [source]
- 
         - api_attr
- 
             Static Graph 
 Gated Recurrent Unit (GRU) RNN cell. This operator performs GRU calculations for one time step and it supports these two modes: If origin_modeis True, then the formula used is from paper Learning Phrase Representations using RNN Encoder Decoder for Statistical Machine Translation .\[ \begin{align}\begin{aligned}u_t & = act_g(W_{ux}x_{t} + W_{uh}h_{t-1} + b_u)\\r_t & = act_g(W_{rx}x_{t} + W_{rh}h_{t-1} + b_r)\\\begin{split}\\tilde{h_t} & = act_c(W_{cx}x_{t} + W_{ch}(r_t \odot h_{t-1}) + b_c)\end{split}\\\begin{split}h_t & = u_t \odot h_{t-1} + (1-u_t) \odot \\tilde{h_t}\end{split}\end{aligned}\end{align} \]if origin_modeis False, then the formula used is from paper Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling\[ \begin{align}\begin{aligned}u_t & = act_g(W_{ux}x_{t} + W_{uh}h_{t-1} + b_u)\\r_t & = act_g(W_{rx}x_{t} + W_{rh}h_{t-1} + b_r)\\\begin{split}\\tilde{h_t} & = act_c(W_{cx}x_{t} + W_{ch}(r_t \odot h_{t-1}) + b_c)\end{split}\\\begin{split}h_t & = (1-u_t) \odot h_{t-1} + u_t \odot \\tilde{h_t}\end{split}\end{aligned}\end{align} \]\(x_t\) is the input of current time step, but it is not input. This operator does not include the calculations \(W_{ux}x_{t}, W_{rx}x_{t}, W_{cx}x_{t}\) , Note thus a fully-connect layer whose size is 3 times of GRU hidden size should be used before this operator, and the output should be used asinputhere. \(h_{t-1}\) is the hidden state from previous time step. \(u_t\) , \(r_t\) , \(\\tilde{h_t}\) and \(h_t\) stand for update gate, reset gate, candidate hidden and hidden output separately. \(W_{uh}, b_u\) , \(W_{rh}, b_r\) and \(W_{ch}, b_c\) stand for the weight matrix and bias used in update gate, reset gate, candidate hidden calculations. For implementation, the three weight matrix are merged into a tensor shaped \([D, D \\times 3]\) , the three bias are concatenated as a tensor shaped \([1, D \\times 3]\) , where \(D\) stands for the hidden size; The data layout of weight tensor is: \(W_{uh}\) and \(W_{rh}\) are concatenated with shape \([D, D \\times 2]\) lying on the first part, and \(W_{ch}\) lying on the latter part with shape \([D, D]\) .- Parameters
- 
           - input (Variable) – A 2D Tensor representing the input after linear projection after linear projection. Its shape should be \([N, D \\times 3]\) , where \(N\) stands for batch size, \(D\) for the hidden size. The data type should be float32 or float64. 
- hidden (Variable) – A 2D Tensor representing the hidden state from previous step. Its shape should be \([N, D]\) , where \(N\) stands for batch size, \(D\) for the hidden size. The data type should be same as - input.
- size (int) – Indicate the hidden size. 
- param_attr (ParamAttr, optional) – To specify the weight parameter property. Default: None, which means the default weight parameter property is used. See usage for details in api_fluid_ParamAttr . 
- bias_attr (ParamAttr, optional) – To specify the bias parameter property. Default: None, which means the default bias parameter property is used. See usage for details in api_fluid_ParamAttr . 
- activation (str, optional) – The activation function corresponding to \(act_c\) in the formula. “sigmoid”, “tanh”, “relu” and “identity” are supported. Default “tanh”. 
- gate_activation (str, optional) – The activation function corresponding to \(act_g\) in the formula. “sigmoid”, “tanh”, “relu” and “identity” are supported. Default “sigmoid”. 
 
- Returns
- 
           
           - The tuple contains three Tensor variables with the same data type
- 
             as input. They represent the hidden state for next time step ( \(h_t\) ), reset previous hidden state ( \(r_t \odot h_{t-1}\) ), and the concatenation of \(h_t, r_t, \\tilde{h_t}\) . And they have shape \([N, D]\) , \([N, D]\) , \([N, D \times 3]\) separately. Usually only the hidden state for next time step ( \(h_t\) ) is used as output and state, the other two are intermediate results of calculations.
 
- Return type
- 
           tuple 
 Examples import paddle.fluid as fluid dict_dim, emb_dim = 128, 64 data = fluid.data(name='step_data', shape=[None], dtype='int64') emb = fluid.embedding(input=data, size=[dict_dim, emb_dim]) hidden_dim = 512 x = fluid.layers.fc(input=emb, size=hidden_dim * 3) pre_hidden = fluid.data( name='pre_hidden', shape=[None, hidden_dim], dtype='float32') hidden = fluid.layers.gru_unit( input=x, hidden=pre_hidden, size=hidden_dim * 3) 
