# gru_unit¶

api_attr

declarative programming (static graph)

paddle.fluid.layers.gru_unit(input, hidden, size, param_attr=None, bias_attr=None, activation='tanh', gate_activation='sigmoid', origin_mode=False)[source]

Gated Recurrent Unit (GRU) RNN cell. This operator performs GRU calculations for one time step and it supports these two modes:

If origin_mode is True, then the formula used is from paper Learning Phrase Representations using RNN Encoder Decoder for Statistical Machine Translation .

\begin{align}\begin{aligned}u_t & = act_g(W_{ux}x_{t} + W_{uh}h_{t-1} + b_u)\\r_t & = act_g(W_{rx}x_{t} + W_{rh}h_{t-1} + b_r)\\\tilde{h_t} & = act_c(W_{cx}x_{t} + W_{ch}(r_t \odot h_{t-1}) + b_c)\\h_t & = u_t \odot h_{t-1} + (1-u_t) \odot \tilde{h_t}\end{aligned}\end{align}

if origin_mode is False, then the formula used is from paper Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

\begin{align}\begin{aligned}u_t & = act_g(W_{ux}x_{t} + W_{uh}h_{t-1} + b_u)\\r_t & = act_g(W_{rx}x_{t} + W_{rh}h_{t-1} + b_r)\\\tilde{h_t} & = act_c(W_{cx}x_{t} + W_{ch}(r_t \odot h_{t-1}) + b_c)\\h_t & = (1-u_t) \odot h_{t-1} + u_t \odot \tilde{h_t}\end{aligned}\end{align}

$$x_t$$ is the input of current time step, but it is not input . This operator does not include the calculations $$W_{ux}x_{t}, W_{rx}x_{t}, W_{cx}x_{t}$$ , Note thus a fully-connect layer whose size is 3 times of GRU hidden size should be used before this operator, and the output should be used as input here. $$h_{t-1}$$ is the hidden state from previous time step. $$u_t$$ , $$r_t$$ , $$\tilde{h_t}$$ and $$h_t$$ stand for update gate, reset gate, candidate hidden and hidden output separately. $$W_{uh}, b_u$$ , $$W_{rh}, b_r$$ and $$W_{ch}, b_c$$ stand for the weight matrix and bias used in update gate, reset gate, candidate hidden calculations. For implementation, the three weight matrix are merged into a tensor shaped $$[D, D \times 3]$$ , the three bias are concatenated as a tensor shaped $$[1, D \times 3]$$ , where $$D$$ stands for the hidden size; The data layout of weight tensor is: $$W_{uh}$$ and $$W_{rh}$$ are concatenated with shape $$[D, D \times 2]$$ lying on the first part, and $$W_{ch}$$ lying on the latter part with shape $$[D, D]$$ .

Parameters
• input (Variable) – A 2D Tensor representing the input after linear projection after linear projection. Its shape should be $$[N, D \times 3]$$ , where $$N$$ stands for batch size, $$D$$ for the hidden size. The data type should be float32 or float64.

• hidden (Variable) – A 2D Tensor representing the hidden state from previous step. Its shape should be $$[N, D]$$ , where $$N$$ stands for batch size, $$D$$ for the hidden size. The data type should be same as input .

• size (int) – Indicate the hidden size.

• param_attr (ParamAttr, optional) – To specify the weight parameter property. Default: None, which means the default weight parameter property is used. See usage for details in ParamAttr .

• bias_attr (ParamAttr, optional) – To specify the bias parameter property. Default: None, which means the default bias parameter property is used. See usage for details in ParamAttr .

• activation (str, optional) – The activation function corresponding to $$act_c$$ in the formula. “sigmoid”, “tanh”, “relu” and “identity” are supported. Default “tanh”.

• gate_activation (str, optional) – The activation function corresponding to $$act_g$$ in the formula. “sigmoid”, “tanh”, “relu” and “identity” are supported. Default “sigmoid”.

Returns

The tuple contains three Tensor variables with the same data type as input . They represent the hidden state for next time step ( $$h_t$$ ), reset previous hidden state ( $$r_t \odot h_{t-1}$$ ), and the concatenation of $$h_t, r_t, \tilde{h_t}$$ . And they have shape $$[N, D]$$ , $$[N, D]$$ , $$[N, D imes 3]$$ separately. Usually only the hidden state for next time step ( $$h_t$$ ) is used as output and state, the other two are intermediate results of calculations.

Return type

tuple

Examples

import paddle.fluid as fluid

dict_dim, emb_dim = 128, 64
data = fluid.data(name='step_data', shape=[None], dtype='int64')
emb = fluid.embedding(input=data, size=[dict_dim, emb_dim])
hidden_dim = 512
x = fluid.layers.fc(input=emb, size=hidden_dim * 3)
pre_hidden = fluid.data(
name='pre_hidden', shape=[None, hidden_dim], dtype='float32')
hidden = fluid.layers.gru_unit(
input=x, hidden=pre_hidden, size=hidden_dim * 3)