dynamic_gru¶

paddle.fluid.layers. dynamic_gru ( input, size, param_attr=None, bias_attr=None, is_reverse=False, gate_activation='sigmoid', candidate_activation='tanh', h_0=None, origin_mode=False ) [source]

api_attr

Static Graph

Note: The input type of this must be LoDTensor. If the input type to be processed is Tensor, use api_fluid_layers_StaticRNN .

This operator is used to perform the calculations for a single layer of Gated Recurrent Unit (GRU) on full sequences step by step. The calculations in one time step support these two modes:

If origin_mode is True, then the formula used is from paper Learning Phrase Representations using RNN Encoder Decoder for Statistical Machine Translation .

$\begin{align}\begin{aligned}u_t & = act_g(W_{ux}x_{t} + W_{uh}h_{t-1} + b_u)\\r_t & = act_g(W_{rx}x_{t} + W_{rh}h_{t-1} + b_r)\\\begin{split}\\tilde{h_t} & = act_c(W_{cx}x_{t} + W_{ch}(r_t \odot h_{t-1}) + b_c)\end{split}\\\begin{split}h_t & = u_t \odot h_{t-1} + (1-u_t) \odot \\tilde{h_t}\end{split}\end{aligned}\end{align}$

if origin_mode is False, then the formula used is from paper Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

$\begin{align}\begin{aligned}u_t & = act_g(W_{ux}x_{t} + W_{uh}h_{t-1} + b_u)\\r_t & = act_g(W_{rx}x_{t} + W_{rh}h_{t-1} + b_r)\\\begin{split}\\tilde{h_t} & = act_c(W_{cx}x_{t} + W_{ch}(r_t \odot h_{t-1}) + b_c)\end{split}\\\begin{split}h_t & = (1-u_t) \odot h_{t-1} + u_t \odot \\tilde{h_t}\end{split}\end{aligned}\end{align}$

$x_t$ is the input of current time step, but it is not from input . This operator does not include the calculations $W_{ux}x_{t}, W_{rx}x_{t}, W_{cx}x_{t}$ , Note thus a fully-connect layer whose size is 3 times of size should be used before this operator, and the output should be used as input here. $h_{t-1}$ is the hidden state from previous time step. $u_t$ , $r_t$ , $\\tilde{h_t}$ and $h_t$ stand for update gate, reset gate, candidate hidden and hidden output separately. $W_{uh}, b_u$ , $W_{rh}, b_r$ and $W_{ch}, b_c$ stand for the weight matrix and bias used in update gate, reset gate, candidate hidden calculations. For implementation, the three weight matrix are merged into a tensor shaped $[D, D \\times 3]$ , the three bias are concatenated as a tensor shaped $[1, D \\times 3]$ , where $D$ stands for the hidden size; The data layout of weight tensor is: $W_{uh}$ and $W_{rh}$ are concatenated with shape $[D, D \\times 2]$ lying on the first part, and $W_{ch}$ lying on the latter part with shape $[D, D]$ .

Parameters

input (Variable) – A LoDTensor whose lod level is 1, representing the input after linear projection. Its shape should be $[T, D \\times 3]$ , where $T$ stands for the total sequence lengths in this mini-batch, $D$ for the hidden size. The data type should be float32 or float64.
size (int) – Indicate the hidden size.
param_attr (ParamAttr, optional) – To specify the weight parameter property. Default: None, which means the default weight parameter property is used. See usage for details in api_fluid_ParamAttr .
bias_attr (ParamAttr, optional) – To specify the bias parameter property. Default: None, which means the default bias parameter property is used. See usage for details in api_fluid_ParamAttr .
is_reverse (bool, optional) – Whether to compute in the reversed order of input sequences. Default False.
gate_activation (str, optional) – The activation function corresponding to $act_g$ in the formula. “sigmoid”, “tanh”, “relu” and “identity” are supported. Default “sigmoid”.
candidate_activation (str, optional) – The activation function corresponding to $act_c$ in the formula. “sigmoid”, “tanh”, “relu” and “identity” are supported. Default “tanh”.
h_0 (Variable, optional) – A Tensor representing the initial hidden state. It not provided, the default initial hidden state is 0. The shape is $[N, D]$ , where $N$ is the number of sequences in the mini-batch, $D$ for the hidden size. The data type should be same as input . Default None.

Returns

A LoDTensor whose lod level is 1 and shape is $[T, D]$ ,: where $T$ stands for the total sequence lengths in this mini-batch $D$ for the hidden size. It represents GRU transformed sequence output, and has the same lod and data type with input .

Return type

Variable

Examples

import paddle.fluid as fluid

dict_dim, emb_dim = 128, 64
data = fluid.data(name='sequence',
          shape=[None],
          dtype='int64',
          lod_level=1)
emb = fluid.embedding(input=data, size=[dict_dim, emb_dim])
hidden_dim = 512
x = fluid.layers.fc(input=emb, size=hidden_dim * 3)
hidden = fluid.layers.dynamic_gru(input=x, size=hidden_dim)