dynamic_gru¶
- paddle.fluid.layers. dynamic_gru ( input, size, param_attr=None, bias_attr=None, is_reverse=False, gate_activation='sigmoid', candidate_activation='tanh', h_0=None, origin_mode=False ) [source]
-
- api_attr
-
Static Graph
Note: The input type of this must be LoDTensor. If the input type to be processed is Tensor, use api_fluid_layers_StaticRNN .
This operator is used to perform the calculations for a single layer of Gated Recurrent Unit (GRU) on full sequences step by step. The calculations in one time step support these two modes:
If
origin_mode
is True, then the formula used is from paper Learning Phrase Representations using RNN Encoder Decoder for Statistical Machine Translation .ut=actg(Wuxxt+Wuhht−1+bu)rt=actg(Wrxxt+Wrhht−1+br)tildeht=actc(Wcxxt+Wch(rt⊙ht−1)+bc)ht=ut⊙ht−1+(1−ut)⊙tildehtif
origin_mode
is False, then the formula used is from paper Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modelingut=actg(Wuxxt+Wuhht−1+bu)rt=actg(Wrxxt+Wrhht−1+br)tildeht=actc(Wcxxt+Wch(rt⊙ht−1)+bc)ht=(1−ut)⊙ht−1+ut⊙tildehtxt is the input of current time step, but it is not from
input
. This operator does not include the calculations Wuxxt,Wrxxt,Wcxxt , Note thus a fully-connect layer whose size is 3 times ofsize
should be used before this operator, and the output should be used asinput
here. ht−1 is the hidden state from previous time step. ut , rt , tildeht and ht stand for update gate, reset gate, candidate hidden and hidden output separately. Wuh,bu , Wrh,br and Wch,bc stand for the weight matrix and bias used in update gate, reset gate, candidate hidden calculations. For implementation, the three weight matrix are merged into a tensor shaped [D,Dtimes3] , the three bias are concatenated as a tensor shaped [1,Dtimes3] , where D stands for the hidden size; The data layout of weight tensor is: Wuh and Wrh are concatenated with shape [D,Dtimes2] lying on the first part, and Wch lying on the latter part with shape [D,D] .- Parameters
-
input (Variable) – A LoDTensor whose lod level is 1, representing the input after linear projection. Its shape should be [T,Dtimes3] , where T stands for the total sequence lengths in this mini-batch, D for the hidden size. The data type should be float32 or float64.
size (int) – Indicate the hidden size.
param_attr (ParamAttr, optional) – To specify the weight parameter property. Default: None, which means the default weight parameter property is used. See usage for details in api_fluid_ParamAttr .
bias_attr (ParamAttr, optional) – To specify the bias parameter property. Default: None, which means the default bias parameter property is used. See usage for details in api_fluid_ParamAttr .
is_reverse (bool, optional) – Whether to compute in the reversed order of input sequences. Default False.
gate_activation (str, optional) – The activation function corresponding to actg in the formula. “sigmoid”, “tanh”, “relu” and “identity” are supported. Default “sigmoid”.
candidate_activation (str, optional) – The activation function corresponding to actc in the formula. “sigmoid”, “tanh”, “relu” and “identity” are supported. Default “tanh”.
h_0 (Variable, optional) – A Tensor representing the initial hidden state. It not provided, the default initial hidden state is 0. The shape is [N,D] , where N is the number of sequences in the mini-batch, D for the hidden size. The data type should be same as
input
. Default None.
- Returns
-
- A LoDTensor whose lod level is 1 and shape is [T,D] ,
-
where T stands for the total sequence lengths in this mini-batch D for the hidden size. It represents GRU transformed sequence output, and has the same lod and data type with
input
.
- Return type
-
Variable
Examples
import paddle.fluid as fluid dict_dim, emb_dim = 128, 64 data = fluid.data(name='sequence', shape=[None], dtype='int64', lod_level=1) emb = fluid.embedding(input=data, size=[dict_dim, emb_dim]) hidden_dim = 512 x = fluid.layers.fc(input=emb, size=hidden_dim * 3) hidden = fluid.layers.dynamic_gru(input=x, size=hidden_dim)