GRUUnit

class paddle.fluid.dygraph.GRUUnit(name_scope, size, param_attr=None, bias_attr=None, activation='tanh', gate_activation='sigmoid', origin_mode=False, dtype='float32')[source]

GRU unit layer

It creates a callable object from GRUUnit class. If origin_mode is True, then the equation of a gru step is from paper Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

\[ \begin{align}\begin{aligned}u_t & = actGate(xu_{t} + W_u h_{t-1} + b_u)\\r_t & = actGate(xr_{t} + W_r h_{t-1} + b_r)\\m_t & = actNode(xm_t + W_c dot(r_t, h_{t-1}) + b_m)\\h_t & = dot(u_t, h_{t-1}) + dot((1-u_t), m_t)\end{aligned}\end{align} \]

If origin_mode is False, then the equation of a gru step is from paper Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

\[ \begin{align}\begin{aligned}u_t & = actGate(xu_{t} + W_u h_{t-1} + b_u)\\r_t & = actGate(xr_{t} + W_r h_{t-1} + b_r)\\m_t & = actNode(xm_t + W_c dot(r_t, h_{t-1}) + b_m)\\h_t & = dot((1-u_t), h_{t-1}) + dot(u_t, m_t)\end{aligned}\end{align} \]

The inputs of gru unit includes \(z_t\), \(h_{t-1}\). In terms of the equation above, the \(z_t\) is split into 3 parts - \(xu_t\), \(xr_t\) and \(xm_t\). This means that in order to implement a full GRU unit operator for an input, a fully connected layer has to be applied, such that \(z_t = W_{fc}x_t\).

The terms \(u_t\) and \(r_t\) represent the update and reset gates of the GRU cell. Unlike LSTM, GRU has one lesser gate. However, there is an intermediate candidate hidden output, which is denoted by \(m_t\). This layer has three outputs \(h_t\), \(dot(r_t, h_{t-1})\) and concatenation of \(u_t\), \(r_t\) and \(m_t\).

Parameters
  • name_scope (str) – The name of this class.

  • size (int) – The input dimension value.

  • param_attr (ParamAttr, optional) –

    The parameter attribute for the learnable hidden-hidden weight matrix.

    Note:

    1. The shape of the weight matrix is \([T, 3*D]\), where D is the hidden size.

    2. All elements in the weight matrix can be divided into two parts. The first part are weights of the update gate and reset gate with shape \([D, 2*D]\), and the second part are weights for candidate hidden state with shape \([D, D]\).

    If it is set to None or one attribute of ParamAttr, gru_unit will create ParamAttr as param_attr. If the Initializer of the param_attr is not set, the parameter is initialized with Xavier. The default value is None.

  • bias_attr (ParamAttr|bool, optional) – The parameter attribute for the bias of GRU.Note that the bias with \([1, 3*D]\) concatenates the bias in the update gate, reset gate and candidate calculations. If it is set to False, no bias will be applied to the update gate, reset gate and candidate calculations. If it is set to None or one attribute of ParamAttr, gru_unit will create ParamAttr as bias_attr. If the Initializer of the bias_attr is not set, the bias is initialized zero. The default value is None.

  • activation (str) – The activation type for cell (actNode). The default value is ‘tanh’.

  • gate_activation (str) – The activation type for gates (actGate). The default value is ‘sigmoid’.

  • dtype (str) – The dtype of the layers. The data type can be set as ‘float32’, ‘float64’. The default value is ‘float32’.

Attribute:

weight (Parameter): the learnable weights of this layer.

bias (Parameter): the learnable bias of this layer.

Returns

The hidden value, reset-hidden value and gate values. The hidden value is a 2-D tensor with shape \([T, D]\) . The reset-hidden value is a 2-D tensor with shape \([T, D]\) . The gate value is a 2-D tensor with shape \([T, 3*D]\).

Return type

tuple

Examples

import paddle.fluid as fluid
import paddle.fluid.dygraph.base as base
import numpy

lod = [[2, 4, 3]]
D = 5
T = sum(lod[0])

input = numpy.random.rand(T, 3 * D).astype('float32')
hidden_input = numpy.random.rand(T, D).astype('float32')
with fluid.dygraph.guard():
    x = numpy.random.random((3, 32, 32)).astype('float32')
    gru = fluid.dygraph.GRUUnit('gru', size=D * 3)
    dy_ret = gru(
      base.to_variable(input), base.to_variable(hidden_input))
forward(input, hidden)

Defines the computation performed at every call. Should be overridden by all subclasses.

Parameters
  • *inputs (tuple) – unpacked tuple arguments

  • **kwargs (dict) – unpacked dict arguments