GRU

class paddle.nn. GRU ( input_size: int, hidden_size: int, num_layers: int = 1, direction: _DirectionType | str = 'forward', time_major: bool = False, dropout: float = 0.0, weight_ih_attr: ParamAttrLike | None = None, weight_hh_attr: ParamAttrLike | None = None, bias_ih_attr: ParamAttrLike | None = None, bias_hh_attr: ParamAttrLike | None = None, name: str | None = None ) [source]

Multilayer GRU. It takes input sequence and initial states as inputs, and returns the output sequences and the final states.

Each layer inside the GRU maps the input sequences and initial states to the output sequences and final states in the following manner: at each step, it takes step inputs(\(x_{t}\)) and previous states(\(h_{t-1}\)) as inputs, and returns step outputs(\(y_{t}\)) and new states(\(h_{t}\)).

\[ \begin{align}\begin{aligned}r_{t} & = \sigma(W_{ir}x_{t} + b_{ir} + W_{hr}h_{t-1} + b_{hr})\\z_{t} & = \sigma(W_{iz}x_{t} + b_{iz} + W_{hz}h_{t-1} + b_{hz})\\\widetilde{h}_{t} & = \tanh(W_{ic}x_{t} + b_{ic} + r_{t} * (W_{hc}h_{t-1} + b_{hc}))\\h_{t} & = z_{t} * h_{t-1} + (1 - z_{t}) * \widetilde{h}_{t}\\y_{t} & = h_{t}\end{aligned}\end{align} \]

where \(\sigma\) is the sigmoid function, and * is the elementwise multiplication operator.

Using key word arguments to construct is recommended.

Parameters

input_size (int) – The input size of \(x\) for the first layer’s cell.
hidden_size (int) – The hidden size of \(h\) for each layer’s cell.
num_layers (int, optional) – Number of recurrent layers. Defaults to 1.
direction (str, optional) – The direction of the network. It can be “forward” or “bidirect”(or “bidirectional”). When “bidirect”, the way to merge outputs of forward and backward is concatenating. Defaults to “forward”.
time_major (bool, optional) – Whether the first dimension of the input means the time steps. If time_major is True, the shape of Tensor is [time_steps,batch_size,input_size], otherwise [batch_size, time_steps,input_size]. Defaults to False. time_steps means the length of input sequence.
dropout (float, optional) – The dropout probability. Dropout is applied to the input of each layer except for the first layer. The range of dropout from 0 to 1. Defaults to 0.
weight_ih_attr (ParamAttr|None, optional) – The parameter attribute for weight_ih of each cell. Default: None.
weight_hh_attr (ParamAttr|None, optional) – The parameter attribute for weight_hh of each cell. Default: None.
bias_ih_attr (ParamAttr|None, optional) – The parameter attribute for the bias_ih of each cells. Default: None.
bias_hh_attr (ParamAttr|None, optional) – The parameter attribute for the bias_hh of each cells. Default: None.
name (str|None, optional) – Name for the operation (optional, default is None). For more information, please refer to Name.

Inputs:

inputs (Tensor): the input sequence. If time_major is True, the shape is [time_steps, batch_size, input_size], else, the shape is [batch_size, time_steps, input_size]. time_steps means the length of the input sequence.
initial_states (Tensor, optional): the initial state. The shape is [num_layers * num_directions, batch_size, hidden_size]. If initial_state is not given, zero initial states are used. Defaults to None.
sequence_length (Tensor, optional): shape [batch_size], dtype: int64 or int32. The valid lengths of input sequences. Defaults to None. If sequence_length is not None, the inputs are treated as padded sequences. In each input sequence, elements whose time step index are not less than the valid length are treated as paddings.

Returns

the output sequence. If time_major is True, the shape is [time_steps, batch_size, num_directions * hidden_size], else, the shape is [batch_size, time_steps, num_directions * hidden_size]. Note that num_directions is 2 if direction is “bidirectional” else 1. time_steps means the length of the output sequence.

final_states (Tensor): final states. The shape is [num_layers * num_directions, batch_size, hidden_size]. Note that num_directions is 2 if direction is “bidirectional” (the index of forward states are 0, 2, 4, 6… and the index of backward states are 1, 3, 5, 7…), else 1.

Return type

outputs (Tensor)

Variables:

weight_ih_l[k]: the learnable input-hidden weights of the k-th layer. If k = 0, the shape is [hidden_size, input_size]. Otherwise, the shape is [hidden_size, num_directions * hidden_size].
weight_hh_l[k]: the learnable hidden-hidden weights of the k-th layer, with shape [hidden_size, hidden_size].
bias_ih_l[k]: the learnable input-hidden bias of the k-th layer, with shape [hidden_size].
bias_hh_l[k]: the learnable hidden-hidden bias of the k-th layer, with shape [hidden_size].

Examples

>>> import paddle

>>> rnn = paddle.nn.GRU(16, 32, 2)

>>> x = paddle.randn((4, 23, 16))
>>> prev_h = paddle.randn((2, 4, 32))
>>> y, h = rnn(x, prev_h)

>>> print(y.shape)
[4, 23, 32]
>>> print(h.shape)
[2, 4, 32]