TransformerEncoder

class paddle.nn. TransformerEncoder ( encoder_layer, num_layers, norm=None ) [source]

TransformerEncoder is a stack of N encoder layers.

Parameters
  • encoder_layer (Layer) – an instance of the TransformerEncoderLayer. It would be used as the first layer, and the other layers would be created according to the configurations of it.

  • num_layers (int) – The number of encoder layers to be stacked.

  • norm (LayerNorm, optional) – the layer normalization component. If provided, apply layer normalization on the output of last encoder layer.

Examples

import paddle
from paddle.nn import TransformerEncoderLayer, TransformerEncoder

# encoder input: [batch_size, src_len, d_model]
enc_input = paddle.rand((2, 4, 128))
# self attention mask: [batch_size, n_head, src_len, src_len]
attn_mask = paddle.rand((2, 2, 4, 4))
encoder_layer = TransformerEncoderLayer(128, 2, 512)
encoder = TransformerEncoder(encoder_layer, 2)
enc_output = encoder(enc_input, attn_mask)  # [2, 4, 128]
forward ( src, src_mask=None, cache=None )

Applies a stack of N Transformer encoder layers on inputs. If norm is provided, also applies layer normalization on the output of last encoder layer.

Parameters
  • src (Tensor) – The input of Transformer encoder. It is a tensor with shape [batch_size, sequence_length, d_model]. The data type should be float32 or float64.

  • src_mask (Tensor, optional) – A tensor used in multi-head attention to prevents attention to some unwanted positions, usually the paddings or the subsequent positions. It is a tensor with shape broadcasted to [batch_size, n_head, sequence_length, sequence_length], where the unwanted positions have -INF values and the others have 0 values. The data type should be float32 or float64. It can be None when nothing wanted or needed to be prevented attention to. Default None

  • cache (list, optional) – It is a list, and each element in the list is incremental_cache produced by TransformerEncoderLayer.gen_cache. See TransformerEncoder.gen_cache for more details. It is only used for inference and should be None for training. Default None.

Returns

It is a tensor that has the same shape and data type

as src, representing the output of Transformer encoder. Or a tuple if cache is not None, except for encoder output, the tuple includes the new cache which is same as input cache argument but incremental_cache in it has an incremental length. See MultiHeadAttention.gen_cache and MultiHeadAttention.forward for more details.

Return type

Tensor|tuple

gen_cache ( src )

Generates cache for forward usage. The generated cache is a list, and each element in it is incremental_cache produced by TransformerEncoderLayer.gen_cache. See TransformerEncoderLayer.gen_cache for more details.

Parameters

src (Tensor) – The input of Transformer encoder. It is a tensor with shape [batch_size, source_length, d_model]. The data type should be float32 or float64.

Returns

It is a list, and each element in the list is incremental_cache produced by TransformerEncoderLayer.gen_cache. See TransformerEncoderLayer.gen_cache for more details.

Return type

list