paddle.fluid.layers.nn. add_position_encoding ( input, alpha, beta, name=None ) [source]

This operator performs weighted sum of input feature at each position (position in the sequence) and the corresponding position encoding.

For more details of position encoding, please refer to Attention Is All You Need .

The formula is as follows:

\[\begin{split}PE(pos, 2i) &= \\sin{(pos / 10000^{2i / P})} \\\\ PE(pos, 2i + 1) &= \\cos{(pos / 10000^{2i / P})} \\\\ Out(:, pos, i) &= \\alpha * input(:, pos, i) + \\beta * PE(pos, i)\end{split}\]
  • \(PE(pos, 2i)\) : the value at even index 2i for encoding of position pos.

  • \(PE(pos, 2i + 1)\) : the value at odd index 2i+1 for encoding of position pos

  • input (Variable) – A Tensor or LoDTensor (lod level is 1). If it is a Tensor, the shape should be [N, M, P], where N stands for batch size, M for sequence length, P for the size of feature dimension. If it is a LoDTensor, the shape should be [N, P], where N stands for the total sequence lengths in this mini-batch, P for the size of feature. The data type should be float32 or float64.

  • alpha (float) – Indicate the weight coefficient for input when performing weighted sum.

  • beta (float) – Indicate the weight coefficient for position encoding when performing weighted sum.

  • name (str, optional) – For detailed information, please refer to Name. Usually name is no need to set and None by default.


A Tensor or LoDTensor. It has the same shape, data type and lod as input.

Return type



import paddle

tensor = paddle.randn([16, 32, 64])
position_tensor = paddle.fluid.layers.add_position_encoding(
      input=tensor, alpha=1.0, beta=1.0)