sequence_pad

api_attr

declarative programming (static graph)

paddle.fluid.layers.sequence_pad(x, pad_value, maxlen=None, name=None)[source]

This layer padding the sequences in a same batch to a common length (according to maxlen). The padding value is defined by pad_value, and will be appended to the tail of sequences. The result is a Python tuple (Out, Length): the LodTensor Out is the padded sequences, and LodTensor Length is the length information of input sequences. For removing padding data (unpadding operation), See sequence_unpad .

Please note that the input x should be LodTensor.

Case 1:
Given input 1-level LoDTensor x:
    x.lod = [[0,  2,   5]]
    x.data = [[a],[b],[c],[d],[e]]
pad_value:
    pad_value.data = [0]
maxlen = 4

the output tuple (Out, Length):
    Out.data = [[[a],[b],[0],[0]],[[c],[d],[e],[0]]]
    Length.data = [2, 3]      #Original sequences length

Case 2:
Given input 1-level LoDTensor x:
    x.lod =  [[0,             2,                     5]]
    x.data = [[a1,a2],[b1,b2],[c1,c2],[d1,d2],[e1,e2]]
pad_value:
    pad_value.data = [0]
default maxlen = None, (the virtual value is 3, according to the shape of x)

the output tuple (Out, Length):
    Out.data = [[[a1,a2],[b1,b2],[0,0]],[[c1,c2],[d1,d2],[e1,e2]]]
    Length.data = [2, 3]

Case 3:
Given input 1-level LoDTensor x:
    x.lod =  [[0,             2,                     5]]
    x.data = [[a1,a2],[b1,b2],[c1,c2],[d1,d2],[e1,e2]]
pad_value:
    pad_value.data = [p1,p2]
default maxlen = None, (the virtual value is 3)

get tuple (Out, Length):
    Out.data = [[[a1,a2],[b1,b2],[p1,p2]],[[c1,c2],[d1,d2],[e1,e2]]]
    Length.data = [2, 3]
Parameters
  • x (Variable) – Input 1-level LodTensor with dims [M, K]. The batch size is described by lod infor (the number of sequences ). The data type should be float32, float64, int8, int32 or int64.

  • pad_value (Variable) – Padding value. It can be a scalar or a 1D tensor with length K. If it’s a scalar, it will be automatically broadcasted to a Tensor. The data type should be as same as x.

  • maxlen (int, optional) – The length of padded sequences, None by default. When it is None, all sequences will be padded up to the length of the longest one among them; when it a certain positive value, it must be greater than the length of the longest original sequence.

  • name (str, optional) – For detailed information, please refer to Name. Usually name is no need to set and None by default.

Returns: A Python tuple (Out, Length): the 1st is a 0 level LodTensor Out, with the shape [batch_size, maxlen, K]; the second is the original sequences length infor Length, which should be a 0-level 1D LodTensor. The size of Length is equal to batch size, and the data type is int64.

Return Type: tuple

Examples

import paddle.fluid as fluid
import numpy

x = fluid.data(name='x', shape=[10, 5], dtype='float32', lod_level=1)
pad_value = fluid.layers.assign(
    input=numpy.array([0.0], dtype=numpy.float32))
out = fluid.layers.sequence_pad(x=x, pad_value=pad_value)