warpctc

paddle.fluid.layers.warpctc(input, label, blank=0, norm_by_times=False, input_length=None, label_length=None)[source]

An operator integrating the open source Warp-CTC library (https://github.com/baidu-research/warp-ctc) to compute Connectionist Temporal Classification (CTC) loss. It can be aliased as softmax with CTC, since a native softmax activation is interated to the Warp-CTC library to normlize values for each row of the input tensor.

Parameters
  • input (Variable) – The unscaled probabilities of variable-length sequences, which is a 2-D Tensor with LoD information, or a 3-D Tensor without Lod information. When it is a 2-D LodTensor, it’s shape is [Lp, num_classes + 1], where Lp is the sum of all input sequences’ length and num_classes is the true number of classes. (not including the blank label). When it is a 3-D Tensor, it’s shape is [max_logit_length, batch_size, num_classes + 1], where max_logit_length is the length of the longest input logit sequence. The data type must be float32.

  • label (Variable) – The ground truth of variable-length sequence, which is a 2-D Tensor with LoD information or a 2-D Tensor without LoD information. When it is a 2-D LoDTensor or 2-D Tensor, it is of the shape [Lg, 1], where Lg is th sum of all labels’ length. The data type must be int32.

  • blank (int, default 0) – The blank label index of Connectionist Temporal Classification (CTC) loss, which is in the half-opened interval [0, num_classes + 1). The data type must be int32.

  • norm_by_times (bool, default false) – Whether to normalize the gradients by the number of time-step, which is also the sequence’s length. There is no need to normalize the gradients if warpctc layer was follewed by a mean_op.

  • input_length (Variable) – The length for each input sequence if it is of Tensor type, it should have shape [batch_size] and dtype int64.

  • label_length (Variable) – The length for each label sequence if it is of Tensor type, it should have shape [batch_size] and dtype int64.

Returns

The Connectionist Temporal Classification (CTC) loss, which is a 2-D Tensor with the shape [batch_size, 1]. The date type is the same as input.

Return type

Variable

Examples

# using LoDTensor
import paddle.fluid as fluid
import numpy as np

predict = fluid.data(name='predict',
                            shape=[None, 5],
                            dtype='float32',lod_level=1)
label = fluid.data(name='label', shape=[None, 1],
                          dtype='int32', lod_level=1)
cost = fluid.layers.warpctc(input=predict, label=label)
place = fluid.CPUPlace()
x=fluid.LoDTensor()
data = np.random.rand(8, 5).astype("float32")
x.set(data, place)
x.set_lod([[0,4,8]])
y=fluid.LoDTensor()
data = np.random.randint(0, 5, [4, 1]).astype("int32")
y.set(data, place)
y.set_lod([[0,2,4]])
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
output= exe.run(feed={"predict": x,"label": y},
                             fetch_list=[cost.name])
print output
# using Tensor
import paddle.fluid as fluid
import numpy as np

# length of the longest logit sequence
max_seq_length = 5
# number of logit sequences
batch_size = None
logits = fluid.data(name='logits',
                           shape=[max_seq_length, batch_size, 5],
                           dtype='float32')
logits_length = fluid.data(name='logits_length', shape=[None],
                             dtype='int64')
label = fluid.layers.data(name='label', shape=[None, 1],
                           dtype='int32')
label_length = fluid.layers.data(name='labels_length', shape=[None],
                             dtype='int64')
cost = fluid.layers.warpctc(input=logits, label=label,
                            input_length=logits_length,
                            label_length=label_length)
place = fluid.CPUPlace()
batch_size = 2
x = np.random.rand(max_seq_length, batch_size, 5).astype("float32")
y = np.random.randint(0, 5, [max_seq_length * batch_size, 1]).astype("int32")
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
output= exe.run(feed={"logits": x,
                      "label": y,
                      "logits_length": np.array([5, 4]).astype("int64"),
                      "labels_length": np.array([3, 2]).astype("int64")},
                      fetch_list=[cost.name])
print(output)