EditDistance

class paddle.fluid.metrics.EditDistance(name)[source]

This API is for the management of edit distances. Editing distance is a method to quantify the degree of dissimilarity between two strings, such as words, by calculating the minimum editing operand (add, delete or replace) required to convert one string into another. Refer to https://en.wikipedia.org/wiki/Edit_distance.

Parameters

name (str, optional) – Metric name. For details, please refer to Name. Default is None.

Examples

import paddle.fluid as fluid
import numpy as np

# suppose that batch_size is 128
batch_size = 128

# init the edit distance manager
distance_evaluator = fluid.metrics.EditDistance("EditDistance")

# generate the edit distance across 128 sequence pairs, the max distance is 10 here
edit_distances_batch0 = np.random.randint(low = 0, high = 10, size = (batch_size, 1))
seq_num_batch0 = batch_size

distance_evaluator.update(edit_distances_batch0, seq_num_batch0)
avg_distance, wrong_instance_ratio = distance_evaluator.eval()
print("the average edit distance for batch0 is %.2f and the wrong instance ratio is %.2f " % (avg_distance, wrong_instance_ratio))

edit_distances_batch1 = np.random.randint(low = 0, high = 10, size = (batch_size, 1))
seq_num_batch1 = batch_size

distance_evaluator.update(edit_distances_batch1, seq_num_batch1)
avg_distance, wrong_instance_ratio = distance_evaluator.eval()
print("the average edit distance for batch0 and batch1 is %.2f and the wrong instance ratio is %.2f " % (avg_distance, wrong_instance_ratio))

distance_evaluator.reset()

edit_distances_batch2 = np.random.randint(low = 0, high = 10, size = (batch_size, 1))
seq_num_batch2 = batch_size

distance_evaluator.update(edit_distances_batch2, seq_num_batch2)
avg_distance, wrong_instance_ratio = distance_evaluator.eval()
print("the average edit distance for batch2 is %.2f and the wrong instance ratio is %.2f " % (avg_distance, wrong_instance_ratio))
update(distances, seq_num)

Update the overall edit distance

Parameters
  • distances (numpy.array) – a (batch_size, 1) numpy.array, each element represents the edit distance between two sequences.

  • seq_num (int|float) – standing for the number of sequence pairs.

eval()

Return two floats: avg_distance: the average distance for all sequence pairs updated using the update function. avg_instance_error: the ratio of sequence pairs whose edit distance is not zero.

get_config()

Get the metric and current states. The states are the members who do not has “_” prefix.

Parameters

None

Returns

a python dict, which contains the inner states of the metric instance

Return types:

a python dict

reset()

reset function empties the evaluation memory for previous mini-batches.

Parameters

None

Returns

None

Return types:

None