wmt14

WMT14 dataset. The original WMT14 dataset is too large and a small set of data for set is provided. This module will download dataset from http://paddlepaddle.bj.bcebos.com/demo/wmt_shrinked_data/wmt14.tgz and parse training set and test set into paddle reader creators.

paddle.dataset.wmt14.train(dict_size)[source]

WMT14 training set creator.

It returns a reader creator, each sample in the reader is source language word ID sequence, target language word ID sequence and next word ID sequence.

Returns

Training reader creator

Return type

callable

paddle.dataset.wmt14.test(dict_size)[source]

WMT14 test set creator.

It returns a reader creator, each sample in the reader is source language word ID sequence, target language word ID sequence and next word ID sequence.

Returns

Test reader creator

Return type

callable