WMT14 dataset. The original WMT14 dataset is too large and a small set of data for set is provided. This module will download dataset from http://paddlepaddle.bj.bcebos.com/demo/wmt_shrinked_data/wmt14.tgz and parse training set and test set into paddle reader creators.
WMT14 training set creator.
It returns a reader creator, each sample in the reader is source language word ID sequence, target language word ID sequence and next word ID sequence.
Training reader creator
- Return type