This module downloads IMDB dataset from http://ai.stanford.edu/%7Eamaas/data/sentiment/. This dataset contains a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. Besides, this module also provides API for building dictionary.
Build a word dictionary from the corpus. Keys of the dictionary are words, and values are zero-based IDs of these words.
IMDB training set creator.
It returns a reader creator, each sample in the reader is an zero-based ID sequence and label in [0, 1].
word_idx (dict) – word dictionary
Training reader creator
- Return type