imdb

IMDB dataset.

This module downloads IMDB dataset from http://ai.stanford.edu/%7Eamaas/data/sentiment/. This dataset contains a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. Besides, this module also provides API for building dictionary.

paddle.dataset.imdb.build_dict(pattern, cutoff)[source]

Build a word dictionary from the corpus. Keys of the dictionary are words, and values are zero-based IDs of these words.

paddle.dataset.imdb.train(word_idx)[source]

IMDB training set creator.

It returns a reader creator, each sample in the reader is an zero-based ID sequence and label in [0, 1].

Parameters

word_idx (dict) – word dictionary

Returns

Training reader creator

Return type

callable

paddle.dataset.imdb.test(word_idx)[source]

IMDB test set creator.

It returns a reader creator, each sample in the reader is an zero-based ID sequence and label in [0, 1].

Parameters

word_idx (dict) – word dictionary

Returns

Test reader creator

Return type

callable