Dataset

class paddle.io. Dataset [source]

An abstract class to encapsulate methods and behaviors of datasets.

All datasets in map-style(dataset samples can be get by a given key) should be a subclass of paddle.io.Dataset. All subclasses should implement following methods:

__getitem__: get sample from dataset with a given index. This method is required by reading dataset sample in paddle.io.DataLoader.

__len__: return dataset sample number. This method is required by some implements of paddle.io.BatchSampler

see paddle.io.DataLoader.

Examples

>>> import numpy as np
>>> from paddle.io import Dataset

>>> # define a random dataset
>>> class RandomDataset(Dataset):
...     def __init__(self, num_samples):
...         self.num_samples = num_samples
...
...     def __getitem__(self, idx):
...         image = np.random.random([784]).astype('float32')
...         label = np.random.randint(0, 9, (1, )).astype('int64')
...         return image, label
...
...     def __len__(self):
...         return self.num_samples
...
>>> dataset = RandomDataset(10)
>>> for i in range(len(dataset)):
...     image, label = dataset[i]
...     # do something