Imikolov¶
- class paddle.text. Imikolov ( data_file=None, data_type='NGRAM', window_size=- 1, mode='train', min_word_freq=50, download=True ) [source]
- 
         Implementation of imikolov dataset. - Parameters
- 
           - data_file (str) – path to data tar file, can be set None if - downloadis True. Default None
- data_type (str) – ‘NGRAM’ or ‘SEQ’. Default ‘NGRAM’. 
- window_size (int) – sliding window size for ‘NGRAM’ data. Default -1. 
- mode (str) – ‘train’ ‘test’ mode. Default ‘train’. 
- min_word_freq (int) – minimal word frequence for building word dictionary. Default 50. 
- download (bool) – whether to download dataset automatically if - data_fileis not set. Default True
 
- Returns
- 
           instance of imikolov dataset 
- Return type
- 
           Dataset 
 Examples import paddle from paddle.text.datasets import Imikolov class SimpleNet(paddle.nn.Layer): def __init__(self): super(SimpleNet, self).__init__() def forward(self, src, trg): return paddle.sum(src), paddle.sum(trg) imikolov = Imikolov(mode='train', data_type='SEQ', window_size=2) for i in range(10): src, trg = imikolov[i] src = paddle.to_tensor(src) trg = paddle.to_tensor(trg) model = SimpleNet() src, trg = model(src, trg) print(src.numpy().shape, trg.numpy().shape) 
