QueueDataset

class paddle.fluid.dataset.QueueDataset[source]

QueueDataset, it will process data streamly.

Examples

import paddle.fluid as fluid
dataset = fluid.DatasetFactory().create_dataset("QueueDataset")
local_shuffle()

Local shuffle data.

Local shuffle is not supported in QueueDataset NotImplementedError will be raised

Examples

import paddle.fluid as fluid
dataset = fluid.DatasetFactory().create_dataset("QueueDataset")
dataset.local_shuffle()
Raises

NotImplementedError – QueueDataset does not support local shuffle

global_shuffle(fleet=None)

Global shuffle data.

Global shuffle is not supported in QueueDataset NotImplementedError will be raised

Parameters

fleet (Fleet) – fleet singleton. Default None.

Examples

import paddle.fluid as fluid
from paddle.fluid.incubate.fleet.parameter_server.pslib import fleet
dataset = fluid.DatasetFactory().create_dataset("QueueDataset")
dataset.global_shuffle(fleet)
Raises

NotImplementedError – QueueDataset does not support global shuffle

desc()

Returns a protobuf message for this DataFeedDesc

Examples

import paddle.fluid as fluid
dataset = fluid.DatasetFactory().create_dataset()
print(dataset.desc())
Returns

A string message

set_batch_size(batch_size)

Set batch size. Will be effective during training

Examples

import paddle.fluid as fluid
dataset = fluid.DatasetFactory().create_dataset()
dataset.set_batch_size(128)
Parameters

batch_size (int) – batch size

set_fea_eval(record_candidate_size, fea_eval=True)

set fea eval mode for slots shuffle to debug the importance level of slots(features), fea_eval need to be set True for slots shuffle.

Parameters
  • record_candidate_size (int) – size of instances candidate to shuffle one slot

  • fea_eval (bool) – wheather enable fea eval mode to enable slots shuffle. default is True.

Examples


import paddle.fluid as fluid dataset = fluid.DatasetFactory().create_dataset(“InMemoryDataset”) dataset.set_fea_eval(1000000, True)

set_filelist(filelist)

Set file list in current worker.

Examples

import paddle.fluid as fluid
dataset = fluid.DatasetFactory().create_dataset()
dataset.set_filelist(['a.txt', 'b.txt'])
Parameters

filelist (list) – file list

set_hdfs_config(fs_name, fs_ugi)

Set hdfs config: fs name ad ugi

Examples

import paddle.fluid as fluid
dataset = fluid.DatasetFactory().create_dataset()
dataset.set_hdfs_config("my_fs_name", "my_fs_ugi")
Parameters
  • fs_name (str) – fs name

  • fs_ugi (str) – fs ugi

set_pipe_command(pipe_command)

Set pipe command of current dataset A pipe command is a UNIX pipeline command that can be used only

Examples

import paddle.fluid as fluid
dataset = fluid.DatasetFactory().create_dataset()
dataset.set_pipe_command("python my_script.py")
Parameters

pipe_command (str) – pipe command

set_thread(thread_num)

Set thread num, it is the num of readers.

Examples

import paddle.fluid as fluid
dataset = fluid.DatasetFactory().create_dataset()
 dataset.set_thread(12)
Parameters

thread_num (int) – thread num

set_use_var(var_list)

Set Variables which you will use.

Examples

import paddle.fluid as fluid
dataset = fluid.DatasetFactory().create_dataset()
dataset.set_use_var([data, label])
Parameters

var_list (list) – variable list

slots_shuffle(slots)

Slots Shuffle Slots Shuffle is a shuffle method in slots level, which is usually used in sparse feature with large scale of instances. To compare the metric, i.e. auc while doing slots shuffle on one or several slots with baseline to evaluate the importance level of slots(features).

Parameters

slots (list[string]) – the set of slots(string) to do slots shuffle.

Examples

import paddle.fluid as fluid dataset = fluid.DatasetFactory().create_dataset(“InMemoryDataset”) dataset.set_merge_by_lineid() #suppose there is a slot 0 dataset.slots_shuffle([‘0’])