DataFeedDesc

class paddle.fluid.DataFeedDesc(proto_file)[source]

Datafeed descriptor, describing input training data format. This class is currently only used for AsyncExecutor (See comments for class AsyncExecutor for a brief introduction)

DataFeedDesc shall be initialized from a valid protobuf message from disk.

See paddle/fluid/framework/data_feed.proto for message definition. A typical message might look like:

import paddle.fluid as fluid
f = open("data.proto", "w")
print >> f, 'name: "MultiSlotDataFeed"'
print >> f, 'batch_size: 2'
print >> f, 'multi_slot_desc {'
print >> f, '    slots {'
print >> f, '         name: "words"'
print >> f, '         type: "uint64"'
print >> f, '         is_dense: false'
print >> f, '         is_used: true'
print >> f, '     }'
print >> f, '     slots {'
print >> f, '         name: "label"'
print >> f, '         type: "uint64"'
print >> f, '         is_dense: false'
print >> f, '         is_used: true'
print >> f, '    }'
print >> f, '}'
f.close()
data_feed = fluid.DataFeedDesc('data.proto')

However, users usually shouldn’t care about the message format; instead, they are encouragd to use Data Generator as a tool to generate a valid data description, in the process of converting their raw log files to training files acceptable to AsyncExecutor.

DataFeedDesc can also be changed during runtime. Once you got familiar with what each field mean, you can modify it to better suit your need. E.g.:

import paddle.fluid as fluid
data_feed = fluid.DataFeedDesc('data.proto')
data_feed.set_batch_size(128)
data_feed.set_dense_slots('wd')  # The slot named 'wd' will be dense
data_feed.set_use_slots('wd')    # The slot named 'wd' will be used

Finally, the content can be dumped out for debugging purpose:

print(data_feed.desc())
Parameters

proto_file (string) – Disk file containing a data feed description.

set_batch_size(batch_size)

Set batch_size in DataFeedDesc . batch_size can be changed during training.

Example

import paddle.fluid as fluid
f = open("data.proto", "w")
print >> f, 'name: "MultiSlotDataFeed"'
print >> f, 'batch_size: 2'
print >> f, 'multi_slot_desc {'
print >> f, '    slots {'
print >> f, '         name: "words"'
print >> f, '         type: "uint64"'
print >> f, '         is_dense: false'
print >> f, '         is_used: true'
print >> f, '     }'
print >> f, '     slots {'
print >> f, '         name: "label"'
print >> f, '         type: "uint64"'
print >> f, '         is_dense: false'
print >> f, '         is_used: true'
print >> f, '    }'
print >> f, '}'
f.close()
data_feed = fluid.DataFeedDesc('data.proto')
data_feed.set_batch_size(128)
Parameters

batch_size (int) – The number of batch size.

Returns

None.

set_dense_slots(dense_slots_name)

Set slots in dense_slots_name as dense slots. Note: In default, all slots are sparse slots.

Features for a dense slot will be fed into a Tensor, while those for a sparse slot will be fed into a LoDTensor.

Example

import paddle.fluid as fluid
f = open("data.proto", "w")
print >> f, 'name: "MultiSlotDataFeed"'
print >> f, 'batch_size: 2'
print >> f, 'multi_slot_desc {'
print >> f, '    slots {'
print >> f, '         name: "words"'
print >> f, '         type: "uint64"'
print >> f, '         is_dense: false'
print >> f, '         is_used: true'
print >> f, '     }'
print >> f, '     slots {'
print >> f, '         name: "label"'
print >> f, '         type: "uint64"'
print >> f, '         is_dense: false'
print >> f, '         is_used: true'
print >> f, '    }'
print >> f, '}'
f.close()
data_feed = fluid.DataFeedDesc('data.proto')
data_feed.set_dense_slots(['words'])
Parameters

dense_slots_name (list(str)) – a list of slot names which will be set dense.

Returns

None.

set_use_slots(use_slots_name)

Set if a specific slot will be used for training. A dataset shall contain a lot of features, through this function one can select which ones will be used for a specific model.

Example

import paddle.fluid as fluid
f = open("data.proto", "w")
print >> f, 'name: "MultiSlotDataFeed"'
print >> f, 'batch_size: 2'
print >> f, 'multi_slot_desc {'
print >> f, '    slots {'
print >> f, '         name: "words"'
print >> f, '         type: "uint64"'
print >> f, '         is_dense: false'
print >> f, '         is_used: true'
print >> f, '     }'
print >> f, '     slots {'
print >> f, '         name: "label"'
print >> f, '         type: "uint64"'
print >> f, '         is_dense: false'
print >> f, '         is_used: true'
print >> f, '    }'
print >> f, '}'
f.close()
data_feed = fluid.DataFeedDesc('data.proto')
data_feed.set_use_slots(['words'])
Parameters

use_slots_name – a list of slot names which will be used in training

Note

Default is not used for all slots

desc()

Returns a protobuf message for this DataFeedDesc

Example

import paddle.fluid as fluid
f = open("data.proto", "w")
print >> f, 'name: "MultiSlotDataFeed"'
print >> f, 'batch_size: 2'
print >> f, 'multi_slot_desc {'
print >> f, '    slots {'
print >> f, '         name: "words"'
print >> f, '         type: "uint64"'
print >> f, '         is_dense: false'
print >> f, '         is_used: true'
print >> f, '     }'
print >> f, '     slots {'
print >> f, '         name: "label"'
print >> f, '         type: "uint64"'
print >> f, '         is_dense: false'
print >> f, '         is_used: true'
print >> f, '    }'
print >> f, '}'
f.close()
data_feed = fluid.DataFeedDesc('data.proto')
print(data_feed.desc())
Returns

A string message