DataFeedDesc¶
- class paddle.fluid.data_feed_desc. DataFeedDesc ( proto_file ) [source]
- 
         - Api_attr
- 
           Static Graph 
 Datafeed descriptor, describing input training data format. This class is currently only used for AsyncExecutor (See comments for class AsyncExecutor for a brief introduction) DataFeedDesc shall be initialized from a valid protobuf message from disk. See paddle/fluid/framework/data_feed.protofor message definition. A typical message might look like:import paddle.fluid as fluid f = open("data.proto", "w") print >> f, 'name: "MultiSlotDataFeed"' print >> f, 'batch_size: 2' print >> f, 'multi_slot_desc {' print >> f, ' slots {' print >> f, ' name: "words"' print >> f, ' type: "uint64"' print >> f, ' is_dense: false' print >> f, ' is_used: true' print >> f, ' }' print >> f, ' slots {' print >> f, ' name: "label"' print >> f, ' type: "uint64"' print >> f, ' is_dense: false' print >> f, ' is_used: true' print >> f, ' }' print >> f, '}' f.close() data_feed = fluid.DataFeedDesc('data.proto') However, users usually shouldn’t care about the message format; instead, they are encouraged to use Data Generatoras a tool to generate a valid data description, in the process of converting their raw log files to training files acceptable to AsyncExecutor.DataFeedDesc can also be changed during runtime. Once you got familiar with what each field mean, you can modify it to better suit your need. E.g.: import paddle.fluid as fluid data_feed = fluid.DataFeedDesc('data.proto') data_feed.set_batch_size(128) data_feed.set_dense_slots('wd') # The slot named 'wd' will be dense data_feed.set_use_slots('wd') # The slot named 'wd' will be used Finally, the content can be dumped out for debugging purpose: print(data_feed.desc()) - Parameters
- 
           proto_file (string) – Disk file containing a data feed description. 
 - 
            
           set_batch_size
           (
           batch_size
           )
           set_batch_size¶
- 
           Set batch_sizein api_fluid_DataFeedDesc .batch_sizecan be changed during training.Example import paddle.fluid as fluid f = open("data.proto", "w") print >> f, 'name: "MultiSlotDataFeed"' print >> f, 'batch_size: 2' print >> f, 'multi_slot_desc {' print >> f, ' slots {' print >> f, ' name: "words"' print >> f, ' type: "uint64"' print >> f, ' is_dense: false' print >> f, ' is_used: true' print >> f, ' }' print >> f, ' slots {' print >> f, ' name: "label"' print >> f, ' type: "uint64"' print >> f, ' is_dense: false' print >> f, ' is_used: true' print >> f, ' }' print >> f, '}' f.close() data_feed = fluid.DataFeedDesc('data.proto') data_feed.set_batch_size(128) - Parameters
- 
             batch_size (int) – The number of batch size. 
- Returns
- 
             None. 
 
 - 
            
           set_dense_slots
           (
           dense_slots_name
           )
           set_dense_slots¶
- 
           Set slots in dense_slots_nameas dense slots. Note: In default, all slots are sparse slots.Features for a dense slot will be fed into a Tensor, while those for a sparse slot will be fed into a LoDTensor. Example import paddle.fluid as fluid f = open("data.proto", "w") print >> f, 'name: "MultiSlotDataFeed"' print >> f, 'batch_size: 2' print >> f, 'multi_slot_desc {' print >> f, ' slots {' print >> f, ' name: "words"' print >> f, ' type: "uint64"' print >> f, ' is_dense: false' print >> f, ' is_used: true' print >> f, ' }' print >> f, ' slots {' print >> f, ' name: "label"' print >> f, ' type: "uint64"' print >> f, ' is_dense: false' print >> f, ' is_used: true' print >> f, ' }' print >> f, '}' f.close() data_feed = fluid.DataFeedDesc('data.proto') data_feed.set_dense_slots(['words']) - Parameters
- 
             dense_slots_name (list(str)) – a list of slot names which will be set dense. 
- Returns
- 
             None. 
 
 - 
            
           set_use_slots
           (
           use_slots_name
           )
           set_use_slots¶
- 
           Set if a specific slot will be used for training. A dataset shall contain a lot of features, through this function one can select which ones will be used for a specific model. Example import paddle.fluid as fluid f = open("data.proto", "w") print >> f, 'name: "MultiSlotDataFeed"' print >> f, 'batch_size: 2' print >> f, 'multi_slot_desc {' print >> f, ' slots {' print >> f, ' name: "words"' print >> f, ' type: "uint64"' print >> f, ' is_dense: false' print >> f, ' is_used: true' print >> f, ' }' print >> f, ' slots {' print >> f, ' name: "label"' print >> f, ' type: "uint64"' print >> f, ' is_dense: false' print >> f, ' is_used: true' print >> f, ' }' print >> f, '}' f.close() data_feed = fluid.DataFeedDesc('data.proto') data_feed.set_use_slots(['words']) - Parameters
- 
             use_slots_name – a list of slot names which will be used in training 
 Note Default is not used for all slots 
 - 
            
           desc
           (
           )
           desc¶
- 
           Returns a protobuf message for this DataFeedDesc Example import paddle.fluid as fluid f = open("data.proto", "w") print >> f, 'name: "MultiSlotDataFeed"' print >> f, 'batch_size: 2' print >> f, 'multi_slot_desc {' print >> f, ' slots {' print >> f, ' name: "words"' print >> f, ' type: "uint64"' print >> f, ' is_dense: false' print >> f, ' is_used: true' print >> f, ' }' print >> f, ' slots {' print >> f, ' name: "label"' print >> f, ' type: "uint64"' print >> f, ' is_dense: false' print >> f, ' is_used: true' print >> f, ' }' print >> f, '}' f.close() data_feed = fluid.DataFeedDesc('data.proto') print(data_feed.desc()) - Returns
- 
             A string message 
 
 
