MultiSlotStringDataGenerator¶
-
class
paddle.fluid.incubate.data_generator.
MultiSlotStringDataGenerator
[source] -
-
generate_batch
( samples ) -
This function needs to be overridden by the user to process the generated samples from generate_sample(self, str) function It is usually used as batch processing when a user wants to do preprocessing on a batch of samples, e.g. padding according to the max length of a sample in the batch
- Parameters
-
samples (list tuple) – generated sample from generate_sample
- Returns
-
a python generator, the same format as return value of generate_sample
Example
-
generate_sample
( line ) -
This function needs to be overridden by the user to process the original data row into a list or tuple.
- Parameters
-
line (str) – the original data row
- Returns
-
- Returns the data processed by the user.
-
The data format is list or tuple:
- [(name, [feasign, …]), …]
-
or ((name, [feasign, …]), …)
For example: [(“words”, [1926, 08, 17]), (“label”, [1])]
or ((“words”, [1926, 08, 17]), (“label”, [1]))
Note
The type of feasigns must be in int or float. Once the float element appears in the feasign, the type of that slot will be processed into a float.
Example
-
run_from_memory
( ) -
This function generator data from memory, it is usually used for debug and benchmarking
Example
-
run_from_stdin
( ) -
This function reads the data row from stdin, parses it with the process function, and further parses the return value of the process function with the _gen_str function. The parsed data will be wrote to stdout and the corresponding protofile will be generated.
Example
-
set_batch
( batch_size ) -
Set batch size of current DataGenerator This is necessary only if a user wants to define generator_batch
Example
-