MultiSlotStringDataGenerator

class paddle.fluid.incubate.data_generator. MultiSlotStringDataGenerator [source]
generate_batch ( samples )

This function needs to be overridden by the user to process the generated samples from generate_sample(self, str) function It is usually used as batch processing when a user wants to do preprocessing on a batch of samples, e.g. padding according to the max length of a sample in the batch

Parameters

samples (list tuple) – generated sample from generate_sample

Returns

a python generator, the same format as return value of generate_sample

Example

System Message: ERROR/3 (/usr/local/lib/python3.8/site-packages/paddle/fluid/incubate/data_generator/__init__.py:docstring of paddle.fluid.incubate.data_generator.DataGenerator.generate_batch, line 14)

Error in “code-block” directive: maximum 1 argument(s) allowed, 7 supplied.

.. code-block:: python
    import paddle.fluid.incubate.data_generator as dg
    class MyData(dg.DataGenerator):

        def generate_sample(self, line):
            def local_iter():
                int_words = [int(x) for x in line.split()]
                yield ("words", int_words)
            return local_iter

        def generate_batch(self, samples):
            def local_iter():
                for s in samples:
                    yield ("words", s[1].extend([s[1][0]]))
    mydata = MyData()
    mydata.set_batch(128)
generate_sample ( line )

This function needs to be overridden by the user to process the original data row into a list or tuple.

Parameters

line (str) – the original data row

Returns

Returns the data processed by the user.

The data format is list or tuple:

[(name, [feasign, …]), …]

or ((name, [feasign, …]), …)

For example: [(“words”, [1926, 08, 17]), (“label”, [1])]

System Message: ERROR/3 (/usr/local/lib/python3.8/site-packages/paddle/fluid/incubate/data_generator/__init__.py:docstring of paddle.fluid.incubate.data_generator.DataGenerator.generate_sample, line 16)

Unexpected indentation.

or ((“words”, [1926, 08, 17]), (“label”, [1]))

Note

The type of feasigns must be in int or float. Once the float element appears in the feasign, the type of that slot will be processed into a float.

Example

System Message: ERROR/3 (/usr/local/lib/python3.8/site-packages/paddle/fluid/incubate/data_generator/__init__.py:docstring of paddle.fluid.incubate.data_generator.DataGenerator.generate_sample, line 26)

Error in “code-block” directive: maximum 1 argument(s) allowed, 7 supplied.

.. code-block:: python
    import paddle.fluid.incubate.data_generator as dg
    class MyData(dg.DataGenerator):

        def generate_sample(self, line):
            def local_iter():
                int_words = [int(x) for x in line.split()]
                yield ("words", [int_words])
            return local_iter
run_from_memory ( )

This function generator data from memory, it is usually used for debug and benchmarking

Example

System Message: ERROR/3 (/usr/local/lib/python3.8/site-packages/paddle/fluid/incubate/data_generator/__init__.py:docstring of paddle.fluid.incubate.data_generator.DataGenerator.run_from_memory, line 6)

Error in “code-block” directive: maximum 1 argument(s) allowed, 7 supplied.

.. code-block:: python
    import paddle.fluid.incubate.data_generator as dg
    class MyData(dg.DataGenerator):

        def generate_sample(self, line):
            def local_iter():
                yield ("words", [1, 2, 3, 4])
            return local_iter

    mydata = MyData()
    mydata.run_from_memory()
run_from_stdin ( )

This function reads the data row from stdin, parses it with the process function, and further parses the return value of the process function with the _gen_str function. The parsed data will be wrote to stdout and the corresponding protofile will be generated.

Example

System Message: ERROR/3 (/usr/local/lib/python3.8/site-packages/paddle/fluid/incubate/data_generator/__init__.py:docstring of paddle.fluid.incubate.data_generator.DataGenerator.run_from_stdin, line 9)

Error in “code-block” directive: maximum 1 argument(s) allowed, 7 supplied.

.. code-block:: python
    import paddle.fluid.incubate.data_generator as dg
    class MyData(dg.DataGenerator):

        def generate_sample(self, line):
            def local_iter():
                int_words = [int(x) for x in line.split()]
                yield ("words", [int_words])
            return local_iter

    mydata = MyData()
    mydata.run_from_stdin()
set_batch ( batch_size )

Set batch size of current DataGenerator This is necessary only if a user wants to define generator_batch

Example

System Message: ERROR/3 (/usr/local/lib/python3.8/site-packages/paddle/fluid/incubate/data_generator/__init__.py:docstring of paddle.fluid.incubate.data_generator.DataGenerator.set_batch, line 6)

Error in “code-block” directive: maximum 1 argument(s) allowed, 7 supplied.

.. code-block:: python
    import paddle.fluid.incubate.data_generator as dg
    class MyData(dg.DataGenerator):

        def generate_sample(self, line):
            def local_iter():
                int_words = [int(x) for x in line.split()]
                yield ("words", int_words)
            return local_iter

        def generate_batch(self, samples):
            def local_iter():
                for s in samples:
                    yield ("words", s[1].extend([s[1][0]]))
    mydata = MyData()
    mydata.set_batch(128)