io

batch

create_py_reader_by_data

paddle.fluid.layers.create_py_reader_by_data(capacity, feed_list, name=None, use_double_buffer=True)[source]

The OP creates a Python reader for data feeding in Python, it is similar to py_reader except that it can read data from the list of feed variables.

Parameters
  • capacity (int) – The buffer capacity maintained by py_reader. Its unit is batch number. Set larger capacity if the reader is fast.

  • feed_list (list(Variable)) – The feed variables, are usually created by fluid.data().

  • name (str, optional) – Normally there is no need for user to set this property. For more information, please refer to api_guide_Name. Default: None.

  • use_double_buffer (bool, optional) – Whether use double buffer. If it’s True, the OP would prefetch next batch data asynchronously. Default: True.

Returns

A Reader for data feeding. The data types of read data are the same as the data types of variables of feed_list.

Return type

Reader

Examples

import paddle
import paddle.fluid as fluid
import paddle.dataset.mnist as mnist

def network(img, label):
    # User defined network. Here a simple regression as example
    predict = fluid.layers.fc(input=img, size=10, act='softmax')
    loss = fluid.layers.cross_entropy(input=predict, label=label)
    return fluid.layers.mean(loss)

MEMORY_OPT = False
USE_CUDA = False

image = fluid.data(name='image', shape=[None, 1, 28, 28], dtype='float32')
label = fluid.data(name='label', shape=[None, 1], dtype='int64')
reader = fluid.layers.create_py_reader_by_data(capacity=64,
                                               feed_list=[image, label])
reader.decorate_paddle_reader(
    paddle.reader.shuffle(paddle.batch(mnist.train(), batch_size=5), buf_size=500))
img, label = fluid.layers.read_file(reader)
loss = network(img, label) # The definition of custom network and the loss funtion

place = fluid.CUDAPlace(0) if USE_CUDA else fluid.CPUPlace()
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())

build_strategy = fluid.BuildStrategy()
build_strategy.memory_optimize = True if MEMORY_OPT else False
exec_strategy = fluid.ExecutionStrategy()
compiled_prog = fluid.compiler.CompiledProgram(
fluid.default_main_program()).with_data_parallel(
    loss_name=loss.name,
    build_strategy=build_strategy,
    exec_strategy=exec_strategy)

for epoch_id in range(2):
reader.start()
try:
    while True:
        exe.run(compiled_prog, fetch_list=[loss.name])
except fluid.core.EOFException:
    reader.reset()

data

paddle.fluid.layers.data(name, shape, append_batch_size=True, dtype='float32', lod_level=0, type=VarType.LOD_TENSOR, stop_gradient=True)[source]

Data Layer

This operator creates the global variable. The global variables can be accessed by all the following operators in the graph.

Note

paddle.fluid.layers.data is deprecated as it will be removed in a later version. Please use paddle.fluid.data .

The paddle.fluid.layers.data set shape and dtype at compile time but does NOT check the shape or the dtype of feeded data, this paddle.fluid.data checks the shape and the dtype of data feeded by Executor or ParallelExecutor during run time.

Parameters
  • name (str) – The name/alias of the variable, see api_guide_Name for more details.

  • shape (list) – Tuple declaring the shape. If append_batch_size is True and there is no -1 inside shape, it should be considered as the shape of the each sample. Otherwise, it should be considered as the shape of the batched data.

  • append_batch_size (bool) –

    1. If true, it prepends -1 to the shape.

    For example if shape=[1], the resulting shape is [-1, 1]. This will be useful to set different batch size at run time.

    1. If shape contains -1, such as shape=[1, -1].

    append_batch_size will be enforced to be be False (ineffective) because PaddlePaddle cannot set more than 1 unknown number on the shape.

  • dtype (np.dtype|VarType|str) – The type of the data. Supported dtype: bool, float16, float32, float64, int8, int16, int32, int64, uint8.

  • type (VarType) – The output type. Supported dtype: VarType.LOD_TENSOR, VarType.SELECTED_ROWS, VarType.NCCL_ID. Default: VarType.LOD_TENSOR.

  • lod_level (int) – The LoD Level. 0 means the input data is not a sequence. Default: 0.

  • stop_gradient (bool) – A boolean that mentions whether gradient should flow. Default: True.

Returns

The global variable that gives access to the data.

Return Type:

Variable

Examples

import paddle.fluid as fluid
data = fluid.layers.data(name='x', shape=[784], dtype='float32')

double_buffer

paddle.fluid.layers.double_buffer(reader, place=None, name=None)[source]

Wrap a double buffer reader. The class Reader contains DecoratedReader and FileReader. Moreover, the DecoratedReader is inherited by CustomReader and BufferedReader. This function is related to BufferedReader. The data will copy to target place with a double buffer queue. If the target place is None, the place that executor perform on will be used.

Parameters
  • reader (Variable) – The Reader Variable need to be wrapped.

  • place (Place, optional) – The place of target data, such as CPU, GPU, and if use GPU, it’s necessary to point out which card is involved. Default is the sample place of executor perform.

  • name (str, optional) – Variable name. Normally there is no need for user to set this property. For more information, please refer to api_guide_Name. Default is None.

Returns

wrapped reader with double buffer.

Return type

Variable(Reader)

Examples

import paddle.fluid as fluid
reader = fluid.layers.py_reader(capacity=64,
                                shapes=[(-1, 1, 28, 28), (-1, 1)],
                                dtypes=['float32', 'int64'],
                                use_double_buffer=False)
reader = fluid.layers.double_buffer(reader)
image, label = fluid.layers.read_file(reader)

load

paddle.fluid.layers.load(out, file_path, load_as_fp16=None)[source]

Load operator will load a LoDTensor / SelectedRows variable from disk file.

Parameters
  • out (Variable) – The LoDTensor / SelectedRows need to be loaded..

  • file_path (STRING) – Variable will be loaded from “file_path”.

  • load_as_fp16 (BOOLEAN) – If true, the tensor will be first loaded and then converted to float16 data type. Otherwise, the tensor will be directly loaded without data type conversion. Default is false..

Returns

None

Examples

import paddle.fluid as fluid
tmp_tensor = fluid.layers.create_tensor(dtype='float32')
fluid.layers.load(tmp_tensor, "./tmp_tensor.bin")

open_files

Preprocessor

py_reader

paddle.fluid.layers.py_reader(capacity, shapes, dtypes, lod_levels=None, name=None, use_double_buffer=True)[source]

Create a Python reader for data feeding in Python

This operator returns a Reader Variable. The Reader provides decorate_paddle_reader() and decorate_tensor_provider() to set a Python generator as the data source and feed the data from the data source to the Reader Variable. When Executor::Run() is invoked in C++ side, the data from the generator would be read automatically. Unlike DataFeeder.feed(), the data reading process and Executor::Run() process can run in parallel using py_reader. The start() method of the Reader should be called when each pass begins, while the reset() method should be called when the pass ends and fluid.core.EOFException raises.

Note

Program.clone() method cannot clone py_reader. You can refer to Program for more details.

The read_file call needs to be in the program block of py_reader. You can refer to read_file for more details.

Parameters
  • capacity (int) – The buffer capacity maintained by py_reader.

  • shapes (list|tuple) – List of tuples which declaring data shapes. shapes[i] represents the i-th data shape.

  • dtypes (list|tuple) – List of strings which declaring data type. Supported dtype: bool, float16, float32, float64, int8, int16, int32, int64, uint8.

  • lod_levels (list|tuple) – List of ints which declaring data lod_level.

  • name (basestring) – The default value is None. Normally there is no need for user to set this property. For more information, please refer to api_guide_Name.

  • use_double_buffer (bool) – Whether use double buffer or not. The double buffer is for pre-reading the data of the next batch and copy the data asynchronously from CPU to GPU. Default is True.

Returns

A Reader from which we can get feeding data.

Return Type:

Variable

Examples

  1. The basic usage of py_reader is as follows:

import paddle
import paddle.fluid as fluid
import paddle.dataset.mnist as mnist

def network(image, label):
    # user defined network, here a softmax regresssion example
    predict = fluid.layers.fc(input=image, size=10, act='softmax')
    return fluid.layers.cross_entropy(input=predict, label=label)

reader = fluid.layers.py_reader(capacity=64,
                                shapes=[(-1, 1, 28, 28), (-1, 1)],
                                dtypes=['float32', 'int64'])
reader.decorate_paddle_reader(
    paddle.reader.shuffle(paddle.batch(mnist.train(), batch_size=5),
                          buf_size=1000))

img, label = fluid.layers.read_file(reader)
loss = network(img, label)

fluid.Executor(fluid.CUDAPlace(0)).run(fluid.default_startup_program())
exe = fluid.ParallelExecutor(use_cuda=True)
for epoch_id in range(10):
    reader.start()
    try:
        while True:
            exe.run(fetch_list=[loss.name])
    except fluid.core.EOFException:
        reader.reset()

fluid.io.save_inference_model(dirname='./model',
                              feeded_var_names=[img.name, label.name],
                              target_vars=[loss],
                              executor=fluid.Executor(fluid.CUDAPlace(0)))

2. When training and testing are both performed, two different py_reader should be created with different names, e.g.:

import paddle
import paddle.fluid as fluid
import paddle.dataset.mnist as mnist

def network(reader):
    img, label = fluid.layers.read_file(reader)
    # User defined network. Here a simple regression as example
    predict = fluid.layers.fc(input=img, size=10, act='softmax')
    loss = fluid.layers.cross_entropy(input=predict, label=label)
    return fluid.layers.mean(loss)

# Create train_main_prog and train_startup_prog
train_main_prog = fluid.Program()
train_startup_prog = fluid.Program()
with fluid.program_guard(train_main_prog, train_startup_prog):
    # Use fluid.unique_name.guard() to share parameters with test program
    with fluid.unique_name.guard():
        train_reader = fluid.layers.py_reader(capacity=64,
                                              shapes=[(-1, 1, 28, 28),
                                                      (-1, 1)],
                                              dtypes=['float32', 'int64'],
                                              name='train_reader')
        train_reader.decorate_paddle_reader(
            paddle.reader.shuffle(paddle.batch(mnist.train(), batch_size=5),
                                  buf_size=500))
        train_loss = network(train_reader)  # some network definition
        adam = fluid.optimizer.Adam(learning_rate=0.01)
        adam.minimize(train_loss)

# Create test_main_prog and test_startup_prog
test_main_prog = fluid.Program()
test_startup_prog = fluid.Program()
with fluid.program_guard(test_main_prog, test_startup_prog):
    # Use fluid.unique_name.guard() to share parameters with train program
    with fluid.unique_name.guard():
        test_reader = fluid.layers.py_reader(capacity=32,
                                             shapes=[(-1, 1, 28, 28), (-1, 1)],
                                             dtypes=['float32', 'int64'],
                                             name='test_reader')
        test_reader.decorate_paddle_reader(paddle.batch(mnist.test(), 512))
        test_loss = network(test_reader)

fluid.Executor(fluid.CUDAPlace(0)).run(train_startup_prog)
fluid.Executor(fluid.CUDAPlace(0)).run(test_startup_prog)

train_exe = fluid.ParallelExecutor(use_cuda=True,
                                   loss_name=train_loss.name,
                                   main_program=train_main_prog)
test_exe = fluid.ParallelExecutor(use_cuda=True,
                                  loss_name=test_loss.name,
                                  main_program=test_main_prog)
for epoch_id in range(10):
    train_reader.start()
    try:
        while True:
           train_exe.run(fetch_list=[train_loss.name])
    except fluid.core.EOFException:
        train_reader.reset()

test_reader.start()
try:
    while True:
        test_exe.run(fetch_list=[test_loss.name])
except fluid.core.EOFException:
    test_reader.reset()

random_data_generator

read_file

paddle.fluid.layers.read_file(reader)[source]

Execute the given reader and get data via it.

A reader is also a Variable. It can be a raw reader generated by fluid.layers.open_files() or a decorated one generated by fluid.layers.double_buffer() .

Parameters

reader (Variable) – The reader to execute.

Returns

Data read from the given reader.

Return type

Tuple[Variable]

Examples

import paddle.fluid as fluid
reader = fluid.layers.py_reader(capacity=64,
                                shapes=[(-1, 1, 28, 28), (-1, 1)],
                                dtypes=['float32', 'int64'])
image, label = fluid.layers.read_file(reader)

shuffle