ExecutionStrategy

class paddle.fluid.ExecutionStrategy

ExecutionStrategy allows the user to more preciously control how to run the program in ParallelExecutor by setting the property.

Examples

import paddle.fluid as fluid
x = fluid.layers.data(name='x', shape=[13], dtype='float32')
y = fluid.layers.data(name='y', shape=[1], dtype='float32')
y_predict = fluid.layers.fc(input=x, size=1, act=None)

cost = fluid.layers.square_error_cost(input=y_predict, label=y)
avg_loss = fluid.layers.mean(cost)

sgd_optimizer = fluid.optimizer.SGD(learning_rate=0.001)
sgd_optimizer.minimize(avg_loss)

exec_strategy = fluid.ExecutionStrategy()
exec_strategy.num_threads = 4

train_exe = fluid.ParallelExecutor(use_cuda=False,
                                   loss_name=avg_loss.name,
                                   exec_strategy=exec_strategy)
allow_op_delay

The type is BOOL, allow_op_delay represents whether to delay the communication operators to run, it may make the execution faster. Note that this option is invalid now, and it will be removed in next version. Default False.

num_iteration_per_drop_scope

The type is INT, num_iteration_per_drop_scope indicates how many iterations to clean up the temp variables which is generated during execution. It may make the execution faster, because the temp variable’s shape maybe the same between two iterations. Default 1.

Notes

  1. If you fetch data when calling the ‘run’, the ParallelExecutor will clean up the temp variables at the end of the current iteration.

  2. In some NLP model, it may cause the GPU memory is insufficient, in this case, you should reduce num_iteration_per_drop_scope.

num_iteration_per_run

This config that how many iteration the executor will run when user call exe.run() in python

num_threads

The type is INT, num_threads represents the size of thread pool that used to run the operators of the current program in ParallelExecutor. If \(num\_threads=1\), all the operators will execute one by one, but the order maybe difference between iterations. If it is not set, it will be set in ParallelExecutor according to the device type and device count, for GPU, \(num\_threads=device\_count*4\), for CPU, \(num\_threads=CPU\_NUM*4\), the explanation of:math:CPU_NUM is in ParallelExecutor. if it is not set, ParallelExecutor will get the cpu count by calling multiprocessing.cpu_count(). Default 0.