- class paddle.static. ExecutionStrategy
ExecutionStrategy allows the user to more preciously control how to run the program in ParallelExecutor by setting the property.
An ExecutionStrategy object.
- Return type
import paddle import paddle.static as static import paddle.nn.functional as F paddle.enable_static() x = static.data(name='x', shape=[None, 13], dtype='float32') y = static.data(name='y', shape=[None, 1], dtype='float32') y_predict = static.nn.fc(input=x, size=1, act=None) cost = F.square_error_cost(input=y_predict, label=y) avg_loss = paddle.mean(cost) sgd_optimizer = paddle.optimizer.SGD(learning_rate=0.001) sgd_optimizer.minimize(avg_loss) exec_strategy = static.ExecutionStrategy() exec_strategy.num_threads = 4 train_exe = static.ParallelExecutor(use_cuda=False, loss_name=avg_loss.name, exec_strategy=exec_strategy)
- property allow_op_delay
The type is BOOL, allow_op_delay represents whether to delay the communication operators to run, it may make the execution faster. Note that this option is invalid now, and it will be removed in next version. Default False.
- property num_iteration_per_drop_scope
The type is INT, num_iteration_per_drop_scope indicates how many iterations to clean up the temp variables which is generated during execution. It may make the execution faster, because the temp variable’s shape maybe the same between two iterations. Default 100.
1. If you fetch data when calling the ‘run’, the ParallelExecutor will clean up the temp variables at the end of the current iteration. 2. In some NLP model, it may cause the GPU memory is insufficient, in this case, you should reduce num_iteration_per_drop_scope.
import paddle import paddle.static as static paddle.enable_static() exec_strategy = static.ExecutionStrategy() exec_strategy.num_iteration_per_drop_scope = 10
- property num_iteration_per_run
This config that how many iteration the executor will run when user call exe.run() in python。Default: 1.
import paddle import paddle.static as static paddle.enable_static() exec_strategy = static.ExecutionStrategy() exec_strategy.num_iteration_per_run = 10
- property num_threads
The type is INT, num_threads represents the size of thread pool that used to run the operators of the current program in ParallelExecutor. If \(num\_threads=1\), all the operators will execute one by one, but the order maybe difference between iterations. If it is not set, it will be set in ParallelExecutor according to the device type and device count, for GPU, \(num\_threads=device\_count*4\), for CPU, \(num\_threads=CPU\_NUM*4\), the explanation of:math:CPU_NUM is in ParallelExecutor. if it is not set, ParallelExecutor will get the cpu count by calling multiprocessing.cpu_count(). Default 0.
import paddle import paddle.static as static paddle.enable_static() exec_strategy = static.ExecutionStrategy() exec_strategy.num_threads = 4
- property use_thread_barrier
This config that the this is distributed training with parameter server