BuildStrategy

class paddle.fluid.BuildStrategy

BuildStrategy allows the user to more preciously control how to build the SSA Graph in ParallelExecutor by setting the property.

Examples

import os
import numpy as np
import paddle.fluid as fluid

os.environ["CPU_NUM"] = '2'
places = fluid.cpu_places()

data = fluid.layers.data(name="x", shape=[1], dtype="float32")
hidden = fluid.layers.fc(input=data, size=10)
loss = fluid.layers.mean(hidden)
fluid.optimizer.SGD(learning_rate=0.01).minimize(loss)

build_strategy = fluid.BuildStrategy()
build_strategy.enable_inplace = True
build_strategy.memory_optimize = True
build_strategy.reduce_strategy = fluid.BuildStrategy.ReduceStrategy.Reduce
program = fluid.compiler.CompiledProgram(fluid.default_main_program())
program = program.with_data_parallel(loss_name=loss.name,
                                     build_strategy=build_strategy,
                                     places=places)
debug_graphviz_path

debug_graphviz_path indicates the path that writing the SSA Graph to file in the form of graphviz. It is useful for debugging. Default is empty string, that is, “”

Examples

import paddle.fluid as fluid
build_strategy = fluid.BuildStrategy()
build_strategy.debug_graphviz_path = "./graph"
Type

(str, optional)

enable_sequential_execution

If set True, the execution order of ops would be the same as what is in the program. Default is False.

Examples

import paddle.fluid as fluid
build_strategy = fluid.BuildStrategy()
build_strategy.enable_sequential_execution = True
Type

(bool, optional)

fuse_broadcast_ops

fuse_broadcast_op indicates whether to fuse the broadcast ops. Note that, in Reduce mode, fusing broadcast ops may make the program faster. Because fusing broadcast OP equals delaying the execution of all broadcast Ops, in this case, all nccl streams are used only for NCCLReduce operations for a period of time. Default False.

Examples

import paddle.fluid as fluid
build_strategy = fluid.BuildStrategy()
build_strategy.fuse_broadcast_ops = True
Type

(bool, optional)

fuse_elewise_add_act_ops

fuse_elewise_add_act_ops indicate whether to fuse elementwise_add_op and activation_op, it may make the execution faster. Default is False.

Examples

import paddle.fluid as fluid
build_strategy = fluid.BuildStrategy()
build_strategy.fuse_elewise_add_act_ops = True
Type

(bool, optional)

fuse_relu_depthwise_conv

fuse_relu_depthwise_conv indicate whether to fuse relu and depthwise_conv2d, it will save GPU memory and may make the execution faster. This options is only available in GPU devices. Default is False.

Examples

import paddle.fluid as fluid
build_strategy = fluid.BuildStrategy()
build_strategy.fuse_relu_depthwise_conv = True
Type

(bool, optional)

gradient_scale_strategy

there are three ways of defining \(loss@grad\) in ParallelExecutor, that is, CoeffNumDevice, One and Customized. By default, ParallelExecutor sets the \(loss@grad\) according to the number of devices. If you want to customize \(loss@grad\), you can choose Customized. Default is ‘CoeffNumDevice’.

Examples

import paddle.fluid as fluid
import paddle.fluid.compiler as compiler
import numpy
import os

use_cuda = True
place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
exe = fluid.Executor(place)

# NOTE: If you use CPU to run the program, you need
# to specify the CPU_NUM, otherwise, fluid will use
# all the number of the logic core as the CPU_NUM,
# in that case, the batch size of the input should be
# greater than CPU_NUM, if not, the process will be
# failed by an exception.
if not use_cuda:
    os.environ['CPU_NUM'] = str(2)
    places = fluid.cpu_places()
else:
    places = places = fluid.cuda_places()

data = fluid.layers.data(name='X', shape=[1], dtype='float32')
hidden = fluid.layers.fc(input=data, size=10)
loss = fluid.layers.mean(hidden)
fluid.optimizer.SGD(learning_rate=0.01).minimize(loss)

fluid.default_startup_program().random_seed=1
exe.run(fluid.default_startup_program())

build_strategy = fluid.BuildStrategy()
build_strategy.gradient_scale_strategy = \
         fluid.BuildStrategy.GradientScaleStrategy.Customized
compiled_prog = compiler.CompiledProgram(
         fluid.default_main_program()).with_data_parallel(
                  loss_name=loss.name, build_strategy=build_strategy,
                  places = places)

dev_count =  len(places)
x = numpy.random.random(size=(10, 1)).astype('float32')
loss_grad = numpy.ones((dev_count)).astype("float32") * 0.01
loss_grad_name = loss.name+"@GRAD"
loss_data = exe.run(compiled_prog,
                     feed={"X": x, loss_grad_name : loss_grad},
                     fetch_list=[loss.name, loss_grad_name])
Type

(fluid.BuildStrategy.GradientScaleStrategy, optional)

memory_optimize[source]

memory opitimize aims to save total memory consumption, set to True to enable it.

Default None. None means framework would choose to use or not use this strategy automatically. Currently, None means that it is enabled when GC is disabled, and disabled when GC is enabled. True means enabling and False means disabling. Default is None.

Type

(bool, optional)

reduce_strategy

there are two reduce strategies in ParallelExecutor, AllReduce and Reduce. If you want that all the parameters’ optimization are done on all devices independently, you should choose AllReduce; otherwise, if you choose Reduce, all the parameters’ optimization will be evenly distributed to different devices, and then broadcast the optimized parameter to other devices. Default is ‘AllReduce’.

Examples

import paddle.fluid as fluid
build_strategy = fluid.BuildStrategy()
build_strategy.reduce_strategy = fluid.BuildStrategy.ReduceStrategy.Reduce
Type

(fluid.BuildStrategy.ReduceStrategy, optional)

remove_unnecessary_lock

If set True, some locks in GPU ops would be released and ParallelExecutor would run faster. Default is True.

Examples

import paddle.fluid as fluid
build_strategy = fluid.BuildStrategy()
build_strategy.remove_unnecessary_lock = True
Type

(bool, optional)

sync_batch_norm

sync_batch_norm indicates whether to use synchronous batch normalization which synchronizes the mean and variance through multi-devices in training phase. Current implementation doesn’t support FP16 training and CPU. And only synchronous on one machine, not all machines. Default is False.

Examples

import paddle.fluid as fluid
build_strategy = fluid.BuildStrategy()
build_strategy.sync_batch_norm = True
Type

(bool, optional)