Strategy¶
- class paddle.distributed. Strategy ( config=None ) [source]
-
The Strategy object is used to configure the parallelization and optimization strategies for static graph. Currently contains configuring
sharding,fused_passes,gradient_mergeandpipline. More strategies will be supported in the future.shardingis used to cnofigure the sharding states of the optimizer, for saving the GPU memory.fused_passesis used to configure the fusion of the computation in the model.gradient_mergeis used to configure the gradient merge strategy in training.pipelineis used to configure the pipeline parallelism strategy.- Parameters
-
config (dict|None, optional) – If
configis None, the defaultdict (configurations will be set. If it is a) –
inside (the itmes) –
configurations (the dict will be used to set the) –
remain (the others) –
values. (the default) –
Examples
>>> import paddle >>> import paddle.distributed as dist >>> strategy = dist.Strategy() >>> strategy.sharding.enable = True >>> strategy.sharding.stage = 2 >>> strategy.sharding.degree = 2 >>> strategy.gradient_merge.enable = True >>> strategy.gradient_merge.k_steps = 2 >>> strategy.gradient_merge.avg = False >>> strategy.pipeline.enable = True >>> strategy.pipeline.schedule_mode = "1F1B" # default is "1F1B" >>> strategy.pipeline.micro_batch_size = 2
- property sharding [source]
-
shardingis used to cnofigure the sharding states of the optimizer, containing following configs:enable(bool): whether to enable sharding. Default: False.stage(int): can be set to 1, 2 or 3. 1 indicates the optimizer states segmentation, 2 indicates optimizer states and gradient segmentation, 3 indicates the segmentation of optimizer states, gradient and parameters. Default: 1.degree(int): the number of segmentation pieces. Default: 8.Examples
>>> import paddle >>> import paddle.distributed as dist >>> strategy = dist.Strategy() >>> strategy.sharding.enable = True >>> strategy.sharding.stage = 2 >>> strategy.sharding.degree = 2
- property gradient_merge
-
gradient_mergeis used to configure the gradient merge strategy in training, containing following configs:enable(bool): whether to enable gradient merge. Default: False.k_steps(int): the number of steps for merging gradients. Default: 1.avg(bool): whether to average the gradients of each step. Default: True.Examples
>>> import paddle >>> import paddle.distributed as dist >>> strategy = dist.Strategy() >>> strategy.gradient_merge.enable = True >>> strategy.gradient_merge.k_steps = 2 >>> strategy.gradient_merge.avg = True
- property fused_passes
-
fused_passesis used to configure the fusion of the computation in the model, containing following configs:enable(bool): whether to enable fused passes. Default: False.gemm_epilogue(bool): whether to fusematmulandaddcomputation in theLinearlayer. Default: False“dropout_add” (bool): whether to fuse
dropoutandaddcomputation. Default: False.Examples
>>> import paddle >>> import paddle.distributed as dist >>> strategy = dist.Strategy() >>> strategy.fused_passes.enable = True >>> strategy.fused_passes.gemm_spilogue = True >>> strategy.fused_passes.dropout_add = True
- property pipeline
-
pipelineis used to configure the pipeline parallelism in training, containing following configs:enable(bool): whether to enable pipeline parallelism. Default: False.schedule_mode(str): the scheduling mode of pipeline parallelism. Default: “1F1B”.micro_batch_size(int): the size of each micro-batch inside a mini-batch. Default: 1.accumulate_steps(int): number of steps for accumulating. Default: 1.Examples
>>> import paddle >>> import paddle.distributed as dist >>> strategy = dist.Strategy() >>> strategy.pipeline.enable = True >>> strategy.pipeline.micro_batch_size = 2
