DataParallel

class paddle. DataParallel ( layers, strategy=None, comm_buffer_size=25, last_comm_buffer_size=1 ) [source]

Run the dygraph module with data parallelism.

Currently, DataParallel class only supports to run the dynamic graph with multi-process.

Now supports two ways to start training:

  1. start by paddle.distributed.spawn method, for example:

    python demo.py (spawn need to be called in __main__ method)

  2. start by paddle.distributed.launch module, for example:

    python -m paddle.distributed.launch --gpus=0,1 demo.py .

And the content of demo.py is the code of examples.

Parameters
  • layers (Layer) – The module that should be executed by data parallel.

  • strategy (ParallelStrategy, optional) – (deprecated) The strategy of data parallelism, contains environment configuration related to parallel execution. Default: None.

  • comm_buffer_size (int, optional) – It limits the memory size(MB) of one buffer parameters’ gradient which is the input of communication calling(e.g NCCLAllReduce). Default: 25.

  • last_comm_buffer_size (float, optional) – It limits memory size(MB) of last buffer in communication calling. Making the last communication buffer size small is useful to improve performance. Default: 1.

Returns

The data paralleled module.

Return type

Layer

Examples

import paddle
import paddle.nn as nn
import paddle.optimizer as opt
import paddle.distributed as dist

class LinearNet(nn.Layer):
    def __init__(self):
        super(LinearNet, self).__init__()
        self._linear1 = nn.Linear(10, 10)
        self._linear2 = nn.Linear(10, 1)

    def forward(self, x):
        return self._linear2(self._linear1(x))

def train():
    # 1. initialize parallel environment
    dist.init_parallel_env()

    # 2. create data parallel layer & optimizer
    layer = LinearNet()
    dp_layer = paddle.DataParallel(layer)

    loss_fn = nn.MSELoss()
    adam = opt.Adam(
        learning_rate=0.001, parameters=dp_layer.parameters())

    # 3. run layer
    inputs = paddle.randn([10, 10], 'float32')
    outputs = dp_layer(inputs)
    labels = paddle.randn([10, 1], 'float32')
    loss = loss_fn(outputs, labels)

    loss.backward()

    adam.step()
    adam.clear_grad()

if __name__ == '__main__':
    # 1. start by ``paddle.distributed.spawn`` (default)
    dist.spawn(train, nprocs=2)
    # 2. start by ``paddle.distributed.launch``
    # train()
forward ( *inputs, **kwargs )

Defines the computation performed at every call. Should be overridden by all subclasses.

Parameters
  • *inputs (tuple) – unpacked tuple arguments

  • **kwargs (dict) – unpacked dict arguments

scale_loss ( loss ) [source]
Warning: API “paddle.fluid.dygraph.parallel.scale_loss” is deprecated since 2.0.0, and will be removed in future versions.

reason: This method does not need to be called anymore.

Deprecated method, now scale_loss is an empty method, keep this method just for compatibility.

apply_collective_grads ( )
Warning: API “paddle.fluid.dygraph.parallel.apply_collective_grads” is deprecated since 2.0.0, and will be removed in future versions.

reason: This method does not need to be called anymore.

Deprecated method, now apply_collective_grads is an empty method, keep this method just for compatibility.

state_dict ( destination=None, include_sublayers=True, structured_name_prefix='' )

Get all parameters and persistable buffers of current layer and its sub-layers. And set them into a dict

Parameters
  • destination (dict, optional) – If provide, all the parameters and persistable buffers will be set to this dict . Default: None

  • include_sublayers (bool, optional) – If true, also include the parameters and persistable buffers from sublayers. Default: True

Retruns:

dict: a dict contains all the parameters and persistable buffers.

Examples

import paddle
import paddle.distributed as dist

dist.init_parallel_env()

emb = fluid.dygraph.Embedding([10, 10])
emb = fluid.dygraph.DataParallel(emb)

state_dict = emb.state_dict()
paddle.save(state_dict, "paddle_dy.pdparams")
set_state_dict ( state_dict, include_sublayers=True, use_structured_name=True )

Set parameters and persistable buffers from state_dict. All the parameters and buffers will be reset by the tensor in the state_dict

Parameters
  • state_dict (dict) – Dict contains all the parameters and persistable buffers.

  • include_sublayers (bool, optional) – If true, also include the parameters and peresistable buffers from sublayers. Default: True

  • use_structured_name (bool, optional) – If true, use structured name as key, otherwise, use parameter or buffer name as key. Default: True

Returns

None

Examples

import paddle
import paddle.distributed as dist

dist.init_parallel_env()

emb = paddle.nn.Embedding(10, 10)
emb = fluid.dygraph.DataParallel(emb)

state_dict = emb.state_dict()
paddle.save(state_dict, "paddle_dy.pdparams")

para_state_dict = paddle.load("paddle_dy.pdparams")
emb.set_state_dict(para_state_dict)
set_dict ( state_dict, include_sublayers=True, use_structured_name=True )

Set parameters and persistable buffers from state_dict. All the parameters and buffers will be reset by the tensor in the state_dict

Parameters
  • state_dict (dict) – Dict contains all the parameters and persistable buffers.

  • include_sublayers (bool, optional) – If true, also include the parameters and peresistable buffers from sublayers. Default: True

  • use_structured_name (bool, optional) – If true, use structured name as key, otherwise, use parameter or buffer name as key. Default: True

Returns

None

Examples

import paddle
import paddle.distributed as dist

dist.init_parallel_env()

emb = paddle.nn.Embedding(10, 10)
emb = fluid.dygraph.DataParallel(emb)

state_dict = emb.state_dict()
paddle.save(state_dict, "paddle_dy.pdparams")

para_state_dict = paddle.load("paddle_dy.pdparams")
emb.set_state_dict(para_state_dict)
load_dict ( state_dict, include_sublayers=True, use_structured_name=True )

Set parameters and persistable buffers from state_dict. All the parameters and buffers will be reset by the tensor in the state_dict

Parameters
  • state_dict (dict) – Dict contains all the parameters and persistable buffers.

  • include_sublayers (bool, optional) – If true, also include the parameters and peresistable buffers from sublayers. Default: True

  • use_structured_name (bool, optional) – If true, use structured name as key, otherwise, use parameter or buffer name as key. Default: True

Returns

None

Examples

import paddle
import paddle.distributed as dist

dist.init_parallel_env()

emb = paddle.nn.Embedding(10, 10)
emb = fluid.dygraph.DataParallel(emb)

state_dict = emb.state_dict()
paddle.save(state_dict, "paddle_dy.pdparams")

para_state_dict = paddle.load("paddle_dy.pdparams")
emb.set_state_dict(para_state_dict)