spawn¶

paddle.distributed. spawn ( func, args=(), nprocs=- 1, join=True, daemon=False, **options ) [source]

Start multiple processes with spawn method for parallel training.

Note

spawn now only supports GPU or XPU collective mode. The collective mode of GPU and XPU cannot be started at the same time, so the option gpus and xpus cannot be configured at the same time.

Parameters

func (function) – The target function is called by spawned process. This function need to be able to pickled, so it must be defined at the top level of a module.
args (list|tuple, optional) – Arguments passed to func.
nprocs (int, optional) – Number of processed to start. Default: -1. when nprocs is -1, the available device will be obtained from the environment variable when the model is executed: If use GPU, the currently available device ID is obtained from the environment variable CUDA_VISIBLE_DEVICES; If use XPU, the currently available device ID is obtained from the environment variable XPU_VISIBLE_DEVICES.
join (bool, optional) – Perform a blocking join on all spawned processes. Default: True.
daemon (bool, optional) – The spawned processes’ daemon flag. Default: False.
**options (dict, optional) – Other initial parallel execution environment configuration options. The following options are currently supported: (1) start_method (string): the way to start a process. The start method can be spawn , fork , forkserver . Because the CUDA runtime does not support the fork start method, when use CUDA in subprocesses, we should start process by spawn or forkserver method. Default: “spawn” ; (2) gpus (string): The training process will run on the selected gpus, such as “0,1,2,3”. Default: None; (3) xpus (string): The training process will run on the selected xpus, such as “0,1,2,3”. Default: None; (5) ips (string): Paddle cluster nodes ips, such as “192.168.0.16,192.168.0.17”. Default: “127.0.0.1” .

Returns

MultiprocessContext object, it hold the spawned processes.

Examples

           >>> 
>>> import paddle
>>> import paddle.nn as nn
>>> import paddle.optimizer as opt
>>> import paddle.distributed as dist

>>> class LinearNet(nn.Layer):
...     def __init__(self):
...         super().__init__()
...         self._linear1 = nn.Linear(10, 10)
...         self._linear2 = nn.Linear(10, 1)
...     def forward(self, x):
...         return self._linear2(self._linear1(x))

>>> def train(print_result=False):
...     # 1. initialize parallel environment
...     group = dist.init_parallel_env()
...     process_group = group.process_group if group else None
...     # 2. create data parallel layer & optimizer
...     layer = LinearNet()
...     dp_layer = paddle.DataParallel(layer, group = process_group)
...     loss_fn = nn.MSELoss()
...     adam = opt.Adam(
...         learning_rate=0.001, parameters=dp_layer.parameters())
...     # 3. run layer
...     inputs = paddle.randn([10, 10], 'float32')
...     outputs = dp_layer(inputs)
...     labels = paddle.randn([10, 1], 'float32')
...     loss = loss_fn(outputs, labels)
...     if print_result is True:
...         print("loss:", loss.numpy())
...     loss.backward()
...     adam.step()
...     adam.clear_grad()

>>> # Usage 1: only pass function.
>>> # If your training method no need any argument, and
>>> # use all visible devices for parallel training.
>>> if __name__ == '__main__':
...     dist.spawn(train)

>>> # Usage 2: pass function and arguments.
>>> # If your training method need some arguments, and
>>> # use all visible devices for parallel training.
>>> if __name__ == '__main__':
...     dist.spawn(train, args=(True,))

>>> # Usage 3: pass function, arguments and nprocs.
>>> # If your training method need some arguments, and
>>> # only use part of visible devices for parallel training.
>>> # If your machine hold 8 cards {0,1,2,3,4,5,6,7},
>>> # this case will use cards {0,1}; If you set
>>> # CUDA_VISIBLE_DEVICES=4,5,6,7, this case will use
>>> # cards {4,5}
>>> if __name__ == '__main__':
...     dist.spawn(train, args=(True,), nprocs=2)

>>> # Usage 4: pass function, arguments, nprocs and gpus.
>>> # If your training method need some arguments, and
>>> # only use part of visible devices for parallel training,
>>> # but you can't set your machine's environment variable
>>> # CUDA_VISIBLE_DEVICES, such as it is None or all cards
>>> # {0,1,2,3,4,5,6,7}, you can pass `gpus` to
>>> # select the GPU cards you want to use. For example,
>>> # this case will use cards {4,5} if your machine hold 8 cards.
>>> if __name__ == '__main__':
...     dist.spawn(train, args=(True,), nprocs=2, gpus='4,5')