reduce¶
- paddle.distributed. reduce ( tensor, dst, op=0, group=None, use_calc_stream=True ) [source]
-
Reduce a tensor to the destination from all others. As shown below, 4 GPUs each start 4 processes and the data on each GPU is respresnted by the GPU number. The destination of the reduce operator is GPU0 and the process is sum. Through reduce operator, the GPU0 will owns the sum of all data from all GPUs.
- Parameters
-
tensor (Tensor) – The output Tensor for the destination and the input Tensor otherwise. Its data type should be float16, float32, float64, int32 or int64.
dst (int) – The destination rank id.
op (ReduceOp.SUM|ReduceOp.MAX|ReduceOp.Min|ReduceOp.PROD) – Optional. The operation used. Default value is ReduceOp.SUM.
group (Group) – The group instance return by new_group or None for global default group.
use_calc_stream (bool) – Wether to use calculation stream (True) or communication stream (False). Default to True.
- Returns
-
None.
Examples
# required: distributed import numpy as np import paddle from paddle.distributed import init_parallel_env paddle.set_device('gpu:%d'%paddle.distributed.ParallelEnv().dev_id) init_parallel_env() if paddle.distributed.ParallelEnv().local_rank == 0: np_data = np.array([[4, 5, 6], [4, 5, 6]]) else: np_data = np.array([[1, 2, 3], [1, 2, 3]]) data = paddle.to_tensor(np_data) paddle.distributed.reduce(data, 0) out = data.numpy() # [[5, 7, 9], [5, 7, 9]]