reduce

paddle.distributed. reduce ( tensor, dst, op=0, group=None, use_calc_stream=True ) [source]

Reduce a tensor to the destination from all others. As shown below, 4 GPUs each start 4 processes and the data on each GPU is respresnted by the GPU number. The destination of the reduce operator is GPU0 and the process is sum. Through reduce operator, the GPU0 will owns the sum of all data from all GPUs.

reduce
Parameters
  • tensor (Tensor) – The output Tensor for the destination and the input Tensor otherwise. Its data type should be float16, float32, float64, int32 or int64.

  • dst (int) – The destination rank id.

  • op (ReduceOp.SUM|ReduceOp.MAX|ReduceOp.Min|ReduceOp.PROD) – Optional. The operation used. Default value is ReduceOp.SUM.

  • group (Group) – The group instance return by new_group or None for global default group.

  • use_calc_stream (bool) – Wether to use calculation stream (True) or communication stream (False). Default to True.

Returns

None.

Examples

# required: distributed
import numpy as np
import paddle
from paddle.distributed import init_parallel_env

paddle.set_device('gpu:%d'%paddle.distributed.ParallelEnv().dev_id)
init_parallel_env()
if paddle.distributed.ParallelEnv().local_rank == 0:
    np_data = np.array([[4, 5, 6], [4, 5, 6]])
else:
    np_data = np.array([[1, 2, 3], [1, 2, 3]])
data = paddle.to_tensor(np_data)
paddle.distributed.reduce(data, 0)
out = data.numpy()
# [[5, 7, 9], [5, 7, 9]]