reduce_scatter¶
- paddle.distributed. reduce_scatter ( tensor, tensor_list, op=0, group=None, sync_op=True ) [source]
- 
         Reduces, then scatters a list of tensors to all processes in a group - Parameters
- 
           - tensor (Tensor) – Output tensor. Its data type should be float16, float32, float64, int32, int64, int8, uint8, bool or bfloat16. 
- tensor_list (list[Tensor]) – List of tensors to reduce and scatter. Every element in the list must be a Tensor whose data type should be float16, float32, float64, int32, int64, int8, uint8, bool or bfloat16. 
- op (ReduceOp.SUM|ReduceOp.MAX|ReduceOp.MIN|ReduceOp.PROD) – Optional. The operation used. Default: ReduceOp.SUM. 
- group (Group, optional) – The group instance return by new_group or None for global default group. Default: None. 
- sync_op (bool, optional) – Whether this op is a sync op. The default value is True. 
 
- Returns
- 
           Async task handle, if sync_op is set to False. None, if sync_op or if not part of the group. 
 Warning This API only supports the dygraph mode. Examples # required: distributed import paddle import paddle.distributed as dist dist.init_parallel_env() if dist.get_rank() == 0: data1 = paddle.to_tensor([0, 1]) data2 = paddle.to_tensor([2, 3]) else: data1 = paddle.to_tensor([4, 5]) data2 = paddle.to_tensor([6, 7]) dist.reduce_scatter(data1, [data1, data2]) print(data1) # [4, 6] (2 GPUs, out for rank 0) # [8, 10] (2 GPUs, out for rank 1) 
