alltoall_single¶
- paddle.distributed. alltoall_single ( in_tensor, out_tensor, in_split_sizes=None, out_split_sizes=None, group=None, sync_op=True ) [source]
- 
         Scatter a single input tensor to all participators and gather the received tensors in out_tensor. Note alltoall_singleis only supported in eager mode.- Parameters
- 
           - in_tensor (Tensor) – Input tensor. The data type should be float16, float32, float64, int32, int64, int8, uint8, bool or bfloat16. 
- out_tensor (Tensor) – Output Tensor. The data type should be the same as the data type of the input Tensor. 
- in_split_sizes (list[int], optional) – Split sizes of - in_tensorfor dim[0]. If not given, dim[0] of- in_tensormust be divisible by group size and- in_tensorwill be scattered averagely to all participators. Default: None.
- out_split_sizes (list[int], optional) – Split sizes of - out_tensorfor dim[0]. If not given, dim[0] of- out_tensormust be divisible by group size and- out_tensorwill be gathered averagely from all participators. Default: None.
- group (Group, optional) – The group instance return by - new_groupor None for global default group. Default: None.
- sync_op (bool, optional) – Whether this op is a sync op. The default value is True. 
 
- Returns
- 
           None, if sync_opis set toTrue;Taskofgroup, ifsync_opis set toFalse.
 Examples # required: distributed import paddle import paddle.distributed as dist dist.init_parallel_env() rank = dist.get_rank() size = dist.get_world_size() # case 1 (2 GPUs) data = paddle.arange(2, dtype='int64') + rank * 2 # data for rank 0: [0, 1] # data for rank 1: [2, 3] output = paddle.empty([2], dtype='int64') dist.alltoall_single(data, output) print(output) # output for rank 0: [0, 2] # output for rank 1: [1, 3] # case 2 (2 GPUs) in_split_sizes = [i + 1 for i in range(size)] # in_split_sizes for rank 0: [1, 2] # in_split_sizes for rank 1: [1, 2] out_split_sizes = [rank + 1 for i in range(size)] # out_split_sizes for rank 0: [1, 1] # out_split_sizes for rank 1: [2, 2] data = paddle.ones([sum(in_split_sizes), size], dtype='float32') * rank # data for rank 0: [[0., 0.], [0., 0.], [0., 0.]] # data for rank 1: [[1., 1.], [1., 1.], [1., 1.]] output = paddle.empty([(rank + 1) * size, size], dtype='float32') group = dist.new_group([0, 1]) task = dist.alltoall_single(data, output, in_split_sizes, out_split_sizes, sync_op=False, group=group) task.wait() print(output) # output for rank 0: [[0., 0.], [1., 1.]] # output for rank 1: [[0., 0.], [0., 0.], [1., 1.], [1., 1.]] 
