alltoall_single¶

paddle.distributed. alltoall_single ( in_tensor, out_tensor, in_split_sizes=None, out_split_sizes=None, group=None, sync_op=True ) [source]

Scatter a single input tensor to all participators and gather the received tensors in out_tensor.

Note

alltoall_single is only supported in eager mode.

Parameters

in_tensor (Tensor) – Input tensor. The data type should be float16, float32, float64, int32, int64, int8, uint8, bool or bfloat16.
out_tensor (Tensor) – Output Tensor. The data type should be the same as the data type of the input Tensor.
in_split_sizes (list[int], optional) – Split sizes of in_tensor for dim[0]. If not given, dim[0] of in_tensor must be divisible by group size and in_tensor will be scattered averagely to all participators. Default: None.
out_split_sizes (list[int], optional) – Split sizes of out_tensor for dim[0]. If not given, dim[0] of out_tensor must be divisible by group size and out_tensor will be gathered averagely from all participators. Default: None.
group (Group, optional) – The group instance return by new_group or None for global default group. Default: None.
sync_op (bool, optional) – Whether this op is a sync op. The default value is True.

Returns

None, if sync_op is set to True; Task of group, if sync_op is set to False.

Examples

           # required: distributed
import paddle
import paddle.distributed as dist

dist.init_parallel_env()
rank = dist.get_rank()
size = dist.get_world_size()

# case 1 (2 GPUs)
data = paddle.arange(2, dtype='int64') + rank * 2
# data for rank 0: [0, 1]
# data for rank 1: [2, 3]
output = paddle.empty([2], dtype='int64')
dist.alltoall_single(data, output)
print(output)
# output for rank 0: [0, 2]
# output for rank 1: [1, 3]

# case 2 (2 GPUs)
in_split_sizes = [i + 1 for i in range(size)]
# in_split_sizes for rank 0: [1, 2]
# in_split_sizes for rank 1: [1, 2]
out_split_sizes = [rank + 1 for i in range(size)]
# out_split_sizes for rank 0: [1, 1]
# out_split_sizes for rank 1: [2, 2]
data = paddle.ones([sum(in_split_sizes), size], dtype='float32') * rank
# data for rank 0: [[0., 0.], [0., 0.], [0., 0.]]
# data for rank 1: [[1., 1.], [1., 1.], [1., 1.]]
output = paddle.empty([(rank + 1) * size, size], dtype='float32')
group = dist.new_group([0, 1])
task = dist.alltoall_single(data,
                            output,
                            in_split_sizes,
                            out_split_sizes,
                            sync_op=False,
                            group=group)
task.wait()
print(output)
# output for rank 0: [[0., 0.], [1., 1.]]
# output for rank 1: [[0., 0.], [0., 0.], [1., 1.], [1., 1.]]