alltoall

paddle.distributed. alltoall ( in_tensor_list, out_tensor_list, group=None, use_calc_stream=True ) [source]

Scatter tensors in in_tensor_list to all participators averagely and gather the result tensors in out_tensor_list. As shown below, the in_tensor_list in GPU0 includes 0_0 and 0_1, and GPU1 includes 1_0 and 1_1. Through alltoall operator, the 0_0 in GPU0 will be sent to GPU0 and 0_1 to GPU1, 1_0 in GPU1 sent to GPU0 and 1_1 to GPU1. Finally the out_tensor_list in GPU0 includes 0_0 and 1_0, and GPU1 includes 0_1 and 1_1.

alltoall
Parameters
  • in_tensor_list (list) – A list of input Tensors. Every element in the list must be a Tensor whose data type should be float16, float32, float64, int32 or int64.

  • out_tensor_list (Tensor) – A list of output Tensors. The data type of its elements should be the same as the data type of the input Tensors.

  • group (Group, optional) – The group instance return by new_group or None for global default group. Default: None.

  • use_calc_stream (bool, optional) – Wether to use calculation stream (True) or communication stream. Default: True.

Returns

None.

Examples

# required: distributed
import numpy as np
import paddle
from paddle.distributed import init_parallel_env

init_parallel_env()
out_tensor_list = []
if paddle.distributed.ParallelEnv().rank == 0:
    np_data1 = np.array([[1, 2, 3], [4, 5, 6]])
    np_data2 = np.array([[7, 8, 9], [10, 11, 12]])
else:
    np_data1 = np.array([[13, 14, 15], [16, 17, 18]])
    np_data2 = np.array([[19, 20, 21], [22, 23, 24]])
data1 = paddle.to_tensor(np_data1)
data2 = paddle.to_tensor(np_data2)
paddle.distributed.alltoall([data1, data2], out_tensor_list)
# out for rank 0: [[[1, 2, 3], [4, 5, 6]], [[13, 14, 15], [16, 17, 18]]]
# out for rank 1: [[[7, 8, 9], [10, 11, 12]], [[19, 20, 21], [22, 23, 24]]]