gather

paddle.distributed.communication.stream. gather ( tensor, gather_list=None, dst=0, group=None, sync_op=True, use_calc_stream=False ) [source]

Gather tensors from all participators.

Parameters
  • tensor (Tensor) – The input Tensor. Its data type should be float16, float32, float64, int32, int64, int8, uint8, bool or bfloat16.

  • gather_list (list) – A list of Tensors to hold the gathered tensors. Every element in the list must be a Tensor whose data type should be float16, float32, float64, int32, int64, int8, uint8, bool or bfloat16. Default value is None.

  • dst (int) – The dst rank id. Default value is 0.

  • group (Group, optional) – The group instance return by new_group or None for global default group.

  • sync_op (bool, optional) – Whether this op is a sync op. The default value is True.

  • use_calc_stream (bool, optional) – Indicate whether the communication is done on calculation stream. If none is given, use false as default. This option is designed for high performance demand, be careful to turn it on except you are clearly know its meaning.

Returns

Async work handle,which can be wait on, if async_op is set to True. None, if not async_op

Examples

>>> 
>>> import paddle
>>> import paddle.distributed as dist

>>> dist.init_parallel_env()
>>> gather_list = []
>>> if dist.get_rank() == 0:
...     data = paddle.to_tensor([1, 2, 3])
...     dist.stream.gather(data, gather_list, dst=0)
>>> else:
...     data = paddle.to_tensor([4, 5, 6])
...     dist.stream.gather(data1, gather_list, dst=0)
>>> print(gather_list)
>>> # [[1, 2, 3], [4, 5, 6]] (2 GPUs, out for rank 0)
>>> # [] (2 GPUs, out for rank 1)