torch.cuda.comm.scatter(tensor, devices=None, chunk_sizes=None, dim=0, streams=None, *, out=None)[source]

Scatters tensor across multiple GPUs.

  • tensor (Tensor) – tensor to scatter. Can be on CPU or GPU.

  • devices (Iterable[torch.device, str or int], optional) – an iterable of GPU devices, among which to scatter.

  • chunk_sizes (Iterable[int], optional) – sizes of chunks to be placed on each device. It should match devices in length and sums to tensor.size(dim). If not specified, tensor will be divided into equal chunks.

  • dim (int, optional) – A dimension along which to chunk tensor. Default: 0.

  • streams (Iterable[Stream], optional) – an iterable of Streams, among which to execute the scatter. If not specified, the default stream will be utilized.

  • out (Sequence[Tensor], optional, keyword-only) – the GPU tensors to store output results. Sizes of these tensors must match that of tensor, except for dim, where the total size must sum to tensor.size(dim).


Exactly one of devices and out must be specified. When out is specified, chunk_sizes must not be specified and will be inferred from sizes of out.


  • If devices is specified,

    a tuple containing chunks of tensor, placed on devices.

  • If out is specified,

    a tuple containing out tensors, each containing a chunk of tensor.


Access comprehensive developer documentation for PyTorch

View Docs


Get in-depth tutorials for beginners and advanced developers

View Tutorials


Find development resources and get your questions answered

View Resources