Shortcuts

get_distributed_backend

torchtune.training.get_distributed_backend(device_type: str, offload_ops_to_cpu: bool = False) str[source]

Gets the PyTorch Distributed backend based on device type.

Parameters:
  • device_type (str) – Device type to get backend for.

  • offload_ops_to_cpu (bool, optional) – Flag to check if any operations should be offloaded to CPU. Examples of these kinds of operations are CPU offload for FSDP and asynchronous save for distributed checkpointing. Defaults to False.

Example

>>> get_distributed_backend("cuda")
'nccl'
>>> get_distributed_backend("cpu")
'gloo'
>>> get_distributed_backend("cuda", offload_ops_to_cpu=True)
'cuda:nccl,cpu:gloo'
Returns:

Distributed backend for use in torch.distributed.init_process_group.

Return type:

str

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources