Parallel

class ignite.distributed.launcher.Parallel(backend=None, nproc_per_node=None, nnodes=None, node_rank=None, master_addr=None, master_port=None, init_method=None, **spawn_kwargs)[source]

Distributed launcher context manager to simplify distributed configuration setup for multiple backends:

backends from native torch distributed configuration: “nccl”, “gloo” and “mpi” (if available)
XLA on TPUs via pytorch/xla (if installed)
using Horovod distributed framework (if installed)

Namely, it can:

1) Spawn nproc_per_node child processes and initialize a processing group according to provided backend (useful for standalone scripts).

2) Only initialize a processing group given the backend (useful with tools like torchrun, horovodrun, etc).

Parameters

backend (Optional[str]) – backend to use: nccl, gloo, xla-tpu, horovod. If None, no distributed configuration.
nproc_per_node (Optional[int]) – optional argument, number of processes per node to specify. If not None, run() will spawn nproc_per_node processes that run input function with its arguments.
nnodes (Optional[int]) – optional argument, number of nodes participating in distributed configuration. If not None, run() will spawn nproc_per_node processes that run input function with its arguments. Total world size is nproc_per_node * nnodes. This option is only supported by native torch distributed module. For other modules, please setup spawn_kwargs with backend specific arguments.
node_rank (Optional[int]) – optional argument, current machine index. Mandatory argument if nnodes is specified and larger than one. This option is only supported by native torch distributed module. For other modules, please setup spawn_kwargs with backend specific arguments.
master_addr (Optional[str]) – optional argument, master node TCP/IP address for torch native backends (nccl, gloo). Mandatory argument if nnodes is specified and larger than one.
master_port (Optional[int]) – optional argument, master node port for torch native backends (nccl, gloo). Mandatory argument if master_addr is specified.
init_method (Optional[str]) – optional argument to specify processing group initialization method for torch native backends (nccl, gloo). Default, “env://”. See more info: dist.init_process_group.
spawn_kwargs (Any) – kwargs to idist.spawn function.

Examples

1) Single node or Multi-node, Multi-GPU training launched with torchrun or horovodrun tools

Single node option with 4 GPUs

torchrun --nproc_per_node=4 main.py
# or if installed horovod
horovodrun -np=4 python main.py

Multi-node option : 2 nodes with 8 GPUs each

## node 0
torchrun --nnodes=2 --node_rank=0 --master_addr=master --master_port=3344             --nproc_per_node=8 main.py

# or if installed horovod
horovodrun -np 16 -H hostname1:8,hostname2:8 python main.py

## node 1
torchrun --nnodes=2 --node_rank=1 --master_addr=master --master_port=3344             --nproc_per_node=8 main.py

User code is the same for both options:

# main.py

import ignite.distributed as idist

def training(local_rank, config, **kwargs):
    # ...
    print(idist.get_rank(), ": run with config:", config, "- backend=", idist.backend())
    # ...

backend = "nccl"  # or "horovod" if package is installed

config = {"key": "value"}

with idist.Parallel(backend=backend) as parallel:
    parallel.run(training, config, a=1, b=2)

Single node, Multi-GPU training launched with python

python main.py

# main.py

import ignite.distributed as idist

def training(local_rank, config, **kwargs):
    # ...
    print(idist.get_rank(), ": run with config:", config, "- backend=", idist.backend())
    # ...

backend = "nccl"  # or "horovod" if package is installed

# no "init_method" was specified , "env://" will be used
with idist.Parallel(backend=backend, nproc_per_node=4) as parallel:
    parallel.run(training, config, a=1, b=2)

Initializing the process using file://

with idist.Parallel(backend=backend, init_method='file:///d:/tmp/some_file', nproc_per_node=4) as parallel:
    parallel.run(training, config, a=1, b=2)

Initializing the process using tcp://

with idist.Parallel(backend=backend, init_method='tcp://10.1.1.20:23456', nproc_per_node=4) as parallel:
    parallel.run(training, config, a=1, b=2)

Single node, Multi-TPU training launched with python

python main.py

# main.py

import ignite.distributed as idist

def training(local_rank, config, **kwargs):
    # ...
    print(idist.get_rank(), ": run with config:", config, "- backend=", idist.backend())
    # ...

config = {"key": "value"}

with idist.Parallel(backend="xla-tpu", nproc_per_node=8) as parallel:
    parallel.run(training, config, a=1, b=2)

Multi-node, Multi-GPU training launched with python. For example, 2 nodes with 8 GPUs:

Using torch native distributed framework:

# node 0
python main.py --node_rank=0

# node 1
python main.py --node_rank=1

# main.py

import ignite.distributed as idist

def training(local_rank, config, **kwargs):
    # ...
    print(idist.get_rank(), ": run with config:", config, "- backend=", idist.backend())
    # ...

dist_config = {
    "nproc_per_node": 8,
    "nnodes": 2,
    "node_rank": args.node_rank,
    "master_addr": "master",
    "master_port": 15000
}

config = {"key": "value"}

with idist.Parallel(backend="nccl", **dist_config) as parallel:
    parallel.run(training, config, a=1, b=2)

Changed in version 0.4.2: backend now accepts horovod distributed framework.

Changed in version 0.4.5: init_method added.

Methods

run

Execute func with provided arguments in distributed context.

run(func, *args, **kwargs)[source]

Execute func with provided arguments in distributed context.

Parameters

func (Callable) – function to execute. First argument of the function should be local_rank - local process index.
args (Any) – positional arguments of func (without local_rank).
kwargs (Any) – keyword arguments of func.

Return type

None

Examples

def training(local_rank, config, **kwargs):
    # ...
    print(idist.get_rank(), ": run with config:", config, "- backend=", idist.backend())
    # ...

config = {"key": "value"}

with idist.Parallel(backend=backend) as parallel:
    parallel.run(training, config, a=1, b=2)