Compute World Size Example¶

This is a minimal “hello world” style example application that uses PyTorch Distributed to compute the world size. It does not do ML training but it does initialize process groups and performs a single collective operation (all_reduce) which is enough to validate the infrastructure and scheduler setup.

As simple as this application is, the actual compute_world_size() function is split into a separate submodule (.module.util.compute_world_size) to double as a E2E test for workspace patching logic, which typically diff-patches a full project directory rather than a single file. This application also uses Hydra configs as an expository example of how to use Hydra configs in an application that launches with TorchX.

Run it with the dist.ddp builtin component to use as a validation application to ensure that the stack has been setup properly for more serious distributed training jobs.

import hydra
from omegaconf import DictConfig, OmegaConf
from torch.distributed.elastic.multiprocessing.errors import record
from torchx.examples.apps.compute_world_size.module.util import compute_world_size


@record
def run(cfg: DictConfig) -> None:
    print(OmegaConf.to_yaml(cfg))

    if cfg.main.throws:
        raise RuntimeError(f"raising error because cfg.main.throws={cfg.main.throws}")
    compute_world_size(cfg)


if __name__ == "__main__":
    # use compose API to make this compatible with ipython notebooks
    # need to initialize the config directory as a module to make it
    # not depends on rel path (PWD) or abs path (torchx install dir)
    # see: https://hydra.cc/docs/advanced/jupyter_notebooks/
    with hydra.initialize_config_module(
        config_module="torchx.examples.apps.compute_world_size.config"
    ):
        cfg: DictConfig = hydra.compose(config_name="defaults")
        run(cfg)

Total running time of the script: ( 0 minutes 0.000 seconds)

Download Python source code: main.py

Download Jupyter notebook: main.ipynb

Gallery generated by Sphinx-Gallery

Compute World Size Example¶

Docs

Tutorials

Resources