Compute World Size Example

This is a minimal “hello world” style example application that uses PyTorch Distributed to compute the world size. It does not do ML training but it does initialize process groups and performs a single collective operation (all_reduce) which is enough to validate the infrastructure and scheduler setup.

As simple as this application is, the actual compute_world_size() function is split into a separate submodule (.module.util.compute_world_size) to double as a E2E test for workspace patching logic, which typically diff-patches a full project directory rather than a single file. This application also uses Hydra configs as an expository example of how to use Hydra configs in an application that launches with TorchX.

Run it with the dist.ddp builtin component to use as a validation application to ensure that the stack has been setup properly for more serious distributed training jobs.

import hydra
from omegaconf import DictConfig, OmegaConf
from torch.distributed.elastic.multiprocessing.errors import record
from torchx.examples.apps.compute_world_size.module.util import compute_world_size

def run(cfg: DictConfig) -> None:

    if cfg.main.throws:
        raise RuntimeError(f"raising error because cfg.main.throws={cfg.main.throws}")

if __name__ == "__main__":
    # use compose API to make this compatible with ipython notebooks
    # need to initialize the config directory as a module to make it
    # not depends on rel path (PWD) or abs path (torchx install dir)
    # see:
    with hydra.initialize_config_module(
        cfg: DictConfig = hydra.compose(config_name="defaults")

Total running time of the script: ( 0 minutes 0.000 seconds)

Gallery generated by Sphinx-Gallery


Access comprehensive developer documentation for PyTorch

View Docs


Get in-depth tutorials for beginners and advanced developers

View Tutorials


Find development resources and get your questions answered

View Resources