.. only:: html
.. note::
:class: sphx-glr-download-link-note
Click :ref:`here ` to download the full example code
.. rst-class:: sphx-glr-example-title
.. _sphx_glr_examples_apps_compute_world_size_main.py:
Compute World Size Example
============================
This is a minimal "hello world" style example application that uses
PyTorch Distributed to compute the world size. It does not do ML training
but it does initialize process groups and performs a single collective operation (all_reduce)
which is enough to validate the infrastructure and scheduler setup.
As simple as this application is, the actual ``compute_world_size()`` function is
split into a separate submodule (``.module.util.compute_world_size``) to double
as a E2E test for workspace patching logic, which typically diff-patches a full project
directory rather than a single file. This application also uses `Hydra `_
configs as an expository example of how to use Hydra configs in an application that launches with TorchX.
Run it with the ``dist.ddp`` builtin component to use as a validation application
to ensure that the stack has been setup properly for more serious distributed training jobs.
.. code-block:: default
import hydra
from omegaconf import DictConfig, OmegaConf
from torch.distributed.elastic.multiprocessing.errors import record
from torchx.examples.apps.compute_world_size.module.util import compute_world_size
@record
def run(cfg: DictConfig) -> None:
print(OmegaConf.to_yaml(cfg))
if cfg.main.throws:
raise RuntimeError(f"raising error because cfg.main.throws={cfg.main.throws}")
compute_world_size(cfg)
if __name__ == "__main__":
# use compose API to make this compatible with ipython notebooks
# need to initialize the config directory as a module to make it
# not depends on rel path (PWD) or abs path (torchx install dir)
# see: https://hydra.cc/docs/advanced/jupyter_notebooks/
with hydra.initialize_config_module(
config_module="torchx.examples.apps.compute_world_size.config"
):
cfg: DictConfig = hydra.compose(config_name="defaults")
run(cfg)
.. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 0 minutes 0.000 seconds)
.. _sphx_glr_download_examples_apps_compute_world_size_main.py:
.. only :: html
.. container:: sphx-glr-footer
:class: sphx-glr-footer-example
.. container:: sphx-glr-download sphx-glr-download-python
:download:`Download Python source code: main.py `
.. container:: sphx-glr-download sphx-glr-download-jupyter
:download:`Download Jupyter notebook: main.ipynb `
.. only:: html
.. rst-class:: sphx-glr-signature
`Gallery generated by Sphinx-Gallery `_