This contains the example applications that demonstrates how to use TorchX for various styles of applications (e.g. single node, distributed, etc). These apps can be launched by themselves or part of a pipeline. It is important to note that TorchX’s job is to launch the apps. You’ll notice that the apps are implemented without any TorchX imports.
See the Pipelines Examples for how to use the components in a pipeline.
Before executing examples, install TorchX and dependencies necessary to run examples:
$ pip install torchx
$ git clone https://github.com/pytorch/torchx.git
$ cd torchx/examples/apps
$ TORCHX_VERSION=$(torchx --version | sed 's/torchx-//')
$ git checkout v$TORCHX_VERSION
$ pip install -r dev-requirements.txt
Compute World Size Example¶
This is a minimal “hello world” style example application that uses
PyTorch Distributed to compute the world size. It is a minimal example
in that it initializes the
torch.distributed process group and
performs a single collective operation (all_reduce) which is enough to
validate the infrastructure and scheduler setup.
This example is compatible with the
dist.ddp. To run from CLI:
$ cd $torchx-git-repo-root/torchx/examples/apps $ torchx run dist.ddp --script compute_world_size/main.py -j 1x2
Data Preprocessing Example¶
This is a simple TorchX app that downloads some data via HTTP, normalizes the images via torchvision and then reuploads it via fsspec.
This examples has two Python files: the app which actually does the preprocessing and the component definition which can be used with TorchX to launch the app.
Lightning Trainer Example¶
This example consists of model training and interpretability apps that uses PyTorch Lightning. The apps have shared logic so are split across several files.
The trainer and interpret apps do not have any TorchX-isms and are simply torchvision and Captum applications. TorchX helps you run these applications on various schedulers and localhost. The trainer app is a distributed data parallel style application and is launched with the dist.ddp built-in. The interpret app is a single node application and is launched as a regular python process with the utils.python built-in.
For instructions on how to run these apps with TorchX refer to the documentations in their respective main modules: train.py and interpret.py.