Shortcuts

Utils

This contains TorchX utility components that are ready-to-use out of the box. These are components that simply execute well known binaries (e.g. cp) and are meant to be used as tutorial materials or glue operations between meaningful stages in a workflow.

torchx.components.utils.echo(msg: str = 'hello world', image: str = 'ghcr.io/pytorch/torchx:0.8.0dev0', num_replicas: int = 1) AppDef[source]

Echos a message to stdout (calls echo)

Parameters:
  • msg – message to echo

  • image – image to use

  • num_replicas – number of replicas to run

torchx.components.utils.touch(file: str, image: str = 'ghcr.io/pytorch/torchx:0.8.0dev0') AppDef[source]

Touches a file (calls touch)

Parameters:
  • file – file to create

  • image – the image to use

torchx.components.utils.sh(*args: str, image: str = 'ghcr.io/pytorch/torchx:0.8.0dev0', num_replicas: int = 1, cpu: int = 1, gpu: int = 0, memMB: int = 1024, h: Optional[str] = None, env: Optional[Dict[str, str]] = None, max_retries: int = 0, mounts: Optional[List[str]] = None) AppDef[source]

Runs the provided command via sh. Currently sh does not support environment variable substitution.

Parameters:
  • args – bash arguments

  • image – image to use

  • num_replicas – number of replicas to run

  • cpu – number of cpus per replica

  • gpu – number of gpus per replica

  • memMB – cpu memory in MB per replica

  • h – a registered named resource (if specified takes precedence over cpu, gpu, memMB)

  • env – environment varibles to be passed to the run (e.g. ENV1=v1,ENV2=v2,ENV3=v3)

  • max_retries – the number of scheduler retries allowed

  • mounts – mounts to mount into the worker environment/container (ex. type=<bind/volume>,src=/host,dst=/job[,readonly]). See scheduler documentation for more info.

torchx.components.utils.copy(src: str, dst: str, image: str = 'ghcr.io/pytorch/torchx:0.8.0dev0') AppDef[source]

copy copies the file from src to dst. src and dst can be any valid fsspec url.

This does not support recursive copies or directories.

Parameters:
  • src – the source fsspec file location

  • dst – the destination fsspec file location

  • image – the image that contains the copy app

torchx.components.utils.python(*args: str, m: Optional[str] = None, c: Optional[str] = None, script: Optional[str] = None, image: str = 'ghcr.io/pytorch/torchx:0.8.0dev0', name: str = 'torchx_utils_python', cpu: int = 1, gpu: int = 0, memMB: int = 1024, h: Optional[str] = None, num_replicas: int = 1) AppDef[source]

Runs python with the specified module, command or script on the specified image and host. Use -- to separate component args and program args (e.g. torchx run utils.python --m foo.main -- --args to --main)

Note: (cpu, gpu, memMB) parameters are mutually exclusive with h (named resource) where

h takes precedence if specified for setting resource requirements. See registering named resources.

Parameters:
  • args – arguments passed to the program in sys.argv[1:] (ignored with –c)

  • m – run library module as a script

  • c – program passed as string (may error if scheduler has a length limit on args)

  • script – .py script to run

  • image – image to run on

  • name – name of the job

  • cpu – number of cpus per replica

  • gpu – number of gpus per replica

  • memMB – cpu memory in MB per replica

  • h – a registered named resource (if specified takes precedence over cpu, gpu, memMB)

  • num_replicas – number of copies to run (each on its own container)

Returns:

Usage is very similar to just regular python, except that this supports remote launches. Example:

# locally (cmd)
$ torchx run utils.python --image $FBPKG -c "import torch; print(torch.__version__)"

# locally (module)
$ torchx run utils.python --image $FBPKG -m foo.bar.main

# remote (cmd)
$ torchx run -s mast utils.python --image $FBPKG -c "import torch; print(torch.__version__)"

# remote (module)
$ torchx run -s mast utils.python --image $FBPKG -m foo.bar.main

Notes:

  • torchx run patches the current working directory (CWD) on top of $FBPKG for faster remote iteration.

  • Patch contents will contain all changes to local fbcode however, the patch building is only triggered if CWD is a subdirectory of fbcode. If you are running from the root of fbcode (e.g. ~/fbsource/fbcode) your job will NOT be patched!

  • Be careful not to abuse -c CMD. Schedulers have a length limit on the arguments, hence don’t try to pass long CMDs, use it sparingly.

  • In -m MODULE, the module needs to be rooted off of fbcode. Example: for ~/fbsource/fbcode/foo/bar/main.py the module is -m foo.bar.main.

  • DO NOT override base_module in python_library buck rule. If you do, you are on your own, patching won’t work.

Inline Script in Component

Note

IMPORTANT: DO NOT ABUSE THIS FEATURE! This use be used sparringly and not abused! We reserve the right to remove this feature in the future.

A nice side effect of how TorchX and penv python is built is that you can do pretty much anything that you would normally do with python, with the added benefit that it auto patches your working directory and gives you the ability to run locally and remotely. This means that python -c CMD will also work. Here’s an example illustrating this

$ cd ~/fbsource/fbcode/torchx/examples/apps

$ ls
component.py  config  main.py  module  README.md  TARGETS

# lets try getting the version of torch from a prebuilt fbpkg or bento kernel
$ torchx run utils.python --image bento_kernel_pytorch_lightning -c "import torch; print(torch.__version__)"
torchx 2021-10-27 11:27:28 INFO     loaded configs from /data/users/kiuk/fbsource/fbcode/torchx/fb/example/.torchxconfig
2021-10-27 11:27:44,633 fbpkg.fetch INFO: completed download of bento_kernel_pytorch_lightning:405
2021-10-27 11:27:44,634 fbpkg.fetch INFO: extracted bento_kernel_pytorch_lightning:405 to bento_kernel_pytorch_lightning
2021-10-27 11:27:48,591 fbpkg.util WARNING: removing old version /home/kiuk/.torchx/fbpkg/bento_kernel_pytorch_lightning/403
All packages downloaded successfully
local_penv://torchx/torchx_utils_python_6effc4e2
torchx 2021-10-27 11:27:49 INFO     Waiting for the app to finish...
1.11.0a0+fb
torchx 2021-10-27 11:27:58 INFO     Job finished: SUCCEEDED
Now for a more interesting example, lets run a dumb all reduce of a 1-d tensor on 1 worker:
$ torchx run utils.python --image torchx_fb_example \
-c "import torch; import torch.distributed as dist; dist.init_process_group(backend='gloo', init_method='tcp://localhost:29500', rank=0, world_size=1); t=torch.tensor(1); dist.all_reduce(t); print(f'all reduce result: {t.item()}')"

torchx 2021-10-27 10:23:05 INFO     loaded configs from /data/users/kiuk/fbsource/fbcode/torchx/fb/example/.torchxconfig
2021-10-27 10:23:09,339 fbpkg.fetch INFO: checksums verified: torchx_fb_example:11
All packages verified
local_penv://torchx/torchx_utils_python_08a41456
torchx 2021-10-27 10:23:09 INFO     Waiting for the app to finish...
all reduce result: 1
torchx 2021-10-27 10:23:13 INFO     Job finished: SUCCEEDED
WARNING: Long inlined scripts won't work since schedulers typically have a character limit on the length of each argument.
torchx.components.utils.booth(x1: float, x2: float, trial_idx: int = 0, tracker_base: str = '/tmp/torchx-util-booth', image: str = 'ghcr.io/pytorch/torchx:0.8.0dev0') AppDef[source]

Evaluates the booth function, f(x1, x2) = (x1 + 2*x2 - 7)^2 + (2*x1 + x2 - 5)^2. Output result is accessible via FsspecResultTracker(outdir)[trial_idx]

Parameters:
  • x1 – x1

  • x2 – x2

  • trial_idx – ignore if not running hpo

  • tracker_base – URI of the tracker’s base output directory (e.g. s3://foo/bar)

  • image – the image that contains the booth app

torchx.components.utils.binary(*args: str, entrypoint: str, name: str = 'torchx_utils_binary', num_replicas: int = 1, cpu: int = 1, gpu: int = 0, memMB: int = 1024, h: Optional[str] = None) AppDef[source]

Test component

Parameters:
  • args – arguments passed to the program in sys.argv[1:] (ignored with –c)

  • name – name of the job

  • num_replicas – number of copies to run (each on its own container)

  • cpu – number of cpus per replica

  • gpu – number of gpus per replica

  • memMB – cpu memory in MB per replica

  • h – a registered named resource (if specified takes precedence over cpu, gpu, memMB)

Returns:

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources