Shortcuts

torchx.tracker

Overview & Usage

Note

PROTOTYPE, USE AT YOUR OWN RISK, APIs SUBJECT TO CHANGE

Practitioners running ML jobs often need to track information such as:

  • Job inputs:
    • configuration
      • model configuration

      • HPO parameters

    • data
      • version

      • sources

  • Job results:
    • metrics

    • model location

  • Conceptual job groupings

AppRun provides a uniform interface as an experiment and artifact tracking solution that supports wrapping pluggable tracking implementations by providing TrackerBase adapter implementation.

Example usage

Sample code using tracker API.

Tracker Setup

To enable tracking it requires:

  1. Defining tracker backends (entrypoints and configuration) on launcher side using .torchxconfig

  2. Adding entrypoints within a user job using entry_points (specification)

1. Launcher side configuration

User can define any number of tracker backends under torchx:tracker section in .torchxconfig, where:
  • Key: is an arbitrary name for the tracker, where the name will be used to configure its properties

    under [tracker:<TRACKER_NAME>]

  • Value: is entrypoint/factory method that must be available within user job. The value will be injected into a

    user job and used to construct tracker implementation.

[torchx:tracker]
tracker_name=<entry_point>

Each tracker can be additionally configured (currently limited to config parameter) under [tracker:<TRACKER NAME>] section:

[tracker:<TRACKER NAME>]
config=configvalue

For example, ~/.torchxconfig may be setup as:

[torchx:tracker]
tracker1=tracker1
tracker12=backend_2_entry_point

[tracker:tracker1]
config=s3://my_bucket/config.json

2. User job configuration (Advanced)

Entrypoint value defined in the previous step must be discoverable under [torchx.tracker] group and callable within user job (depending on packaging/distribution mechanism) to create an instance of the TrackerBase.

To accomplish that define entrypoint in the distribution in entry_points.txt as:

[torchx.tracker]
entry_point_name=my_module:create_tracker_fn

Acquiring AppRun instance

Use app_run_from_env():

>>> import os; os.environ["TORCHX_JOB_ID"] = "scheduler://session/job_id" # Simulate running job first
>>> from torchx.tracker import app_run_from_env
>>> app_run = app_run_from_env()

Reference TrackerBase implementation

FsspecTracker provides reference implementation of a tracker backend. GitHub example directory provides example on how to configure and use it in user application.

Querying data

  • CmdTracker exposes operations available to users at the CLI level:
    • torchx tracker list jobs [–parent-run-id RUN_ID]

    • torchx tracker list metadata RUN_ID

    • torchx tracker list artifacts [–artifact ARTIFACT_NAME] RUN_ID

  • Alternatively, backend implementations may expose UI for user consumption.

class torchx.tracker.AppRun(id: str, backends: Iterable[TrackerBase])[source]

Exposes tracker API to at the job level and should the only API that encapsulates that module implementation.

This API is stil experimental and may change in the future.

Parameters:
  • id (str) – identity of the job used by tracker API

  • backends (Iterable[TrackerBase]) – list of TrackerBase implementations that will be used to persist the data.

class torchx.tracker.api.TrackerBase[source]

Abstraction of tracking solution implementations/services.

This API is stil experimental and may change in the future to a large extend.

class torchx.tracker.backend.fsspec.FsspecTracker(fs: AbstractFileSystem, root_dir: str)[source]

Implements TrackerBase using Fsspec abstraction and has an advantage of using various storage options for persisting the data.

Important: torchx.tracker.api API is still experimental, hence there are no backwards compatibility gurantees with future releases yet.

Each run will have a directory with subdirs for metadata, artifact, source and descendants data.

class torchx.cli.cmd_tracker.CmdTracker[source]

Prototype TorchX tracker subcommand that allows querying data by interacting with tracker implementation.

Important: commands and the arguments may be modified in the future.

Supported commands:
  • tracker list jobs [–parent-run-id RUN_ID]

  • tracker list metadata RUN_ID

  • tracker list artifacts [–artifact ARTIFACT_NAME] RUN_ID

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources