torchx.tracker¶

Overview & Usage¶

Note

PROTOTYPE, USE AT YOUR OWN RISK, APIs SUBJECT TO CHANGE

Practitioners running ML jobs often need to track information such as:

Job inputs:
- configuration
  
  model configuration
  
  HPO parameters
- data
  
  version
  
  sources
Job results:
- metrics
- model location
Conceptual job groupings

AppRun provides a uniform interface as an experiment and artifact tracking solution that supports wrapping pluggable tracking implementations by providing TrackerBase adapter implementation.

Example usage¶

Sample code using tracker API.

Tracker Setup¶

To enable tracking it requires:

Defining tracker backends (entrypoints/modules and configuration) on launcher side using .torchxconfig
Adding entrypoints within a user job using entry_points (specification)

1. Launcher side configuration¶

User can define any number of tracker backends under torchx:tracker section in .torchxconfig, where:

Key: is an arbitrary name for the tracker, where the name will be used to configure its properties
under [tracker:<TRACKER_NAME>]
Value: is entrypoint or module factory method that must be available within user job. The value will be injected into a
user job and used to construct tracker implementation.

[torchx:tracker]
tracker_name=<entry_point_or_module_factory_method>

Each tracker can be additionally configured (currently limited to config parameter) under [tracker:<TRACKER NAME>] section:

[tracker:<TRACKER NAME>]
config=configvalue

For example, ~/.torchxconfig may be setup as:

[torchx:tracker]
tracker1=tracker1
tracker2=backend_2_entry_point
tracker3=torchx.tracker.mlflow:create_tracker

[tracker:tracker1]
config=s3://my_bucket/config.json

[tracker:tracker3]
config=my_config.json

2. User job configuration (Advanced)¶

Entrypoint value defined in the previous step must be discoverable under [torchx.tracker] group and callable within user job (depending on packaging/distribution mechanism) to create an instance of the TrackerBase.

To accomplish that define entrypoint in the distribution in entry_points.txt as:

[torchx.tracker]
entry_point_name=my_module:create_tracker_fn

Acquiring `AppRun` instance¶

Use app_run_from_env():

>>> import os; os.environ["TORCHX_JOB_ID"] = "scheduler://session/job_id" # Simulate running job first
>>> from torchx.tracker import app_run_from_env
>>> app_run = app_run_from_env()

Reference `TrackerBase` implementation¶

FsspecTracker provides reference implementation of a tracker backend. GitHub example directory provides example on how to configure and use it in user application.

Querying data¶

CmdTracker exposes operations available to users at the CLI level:
- torchx tracker list jobs [–parent-run-id RUN_ID]
- torchx tracker list metadata RUN_ID
- torchx tracker list artifacts [–artifact ARTIFACT_NAME] RUN_ID
Alternatively, backend implementations may expose UI for user consumption.

class torchx.tracker.AppRun(id: str, backends: Iterable[TrackerBase])[source]¶

Exposes tracker API to at the job level and should the only API that encapsulates that module implementation.

This API is stil experimental and may change in the future.

Parameters:

id (str) – identity of the job used by tracker API
backends (Iterable[TrackerBase]) – list of TrackerBase implementations that will be used to persist the data.

class torchx.tracker.api.TrackerBase[source]¶

Abstraction of tracking solution implementations/services.

This API is stil experimental and may change in the future to a large extend.

class torchx.tracker.backend.fsspec.FsspecTracker(fs: AbstractFileSystem, root_dir: str)[source]¶

Implements TrackerBase using Fsspec abstraction and has an advantage of using various storage options for persisting the data.

Important: torchx.tracker.api API is still experimental, hence there are no backwards compatibility gurantees with future releases yet.

Each run will have a directory with subdirs for metadata, artifact, source and descendants data.

class torchx.cli.cmd_tracker.CmdTracker[source]¶

Prototype TorchX tracker subcommand that allows querying data by interacting with tracker implementation.

Important: commands and the arguments may be modified in the future.

Supported commands:

tracker list jobs [–parent-run-id RUN_ID]
tracker list metadata RUN_ID
tracker list artifacts [–artifact ARTIFACT_NAME] RUN_ID

torchx.tracker¶

Overview & Usage¶

Example usage¶

Tracker Setup¶

1. Launcher side configuration¶

2. User job configuration (Advanced)¶

Acquiring `AppRun` instance¶

Reference `TrackerBase` implementation¶

Querying data¶

Docs

Tutorials

Resources

torchx.tracker¶

Overview & Usage¶

Example usage¶

Tracker Setup¶

1. Launcher side configuration¶

2. User job configuration (Advanced)¶

Acquiring AppRun instance¶

Reference TrackerBase implementation¶

Querying data¶

Docs

Tutorials

Resources

Acquiring `AppRun` instance¶

Reference `TrackerBase` implementation¶