For metrics we recommend using Tensorboard to log metrics directly to cloud storage along side your model. As the model trains you can launch a tensorboard instance locally to monitor your model progress:

$ tensorboard --log-dir provider://path/to/logs

Or you can use the torchx.components.metrics.tensorboard() component as part of your pipeline.

See the Trainer Example for an example on how to use the PyTorch Lightning TensorboardLogger.


torchx.components.metrics.tensorboard(logdir: str, image: str = '', timeout: float = 3600, port: int = 6006, start_on_file: str = '', exit_on_file: str = '') AppDef[source]

This component runs a Tensorboard server which will render the logs specified by logdir.

Since Tensorboard runs as a service you need to specify the termination conditions. This consists of a timeout as well as an optional exit_on_file which will cause the service to quit when that path is created.

The files are periodically polled for existence via fsspec and will trigger the corresponding behavior when created.

  • logdir – fsspec path to the Tensorboard logs

  • image – image to use

  • timeout – maximum time to run before exiting (seconds)

  • start_on_file – start the server when the fsspec path is created

  • exit_on_file – shutdown the server when the fsspec path is created


Access comprehensive developer documentation for PyTorch

View Docs


Get in-depth tutorials for beginners and advanced developers

View Tutorials


Find development resources and get your questions answered

View Resources