.. _wandb_logging: =========================== Logging to Weights & Biases =========================== This deep-dive will guide you through how to set up logging to Weights & Biases (W&B) in torchtune. .. grid:: 1 .. grid-item-card:: :octicon:`mortar-board;1em;` What this deep-dive will cover * How to get started with W&B * How to use the :class:`~torchtune.training.metric_logging.WandBLogger` * How to log configs, metrics, and model checkpoints to W&B torchtune supports logging your training runs to `Weights & Biases <https://wandb.ai)>`_. An example W&B workspace from a torchtune fine-tuning run can be seen in the screenshot below. .. image:: ../_static/img/torchtune_workspace.png :alt: torchtune workspace in W&B :width: 100% :align: center .. note:: You will need to install the :code:`wandb` package to use this feature. You can install it via pip: .. code-block:: bash pip install wandb Then you need to login with your API key using the W&B CLI: .. code-block:: bash wandb login Metric Logger ------------- The only change you need to make is to add the metric logger to your config. Weights & Biases will log the metrics and model checkpoints for you. .. code-block:: yaml # enable logging to the built-in WandBLogger metric_logger: _component_: torchtune.training.metric_logging.WandBLogger # the W&B project to log to project: torchtune We automatically grab the config from the recipe you are running and log it to W&B. You can find it in the W&B overview tab and the actual file in the :code:`Files` tab. As a tip, you may see straggler `wandb` processes running in the background if your job crashes or otherwise exits without cleaning up resources. To kill these straggler processes, a command like ``ps -aux | grep wandb | awk '{ print $2 }' | xargs kill`` can be used. .. note:: Click on this sample `project to see the W&B workspace <https://wandb.ai/capecape/torchtune>`_. The config used to train the models can be found `here <https://wandb.ai/capecape/torchtune/runs/6053ofw0/files/torchtune_config_j67sb73v.yaml>`_. Logging Model Checkpoints to W&B -------------------------------- You can also log the model checkpoints to W&B by modifying the desired script :code:`save_checkpoint` method. A suggested approach would be something like this: .. code-block:: python def save_checkpoint(self, epoch: int) -> None: ... ## Let's save the checkpoint to W&B ## depending on the Checkpointer Class the file will be named differently ## Here is an example for the full_finetune case checkpoint_file = Path.joinpath( self._checkpointer._output_dir, f"torchtune_model_{epoch}" ).with_suffix(".pt") wandb_at = wandb.Artifact( name=f"torchtune_model_{epoch}", type="model", # description of the model checkpoint description="Model checkpoint", # you can add whatever metadata you want as a dict metadata={ training.SEED_KEY: self.seed, training.EPOCHS_KEY: self.epochs_run, training.TOTAL_EPOCHS_KEY: self.total_epochs, training.MAX_STEPS_KEY: self.max_steps_per_epoch, } ) wandb_at.add_file(checkpoint_file) wandb.log_artifact(wandb_at)