FullModelMetaCheckpointer¶

class torchtune.utils.FullModelMetaCheckpointer(checkpoint_dir: str, checkpoint_files: List[str], model_type: ModelType, output_dir: str, adapter_checkpoint: Optional[str] = None, recipe_checkpoint: Optional[str] = None, resume_from_checkpoint: bool = False)[source]¶

Checkpointer which reads and writes checkpoints in Meta’s format. Example includes the Llama-2-7b model from the meta-llama repo (https://huggingface.co/meta-llama/Llama-2-7b)

Currently we support reading from a single checkpoint file only. Support for reading from sharded checkpoints is WIP.

Parameters:

checkpoint_dir (str) – Directory containing the checkpoint files
checkpoint_files (List[str]) – List of checkpoint files to load. Currently this checkpointer only supports loading a single checkpoint file.
model_type (ModelType) – Model type of the model for which the checkpointer is being loaded
output_dir (str) – Directory to save the checkpoint files
adapter_checkpoint (Optional[str]) – Path to the adapter weights. Default is None
recipe_checkpoint (Optional[str]) – Path to the recipe state checkpoint file. Default is None
resume_from_checkpoint (bool) – If True, the checkpointer will load the additional checkpoint files to resume training from a previous run. Default is False

Raises:

ValueError – If checkpoint_files is not a list of length 1
ValueError – If resume_from_checkpoint is True but recipe_checkpoint is None

load_checkpoint() → Dict[str, Any][source]¶: Load TorchTune checkpoint from file. Currently only loading from a single file is supported.

save_checkpoint(state_dict: Dict[str, Any], epoch: int, intermediate_checkpoint: bool = False) → None[source]¶

Save TorchTune checkpoint to file. If intermediate_checkpoint is True, an additional checkpoint file recipe_state.pt is created in _output_dir which contains the recipe state.

Parameters:

state_dict (Dict[str, Any]) – Checkpoint state dict to be written out to file
epoch (int) – Epoch number. Used to create the checkpoint file name
intermediate_checkpoint (bool) – If True, an additional checkpoint files for recipe state and (if applicable) adapter weights are created. Default is False

FullModelMetaCheckpointer¶

Docs

Tutorials

Resources