LoRA Single Device Finetuning¶
This recipe supports finetuning on next-token prediction tasks using parameter efficient fine-tuning techniques (PEFT) such as LoRA and QLoRA. These techniques significantly reduce memory consumption during training whilst still maintaining competitive performance.
We provide pre-tested out-of-the-box configs which you can get up and running with the latest Llama models in just two steps:
Note
You may need to be granted access to the Llama model you’re interested in. See here for details on accessing gated repositories.
tune download meta-llama/Meta-Llama-3.1-8B-Instruct \
--output-dir /tmp/Meta-Llama-3.1-8B-Instruct \
--ignore-patterns "original/consolidated.00.pth"
tune run lora_finetune_single_device \
--config llama3_1/8B_lora_single_device
You can quickly customize this recipe through the torchtune CLI. For example, when fine-tuning with LoRA, you can adjust the layers which LoRA are applied to, and the scale of the imapct of LoRA during training:
tune run lora_finetune_single_device \
--config llama3_1/8B_lora_single_device \
--model.lora_attn_modules=["q_proj", "k_proj", "v_proj"] \
--model.apply_lora_to_mlp=True \
--model.lora_rank=64 \
--model.lora_alpha=128
This configuration in particular results in a aggressive LoRA policy which will tradeoff higher training accuracy with increased memory usage and slower training.
For a deeper understanding of the different levers you can pull when using this recipe, see our documentation for the different PEFT training paradigms we support:
Many of our other memory optimization features can be used in this recipe, too:
Adjust model precision.
Enable gradient accumulation.
Use lower precision optimizers. However, note that since LoRA significantly reduces memory usage due to gradient state, you will likely not need this feature.
You can learn more about all of our memory optimization features in our memory optimization overview.
Interested in seeing this recipe in action? Check out some of our tutorials to show off how it can be used: