Shortcuts

torchtune.models

llama3.2

Text-only models from the 3.2 version of Llama3 family.

Important: You need to request access on Hugging Face before downloading it.

To download the Llama-3.2-1B-Instruct model:

tune download meta-llama/Llama-3.2-1B-Instruct --output-dir /tmp/Llama-3.2-1B-Instruct --ignore-patterns "original/consolidated.00.pth" --hf-token <HF_TOKEN>

To download the Llama-3.2-3B-Instruct model:

tune download meta-llama/Llama-3.2-3B-Instruct --output-dir /tmp/Llama-3.2-3B-Instruct --ignore-patterns "original/consolidated*" --hf-token <HF_TOKEN>

llama3_2.llama3_2_1b

Builder for creating a Llama3.2 model initialized w/ the default 1b parameter values.

llama3_2.llama3_2_3b

Builder for creating a Llama3.2 model initialized w/ the default 3b parameter values.

llama3_2.lora_llama3_2_1b

Builder for creating a Llama3.2 1B model with LoRA enabled.

llama3_2.lora_llama3_2_3b

Builder for creating a Llama3.2 3B model with LoRA enabled.

llama3_2.qlora_llama3_2_1b

Builder for creating a Llama3.2 1B model with QLoRA enabled.

llama3_2.qlora_llama3_2_3b

Builder for creating a Llama3.2 3B model with QLoRA enabled.

Note

The Llama3.2 tokenizer reuses the llama3_tokenizer class.

llama3.2 Vision

Vision-Language Models from the 3.2 version of Llama3 family.

Important: You need to request access on Hugging Face before downloading it.

To download the Llama-3.2-11B-Instruct model:

tune download meta-llama/Llama-3.2-11B-Vision-Instruct --output-dir /tmp/Llama-3.2-11B-Vision-Instruct --hf-token <HF_TOKEN>

llama3_2_vision.llama3_2_vision_11b

Llama 3.2 Vision 11B model

llama3_2_vision.llama3_2_vision_transform

Data Transforms (including Tokenizer) for Llama3 Vision.

llama3_2_vision.lora_llama3_2_vision_11b

Return a version of Llama3.2 vision (an instance of DeepFusionModel()) with LoRA applied based on the passed in configuration.

llama3_2_vision.qlora_llama3_2_vision_11b

Builder for creating a Llama3.2 vision 11B model with QLoRA enabled.

llama3_2_vision.llama3_2_vision_decoder

Build the decoder associated with the Llama3 model with additional fused cross attention layers.

llama3_2_vision.llama3_2_vision_encoder

Build the Llama 3.2 vision encoder by combining the CLIP image model with an additional projection head fusion module.

llama3_2_vision.lora_llama3_2_vision_decoder

Build the decoder associated with the Llama3 model with additional fused cross attention layers.

llama3_2_vision.lora_llama3_2_vision_encoder

Build the Llama 3.2 vision encoder by combining the CLIP image model with an additional projection head fusion module.

llama3_2_vision.Llama3VisionEncoder

Vision encoder model for Llama 3.2 Vision.

llama3_2_vision.Llama3VisionProjectionHead

Projection transformer to adapt the output of a pretrained frozen encoder (CLIP) to a pretrained decoder model.

llama3_2_vision.Llama3VisionTransform

This transform combines the transforms for the different modalities of Llama 3.2 Vision.

Note

The Llama3.2 tokenizer reuses the llama3_tokenizer class.

llama3 & llama3.1

Models 3 and 3.1 from the Llama3 family.

Important: You need to request access on Hugging Face before downloading it.

To download the Llama3.1-8B-Instruct model:

tune download meta-llama/Meta-Llama-3.1-8B-Instruct --output-dir /tmp/Meta-Llama-3.1-8B-Instruct --ignore-patterns "original/consolidated.00.pth" --hf-token <HF_TOKEN>

To download the Llama3.1-70B-Instruct model:

tune download meta-llama/Meta-Llama-3.1-70B-Instruct --output-dir /tmp/Meta-Llama-3.1-70B-Instruct --ignore-patterns "original/consolidated*" --hf-token <HF_TOKEN>

To download the Llama3.1-405B-Instruct model:

tune download meta-llama/Meta-Llama-3.1-405B-Instruct --ignore-patterns "original/consolidated*" --hf-token <HF_TOKEN>

To download the Llama3 weights of the above models, you can instead download from Meta-Llama-3-8B-Instruct and Meta-Llama-3-70B-Instruct.

llama3.llama3

Build the decoder associated with the Llama3 model.

llama3.lora_llama3

Return a version of Llama3 (an instance of TransformerDecoder()) with LoRA applied based on the passed in configuration.

llama3.llama3_8b

Builder for creating a Llama3 model initialized w/ the default 8b parameter values.

llama3.lora_llama3_8b

Builder for creating a Llama3 8B model with LoRA enabled.

llama3.qlora_llama3_8b

Builder for creating a Llama3 8B model with QLoRA enabled.

llama3.llama3_70b

Builder for creating a Llama3 model initialized w/ the default 70B parameter values.

llama3.lora_llama3_70b

Builder for creating a Llama3 70B model with LoRA enabled.

llama3.qlora_llama3_70b

Builder for creating a Llama3 70B model with QLoRA enabled.

llama3.llama3_tokenizer

Tokenizer for Llama3.

llama3_1.llama3_1

Build the decoder associated with the Llama3.1 model.

llama3_1.lora_llama3_1

Return a version of Llama3.1 (an instance of TransformerDecoder()) with LoRA applied based on the passed in configuration.

llama3_1.llama3_1_8b

Builder for creating a Llama3.1 model initialized w/ the default 8b parameter values.

llama3_1.lora_llama3_1_8b

Builder for creating a Llama3.1 8B model with LoRA enabled.

llama3_1.qlora_llama3_1_8b

Builder for creating a Llama3.1 8B model with QLoRA enabled.

llama3_1.llama3_1_70b

Builder for creating a Llama3.1 model initialized w/ the default 70B parameter values.

llama3_1.lora_llama3_1_70b

Builder for creating a Llama3.1 70B model with LoRA enabled.

llama3_1.qlora_llama3_1_70b

Builder for creating a Llama3.1 70B model with QLoRA enabled.

llama3_1.llama3_1_405b

Builder for creating a Llama3.1 model initialized w/ the default 405B parameter values.

llama3_1.lora_llama3_1_405b

Builder for creating a Llama3.1 405B model with LoRA enabled.

llama3_1.qlora_llama3_1_405b

Builder for creating a Llama3.1 405B model with QLoRA enabled.

Note

The Llama3.1 tokenizer reuses the llama3.llama3_tokenizer builder class.

llama2

All models from the Llama2 family.

Important: You need to request access on Hugging Face before downloading it.

To download the Llama2-7B model:

tune download meta-llama/Llama-2-7b-hf --output-dir /tmp/Llama-2-7b-hf --hf-token <HF_TOKEN>

To download the Llama2-13B model:

tune download meta-llama/Llama-2-13b-hf --output-dir /tmp/Llama-2-13b-hf --hf-token <HF_TOKEN>

To download the Llama2-70B model:

tune download meta-llama/Llama-2-70b-hf --output-dir /tmp/Llama-2-70b-hf --hf-token <HF_TOKEN>

llama2.llama2

Build the decoder associated with the Llama2 model.

llama2.lora_llama2

Return a version of Llama2 (an instance of TransformerDecoder()) with LoRA applied based on the passed in configuration.

llama2.llama2_7b

Builder for creating a Llama2 model initialized w/ the default 7B parameter values from https://arxiv.org/abs/2307.09288

llama2.lora_llama2_7b

Builder for creating a Llama2 7B model with LoRA enabled.

llama2.qlora_llama2_7b

Builder for creating a Llama2 7B model with QLoRA enabled.

llama2.llama2_13b

Builder for creating a Llama2 model initialized w/ the default 13B parameter values from https://arxiv.org/abs/2307.09288

llama2.lora_llama2_13b

Builder for creating a Llama2 13B model with LoRA enabled.

llama2.qlora_llama2_13b

Builder for creating a Llama2 13B model with QLoRA enabled.

llama2.llama2_70b

Builder for creating a Llama2 model initialized w/ the default 70B parameter values from https://arxiv.org/abs/2307.09288

llama2.lora_llama2_70b

Builder for creating a Llama2 70B model with LoRA enabled.

llama2.qlora_llama2_70b

Builder for creating a Llama2 70B model with QLoRA enabled.

llama2.llama2_tokenizer

Tokenizer for Llama2.

llama2.llama2_reward_7b

Builder for creating a Llama2 model initialized w/ the default 7B parameter values from https://arxiv.org/abs/2307.09288, where the output layer is a classification layer projecting to a single class for reward modelling.

llama2.lora_llama2_reward_7b

Builder for creating a Llama2 7B reward model with LoRA enabled.

llama2.qlora_llama2_reward_7b

Builder for creating a Llama2 reward 7b model with QLoRA enabled.

llama2.Llama2ChatTemplate

Prompt template that formats chat data of human and system prompts with appropriate tags used in Llama2 pre-training.

code llama

Models from the Code Llama family.

Important: You need to request access on Hugging Face before downloading it.

To download the CodeLlama-7B model:

tune download meta-llama/CodeLlama-7b-hf --output-dir /tmp/CodeLlama-7b-hf --hf-token <HF_TOKEN>

code_llama2.code_llama2_7b

Builder for creating a Code-Llama2 model initialized w/ the default 7B parameter values from https://arxiv.org/pdf/2308.12950.pdf

code_llama2.lora_code_llama2_7b

Builder for creating a Code-Llama2 7B model with LoRA enabled.

code_llama2.qlora_code_llama2_7b

Builder for creating a Code-Llama2 7B model with QLoRA enabled.

code_llama2.code_llama2_13b

Builder for creating a Code-Llama2 model initialized w/ the default 13B parameter values from https://arxiv.org/pdf/2308.12950.pdf

code_llama2.lora_code_llama2_13b

Builder for creating a Code-Llama2 13B model with LoRA enabled.

code_llama2.qlora_code_llama2_13b

Builder for creating a Code-Llama2 13B model with QLoRA enabled.

code_llama2.code_llama2_70b

Builder for creating a Code-Llama2 model initialized w/ the default 70B parameter values from https://arxiv.org/pdf/2308.12950.pdf

code_llama2.lora_code_llama2_70b

Builder for creating a Code-Llama2 70B model with LoRA enabled.

code_llama2.qlora_code_llama2_70b

Builder for creating a Code-Llama2 70B model with QLoRA enabled.

qwen-2.5

Models of size 0.5B, 1.5B, 3B, 7B, 14B, 32B, 72B from the Qwen2.5 family.

To download the Qwen2.5 1.5B model, for example:

tune download Qwen/Qwen2.5-1.5B-Instruct --output-dir /tmp/Qwen2_5-1_5B-Instruct --ignore-patterns None

qwen2_5.qwen2_5_0_5b

Builder for creating a Qwen2.5 model (base or instruct) initialized w/ the default 0.5B parameter values from https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct

qwen2_5.lora_qwen2_5_0_5b

Builder for creating a Qwen2.5 0.5B model (base or instruct) with LoRA enabled.

qwen2_5.qwen2_5_1_5b_base

Builder for creating a Qwen2.5 base model initialized w/ the default 1.5B parameter values from https://huggingface.co/Qwen/Qwen2.5-1.5B

qwen2_5.qwen2_5_1_5b_instruct

Builder for creating a Qwen2.5 instruct model initialized w/ the default 1.5B parameter values from https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct

qwen2_5.lora_qwen2_5_1_5b_base

Builder for creating a Qwen2.5 1.5B base model with LoRA enabled.

qwen2_5.lora_qwen2_5_1_5b_instruct

Builder for creating a Qwen2.5 1.5B instruct model with LoRA enabled.

qwen2_5.qwen2_5_3b

Builder for creating a Qwen2.5 model (base or instruct) initialized w/ the default 3B parameter values from https://huggingface.co/Qwen/Qwen2.5-3B-Instruct

qwen2_5.lora_qwen2_5_3b

Builder for creating a Qwen2.5 3B model (base or instruct) with LoRA enabled.

qwen2_5.qwen2_5_7b_base

Builder for creating a Qwen2.5 base model initialized w/ the default 7B parameter values from https://huggingface.co/Qwen/Qwen2.5-7B

qwen2_5.qwen2_5_7b_instruct

Builder for creating a Qwen2.5 instruct model initialized w/ the default 7B parameter values from https://huggingface.co/Qwen/Qwen2.5-7B-Instruct

qwen2_5.lora_qwen2_5_7b_base

Builder for creating a Qwen2.5 7B base model with LoRA enabled.

qwen2_5.lora_qwen2_5_7b_instruct

Builder for creating a Qwen2.5 7B instruct model with LoRA enabled.

qwen2_5.qwen2_5_14b_base

Builder for creating a Qwen2.5 base model initialized w/ the default 14B parameter values from https://huggingface.co/Qwen/Qwen2.5-14B

qwen2_5.qwen2_5_14b_instruct

Builder for creating a Qwen2.5 instruct model initialized w/ the default 14B parameter values from https://huggingface.co/Qwen/Qwen2.5-14B-Instruct

qwen2_5.lora_qwen2_5_14b_base

Builder for creating a Qwen2.5 14B base model with LoRA enabled.

qwen2_5.lora_qwen2_5_14b_instruct

Builder for creating a Qwen2.5 14B instruct model with LoRA enabled.

qwen2_5.qwen2_5_32b_base

Builder for creating a Qwen2.5 base model initialized w/ the default 32B parameter values from https://huggingface.co/Qwen/Qwen2.5-32B

qwen2_5.qwen2_5_32b_instruct

Builder for creating a Qwen2.5 instruct model initialized w/ the default 32B parameter values from https://huggingface.co/Qwen/Qwen2.5-32B-Instruct

qwen2_5.lora_qwen2_5_32b_base

Builder for creating a Qwen2.5 32B base model with LoRA enabled.

qwen2_5.lora_qwen2_5_32b_instruct

Builder for creating a Qwen2.5 32B instruct model with LoRA enabled.

qwen2_5.qwen2_5_72b_base

Builder for creating a Qwen2.5 base model initialized w/ the default 72B parameter values from https://huggingface.co/Qwen/Qwen2.5-72B

qwen2_5.qwen2_5_72b_instruct

Builder for creating a Qwen2.5 instruct model initialized w/ the default 72B parameter values from https://huggingface.co/Qwen/Qwen2.5-72B-Instruct

qwen2_5.lora_qwen2_5_72b_base

Builder for creating a Qwen2.5 72B base model with LoRA enabled.

qwen2_5.lora_qwen2_5_72b_instruct

Builder for creating a Qwen2.5 72B instruct model with LoRA enabled.

qwen2_5.qwen2_5_tokenizer

Tokenizer for Qwen2.5.

qwen-2

Models of size 0.5B, 1.5B, and 7B from the Qwen2 family.

To download the Qwen2 1.5B model, for example:

tune download Qwen/Qwen2-1.5B-Instruct --output-dir /tmp/Qwen2-1.5B-Instruct --ignore-patterns None

qwen2.qwen2

Build the decoder associated with the Qwen2 model.

qwen2.lora_qwen2

Return a version of Qwen2 (an instance of Qwen2TransformerDecoder()) with LoRA applied based on the passed in configuration.

qwen2.qwen2_0_5b

Builder for creating a Qwen2 model initialized w/ the default 0.5B parameter values from https://huggingface.co/Qwen/Qwen2-0.5B-Instruct

qwen2.lora_qwen2_0_5b

Builder for creating a Qwen2 0.5B model with LoRA enabled.

qwen2.qwen2_1_5b

Builder for creating a Qwen2 model initialized w/ the default 1.5B parameter values from https://huggingface.co/Qwen/Qwen2-1.5B-Instruct

qwen2.lora_qwen2_1_5b

Builder for creating a Qwen2 1.5B model with LoRA enabled.

qwen2.qwen2_7b

Builder for creating a Qwen2 model initialized w/ the default 7B parameter values from https://huggingface.co/Qwen/Qwen2-7B-Instruct

qwen2.lora_qwen2_7b

Builder for creating a Qwen2 7B model with LoRA enabled.

qwen2.qwen2_tokenizer

Tokenizer for Qwen2.

phi-3

Models from the Phi-3 mini family.

To download the Phi-3 Mini 4k instruct model:

tune download microsoft/Phi-3-mini-4k-instruct --output-dir /tmp/Phi-3-mini-4k-instruct --ignore-patterns None --hf-token <HF_TOKEN>

phi3.phi3

param vocab_size:

number of tokens in vocabulary.

phi3.lora_phi3

Return a version of Phi3 (an instance of TransformerDecoder()) with LoRA applied based on the passed in configuration.

phi3.phi3_mini

Builder for creating the Phi3 Mini 4K Instruct Model.

phi3.lora_phi3_mini

Builder for creating a Phi3 Mini (3.8b) model with LoRA enabled.

phi3.qlora_phi3_mini

Builder for creating a Phi3 mini model with QLoRA enabled.

phi3.phi3_mini_tokenizer

Phi-3 Mini tokenizer.

mistral

All models from Mistral AI family.

Important: You need to request access on Hugging Face to download this model.

To download the Mistral 7B v0.1 model:

tune download mistralai/Mistral-7B-v0.1 --output-dir /tmp/Mistral-7B-v0.1 --hf-token <HF_TOKEN>

mistral.mistral

Build the decoder associated with the mistral model.

mistral.lora_mistral

Return a version of Mistral (an instance of TransformerDecoder()) with LoRA applied based on the passed in configuration.

mistral.mistral_classifier

Build a base mistral model with an added classification layer.

mistral.lora_mistral_classifier

Return a version of Mistral classifier (an instance of TransformerDecoder()) with LoRA applied to some of the linear layers in its self-attention modules.

mistral.mistral_7b

Builder for creating a Mistral 7B model initialized w/ the default 7b parameter values from https://mistral.ai/news/announcing-mistral-7b/

mistral.lora_mistral_7b

Builder for creating a Mistral 7B model with LoRA enabled.

mistral.qlora_mistral_7b

Builder for creating a Mistral model with QLoRA enabled.

mistral.mistral_reward_7b

Builder for creating a Mistral 7B model initialized w/ the default 7b parameter values from: https://huggingface.co/Ray2333/reward-model-Mistral-7B-instruct-Unified-Feedback where the output layer is a classification layer projecting to a single class for reward modelling.

mistral.lora_mistral_reward_7b

Builder for creating a Mistral reward 7B model with LoRA enabled.

mistral.qlora_mistral_reward_7b

Builder for creating a Mistral reward 7B model with QLoRA enabled.

mistral.mistral_tokenizer

Tokenizer for Mistral models.

mistral.MistralChatTemplate

Formats according to Mistral's instruct model.

gemma

Models of size 2B and 7B from the Gemma family.

Important: You need to request access on Hugging Face to use this model.

To download the Gemma 2B model (not Gemma2):

tune download google/gemma-2b --ignore-patterns "gemma-2b.gguf"  --hf-token <HF_TOKEN>

To download the Gemma 7B model:

tune download google/gemma-7b --ignore-patterns "gemma-7b.gguf"  --hf-token <HF_TOKEN>

gemma.gemma

Build the decoder associated with the gemma model.

gemma.lora_gemma

Return a version of Gemma with LoRA applied based on the passed in configuration.

gemma.gemma_2b

Builder for creating a Gemma 2B model initialized w/ the default 2b parameter values from: https://blog.google/technology/developers/gemma-open-models/

gemma.lora_gemma_2b

Builder for creating a Gemma 2B model with LoRA enabled.

gemma.qlora_gemma_2b

Builder for creating a Gemma model with QLoRA enabled.

gemma.gemma_7b

Builder for creating a Gemma 7B model initialized w/ the default 7b parameter values from: https://blog.google/technology/developers/gemma-open-models/

gemma.lora_gemma_7b

Builder for creating a Gemma 7B model with LoRA enabled.

gemma.qlora_gemma_7b

Builder for creating a Gemma model with QLoRA enabled.

gemma.gemma_tokenizer

Tokenizer for Gemma.

clip

Vision components to support multimodality using CLIP encoder.

clip.clip_vision_encoder

Builds the vision encoder associated with the clip model.

clip.TokenPositionalEmbedding

Token positional embedding for images, different for every token in an image.

clip.TiledTokenPositionalEmbedding

Token positional embedding for tiled images, different for every tile, different for every token.

clip.TilePositionalEmbedding

Positional embedding for tiles, different for every tile, same for every token within a tile.

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources