lora_llama3_2_vision_11b

torchtune.models.llama3_2_vision.lora_llama3_2_vision_11b(lora_attn_modules: List[Literal['q_proj', 'k_proj', 'v_proj', 'output_proj']], decoder_trainable: str = 'frozen', encoder_trainable: str = 'lora', fusion_trainable: str = 'lora', apply_lora_to_mlp: bool = False, apply_lora_to_output: bool = False, lora_rank: int = 8, lora_alpha: float = 16, lora_dropout: float = 0.0, use_dora: bool = False, quantize_base: bool = False, image_size: int = 560) → DeepFusionModel[source]

Return a version of Llama3.2 vision (an instance of DeepFusionModel()) with LoRA applied based on the passed in configuration.

Parameters:

lora_attn_modules (List[LORA_ATTN_MODULES]) – list of which linear layers LoRA should be applied to in each self-attention block. Options are {"q_proj", "k_proj", "v_proj", "output_proj"}.
decoder_trainable (str) – Option to set decoder params as fully trainble (full), lora trainable (lora), or frozen (frozen). The default is “frozen”.
encoder_trainable (str) – Option to set encoder params as fully trainble (full), lora trainable (lora), or frozen (frozen). The default is “lora”.
fusion_trainable (str) – Option to set fusion params as fully trainble (full), lora trainable (lora), or frozen (frozen). The default is “lora”.
apply_lora_to_mlp (bool) – whether to apply LoRA to the MLP in each transformer layer. Default: False
apply_lora_to_output (bool) – whether to apply LoRA to the model’s final output projection. Default: False
lora_rank (int) – rank of each low-rank approximation
lora_alpha (float) – scaling factor for the low-rank approximation
lora_dropout (float) – LoRA dropout probability. Default: 0.0
quantize_base – (bool): Whether to quantize base model weights or not. Only applied to base weights within linear layers LoRA is applied to. The final output linear projection is not supported for quantization currently.
image_size (int) – Base image size that images will be tiled and resized to. Default is 560 for Instruct weights, use 448 for pre-trained.

Returns:

Instantiation of Llama3.2 vision model with LoRA applied to a subset of the attention projections in each layer.

Return type:

DeepFusionModel

lora_llama3_2_vision_11b

Docs

Tutorials

Resources