Shortcuts

lora_llama3_2_vision_decoder

torchtune.models.llama3_2_vision.lora_llama3_2_vision_decoder(decoder_lora: bool, fusion_lora: bool, lora_attn_modules: List[Literal['q_proj', 'k_proj', 'v_proj', 'output_proj']], apply_lora_to_mlp: bool = False, apply_lora_to_output: bool = False, *, vocab_size: int, num_layers: int, fusion_interval: int, num_special_tokens: int, num_heads: int, num_kv_heads: int, embed_dim: int, max_seq_len: int, encoder_max_seq_len: int, rope_base: int = 500000.0, intermediate_dim: Optional[int] = None, lora_rank: int = 8, lora_alpha: float = 16, lora_dropout: float = 0.0, use_dora: bool = False, quantize_base: bool = False) TransformerDecoder[source]

Build the decoder associated with the Llama3 model with additional fused cross attention layers. This includes: - Token embeddings - num_layers number of CausalSelfAttention blocks - Fused cross attention layers every fusion_interval number of layers - RMS Norm layer applied to the output of the transformer - Final projection into token space

Parameters:
  • decoder_lora (bool) – whether to apply LoRA to the language decoder

  • fusion_lora (bool) – whether to apply LoRA to the projection head

  • lora_attn_modules (List[LORA_ATTN_MODULES]) – list of which linear layers LoRA should be applied to in each self-attention block. Options are {"q_proj", "k_proj", "v_proj", "output_proj"}.

  • apply_lora_to_mlp (bool) – whether to apply LoRA to the MLP in each transformer layer. Default: False

  • apply_lora_to_output (bool) – whether to apply LoRA to the model’s final output projection. Default: False

  • vocab_size (int) – number of tokens in vocabulary.

  • num_layers (int) – number of layers in the transformer decoder.

  • fusion_interval (int) – interval number of layers between fusion layers.

  • num_special_tokens (int) – number of special tokens added for the fusion model.

  • num_heads (int) – number of query heads. For MHA this is also the number of heads for key and value.

  • num_kv_heads (int) – number of key and value heads. User should ensure num_heads % num_kv_heads == 0. For standard MHA set num_kv_heads == num_heads, for GQA num_kv_heads < num_heads, and for MQA set num_kv_heads == 1.

  • embed_dim (int) – embedding dimension for self-attention.

  • max_seq_len (int) – maximum sequence length the model will be run with, as used by KVCache().

  • encoder_max_seq_len (int) – maximum sequence length the encoder will be run with, as used by KVCache().

  • intermediate_dim (Optional[int]) – intermediate dimension for MLP. If not specified, this is computed using scale_hidden_dim_for_mlp().

  • lora_rank (int) – rank of each low-rank approximation

  • lora_alpha (float) – scaling factor for the low-rank approximation

  • lora_dropout (float) – LoRA dropout probability. Default: 0.0

  • use_dora (bool) – Whether to use DoRA layers instead of LoRA layers. Default is False.

  • quantize_base – (bool): Whether to quantize base model weights or not. Only applied to base weights within linear layers LoRA is applied to. The final output linear projection is not supported for quantization currently.

Returns:

Instantiation of Llama 3.2 vision decoder.

Return type:

TransformerDecoder

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources