Shortcuts

lora_mistral_classifier

torchtune.models.mistral.lora_mistral_classifier(lora_attn_modules: List[Literal['q_proj', 'k_proj', 'v_proj', 'output_proj']], apply_lora_to_mlp: bool = False, apply_lora_to_output: bool = False, *, num_classes: int, vocab_size: int, num_layers: int, num_heads: int, num_kv_heads: int, embed_dim: int, max_seq_len: int, intermediate_dim: int, attn_dropout: float = 0.0, norm_eps: float = 1e-05, rope_base: int = 10000, lora_rank: int, lora_alpha: float, lora_dropout: float = 0.0, quantize_base: bool = False) TransformerDecoder[source]

Return a version of Mistral classifier (an instance of TransformerDecoder()) with LoRA applied to some of the linear layers in its self-attention modules.

Parameters:
  • lora_attn_modules (List[LORA_ATTN_MODULES]) – list of which linear layers LoRA should be applied to in each self-attention block. Options are {"q_proj", "k_proj", "v_proj", "output_proj"}.

  • apply_lora_to_mlp (bool) – whether to apply LoRA to the MLP in each transformer layer. Default: False

  • apply_lora_to_output (bool) – whether to apply LoRA to the model’s final output projection. Default: False

  • vocab_size (int) – number of tokens in vocabulary.

  • num_layers (int) – number of layers in the transformer decoder.

  • num_heads (int) – number of query heads. For MHA this is also the number of heads for key and value

  • num_kv_heads (int) – number of key and value heads. If specified, user should ensure num_heads % num_kv_heads == 0. Default value is None, in which case this is the same as MHA

  • embed_dim (int) – embedding dimension for self-attention

  • max_seq_len (int) – maximum sequence length the model will be run with

  • intermediate_dim (int) – intermediate dimension for MLP.

  • attn_dropout (float) – dropout value passed onto scaled_dot_product_attention. Default: 0.0

  • norm_eps (float) – epsilon in RMS norms.

  • rope_base (int) – base for the rotary positional embeddings. Default: 10_000

  • lora_rank (int) – rank of each low-rank approximation

  • lora_alpha (float) – scaling factor for the low-rank approximation

  • lora_dropout (float) – LoRA dropout probability. Default: 0.0

  • quantize_base – (bool): Whether to quantize base model weights or not. Only applied to base weights within linear layers LoRA is applied to. The final output linear projection is not supported for quantization currently.

Returns:

Instantiation of Mistral classifier model with LoRA applied to a subset of the attention projections in each layer.

Return type:

TransformerDecoder

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources