llama3_2_vision_11b¶
- torchtune.models.llama3_2_vision.llama3_2_vision_11b(decoder_trainable: bool = False, encoder_trainable: bool = True, fusion_trainable: bool = True, image_size: int = 560) DeepFusionModel [source]¶
Llama 3.2 Vision 11B model
- Parameters:
decoder_trainable (bool) – Whether to make decoder params trainable. Default is False.
encoder_trainable (bool) – Whether to make encoder params trainable. Default is True.
fusion_trainable (bool) – Whether to make fusion params trainable. Default is True.
image_size (int) – Base image size that images will be tiled and resized to. Default is 560 for Instruct weights, use 448 for pre-trained.
- Returns:
Instantiation of the Llama 3.2 Vision 11B model
- Return type: