llama3_2_vision_transform

torchtune.models.llama3_2_vision.llama3_2_vision_transform(path: str, max_seq_len: int = 8192, image_size: int = 560, special_tokens_path: Optional[str] = None, prompt_template: Optional[Union[str, Dict[Literal['system', 'user', 'assistant', 'ipython', 'tool'], Tuple[str, str]]]] = None) → Llama3VisionTransform[source]

Data Transforms (including Tokenizer) for Llama3 Vision.

Parameters:

path (str) – path to the tokenizer
max_seq_len (int) – maximum sequence length for tokenizing a single list of messages, after which the input will be truncated.
image_size (int) – Base image size that images will be tiled and resized to. Default is 560 for Instruct weights, use 448 for pre-trained.
special_tokens_path (Optional[str]) – Path to tokenizer.json from Hugging Face model files that contains all registered special tokens, or a local json file structured similarly. Default is None to use the canonical Llama3 special tokens.
prompt_template (Optional[_TemplateType]) – optional specified prompt template. If a string, it is assumed to be the dotpath of a PromptTemplateInterface class. If a dictionary, it is assumed to be a custom prompt template mapping role to the prepend/append tags.

Returns:

Instantiation of the Llama 3.2 vision transform

Return type:

Llama3VisionTransform

Docs