Shortcuts

Llama3VisionTransform

class torchtune.models.llama3_2_vision.Llama3VisionTransform(path: str, *, tile_size: int, patch_size: int, max_num_tiles: int = 4, special_tokens: Optional[Dict[str, int]] = None, max_seq_len: Optional[int] = None, image_mean: Optional[Tuple[float, float, float]] = None, image_std: Optional[Tuple[float, float, float]] = None, prompt_template: Optional[PromptTemplate] = None)[source]

This transform combines the transforms for the different modalities of Llama 3.2 Vision. It is made up of the following transforms: - torchtune.models.llama3.Llama3Tokenizer - torchtune.models.clip.CLIPImageTransform - torchtune.modules.transforms.VisionCrossAttentionMask

This transform can be used as a drop-in replacement for tokenizers in recipes and generation but handles additional transformations from the __call__ method.

Parameters:
  • path (str) – Path to pretrained tiktoken tokenizer file.

  • tile_size (int) – Size of the tiles to divide the image into.

  • patch_size (int) – Size of the patches used in the CLIP vision tranformer model. This is used to calculate the number of image embeddings per image.

  • max_num_tiles (int) – Only used if possible_resolutions is NOT given. Maximum number of tiles to break an image into. This will be used to generate possible_resolutions, e.g. [(224, 224), (224, 448), (448, 224)] if max_num_tiles = 2 and tile_size = 224. Default 4.

  • special_tokens (Optional[Dict[str, int]]) – mapping containing special text tokens and their registered token IDs. If left as None, this will be set to the canonical Llama3 special tokens.

  • max_seq_len (Optional[int]) – maximum sequence length for tokenizing a single list of messages, after which the input will be truncated. Default is None.

  • image_mean (Optional[Tuple[float, float, float]]) – Mean values of each channel, used for normalization.

  • image_std (Optional[Tuple[float, float, float]]) – Standard deviations for each channel, used for normalization.

  • prompt_template (Optional[PromptTemplate]) –

    template used to format the messages based on their role. This is used to add structured text around the actual messages. The structured text is used in three scenarios:

    • Task-specific templates to gear models for a particular task that it will expect after training

    • Model-specific templates that are required whenever the model is prompted, such as the [INST] tags in Llama2 and in Mistral

    • Community standardized templates, such as ChatMLTemplate

    The extra text will still get tokenized as normal text, not as special tokens. Default is None.

Examples

>>> model_transform = Llama3VisionTransform("/path/to/tokenizer.model", tile_size=224, patch_size=14)
>>> transformed_data = model_transform({"messages": user_message, "images": [img1, img2]})
>>> print(transformed_data["tokens"])
[1, 31587, 29644, 102, 2]
>>> print(transformed_data["images"][0].shape)
torch.Size([4, 3, 224, 224])
decode(token_ids: List[int], truncate_at_eos: bool = True, skip_special_tokens: bool = True) str[source]

Decode a list of token ids into a string.

Parameters:
  • token_ids (List[int]) – The list of token ids.

  • truncate_at_eos (bool) – Whether to truncate the string at the end of sequence token. Default is True.

  • skip_special_tokens (bool) – Whether to show or skip special tokens in the decoded string. Default is True.

Returns:

The decoded string.

Return type:

str

tokenize_message(message: Message, tokenize_header: bool = True, tokenize_end: bool = True) List[int][source]

Tokenize a message into a list of token ids.

Parameters:
  • message (Message) – The message to tokenize.

  • tokenize_header (bool) – Whether to prepend a tokenized header to the message.

  • tokenize_end (bool) – Whether to append eot or eom id at the end of the message.

Returns:

The list of token ids.

Return type:

List[int]

tokenize_messages(messages: List[Message], add_eos: bool = True) Tuple[List[int], List[bool]][source]

Tokenize a list of messages into a list of token ids and masks.

Parameters:
  • messages (List[Message]) – The list of messages to tokenize.

  • add_eos (bool) – Wether to add the tokenizer’s eos_id. Default True.

Returns:

The list of token ids and the list of masks.

Return type:

Tuple[List[int], List[bool]]

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources