Shortcuts

ModelTokenizer

class torchtune.modules.tokenizers.ModelTokenizer(*args, **kwargs)[source]

Abstract tokenizer that implements model-specific special token logic in the tokenize_messages method. See Llama3Tokenizer for an example implementation of this protocol.

tokenize_messages(messages: List[Message], **kwargs: Dict[str, Any]) Tuple[List[int], List[bool]][source]

Given a list of messages, return a list of tokens and list of masks for the concatenated and formatted messages.

Parameters:
  • messages (List[Message]) – The list of messages to tokenize.

  • **kwargs (Dict[str, Any]) – kwargs.

Returns:

The list of token ids and the list of masks.

Return type:

Tuple[List[int], List[bool]]

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources