Llama3Tokenizer¶
- class torchtune.models.llama3.Llama3Tokenizer(path: str, special_tokens: Optional[Dict[str, int]] = None)[source]¶
tiktoken tokenizer configured with Llama3 Instruct’s special tokens, as described in https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3
- Parameters:
Examples
>>> tokenizer = Llama3Tokenizer("/path/to/tt_model") >>> tokenized_text = tokenizer.encode("Hello world!", add_bos=True, add_eos=True) >>> print(tokenized_text) [1, 31587, 29644, 102, 2]
- decode(token_ids: List[int], truncate_at_eos: bool = True) str [source]¶
Decode a list of token ids into a string.
- tokenize_message(message: Message, tokenize_header: bool = False) List[int] [source]¶
Tokenize a message into a list of token ids.
- tokenize_messages(messages: List[Message], max_seq_len: Optional[int] = None, tokenize_header: bool = True, add_eos: bool = True) Tuple[List[int], List[bool]] [source]¶
Tokenize a list of messages into a list of token ids and masks.
- Parameters:
- Returns:
The list of token ids and the list of masks.
- Return type: