phi3_mini_tokenizer¶
- torchtune.models.phi3.phi3_mini_tokenizer(path: str, special_tokens_path: Optional[str] = None) Phi3MiniTokenizer [source]¶
Phi-3 Mini tokenizer. Ref: https://huggingface.co/microsoft/Phi-3-mini-4k-instruct/blob/main/tokenizer_config.json
- Parameters:
Note
This tokenizer includes typical LM EOS and BOS tokens like <s>, </s>, and <unk>. However, to support chat completion, it is also augmented with special tokens like <endoftext> and <assistant>.
Warning
Microsoft currently opts to ignore system messages citing better performance. See https://huggingface.co/microsoft/Phi-3-mini-4k-instruct/discussions/51 for more details.
- Returns:
Instantiation of the SPM tokenizer.
- Return type:
Phi3MiniSentencePieceBaseTokenizer