Shortcuts

phi3_mini_tokenizer

torchtune.models.phi3.phi3_mini_tokenizer(path: str, special_tokens_path: Optional[str] = None) Phi3MiniTokenizer[source]

Phi-3 Mini tokenizer. Ref: https://huggingface.co/microsoft/Phi-3-mini-4k-instruct/blob/main/tokenizer_config.json

Parameters:
  • path (str) – Path to the SPM tokenizer model.

  • special_tokens_path (Optional[str]) – Path to tokenizer.json from Hugging Face model files that contains all registered special tokens, or a local json file structured similarly. Default is None to use the canonical Phi3 special tokens.

Note

This tokenizer includes typical LM EOS and BOS tokens like <s>, </s>, and <unk>. However, to support chat completion, it is also augmented with special tokens like <endoftext> and <assistant>.

Warning

Microsoft currently opts to ignore system messages citing better performance. See https://huggingface.co/microsoft/Phi-3-mini-4k-instruct/discussions/51 for more details.

Returns:

Instantiation of the SPM tokenizer.

Return type:

Phi3MiniSentencePieceBaseTokenizer

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources