Shortcuts

phi4_tokenizer

torchtune.models.phi4.phi4_tokenizer(vocab_path: str = None, merges_path: str = None, special_tokens_path: Optional[str] = None, max_seq_len: Optional[int] = None, prompt_template: Optional[Union[str, Dict[Literal['system', 'user', 'assistant', 'ipython', 'tool'], Tuple[str, str]]]] = None, truncation_type: str = 'right') Phi4Tokenizer[source]

Phi4 tokenizer.

Parameters:
  • vocab_path (str) – Path to vocab.json.

  • merges_path (str) – Path to merges.txt.

  • special_tokens_path (Optional[str]) – Path to tokenizer.json from Hugging Face model files that contains all registered special tokens, or a local json file structured similarly. Default is None to use the canonical Phi4 special tokens.

  • max_seq_len (Optional[int]) – maximum sequence length for tokenizing a single list of messages, after which the input will be truncated. Default is None.

  • prompt_template (Optional[_TemplateType]) – optional specified prompt template. If a string, it is assumed to be the dotpath of a PromptTemplateInterface class. If a dictionary, it is assumed to be a custom prompt template mapping role to the prepend/append tags.

  • truncation_type (str) – type of truncation to apply, either “left” or “right”. Default is “right”.

Returns:

Instantiation of the Phi-4 (14B) tokenizer.

Return type:

Phi4Tokenizer

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources