qwen2_tokenizer¶
- torchtune.models.qwen2.qwen2_tokenizer(path: str, merges_file: str = None, special_tokens_path: Optional[str] = None, max_seq_len: Optional[int] = None, prompt_template: Optional[Union[str, Dict[Literal['system', 'user', 'assistant', 'ipython'], Tuple[str, str]]]] = 'torchtune.data.ChatMLTemplate', **kwargs) Qwen2Tokenizer [source]¶
Tokenizer for Qwen2.
- Parameters:
path (str) – path to the vocab.json file.
merges_file (str) – path to the merges.txt file.
special_tokens_path (Optional[str]) – Path to
tokenizer.json
from Hugging Face model files that contains all registered special tokens, or a local json file structured similarly. Default is None to use the canonical Qwen2 special tokens.max_seq_len (Optional[int]) – A max sequence length to truncate tokens to. Default: None
prompt_template (Optional[_TemplateType]) – optional specified prompt template. If a string, it is assumed to be the dotpath of a
PromptTemplateInterface
class. If a dictionary, it is assumed to be a custom prompt template mapping role to the prepend/append tags. Default isLlama2ChatTemplate
.
- Returns:
Instantiation of the Qwen2 tokenizer
- Return type:
Qwen2Tokenizer