HuggingFaceBaseTokenizer
- class torchtune.modules.transforms.tokenizers.HuggingFaceBaseTokenizer(tokenizer_json_path: str, *, tokenizer_config_json_path: Optional[str] = None, generation_config_path: Optional[str] = None)[source]
A wrapper around Hugging Face tokenizers. See https://github.com/huggingface/tokenizers This can be used to load from a Hugging Face tokenizer.json file into a torchtune BaseTokenizer.
This class will load the tokenizer.json file from tokenizer_json_path. It will attempt to infer BOS and EOS token IDs from config.json if possible, and if not will fallback to inferring them from generation_config.json.
- Parameters:
- Raises:
ValueError – If neither tokenizer_config_json_path or generation_config_path are specified.