torchtune.data
Text templates
Templates for instruct prompts and chat prompts. Includes some specific formatting for difference datasets and models.
Interface for instruction templates. |
|
Prompt template for Alpaca-style datasets. |
|
Prompt template for grammar correction datasets. |
|
Prompt template to format datasets for summarization tasks. |
|
Prompt template for preference datasets similar to StackExchangedPaired. |
|
Interface for chat formats. |
|
OpenAI's Chat Markup Language used by their chat models. |
|
Chat format that formats human and system prompts with appropriate tags used in Llama2 pre-training. |
|
Formats according to Mistral's instruct model. |
Types
This dataclass represents individual messages in an instruction or chat dataset. |
Converters
Converts data from common JSON formats into a torchtune Message
.
Helper funcs
Miscellaneous helper functions used in modifying data.
Given a list of messages, ensure that messages form a valid back-and-forth conversation. |
|
Truncate a list of tokens to a maximum length. |