torchtune.data¶
Text templates¶
Templates for instruct prompts and chat prompts. Includes some specific formatting for difference datasets and models.
Interface for instruction templates. |
|
Prompt template for Alpaca-style datasets. |
|
Prompt template for grammar correction datasets. |
|
Prompt template to format datasets for summarization tasks. |
|
Prompt template for preference datasets similar to StackExchangedPaired. |
|
Interface for chat formats. |
|
OpenAI's Chat Markup Language used by their chat models. |
|
Chat format that formats human and system prompts with appropriate tags used in Llama2 pre-training. |
|
Formats according to Mistral's instruct model. |
Types¶
This dataclass represents individual messages in an instruction or chat dataset. |
Converters¶
Converts data from common JSON formats into a torchtune Message
.
Convert a chat sample adhering to the ShareGPT json structure to torchtune's |
|
Convert a chat sample adhering to the OpenAI API json structure to torchtune's |
Helper funcs¶
Miscellaneous helper functions used in modifying data.
Given a list of messages, ensure that messages form a valid back-and-forth conversation. |
|
Truncate a list of tokens to a maximum length. |