torchtune.data

Text templates

Templates for instruct prompts and chat prompts. Includes some specific formatting for difference datasets and models.

`InstructTemplate`	Interface for instruction templates.
`AlpacaInstructTemplate`	Prompt template for Alpaca-style datasets.
`GrammarErrorCorrectionTemplate`	Prompt template for grammar correction datasets.
`SummarizeTemplate`	Prompt template to format datasets for summarization tasks.
`StackExchangedPairedTemplate`	Prompt template for preference datasets similar to StackExchangedPaired.
`ChatFormat`	Interface for chat formats.
`ChatMLFormat`	OpenAI's Chat Markup Language used by their chat models.
`Llama2ChatFormat`	Chat format that formats human and system prompts with appropriate tags used in Llama2 pre-training.
`MistralChatFormat`	Formats according to Mistral's instruct model.

This dataclass represents individual messages in an instruction or chat dataset.

Converts data from common JSON formats into a torchtune Message.

`get_sharegpt_messages`	Convert a chat sample adhering to the ShareGPT json structure to torchtune's `Message` structure.
`get_openai_messages`	Convert a chat sample adhering to the OpenAI API json structure to torchtune's `Message` structure.

Miscellaneous helper functions used in modifying data.

`validate_messages`	Given a list of messages, ensure that messages form a valid back-and-forth conversation.
`truncate`	Truncate a list of tokens to a maximum length.