torchtune.data¶
Text templates¶
Templates for instruct prompts and chat prompts. Includes some specific formatting for difference datasets and models.
A prompt template for grammar error correction tasks. |
|
A prompt template for summarization tasks. |
|
A prompt template for question answering tasks. |
|
Quickly define a custom prompt template by passing in a dictionary mapping role to the prepend and append tags. For example, to achieve the following prompt template::. |
|
Interface for prompt templates. |
|
OpenAI's Chat Markup Language used by their chat models. |
Types¶
This class represents individual messages in a fine-tuning dataset. |
|
alias of |
Message transforms¶
Converts data from common schema and conversation JSON formats into a list of torchtune Message
.
Message transform class that converts a single sample with "input" and "output" fields, (or equivalent fields specified in column_map) to user and assistant messages, respectively. This is useful for datasets that have two columns, one containing the user prompt string and the other containing the model response string::. |
|
Convert a single chat sample adhering to the ShareGPT JSON structure to torchtune's |
|
Convert a single chat sample adhering to the OpenAI chat completion JSON structure to torchtune's |
|
Transform for converting a single sample from datasets with "chosen" and "rejected" columns containing conversations to a list of chosen and rejected messages. For example::. |
|
Message transform class for Alpaca-style datasets with "instruction", "input", and "output" (or equivalent fields specified in column_map) columns. |
Collaters¶
Collaters used to collect samples into batches and handle any padding.
A generic padding collation function which pads |
|
Pad a batch of text sequences, tiled image tensors, aspect ratios, and cross attention masks. |
|
Pad a batch of sequences to the longest sequence length in the batch, and convert integer lists to tensors. |
|
Pad a batch of sequences for Direct Preference Optimization (DPO). |
|
This function is identical to |
Helper functions¶
Miscellaneous helper functions used in modifying data.
Given a list of messages, ensure that messages form a valid back-and-forth conversation. |
|
Truncate a list of tokens to a maximum length. |
|
Convenience method to load an image in PIL format from a local file path or remote source. |
|
Given a raw text string, split by the specified |