.. _data: ============== torchtune.data ============== .. currentmodule:: torchtune.data Text templates -------------- Templates for instruct prompts and chat prompts. Includes some specific formatting for difference datasets and models. .. autosummary:: :toctree: generated/ :nosignatures: GrammarErrorCorrectionTemplate SummarizeTemplate QuestionAnswerTemplate PromptTemplate PromptTemplateInterface ChatMLTemplate Types ----- .. autosummary:: :toctree: generated/ :nosignatures: Message Role .. _message_transforms_ref: Message transforms ------------------ Converts data from common schema and conversation JSON formats into a list of torchtune :class:`Message`. .. autosummary:: :toctree: generated/ :nosignatures: InputOutputToMessages ShareGPTToMessages OpenAIToMessages ChosenRejectedToMessages AlpacaToMessages Collaters --------- Collaters used to collect samples into batches and handle any padding. .. autosummary:: :toctree: generated/ :nosignatures: padded_collate padded_collate_tiled_images_and_mask padded_collate_sft padded_collate_dpo left_pad_sequence Helper functions ---------------- Miscellaneous helper functions used in modifying data. .. autosummary:: :toctree: generated/ :nosignatures: validate_messages truncate load_image format_content_with_images mask_messages