.. _datasets: ================== torchtune.datasets ================== .. currentmodule:: torchtune.datasets For a detailed general usage guide, please see :ref:`datasets_overview`. Text datasets ------------- torchtune supports several widely used text-only datasets to help quickly bootstrap your fine-tuning. .. autosummary:: :toctree: generated/ :nosignatures: alpaca_dataset alpaca_cleaned_dataset grammar_dataset hh_rlhf_helpful_dataset samsum_dataset slimorca_dataset stack_exchange_paired_dataset cnn_dailymail_articles_dataset wikitext_dataset Image + Text datasets --------------------- .. autosummary:: :toctree: generated/ :nosignatures: multimodal.llava_instruct_dataset multimodal.the_cauldron_dataset multimodal.vqa_dataset .. _dataset_builders: Generic dataset builders ------------------------ torchtune also supports generic dataset builders for common formats like chat models and instruct models. These are especially useful for specifying from a YAML config. .. autosummary:: :toctree: generated/ :nosignatures: instruct_dataset chat_dataset preference_dataset text_completion_dataset Generic dataset classes ----------------------- Class representations for the above dataset builders. .. autosummary:: :toctree: generated/ :nosignatures: TextCompletionDataset ConcatDataset PackedDataset PreferenceDataset SFTDataset