.. _datasets:

==================
torchtune.datasets
==================

.. currentmodule:: torchtune.datasets

For a detailed general usage guide, please see our :ref:`datasets tutorial <dataset_tutorial_label>`.


Example datasets
----------------

torchtune supports several widely used datasets to help quickly bootstrap your fine-tuning.

.. autosummary::
    :toctree: generated/
    :nosignatures:

    alpaca_dataset
    alpaca_cleaned_dataset
    grammar_dataset
    samsum_dataset
    slimorca_dataset
    stack_exchanged_paired_dataset
    cnn_dailymail_articles_dataset
    wikitext_dataset

Generic dataset builders
------------------------

torchtune also supports generic dataset builders for common formats like chat models and instruct models.
These are especially useful for specifying from a YAML config.

.. autosummary::
    :toctree: generated/
    :nosignatures:

    instruct_dataset
    chat_dataset
    text_completion_dataset

Generic dataset classes
-----------------------

Class representations for the above dataset builders.

.. autosummary::
    :toctree: generated/
    :nosignatures:

    InstructDataset
    ChatDataset
    TextCompletionDataset
    ConcatDataset
    PackedDataset
    PreferenceDataset