DataLoader2
================================

.. automodule:: torchdata.dataloader2

A light-weight :class:`DataLoader2` is introduced to decouple the overloaded data-manipulation functionalities from ``torch.utils.data.DataLoader`` to ``DataPipe`` operations. Besides, a certain features can only be achieved with :class:`DataLoader2` like snapshotting and switching backend services to perform high-performant operations.

DataLoader2
------------

.. autoclass:: DataLoader2
    :special-members: __iter__
    :members:

Note:
:class:`DataLoader2` doesn't support ``torch.utils.data.Dataset`` or ``torch.utils.data.IterableDataset``. Please wrap each of them with the corresponding ``DataPipe`` below:

- :class:`torchdata.datapipes.map.SequenceWrapper`: ``torch.utils.data.Dataset``
- :class:`torchdata.datapipes.iter.IterableWrapper`: ``torch.utils.data.IterableDataset``

ReadingService
---------------

``ReadingService`` specifies the execution backend for the data-processing graph. There are three types of ``ReadingServices`` in TorchData:

.. autosummary::
    :nosignatures:
    :toctree: generated/
    :template: class_method_template.rst

    DistributedReadingService
    MultiProcessingReadingService
    PrototypeMultiProcessingReadingService

Each ``ReadingServices`` would take the ``DataPipe`` graph and modify it to achieve a few features like dynamic sharding, sharing random seeds and snapshoting for multi-/distributed processes.

This also allows easier transition of data-preprocessing pipeline from research to production. After the ``DataPipe`` graph is created and validated with the ``ReadingServices``, a different ``ReadingService`` that configures and connects to the production service/infra such as ``AIStore`` can be provided to :class:`DataLoader2` as a drop-in replacement. The ``ReadingService`` could potentially search the graph, and find ``DataPipe`` operations that can be delegated to the production service/infra, then modify the graph correspondingly to achieve higher-performant execution.

The followings are interfaces for custom ``ReadingService``.

.. autoclass:: ReadingServiceInterface
    :members:

The checkpoint/snapshotting feature is a work in progress. Here is the preliminary interface (small changes are likely):

.. autoclass:: CheckpointableReadingServiceInterface
    :members:

And, graph utility functions are provided in ``torchdata.dataloader.graph`` to help users to define their own ``ReadingService`` and modify the graph:

.. module:: torchdata.dataloader2.graph

.. autosummary::
    :nosignatures:
    :toctree: generated/
    :template: function.rst

    traverse_dps
    find_dps
    replace_dp
    remove_dp

Adapter
--------

``Adapter`` is used to configure, modify and extend the ``DataPipe`` graph in :class:`DataLoader2`. It allows in-place
modification or replace the pre-assembled ``DataPipe`` graph provided by PyTorch domains. For example, ``Shuffle(False)`` can be
provided to :class:`DataLoader2`, which would disable any ``shuffle`` operations in the ``DataPipes`` graph.

.. module:: torchdata.dataloader2.adapter

.. autoclass:: Adapter
    :special-members: __call__

Here are the list of :class:`Adapter` provided by TorchData in ``torchdata.dataloader2.adapter``:

.. autosummary::
    :nosignatures:
    :toctree: generated/
    :template: class_template.rst

    Shuffle
    CacheTimeout

And, we will provide more ``Adapters`` to cover data-processing options:

- ``PinMemory``: Attach a ``DataPipe`` at the end of the data-processing graph that coverts output data to ``torch.Tensor`` in pinned memory.
- ``FullSync``: Attach a ``DataPipe`` to make sure the data-processing graph synchronized between distributed processes to prevent hanging.
- ``ShardingPolicy``: Modify sharding policy if ``sharding_filter`` is presented in the ``DataPipe`` graph.
- ``PrefetchPolicy``, ``InvalidateCache``, etc.

If you have feature requests about the ``Adapters`` you'd like to be provided, please open a GitHub issue. For specific
needs, ``DataLoader2`` also accepts any custom ``Adapter`` as long as it inherits from the ``Adapter`` class.