A Map-style DataPipe is one that implements the
__len__() protocols, and represents a map
from (possibly non-integral) indices/keys to data samples. This is a close equivalent of
Dataset from the PyTorch
For example, when accessed with
mapdatapipe[idx], could read the
idx-th image and its
corresponding label from a folder on the disk.
- class torchdata.datapipes.map.MapDataPipe(*args, **kwds)¶
All datasets that represent a map from keys to data samples should subclass this. Subclasses should overwrite
__getitem__(), supporting fetching a data sample for a given, unique key. Subclasses can also optionally overwrite
__len__(), which is expected to return the size of the dataset by many
Samplerimplementations and the default options of
These DataPipes can be invoked in two ways, using the class constructor or applying their functional form onto an existing MapDataPipe (recommend, available to most but not all DataPipes).
DataLoaderby default constructs an index sampler that yields integral indices. To make it work with a map-style DataPipe with non-integral indices/keys, a custom sampler must be provided.
>>> # xdoctest: +SKIP >>> from torchdata.datapipes.map import SequenceWrapper, Mapper >>> dp = SequenceWrapper(range(10)) >>> map_dp_1 = dp.map(lambda x: x + 1) # Using functional form (recommended) >>> list(map_dp_1) [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] >>> map_dp_2 = Mapper(dp, lambda x: x + 1) # Using class constructor >>> list(map_dp_2) [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] >>> batch_dp = map_dp_1.batch(batch_size=2) >>> list(batch_dp) [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]
By design, there are fewer
IterDataPipe to avoid duplicate implementations of the same
MapDataPipe. We encourage users to use the built-in
IterDataPipe for various functionalities,
and convert it to
MapDataPipe as needed using
If you have any question about usage or best practices while using
MapDataPipe, feel free to ask on the PyTorch
forum under the ‘data’ category.
We are open to add additional
MapDataPipe where the operations can be lazily executed and
__len__ can be
known in advance. Feel free to make suggestions with description of your use case in
this Github issue. Feedback about our design choice is also
welcomed in that Github issue.
Here is the list of available Map-style DataPipes:
List of MapDataPipes¶
Create mini-batches of data (functional name:
Concatenate multiple Map DataPipes (functional name:
Stores elements from the source DataPipe in memory (functional name:
Lazily load data from
Apply the input function over each item from the source DataPipe (functional name:
Wraps a sequence object into a MapDataPipe.
Shuffle the input MapDataPipe via its indices (functional name:
Takes in a DataPipe of Sequences, unpacks each Sequence, and return the elements in separate DataPipes based on their position in the Sequence (functional name:
Aggregates elements into a tuple from each of the input DataPipes (functional name: