Shuffler

class torchdata.datapipes.map.Shuffler(datapipe: MapDataPipe[T_co], *, indices: Optional[List] = None)

Shuffle the input MapDataPipe via its indices (functional name: shuffle).

When it is used with DataLoader, the methods to set up random seed are different based on num_workers.

For single-process mode (num_workers == 0), the random seed is set before the DataLoader in the main process. For multi-process mode (num_worker > 0), worker_init_fn is used to set up a random seed for each worker process.

Parameters:

datapipe – MapDataPipe being shuffled
indices – a list of indices of the MapDataPipe. If not provided, we assume it uses 0-based indexing

Example

>>> # xdoctest: +SKIP
>>> from torchdata.datapipes.map import SequenceWrapper
>>> dp = SequenceWrapper(range(10))
>>> shuffle_dp = dp.shuffle().set_seed(0)
>>> list(shuffle_dp)
[7, 8, 1, 5, 3, 4, 2, 0, 9, 6]
>>> list(shuffle_dp)
[6, 1, 9, 5, 2, 4, 7, 3, 8, 0]
>>> # Reset seed for Shuffler
>>> shuffle_dp = shuffle_dp.set_seed(0)
>>> list(shuffle_dp)
[7, 8, 1, 5, 3, 4, 2, 0, 9, 6]

Note

Even thought this shuffle operation takes a MapDataPipe as the input, it would return an IterDataPipe rather than a MapDataPipe, because MapDataPipe should be non-sensitive to the order of data order for the sake of random reads, but IterDataPipe depends on the order of data during data-processing.

Shuffler

Docs

Tutorials

Resources