Shuffler¶
- class torchdata.datapipes.map.Shuffler(datapipe: MapDataPipe[T_co], *, indices: Optional[List] = None)¶
Shuffle the input MapDataPipe via its indices (functional name:
shuffle
).When it is used with
DataLoader
, the methods to set up random seed are different based onnum_workers
.For single-process mode (
num_workers == 0
), the random seed is set before theDataLoader
in the main process. For multi-process mode (num_worker > 0
),worker_init_fn
is used to set up a random seed for each worker process.- Parameters:
datapipe – MapDataPipe being shuffled
indices – a list of indices of the MapDataPipe. If not provided, we assume it uses 0-based indexing
Example
>>> # xdoctest: +SKIP >>> from torchdata.datapipes.map import SequenceWrapper >>> dp = SequenceWrapper(range(10)) >>> shuffle_dp = dp.shuffle().set_seed(0) >>> list(shuffle_dp) [7, 8, 1, 5, 3, 4, 2, 0, 9, 6] >>> list(shuffle_dp) [6, 1, 9, 5, 2, 4, 7, 3, 8, 0] >>> # Reset seed for Shuffler >>> shuffle_dp = shuffle_dp.set_seed(0) >>> list(shuffle_dp) [7, 8, 1, 5, 3, 4, 2, 0, 9, 6]
Note
Even thought this
shuffle
operation takes aMapDataPipe
as the input, it would return anIterDataPipe
rather than aMapDataPipe
, becauseMapDataPipe
should be non-sensitive to the order of data order for the sake of random reads, butIterDataPipe
depends on the order of data during data-processing.