Shuffler¶
- class torchdata.datapipes.iter.Shuffler(datapipe: IterDataPipe[T_co], *, buffer_size: int = 10000, unbatch_level: int = 0)¶
Shuffles the input DataPipe with a buffer (functional name:
shuffle
). The buffer withbuffer_size
is filled with elements from the datapipe first. Then, each item will be yielded from the buffer by reservoir sampling via iterator.buffer_size
is required to be larger than0
. Forbuffer_size == 1
, the datapipe is not shuffled. In order to fully shuffle all elements from datapipe,buffer_size
is required to be greater than or equal to the size of datapipe.When it is used with
torch.utils.data.DataLoader
, the methods to set up random seed are different based onnum_workers
.For single-process mode (
num_workers == 0
), the random seed is set before theDataLoader
in the main process. For multi-process mode (num_worker > 0
), worker_init_fn is used to set up a random seed for each worker process.- Parameters:
datapipe – The IterDataPipe being shuffled
buffer_size – The buffer size for shuffling (default to
10000
)unbatch_level – Specifies if it is necessary to unbatch source data before applying the shuffle
Example
>>> # xdoctest: +SKIP >>> from torchdata.datapipes.iter import IterableWrapper >>> dp = IterableWrapper(range(10)) >>> shuffle_dp = dp.shuffle() >>> list(shuffle_dp) [0, 4, 1, 6, 3, 2, 9, 5, 7, 8]