Shortcuts

Attention

June 2024 Status Update: Removing DataPipes and DataLoader V2

We are re-focusing the torchdata repo to be an iterative enhancement of torch.utils.data.DataLoader. We do not plan on continuing development or maintaining the [DataPipes] and [DataLoaderV2] solutions, and they will be removed from the torchdata repo. We’ll also be revisiting the DataPipes references in pytorch/pytorch. In release torchdata==0.8.0 (July 2024) they will be marked as deprecated, and in 0.9.0 (Oct 2024) they will be deleted. Existing users are advised to pin to torchdata==0.8.0 or an older version until they are able to migrate away. Subsequent releases will not include DataPipes or DataLoaderV2. Please reach out if you suggestions or comments (please use this issue for feedback)

SequentialReadingService

class torchdata.dataloader2.SequentialReadingService(*reading_services)
checkpoint() bytes

ReadingService serializes the internal states. Called in DataLoader2.state_dict.

finalize() None

ReadingService cleans up internal states and fully shuts down the service. Called in DataLoader2’s shutdown and __del__.

finalize_iteration() None

ReadingService ends service after an epoch is finished. Called when the iterator of DataLoader2 is depleted.

initialize(datapipe: Union[IterDataPipe, MapDataPipe]) Union[IterDataPipe, MapDataPipe]

ReadingService takes a DataPipe graph, adapts it into a new DataPipe graph based on the custom need. Called once in creating DataLoader2 iterator at first time. Prior to calling this method, the ReadingService object must be picklable.

Parameters:

datapipe – Original DataPipe graph.

Returns:

An adapted or a new DataPipe graph.

initialize_iteration(seed_generator: SeedGenerator, iter_reset_fn: Optional[Callable[[Union[IterDataPipe, MapDataPipe]], Union[IterDataPipe, MapDataPipe]]] = None) Optional[Callable[[Union[IterDataPipe, MapDataPipe]], Union[IterDataPipe, MapDataPipe]]]

ReadingService spins up service for an epoch. Called at the beginning of every time getting DataLoader2 iterator.

Parameters:
  • seed_generator – SeedGenerator object created and managed by DataLoader2. As the single source of randomness, it will govern the determinism for all of random operations with the graph of DataPipes.

  • iter_reset_fn – Optional reset function from the prior ReadingServcie when SequentialReadingService chains multiple ReadingServices

Returns:

A new iter_reset_fn to be used by subseqeuent ReadingService

Example

MultiProcessingReadingService starts setting worker seeds per process and prefetching items from the graph.

restore(datapipe, serialized_state: bytes) Union[IterDataPipe, MapDataPipe]

ReadingService adapts DataPipe graph based on the serialized state. Called once in creating DataLoader2 iterator at first time. Counterpart of initialize, which adapt DataPipe graph from scratch.

Parameters:
  • datapipe – original DataPipe graph before adapted by ReadingService

  • serialized_state – The serialized state of internal state used to restore the state of the adapted DataPipe graph.

Returns:

Adapted DataPipe generated from the serialized state.

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources