Shortcuts

ShardExpander

class torchdata.datapipes.iter.ShardExpander(source_datapipe: IterDataPipe[str])

Expands incoming shard strings into shards.

Sharded data files are named using shell-like brace notation. For example, an ImageNet dataset sharded into 1200 shards and stored on a web server might be named imagenet-{000000..001199}.tar.

Note that shard names can be expanded without any server transactions; this makes shard_expand reproducible and storage system independent (unlike :class .FileLister etc.).

Parameters:

source_datapipe – a DataPipe yielding a stream of pairs

Returns:

a DataPipe yielding a stream of expanded pathnames.

Example

>>> from torchdata.datapipes.iter import IterableWrapper
>>> source_dp = IterableWrapper(["ds-{00..05}.tar"])
>>> expand_dp = source_dp.shard_expand()
>>> list(expand_dp)
['ds-00.tar', 'ds-01.tar', 'ds-02.tar', 'ds-03.tar', 'ds-04.tar', 'ds-05.tar']
>>> source_dp = IterableWrapper(["imgs_{00..05}.tar", "labels_{00..05}.tar"])
>>> expand_dp = source_dp.shard_expand()
>>> list(expand_dp)
['imgs_00.tar', 'imgs_01.tar', 'imgs_02.tar', 'labels_00.tar', 'labels_01.tar', 'labels_02.tar']

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources