Iterable-style DataPipes ========================== .. currentmodule:: torchdata.datapipes.iter An iterable-style dataset is an instance of a subclass of IterableDataset that implements the ``__iter__()`` protocol, and represents an iterable over data samples. This type of datasets is particularly suitable for cases where random reads are expensive or even improbable, and where the batch size depends on the fetched data. For example, such a dataset, when called ``iter(iterdatapipe)``, could return a stream of data reading from a database, a remote server, or even logs generated in real time. This is an updated version of ``IterableDataset`` in ``torch``. .. autoclass:: IterDataPipe We have different types of Iterable DataPipes: 1. Archive - open and decompress archive files of different formats. 2. Augmenting - augment your samples (e.g. adding index, or cycle through indefinitely). 3. Combinatorial - perform combinatorial operations (e.g. sampling, shuffling). 4. Combining/Splitting - interact with multiple DataPipes by combining them or splitting one to many. 5. Grouping - group samples within a DataPipe 6. IO - interacting with the file systems or remote server (e.g. downloading, opening, saving files, and listing the files in directories). 7. Mapping - apply the a given function to each element in the DataPipe. 8. Others - perform miscellaneous set of operations. 9. Selecting - select specific samples within a DataPipe. 10. Text - parse, read, and transform text files and data Archive DataPipes ------------------------- These DataPipes help opening and decompressing archive files of different formats. .. autosummary:: :nosignatures: :toctree: generated/ :template: class_template.rst Bz2FileLoader Decompressor RarArchiveLoader TarArchiveLoader TFRecordLoader WebDataset XzFileLoader ZipArchiveLoader Augmenting DataPipes ----------------------------- These DataPipes help to augment your samples. .. autosummary:: :nosignatures: :toctree: generated/ :template: class_template.rst Cycler Enumerator IndexAdder Repeater Combinatorial DataPipes ----------------------------- These DataPipes help to perform combinatorial operations. .. autosummary:: :nosignatures: :toctree: generated/ :template: class_template.rst InBatchShuffler Sampler Shuffler Combining/Splitting DataPipes ----------------------------- These tend to involve multiple DataPipes, combining them or splitting one to many. .. autosummary:: :nosignatures: :toctree: generated/ :template: class_template.rst Concater Demultiplexer Forker IterKeyZipper MapKeyZipper Multiplexer MultiplexerLongest RoundRobinDemultiplexer SampleMultiplexer UnZipper Zipper ZipperLongest Grouping DataPipes ----------------------------- These DataPipes have you group samples within a DataPipe. .. autosummary:: :nosignatures: :toctree: generated/ :template: class_template.rst Batcher BucketBatcher Collator Grouper MaxTokenBucketizer UnBatcher IO DataPipes ------------------------- These DataPipes help interacting with the file systems or remote server (e.g. downloading, opening, saving files, and listing the files in directories). .. autosummary:: :nosignatures: :toctree: generated/ :template: class_template.rst AISFileLister AISFileLoader FSSpecFileLister FSSpecFileOpener FSSpecSaver FileLister FileOpener GDriveReader HttpReader HuggingFaceHubReader IoPathFileLister IoPathFileOpener IoPathSaver OnlineReader ParquetDataFrameLoader S3FileLister S3FileLoader Saver Mapping DataPipes ------------------------- These DataPipes apply the a given function to each element in the DataPipe. .. autosummary:: :nosignatures: :toctree: generated/ :template: class_template.rst BatchAsyncMapper BatchMapper FlatMapper Mapper ShuffledFlatMapper ThreadPoolMapper Other DataPipes ------------------------- A miscellaneous set of DataPipes with different functionalities. .. autosummary:: :nosignatures: :toctree: generated/ :template: class_template.rst DataFrameMaker EndOnDiskCacheHolder FullSync HashChecker InMemoryCacheHolder IterableWrapper LengthSetter MapToIterConverter OnDiskCacheHolder PinMemory Prefetcher RandomSplitter ShardExpander ShardingFilter ShardingRoundRobinDispatcher Selecting DataPipes ------------------------- These DataPipes helps you select specific samples within a DataPipe. .. autosummary:: :nosignatures: :toctree: generated/ :template: class_template.rst Filter Header Dropper Slicer Flattener Text DataPipes ----------------------------- These DataPipes help you parse, read, and transform text files and data. .. autosummary:: :nosignatures: :toctree: generated/ :template: class_template.rst CSVDictParser CSVParser JsonParser LineReader ParagraphAggregator RoutedDecoder Rows2Columnar StreamReader