AISFileLister¶
- class torchdata.datapipes.iter.AISFileLister(source_datapipe: IterDataPipe[str], url: str, length: int = -1)¶
Iterable Datapipe that lists files from the AIStore backends with the given URL prefixes (functional name:
list_files_by_ais
). Acceptable prefixes include but not limited to - ais://bucket-name, ais://bucket-name/Note
This function also supports files from multiple backends (aws://.., gcp://.., azure://.., etc)
Input must be a list and direct URLs are not supported.
- length is -1 by default, all calls to len() are invalid as
not all items are iterated at the start.
This internally uses AIStore Python SDK.
- Parameters:
source_datapipe (IterDataPipe[str]) – a DataPipe that contains URLs/URL prefixes to objects on AIS
url (str) – AIStore endpoint
length (int) – length of the datapipe
Example
>>> from torchdata.datapipes.iter import IterableWrapper, AISFileLister >>> ais_prefixes = IterableWrapper(['gcp://bucket-name/folder/', 'aws:bucket-name/folder/', 'ais://bucket-name/folder/', ...]) >>> dp_ais_urls = AISFileLister(url='localhost:8080', source_datapipe=ais_prefixes) >>> for url in dp_ais_urls: ... pass >>> # Functional API >>> dp_ais_urls = ais_prefixes.list_files_by_ais(url='localhost:8080') >>> for url in dp_ais_urls: ... pass