OnlineReader¶
- class torchdata.datapipes.iter.OnlineReader(source_datapipe: IterDataPipe[str], *, timeout: Optional[float] = None, skip_on_error: bool = False, **kwargs: Optional[Dict[str, Any]])¶
Takes file URLs (can be HTTP URLs pointing to files or URLs to GDrive files), and yields tuples of file URL and IO stream (functional name:
read_from_remote
).- Parameters:
source_datapipe – a DataPipe that contains URLs
timeout – timeout in seconds for HTTP request
skip_on_error – whether to skip over urls causing problems, otherwise an exception is raised
**kwargs – a Dictionary to pass optional arguments that requests takes. For the full list check out https://docs.python-requests.org/en/master/api/
Example:
from torchdata.datapipes.iter import IterableWrapper, OnlineReader file_url = "https://raw.githubusercontent.com/pytorch/data/main/LICENSE" online_reader_dp = OnlineReader(IterableWrapper([file_url])) reader_dp = online_reader_dp.readlines() it = iter(reader_dp) path, line = next(it) print((path, line))
Output:
('https://raw.githubusercontent.com/pytorch/data/main/LICENSE', b'BSD 3-Clause License')