class torchdata.datapipes.iter.OnDiskCacheHolder(source_datapipe: IterDataPipe, filepath_fn: Optional[Callable] = None, hash_dict: Optional[Dict[str, str]] = None, hash_type: str = 'sha256', extra_check_fn: Optional[Callable[[str], bool]] = None)

Caches the outputs of multiple DataPipe operations to local files, which are typically performance bottleneck such download, decompress, and etc (functional name: on_disk_cache).

Must use .end_caching() to stop tracing the sequence of DataPipe operations and save the results to local files.

  • source_datapipe – IterDataPipe

  • filepath_fn – Given data from source_datapipe, returns file path(s) on local file system.

  • hash_dict – A Dictionary mapping file names to their corresponding hashes. If hash_dict is specified, the extra hash check will be attached before saving data to local file system. If the data doesn’t meet the hash, the pipeline will raise an Error.

  • hash_type – The type of hash function to apply

  • extra_check_fn – Optional function to carry out extra validation on the given file path from filepath_fn.


>>> from torchdata.datapipes.iter import IterableWrapper, HttpReader
>>> url = IterableWrapper(["https://path/to/filename", ])
>>> def _filepath_fn(url):
>>>     temp_dir = tempfile.gettempdir()
>>>     return os.path.join(temp_dir, os.path.basename(url))
>>> hash_dict = {"expected_filepath": expected_MD5_hash}
>>> cache_dp = url.on_disk_cache(filepath_fn=_filepath_fn, hash_dict=_hash_dict, hash_type="md5")
>>> # You must call ``.end_caching`` at a later point to stop tracing and save the results to local files.
>>> cache_dp = HttpReader(cache_dp).end_caching(mode="wb", filepath_fn=_filepath_fn)


