EndOnDiskCacheHolder¶
- class torchdata.datapipes.iter.EndOnDiskCacheHolder(datapipe, mode='wb', filepath_fn=None, *, same_filepath_fn=False, skip_read=False, timeout=300)¶
Indicates when the result of prior DataPipe will be saved local files specified by
filepath_fn
(functional name:end_caching
). Moreover, the result of source DataPipe is required to be a tuple of metadata and data, or a tuple of metadata and file handle.- Parameters:
datapipe – IterDataPipe with at least one
OnDiskCacheHolder
in the graph.mode – Mode in which the cached files are opened to write the data on disk. This is needed to be aligned with the type of data or file handle from
datapipe
."wb"
is used by default.filepath_fn – Optional function to extract filepath from the metadata from
datapipe
. By default, it would directly use the metadata as file path.same_filepath_fn – Set to
True
to use samefilepath_fn
from theOnDiskCacheHolder
.skip_read – Boolean value to skip reading the file handle from
datapipe
. By default, reading is enabled and reading function is created based on themode
.timeout – Integer value of seconds to wait for uncached item to be written to disk
Example
>>> from torchdata.datapipes.iter import IterableWrapper, HttpReader >>> url = IterableWrapper(["https://path/to/filename", ]) >>> def _filepath_fn(url): >>> temp_dir = tempfile.gettempdir() >>> return os.path.join(temp_dir, os.path.basename(url)) >>> hash_dict = {"expected_filepath": expected_MD5_hash} >>> # You must call ``.on_disk_cache`` at some point before ``.end_caching`` >>> cache_dp = url.on_disk_cache(filepath_fn=_filepath_fn, hash_dict=_hash_dict, hash_type="md5") >>> # You must call ``.end_caching`` at a later point to stop tracing and save the results to local files. >>> cache_dp = HttpReader(cache_dp).end_caching(mode="wb", filepath_fn=_filepath_fn)