HashChecker¶
- class torchdata.datapipes.iter.HashChecker(source_datapipe: IterDataPipe[Tuple[str, IOBase]], hash_dict: Dict[str, str], hash_type: str = 'sha256', rewind: bool = True)¶
Computes and checks the hash of each file, from an input DataPipe of tuples of file name and data/stream (functional name:
check_hash
). If the hashes match the given hash in the dictionary, it yields a tuple of file name and data/stream. Otherwise, it will raise an error.- Parameters:
source_datapipe – IterDataPipe with tuples of file name and data/stream
hash_dict – Dictionary that maps file names to their corresponding hashes
hash_type – The type of hash function to apply
rewind – Rewind the stream after using the stream to compute the hash (this does not work with non-seekable stream, e.g. HTTP)
Example
>>> from torchdata.datapipes.iter import IterableWrapper, FileOpener >>> expected_MD5_hash = "bb9675028dd39d2dd2bf71002b93e66c" File is from "https://raw.githubusercontent.com/pytorch/data/main/LICENSE" >>> file_dp = FileOpener(IterableWrapper(["LICENSE.txt"]), mode='rb') >>> # An exception is only raised when the hash doesn't match, otherwise (path, stream) is returned >>> check_hash_dp = file_dp.check_hash({"LICENSE.txt": expected_MD5_hash}, "md5", rewind=True) >>> reader_dp = check_hash_dp.readlines() >>> it = iter(reader_dp) >>> path, line = next(it) >>> path LICENSE.txt >>> line b'BSD 3-Clause License'