Shortcuts

HashChecker

class torchdata.datapipes.iter.HashChecker(source_datapipe: IterDataPipe[Tuple[str, IOBase]], hash_dict: Dict[str, str], hash_type: str = 'sha256', rewind: bool = True)

Computes and checks the hash of each file, from an input DataPipe of tuples of file name and data/stream (functional name: check_hash). If the hashes match the given hash in the dictionary, it yields a tuple of file name and data/stream. Otherwise, it will raise an error.

Parameters:
  • source_datapipe – IterDataPipe with tuples of file name and data/stream

  • hash_dict – Dictionary that maps file names to their corresponding hashes

  • hash_type – The type of hash function to apply

  • rewind – Rewind the stream after using the stream to compute the hash (this does not work with non-seekable stream, e.g. HTTP)

Example

>>> from torchdata.datapipes.iter import IterableWrapper, FileOpener
>>> expected_MD5_hash = "bb9675028dd39d2dd2bf71002b93e66c"
File is from "https://raw.githubusercontent.com/pytorch/data/main/LICENSE"
>>> file_dp = FileOpener(IterableWrapper(["LICENSE.txt"]), mode='rb')
>>> # An exception is only raised when the hash doesn't match, otherwise (path, stream) is returned
>>> check_hash_dp = file_dp.check_hash({"LICENSE.txt": expected_MD5_hash}, "md5", rewind=True)
>>> reader_dp = check_hash_dp.readlines()
>>> it = iter(reader_dp)
>>> path, line = next(it)
>>> path
LICENSE.txt
>>> line
b'BSD 3-Clause License'

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources