Shortcuts

HuggingFaceHubReader

class torchdata.datapipes.iter.HuggingFaceHubReader(dataset: str, **config_kwargs)

Takes in dataset names and returns an Iterable HuggingFace dataset. Please refer to https://huggingface.co/docs/datasets/loading for the meaning and type of each argument. Contrary to their implementation, default behavior differs in the following (this will be changed in version 0.7):

  • split is set to "train"

  • revision is set to "main"

  • streaming is set to True

Parameters:

source_datapipe – a DataPipe that contains dataset names which will be accepted by the HuggingFace datasets library

Example:

huggingface_reader_dp = HuggingFaceHubReader("lhoestq/demo1", revision="main")
elem = next(iter(huggingface_reader_dp))
assert elem["package_name"] == "com.mantz_it.rfanalyzer"

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources