HuggingFaceHubReader¶
- class torchdata.datapipes.iter.HuggingFaceHubReader(dataset: str, **config_kwargs)¶
Takes in dataset names and returns an Iterable HuggingFace dataset. Please refer to https://huggingface.co/docs/datasets/loading for the meaning and type of each argument. Contrary to their implementation, default behavior differs in the following (this will be changed in version 0.7):
split
is set to"train"
revision
is set to"main"
streaming
is set toTrue
- Parameters:
source_datapipe – a DataPipe that contains dataset names which will be accepted by the HuggingFace datasets library
Example:
huggingface_reader_dp = HuggingFaceHubReader("lhoestq/demo1", revision="main") elem = next(iter(huggingface_reader_dp)) assert elem["package_name"] == "com.mantz_it.rfanalyzer"