torchaudio.datasets

All datasets are subclasses of torch.utils.data.Dataset and have __getitem__ and __len__ methods implemented.

Hence, they can all be passed to a torch.utils.data.DataLoader which can load multiple samples parallelly using torch.multiprocessing workers. For example:

yesno_data = torchaudio.datasets.YESNO('.', download=True)
data_loader = torch.utils.data.DataLoader(
    yesno_data,
    batch_size=1,
    shuffle=True,
    num_workers=args.nThreads)

`CMUARCTIC`	CMU ARCTIC [Kominek et al., 2003] dataset.
`CMUDict`	CMU Pronouncing Dictionary [Weide, 1998] (CMUDict) dataset.
`COMMONVOICE`	CommonVoice [Ardila et al., 2020] dataset.
`DR_VCTK`	Device Recorded VCTK (Small subset version) [Sarfjoo and Yamagishi, 2018] dataset.
`FluentSpeechCommands`	Fluent Speech Commands [Lugosch et al., 2019] dataset
`GTZAN`	GTZAN [Tzanetakis et al., 2001] dataset.
`IEMOCAP`	IEMOCAP [Busso et al., 2008] dataset.
`LibriMix`	LibriMix [Cosentino et al., 2020] dataset.
`LIBRISPEECH`	LibriSpeech [Panayotov et al., 2015] dataset.
`LibriLightLimited`	Subset of Libri-light [Kahn et al., 2020] dataset, which was used in HuBERT [Hsu et al., 2021] for supervised fine-tuning.
`LIBRITTS`	LibriTTS [Zen et al., 2019] dataset.
`LJSPEECH`	LJSpeech-1.1 [Ito and Johnson, 2017] dataset.
`MUSDB_HQ`	MUSDB_HQ [Rafii et al., 2019] dataset.
`QUESST14`	QUESST14 [Miro et al., 2015] dataset.
`Snips`	Snips [Coucke et al., 2018] dataset.
`SPEECHCOMMANDS`	Speech Commands [Warden, 2018] dataset.
`TEDLIUM`	Tedlium [Rousseau et al., 2012] dataset (releases 1,2 and 3).
`VCTK_092`	VCTK 0.92 [Yamagishi et al., 2019] dataset
`VoxCeleb1Identification`	VoxCeleb1 [Nagrani et al., 2017] dataset for speaker identification task.
`VoxCeleb1Verification`	VoxCeleb1 [Nagrani et al., 2017] dataset for speaker verification task.
`YESNO`	YesNo [YesNo, n.d.] dataset.

torchaudio.datasets

Docs

Tutorials

Resources