Shortcuts

torchaudio.datasets

All datasets are subclasses of torch.utils.data.Dataset i.e, they have __getitem__ and __len__ methods implemented. Hence, they can all be passed to a torch.utils.data.DataLoader which can load multiple samples parallelly using torch.multiprocessing workers. For example:

yesno_data = torchaudio.datasets.YESNO('.', download=True)
data_loader = torch.utils.data.DataLoader(yesno_data,
                                          batch_size=1,
                                          shuffle=True,
                                          num_workers=args.nThreads)

The following datasets are available:

All the datasets have almost similar API. They all have two common arguments: transform and target_transform to transform the input and target respectively.

CMUARCTIC

class torchaudio.datasets.CMUARCTIC(root: str, url: str = 'aew', folder_in_archive: str = 'ARCTIC', download: bool = False)[source]

Create a Dataset for CMU_arctic. Each item is a tuple of the form: waveform, sample_rate, utterance, utterance_id

COMMONVOICE

class torchaudio.datasets.COMMONVOICE(root: str, tsv: str = 'train.tsv', url: str = 'english', folder_in_archive: str = 'CommonVoice', version: str = 'cv-corpus-4-2019-12-10', download: bool = False)[source]

Create a Dataset for CommonVoice. Each item is a tuple of the form: (waveform, sample_rate, dictionary) where dictionary is a dictionary built from the tsv file with the following keys: client_id, path, sentence, up_votes, down_votes, age, gender, accent.

GTZAN

class torchaudio.datasets.GTZAN(root: str, url: str = 'http://opihi.cs.uvic.ca/sound/genres.tar.gz', folder_in_archive: str = 'genres', download: bool = False, subset: Any = None)[source]

Create a Dataset for GTZAN. Each item is a tuple of the form: waveform, sample_rate, label.

Please see http://marsyas.info/downloads/datasets.html if you are planning to use this dataset to publish results.

LIBRISPEECH

class torchaudio.datasets.LIBRISPEECH(root: str, url: str = 'train-clean-100', folder_in_archive: str = 'LibriSpeech', download: bool = False)[source]

Create a Dataset for LibriSpeech. Each item is a tuple of the form: waveform, sample_rate, utterance, speaker_id, chapter_id, utterance_id

LJSPEECH

class torchaudio.datasets.LJSPEECH(root: str, url: str = 'https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2', folder_in_archive: str = 'wavs', download: bool = False)[source]

Create a Dataset for LJSpeech-1.1. Each item is a tuple of the form: waveform, sample_rate, transcript, normalized_transcript

SPEECHCOMMANDS

class torchaudio.datasets.SPEECHCOMMANDS(root: str, url: str = 'speech_commands_v0.02', folder_in_archive: str = 'SpeechCommands', download: bool = False)[source]

Create a Dataset for Speech Commands. Each item is a tuple of the form: waveform, sample_rate, label, speaker_id, utterance_number

VCTK

class torchaudio.datasets.VCTK(root: str, url: str = 'http://homepages.inf.ed.ac.uk/jyamagis/release/VCTK-Corpus.tar.gz', folder_in_archive: str = 'VCTK-Corpus', download: bool = False, downsample: bool = False, transform: Any = None, target_transform: Any = None)[source]

Create a Dataset for VCTK. Each item is a tuple of the form: (waveform, sample_rate, utterance, speaker_id, utterance_id)

Folder p315 will be ignored due to the non-existent corresponding text files. For more information about the dataset visit: https://datashare.is.ed.ac.uk/handle/10283/3443

YESNO

class torchaudio.datasets.YESNO(root: str, url: str = 'http://www.openslr.org/resources/1/waves_yesno.tar.gz', folder_in_archive: str = 'waves_yesno', download: bool = False, transform: Any = None, target_transform: Any = None)[source]

Create a Dataset for YesNo. Each item is a tuple of the form: (waveform, sample_rate, labels)

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources