Shortcuts

torchaudio.datasets

All datasets are subclasses of torch.utils.data.Dataset and have __getitem__ and __len__ methods implemented. Hence, they can all be passed to a torch.utils.data.DataLoader which can load multiple samples parallelly using torch.multiprocessing workers. For example:

yesno_data = torchaudio.datasets.YESNO('.', download=True)
data_loader = torch.utils.data.DataLoader(yesno_data,
                                          batch_size=1,
                                          shuffle=True,
                                          num_workers=args.nThreads)

The following datasets are available:

All the datasets have almost similar API. They all have two common arguments: transform and target_transform to transform the input and target respectively.

CMUARCTIC

class torchaudio.datasets.CMUARCTIC(root: Union[str, pathlib.Path], url: str = 'aew', folder_in_archive: str = 'ARCTIC', download: bool = False)[source]

Create a Dataset for CMU_ARCTIC.

Parameters
  • root (str or Path) – Path to the directory where the dataset is found or downloaded.

  • url (str, optional) – The URL to download the dataset from or the type of the dataset to dowload. (default: "aew") Allowed type values are "aew", "ahw", "aup", "awb", "axb", "bdl", "clb", "eey", "fem", "gka", "jmk", "ksp", "ljm", "lnh", "rms", "rxr", "slp" or "slt".

  • folder_in_archive (str, optional) – The top-level directory of the dataset. (default: "ARCTIC")

  • download (bool, optional) – Whether to download the dataset if it is not found at root path. (default: False).

__getitem__(n: int)Tuple[torch.Tensor, int, str, str][source]

Load the n-th sample from the dataset.

Parameters

n (int) – The index of the sample to be loaded

Returns

(waveform, sample_rate, transcript, utterance_id)

Return type

(Tensor, int, str, str)

CMUDict

class torchaudio.datasets.CMUDict(root: Union[str, pathlib.Path], exclude_punctuations: bool = True, *, download: bool = False, url: str = 'http://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict/cmudict-0.7b', url_symbols: str = 'http://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict/cmudict-0.7b.symbols')[source]

Create a Dataset for CMU Pronouncing Dictionary (CMUDict).

Parameters
  • root (str or Path) – Path to the directory where the dataset is found or downloaded.

  • exclude_punctuations (bool, optional) – When enabled, exclude the pronounciation of punctuations, such as !EXCLAMATION-POINT and #HASH-MARK.

  • download (bool, optional) – Whether to download the dataset if it is not found at root path. (default: False).

  • url (str, optional) – The URL to download the dictionary from. (default: "http://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict/cmudict-0.7b")

  • url_symbols (str, optional) – The URL to download the list of symbols from. (default: "http://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict/cmudict-0.7b.symbols")

__getitem__(n: int)Tuple[str, List[str]][source]

Load the n-th sample from the dataset.

Parameters

n (int) – The index of the sample to be loaded.

Returns

The corresponding word and phonemes (word, [phonemes]).

Return type

(str, List[str])

property symbols

A list of phonemes symbols, such as AA, AE, AH.

Type

list[str]

COMMONVOICE

class torchaudio.datasets.COMMONVOICE(root: Union[str, pathlib.Path], tsv: str = 'train.tsv')[source]

Create a Dataset for CommonVoice.

Parameters
  • root (str or Path) – Path to the directory where the dataset is located. (Where the tsv file is present.)

  • tsv (str, optional) – The name of the tsv file used to construct the metadata, such as "train.tsv", "test.tsv", "dev.tsv", "invalidated.tsv", "validated.tsv" and "other.tsv". (default: "train.tsv")

__getitem__(n: int)Tuple[torch.Tensor, int, Dict[str, str]][source]

Load the n-th sample from the dataset.

Parameters

n (int) – The index of the sample to be loaded

Returns

(waveform, sample_rate, dictionary), where dictionary is built from the TSV file with the following keys: client_id, path, sentence, up_votes, down_votes, age, gender and accent.

Return type

(Tensor, int, Dict[str, str])

GTZAN

class torchaudio.datasets.GTZAN(root: Union[str, pathlib.Path], url: str = 'http://opihi.cs.uvic.ca/sound/genres.tar.gz', folder_in_archive: str = 'genres', download: bool = False, subset: Optional[str] = None)[source]

Create a Dataset for GTZAN.

Note

Please see http://marsyas.info/downloads/datasets.html if you are planning to use this dataset to publish results.

Parameters
  • root (str or Path) – Path to the directory where the dataset is found or downloaded.

  • url (str, optional) – The URL to download the dataset from. (default: "http://opihi.cs.uvic.ca/sound/genres.tar.gz")

  • folder_in_archive (str, optional) – The top-level directory of the dataset.

  • download (bool, optional) – Whether to download the dataset if it is not found at root path. (default: False).

  • subset (str or None, optional) – Which subset of the dataset to use. One of "training", "validation", "testing" or None. If None, the entire dataset is used. (default: None).

__getitem__(n: int)Tuple[torch.Tensor, int, str][source]

Load the n-th sample from the dataset.

Parameters

n (int) – The index of the sample to be loaded

Returns

(waveform, sample_rate, label)

Return type

(Tensor, int, str)

LIBRISPEECH

class torchaudio.datasets.LIBRISPEECH(root: Union[str, pathlib.Path], url: str = 'train-clean-100', folder_in_archive: str = 'LibriSpeech', download: bool = False)[source]

Create a Dataset for LibriSpeech.

Parameters
  • root (str or Path) – Path to the directory where the dataset is found or downloaded.

  • url (str, optional) – The URL to download the dataset from, or the type of the dataset to dowload. Allowed type values are "dev-clean", "dev-other", "test-clean", "test-other", "train-clean-100", "train-clean-360" and "train-other-500". (default: "train-clean-100")

  • folder_in_archive (str, optional) – The top-level directory of the dataset. (default: "LibriSpeech")

  • download (bool, optional) – Whether to download the dataset if it is not found at root path. (default: False).

__getitem__(n: int)Tuple[torch.Tensor, int, str, int, int, int][source]

Load the n-th sample from the dataset.

Parameters

n (int) – The index of the sample to be loaded

Returns

(waveform, sample_rate, transcript, speaker_id, chapter_id, utterance_id)

Return type

(Tensor, int, str, int, int, int)

LIBRITTS

class torchaudio.datasets.LIBRITTS(root: Union[str, pathlib.Path], url: str = 'train-clean-100', folder_in_archive: str = 'LibriTTS', download: bool = False)[source]

Create a Dataset for LibriTTS.

Parameters
  • root (str or Path) – Path to the directory where the dataset is found or downloaded.

  • url (str, optional) – The URL to download the dataset from, or the type of the dataset to dowload. Allowed type values are "dev-clean", "dev-other", "test-clean", "test-other", "train-clean-100", "train-clean-360" and "train-other-500". (default: "train-clean-100")

  • folder_in_archive (str, optional) – The top-level directory of the dataset. (default: "LibriTTS")

  • download (bool, optional) – Whether to download the dataset if it is not found at root path. (default: False).

__getitem__(n: int)Tuple[torch.Tensor, int, str, str, int, int, str][source]

Load the n-th sample from the dataset.

Parameters

n (int) – The index of the sample to be loaded

Returns

(waveform, sample_rate, original_text, normalized_text, speaker_id, chapter_id, utterance_id)

Return type

(Tensor, int, str, str, str, int, int, str)

LJSPEECH

class torchaudio.datasets.LJSPEECH(root: Union[str, pathlib.Path], url: str = 'https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2', folder_in_archive: str = 'wavs', download: bool = False)[source]

Create a Dataset for LJSpeech-1.1.

Parameters
  • root (str or Path) – Path to the directory where the dataset is found or downloaded.

  • url (str, optional) – The URL to download the dataset from. (default: "https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2")

  • folder_in_archive (str, optional) – The top-level directory of the dataset. (default: "wavs")

  • download (bool, optional) – Whether to download the dataset if it is not found at root path. (default: False).

__getitem__(n: int)Tuple[torch.Tensor, int, str, str][source]

Load the n-th sample from the dataset.

Parameters

n (int) – The index of the sample to be loaded

Returns

(waveform, sample_rate, transcript, normalized_transcript)

Return type

(Tensor, int, str, str)

SPEECHCOMMANDS

class torchaudio.datasets.SPEECHCOMMANDS(root: Union[str, pathlib.Path], url: str = 'speech_commands_v0.02', folder_in_archive: str = 'SpeechCommands', download: bool = False, subset: Optional[str] = None)[source]

Create a Dataset for Speech Commands.

Parameters
  • root (str or Path) – Path to the directory where the dataset is found or downloaded.

  • url (str, optional) – The URL to download the dataset from, or the type of the dataset to dowload. Allowed type values are "speech_commands_v0.01" and "speech_commands_v0.02" (default: "speech_commands_v0.02")

  • folder_in_archive (str, optional) – The top-level directory of the dataset. (default: "SpeechCommands")

  • download (bool, optional) – Whether to download the dataset if it is not found at root path. (default: False).

  • subset (str or None, optional) – Select a subset of the dataset [None, “training”, “validation”, “testing”]. None means the whole dataset. “validation” and “testing” are defined in “validation_list.txt” and “testing_list.txt”, respectively, and “training” is the rest. Details for the files “validation_list.txt” and “testing_list.txt” are explained in the README of the dataset and in the introduction of Section 7 of the original paper and its reference 12. The original paper can be found here. (Default: None)

__getitem__(n: int)Tuple[torch.Tensor, int, str, str, int][source]

Load the n-th sample from the dataset.

Parameters

n (int) – The index of the sample to be loaded

Returns

(waveform, sample_rate, label, speaker_id, utterance_number)

Return type

(Tensor, int, str, str, int)

TEDLIUM

class torchaudio.datasets.TEDLIUM(root: Union[str, pathlib.Path], release: str = 'release1', subset: Optional[str] = None, download: bool = False, audio_ext: str = '.sph')[source]

Create a Dataset for Tedlium. It supports releases 1,2 and 3.

Parameters
  • root (str or Path) – Path to the directory where the dataset is found or downloaded.

  • release (str, optional) – Release version. Allowed values are "release1", "release2" or "release3". (default: "release1").

  • subset (str, optional) – The subset of dataset to use. Valid options are "train", "dev", and "test" for releases 1&2, None for release3. Defaults to "train" or None.

  • download (bool, optional) – Whether to download the dataset if it is not found at root path. (default: False).

  • audio_ext (str, optional) – extension for audio file (default: "audio_ext")

__getitem__(n: int)Tuple[torch.Tensor, int, str, int, int, int][source]

Load the n-th sample from the dataset.

Parameters

n (int) – The index of the sample to be loaded

Returns

(waveform, sample_rate, transcript, talk_id, speaker_id, identifier)

Return type

tuple

property phoneme_dict

Phonemes. Mapping from word to tuple of phonemes. Note that some words have empty phonemes.

Type

dict[str, tuple[str]]

VCTK_092

class torchaudio.datasets.VCTK_092(root: str, mic_id: str = 'mic2', download: bool = False, url: str = 'https://datashare.is.ed.ac.uk/bitstream/handle/10283/3443/VCTK-Corpus-0.92.zip', audio_ext='.flac')[source]

Create VCTK 0.92 Dataset

Parameters
  • root (str) – Root directory where the dataset’s top level directory is found.

  • mic_id (str, optional) – Microphone ID. Either "mic1" or "mic2". (default: "mic2")

  • download (bool, optional) – Whether to download the dataset if it is not found at root path. (default: False).

  • url (str, optional) – The URL to download the dataset from. (default: "https://datashare.is.ed.ac.uk/bitstream/handle/10283/3443/VCTK-Corpus-0.92.zip")

  • audio_ext (str, optional) – Custom audio extension if dataset is converted to non-default audio format.

Note

  • All the speeches from speaker p315 will be skipped due to the lack of the corresponding text files.

  • All the speeches from p280 will be skipped for mic_id="mic2" due to the lack of the audio files.

  • Some of the speeches from speaker p362 will be skipped due to the lack of the audio files.

  • See Also: https://datashare.is.ed.ac.uk/handle/10283/3443

__getitem__(n: int)Tuple[torch.Tensor, int, str, str, str][source]

Load the n-th sample from the dataset.

Parameters

n (int) – The index of the sample to be loaded

Returns

(waveform, sample_rate, transcript, speaker_id, utterance_id)

Return type

(Tensor, int, str, str, str)

DR_VCTK

class torchaudio.datasets.DR_VCTK(root: Union[str, pathlib.Path], subset: str = 'train', *, download: bool = False, url: str = 'https://datashare.ed.ac.uk/bitstream/handle/10283/3038/DR-VCTK.zip')[source]

Create a dataset for Device Recorded VCTK (Small subset version).

Parameters
  • root (str or Path) – Root directory where the dataset’s top level directory is found.

  • subset (str) – The subset to use. Can be one of "train" and "test". (default: "train").

  • download (bool) – Whether to download the dataset if it is not found at root path. (default: False).

  • url (str) – The URL to download the dataset from. (default: "https://datashare.ed.ac.uk/bitstream/handle/10283/3038/DR-VCTK.zip")

__getitem__(n: int)Tuple[torch.Tensor, int, torch.Tensor, int, str, str, str, int][source]

Load the n-th sample from the dataset.

Parameters

n (int) – The index of the sample to be loaded

Returns

(waveform_clean, sample_rate_clean, waveform_noisy, sample_rate_noisy, speaker_id,                utterance_id, source, channel_id)

Return type

(Tensor, int, Tensor, int, str, str, str, int)

YESNO

class torchaudio.datasets.YESNO(root: Union[str, pathlib.Path], url: str = 'http://www.openslr.org/resources/1/waves_yesno.tar.gz', folder_in_archive: str = 'waves_yesno', download: bool = False)[source]

Create a Dataset for YesNo.

Parameters
  • root (str or Path) – Path to the directory where the dataset is found or downloaded.

  • url (str, optional) – The URL to download the dataset from. (default: "http://www.openslr.org/resources/1/waves_yesno.tar.gz")

  • folder_in_archive (str, optional) – The top-level directory of the dataset. (default: "waves_yesno")

  • download (bool, optional) – Whether to download the dataset if it is not found at root path. (default: False).

__getitem__(n: int)Tuple[torch.Tensor, int, List[int]][source]

Load the n-th sample from the dataset.

Parameters

n (int) – The index of the sample to be loaded

Returns

(waveform, sample_rate, labels)

Return type

(Tensor, int, List[int])

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources