torchaudio.datasets¶
All datasets are subclasses of torch.utils.data.Dataset
i.e, they have __getitem__
and __len__
methods implemented.
Hence, they can all be passed to a torch.utils.data.DataLoader
which can load multiple samples parallelly using torch.multiprocessing
workers.
For example:
yesno_data = torchaudio.datasets.YESNO('.', download=True)
data_loader = torch.utils.data.DataLoader(yesno_data,
batch_size=1,
shuffle=True,
num_workers=args.nThreads)
The following datasets are available:
Datasets
All the datasets have almost similar API. They all have two common arguments:
transform
and target_transform
to transform the input and target respectively.
CMUARCTIC¶
-
class
torchaudio.datasets.
CMUARCTIC
(root: str, url: str = 'aew', folder_in_archive: str = 'ARCTIC', download: bool = False)[source]¶ Create a Dataset for CMU_ARCTIC.
- Parameters
root (str) – Path to the directory where the dataset is found or downloaded.
url (str, optional) – The URL to download the dataset from or the type of the dataset to dowload. (default:
"aew"
) Allowed type values are"aew"
,"ahw"
,"aup"
,"awb"
,"axb"
,"bdl"
,"clb"
,"eey"
,"fem"
,"gka"
,"jmk"
,"ksp"
,"ljm"
,"lnh"
,"rms"
,"rxr"
,"slp"
or"slt"
.folder_in_archive (str, optional) – The top-level directory of the dataset. (default:
"ARCTIC"
)download (bool, optional) – Whether to download the dataset if it is not found at root path. (default:
False
).
COMMONVOICE¶
-
class
torchaudio.datasets.
COMMONVOICE
(root: str, tsv: str = 'train.tsv', url: str = 'english', folder_in_archive: str = 'CommonVoice', version: str = 'cv-corpus-4-2019-12-10', download: bool = False)[source]¶ Create a Dataset for CommonVoice.
- Parameters
root (str) – Path to the directory where the dataset is found or downloaded.
tsv (str, optional) – The name of the tsv file used to construct the metadata. (default:
"train.tsv"
)url (str, optional) – The URL to download the dataset from, or the language of the dataset to download. (default:
"english"
). Allowed language values are"tatar"
,"english"
,"german"
,"french"
,"welsh"
,"breton"
,"chuvash"
,"turkish"
,"kyrgyz"
,"irish"
,"kabyle"
,"catalan"
,"taiwanese"
,"slovenian"
,"italian"
,"dutch"
,"hakha chin"
,"esperanto"
,"estonian"
,"persian"
,"portuguese"
,"basque"
,"spanish"
,"chinese"
,"mongolian"
,"sakha"
,"dhivehi"
,"kinyarwanda"
,"swedish"
,"russian"
,"indonesian"
,"arabic"
,"tamil"
,"interlingua"
,"latvian"
,"japanese"
,"votic"
,"abkhaz"
,"cantonese"
and"romansh sursilvan"
.folder_in_archive (str, optional) – The top-level directory of the dataset.
version (str) – Version string. (default:
"cv-corpus-4-2019-12-10"
) For the other allowed values, Please checkout https://commonvoice.mozilla.org/en/datasets.download (bool, optional) – Whether to download the dataset if it is not found at root path. (default:
False
).
GTZAN¶
-
class
torchaudio.datasets.
GTZAN
(root: str, url: str = 'http://opihi.cs.uvic.ca/sound/genres.tar.gz', folder_in_archive: str = 'genres', download: bool = False, subset: Optional[str] = None)[source]¶ Create a Dataset for GTZAN.
Note
Please see http://marsyas.info/downloads/datasets.html if you are planning to use this dataset to publish results.
- Parameters
root (str) – Path to the directory where the dataset is found or downloaded.
url (str, optional) – The URL to download the dataset from. (default:
"http://opihi.cs.uvic.ca/sound/genres.tar.gz"
)folder_in_archive (str, optional) – The top-level directory of the dataset.
download (bool, optional) – Whether to download the dataset if it is not found at root path. (default:
False
).subset (str, optional) – Which subset of the dataset to use. One of
"training"
,"validation"
,"testing"
orNone
. IfNone
, the entire dataset is used. (default:None
).
LIBRISPEECH¶
-
class
torchaudio.datasets.
LIBRISPEECH
(root: str, url: str = 'train-clean-100', folder_in_archive: str = 'LibriSpeech', download: bool = False)[source]¶ Create a Dataset for LibriSpeech.
- Parameters
root (str) – Path to the directory where the dataset is found or downloaded.
url (str, optional) – The URL to download the dataset from, or the type of the dataset to dowload. Allowed type values are
"dev-clean"
,"dev-other"
,"test-clean"
,"test-other"
,"train-clean-100"
,"train-clean-360"
and"train-other-500"
. (default:"train-clean-100"
)folder_in_archive (str, optional) – The top-level directory of the dataset. (default:
"LibriSpeech"
)download (bool, optional) – Whether to download the dataset if it is not found at root path. (default:
False
).
LIBRITTS¶
-
class
torchaudio.datasets.
LIBRITTS
(root: str, url: str = 'train-clean-100', folder_in_archive: str = 'LibriTTS', download: bool = False)[source]¶ Create a Dataset for LibriTTS.
- Parameters
root (str) – Path to the directory where the dataset is found or downloaded.
url (str, optional) – The URL to download the dataset from, or the type of the dataset to dowload. Allowed type values are
"dev-clean"
,"dev-other"
,"test-clean"
,"test-other"
,"train-clean-100"
,"train-clean-360"
and"train-other-500"
. (default:"train-clean-100"
)folder_in_archive (str, optional) – The top-level directory of the dataset. (default:
"LibriTTS"
)download (bool, optional) – Whether to download the dataset if it is not found at root path. (default:
False
).
LJSPEECH¶
-
class
torchaudio.datasets.
LJSPEECH
(root: str, url: str = 'https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2', folder_in_archive: str = 'wavs', download: bool = False)[source]¶ Create a Dataset for LJSpeech-1.1.
- Parameters
root (str) – Path to the directory where the dataset is found or downloaded.
url (str, optional) – The URL to download the dataset from. (default:
"https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2"
)folder_in_archive (str, optional) – The top-level directory of the dataset. (default:
"wavs"
)download (bool, optional) – Whether to download the dataset if it is not found at root path. (default:
False
).
SPEECHCOMMANDS¶
-
class
torchaudio.datasets.
SPEECHCOMMANDS
(root: str, url: str = 'speech_commands_v0.02', folder_in_archive: str = 'SpeechCommands', download: bool = False)[source]¶ Create a Dataset for Speech Commands.
- Parameters
root (str) – Path to the directory where the dataset is found or downloaded.
url (str, optional) – The URL to download the dataset from, or the type of the dataset to dowload. Allowed type values are
"speech_commands_v0.01"
and"speech_commands_v0.02"
(default:"speech_commands_v0.02"
)folder_in_archive (str, optional) – The top-level directory of the dataset. (default:
"SpeechCommands"
)download (bool, optional) – Whether to download the dataset if it is not found at root path. (default:
False
).
TEDLIUM¶
-
class
torchaudio.datasets.
TEDLIUM
(root: str, release: str = 'release1', subset: str = None, download: bool = False, audio_ext='.sph')[source]¶ Create a Dataset for Tedlium. It supports releases 1,2 and 3.
- Parameters
root (str) – Path to the directory where the dataset is found or downloaded.
release (str, optional) – Release version. Allowed values are
"release1"
,"release2"
or"release3"
. (default:"release1"
).subset (str, optional) – The subset of dataset to use. Valid options are
"train"
,"dev"
, and"test"
for releases 1&2,None
for release3. Defaults to"train"
orNone
.download (bool, optional) – Whether to download the dataset if it is not found at root path. (default:
False
).
VCTK¶
-
class
torchaudio.datasets.
VCTK
(root: str, url: str = 'https://datashare.is.ed.ac.uk/bitstream/handle/10283/3443/VCTK-Corpus-0.92.zip', folder_in_archive: str = 'VCTK-Corpus', download: bool = False, downsample: bool = False, transform: Any = None, target_transform: Any = None)[source]¶ Create a Dataset for VCTK.
Note
This dataset is no longer publicly available. Please use
VCTK_092
Directory
p315
is ignored because there is no corresponding text files. For more information about the dataset visit: https://datashare.is.ed.ac.uk/handle/10283/3443
- Parameters
root (str) – Path to the directory where the dataset is found or downloaded.
url (str, optional) – Not used as the dataset is no longer publicly available.
folder_in_archive (str, optional) – The top-level directory of the dataset. (default:
"VCTK-Corpus"
)download (bool, optional) – Whether to download the dataset if it is not found at root path. (default:
False
). Givingdownload=True
will result in error as the dataset is no longer publicly available.downsample (bool, optional) – Not used.
transform (callable, optional) – Optional transform applied on waveform. (default:
None
)target_transform (callable, optional) – Optional transform applied on utterance. (default:
None
)
VCTK_092¶
-
class
torchaudio.datasets.
VCTK_092
(root: str, mic_id: str = 'mic2', download: bool = False, url: str = 'https://datashare.is.ed.ac.uk/bitstream/handle/10283/3443/VCTK-Corpus-0.92.zip', audio_ext='.flac')[source]¶ Create VCTK 0.92 Dataset
- Parameters
root (str) – Root directory where the dataset’s top level directory is found.
mic_id (str) – Microphone ID. Either
"mic1"
or"mic2"
. (default:"mic2"
)download (bool, optional) – Whether to download the dataset if it is not found at root path. (default:
False
).url (str, optional) – The URL to download the dataset from. (default:
"https://datashare.is.ed.ac.uk/bitstream/handle/10283/3443/VCTK-Corpus-0.92.zip"
)audio_ext (str, optional) – Custom audio extension if dataset is converted to non-default audio format.
Note
All the speeches from speaker
p315
will be skipped due to the lack of the corresponding text files.All the speeches from
p280
will be skipped formic_id="mic2"
due to the lack of the audio files.Some of the speeches from speaker
p362
will be skipped due to the lack of the audio files.
YESNO¶
-
class
torchaudio.datasets.
YESNO
(root: str, url: str = 'http://www.openslr.org/resources/1/waves_yesno.tar.gz', folder_in_archive: str = 'waves_yesno', download: bool = False, transform: Any = None, target_transform: Any = None)[source]¶ Create a Dataset for YesNo.
- Parameters
root (str) – Path to the directory where the dataset is found or downloaded.
url (str, optional) – The URL to download the dataset from. (default:
"http://www.openslr.org/resources/1/waves_yesno.tar.gz"
)folder_in_archive (str, optional) – The top-level directory of the dataset. (default:
"waves_yesno"
)download (bool, optional) – Whether to download the dataset if it is not found at root path. (default:
False
).transform (callable, optional) – Optional transform applied on waveform. (default:
None
)target_transform (callable, optional) – Optional transform applied on utterance. (default:
None
)