VCTK_092

class torchaudio.datasets.VCTK_092(root: str, mic_id: str = 'mic2', download: bool = False, url: str = 'https://datashare.is.ed.ac.uk/bitstream/handle/10283/3443/VCTK-Corpus-0.92.zip', audio_ext='.flac')[source]

VCTK 0.92 [Yamagishi et al., 2019] dataset

Parameters:

root (str) – Root directory where the dataset’s top level directory is found.
mic_id (str, optional) – Microphone ID. Either "mic1" or "mic2". (default: "mic2")
download (bool, optional) – Whether to download the dataset if it is not found at root path. (default: False).
url (str, optional) – The URL to download the dataset from. (default: "https://datashare.is.ed.ac.uk/bitstream/handle/10283/3443/VCTK-Corpus-0.92.zip")
audio_ext (str, optional) – Custom audio extension if dataset is converted to non-default audio format.

Note

All the speeches from speaker p315 will be skipped due to the lack of the corresponding text files.
All the speeches from p280 will be skipped for mic_id="mic2" due to the lack of the audio files.
Some of the speeches from speaker p362 will be skipped due to the lack of the audio files.
See Also: https://datashare.is.ed.ac.uk/handle/10283/3443

getitem

VCTK_092.__getitem__(n: int) → Tuple[Tensor, int, str, str, str][source]

Load the n-th sample from the dataset.

Parameters:

n (int) – The index of the sample to be loaded

Returns:

Tuple of the following items;

Tensor:: Waveform
int:: Sample rate
str:: Transcript
str:: Speaker ID
std:: Utterance ID

VCTK_092

getitem

Docs

Tutorials

Resources