Shortcuts

VCTK_092

class torchaudio.datasets.VCTK_092(root: str, mic_id: str = 'mic2', download: bool = False, url: str = 'https://datashare.is.ed.ac.uk/bitstream/handle/10283/3443/VCTK-Corpus-0.92.zip', audio_ext='.flac')[source]

VCTK 0.92 [Yamagishi et al., 2019] dataset

Parameters:
  • root (str) – Root directory where the dataset’s top level directory is found.

  • mic_id (str, optional) – Microphone ID. Either "mic1" or "mic2". (default: "mic2")

  • download (bool, optional) – Whether to download the dataset if it is not found at root path. (default: False).

  • url (str, optional) – The URL to download the dataset from. (default: "https://datashare.is.ed.ac.uk/bitstream/handle/10283/3443/VCTK-Corpus-0.92.zip")

  • audio_ext (str, optional) – Custom audio extension if dataset is converted to non-default audio format.

Note

  • All the speeches from speaker p315 will be skipped due to the lack of the corresponding text files.

  • All the speeches from p280 will be skipped for mic_id="mic2" due to the lack of the audio files.

  • Some of the speeches from speaker p362 will be skipped due to the lack of the audio files.

  • See Also: https://datashare.is.ed.ac.uk/handle/10283/3443

__getitem__

VCTK_092.__getitem__(n: int) Tuple[Tensor, int, str, str, str][source]

Load the n-th sample from the dataset.

Parameters:

n (int) – The index of the sample to be loaded

Returns:

Tuple of the following items;

Tensor:

Waveform

int:

Sample rate

str:

Transcript

str:

Speaker ID

std:

Utterance ID

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources