Shortcuts

Audio Datasets

Author: Moto Hira

torchaudio provides easy access to common, publicly accessible datasets. Please refer to the official documentation for the list of available datasets.

# When running this tutorial in Google Colab, install the required packages
# with the following.
# !pip install torchaudio

import torch
import torchaudio

print(torch.__version__)
print(torchaudio.__version__)
1.14.0.dev20221207
0.14.0.dev20221207

Preparing data and utility functions (skip this section)

# @title Prepare data and utility functions. {display-mode: "form"}
# @markdown
# @markdown You do not need to look into this cell.
# @markdown Just execute once and you are good to go.

# -------------------------------------------------------------------------------
# Preparation of data and helper functions.
# -------------------------------------------------------------------------------
import os

import matplotlib.pyplot as plt
from IPython.display import Audio, display


_SAMPLE_DIR = "_assets"
YESNO_DATASET_PATH = os.path.join(_SAMPLE_DIR, "yes_no")
os.makedirs(YESNO_DATASET_PATH, exist_ok=True)


def plot_specgram(waveform, sample_rate, title="Spectrogram", xlim=None):
    waveform = waveform.numpy()

    num_channels, _ = waveform.shape

    figure, axes = plt.subplots(num_channels, 1)
    if num_channels == 1:
        axes = [axes]
    for c in range(num_channels):
        axes[c].specgram(waveform[c], Fs=sample_rate)
        if num_channels > 1:
            axes[c].set_ylabel(f"Channel {c+1}")
        if xlim:
            axes[c].set_xlim(xlim)
    figure.suptitle(title)
    plt.show(block=False)


def play_audio(waveform, sample_rate):
    waveform = waveform.numpy()

    num_channels, _ = waveform.shape
    if num_channels == 1:
        display(Audio(waveform[0], rate=sample_rate))
    elif num_channels == 2:
        display(Audio((waveform[0], waveform[1]), rate=sample_rate))
    else:
        raise ValueError("Waveform with more than 2 channels are not supported.")

Here, we show how to use the torchaudio.datasets.YESNO dataset.

dataset = torchaudio.datasets.YESNO(YESNO_DATASET_PATH, download=True)

for i in [1, 3, 5]:
    waveform, sample_rate, label = dataset[i]
    plot_specgram(waveform, sample_rate, title=f"Sample {i}: {label}")
    play_audio(waveform, sample_rate)
  • Sample 1: [0, 0, 0, 1, 0, 0, 0, 1]
  • Sample 3: [0, 0, 1, 0, 0, 0, 1, 0]
  • Sample 5: [0, 0, 1, 0, 0, 1, 1, 1]
  0%|          | 0.00/4.49M [00:00<?, ?B/s]
  0%|          | 8.00k/4.49M [00:00<00:59, 78.7kB/s]
  1%|          | 32.0k/4.49M [00:00<00:27, 171kB/s]
  2%|2         | 96.0k/4.49M [00:00<00:12, 359kB/s]
  5%|4         | 208k/4.49M [00:00<00:07, 597kB/s]
  8%|7         | 352k/4.49M [00:00<00:05, 790kB/s]
 11%|#         | 504k/4.49M [00:00<00:04, 893kB/s]
 14%|#4        | 664k/4.49M [00:00<00:03, 1.01MB/s]
 18%|#8        | 832k/4.49M [00:00<00:03, 1.12MB/s]
 22%|##1       | 0.98M/4.49M [00:01<00:03, 1.21MB/s]
 26%|##5       | 1.16M/4.49M [00:01<00:02, 1.27MB/s]
 30%|###       | 1.35M/4.49M [00:01<00:03, 932kB/s]
 34%|###3      | 1.52M/4.49M [00:01<00:03, 977kB/s]
 37%|###7      | 1.68M/4.49M [00:01<00:02, 1.01MB/s]
 40%|###9      | 1.79M/4.49M [00:02<00:02, 974kB/s]
 42%|####2     | 1.89M/4.49M [00:02<00:02, 950kB/s]
 44%|####4     | 1.98M/4.49M [00:02<00:02, 916kB/s]
 47%|####6     | 2.09M/4.49M [00:02<00:02, 905kB/s]
 49%|####8     | 2.19M/4.49M [00:02<00:02, 906kB/s]
 51%|#####1    | 2.29M/4.49M [00:02<00:02, 796kB/s]
 53%|#####3    | 2.39M/4.49M [00:02<00:03, 677kB/s]
 58%|#####7    | 2.59M/4.49M [00:03<00:02, 896kB/s]
 60%|#####9    | 2.68M/4.49M [00:03<00:02, 871kB/s]
 62%|######1   | 2.77M/4.49M [00:03<00:02, 846kB/s]
 64%|######3   | 2.86M/4.49M [00:03<00:02, 814kB/s]
 66%|######5   | 2.95M/4.49M [00:03<00:02, 779kB/s]
 67%|######7   | 3.02M/4.49M [00:03<00:02, 717kB/s]
 69%|######8   | 3.09M/4.49M [00:03<00:02, 684kB/s]
 71%|#######   | 3.16M/4.49M [00:03<00:02, 690kB/s]
 73%|#######2  | 3.26M/4.49M [00:04<00:01, 648kB/s]
 75%|#######4  | 3.34M/4.49M [00:04<00:01, 700kB/s]
 77%|#######6  | 3.44M/4.49M [00:04<00:01, 624kB/s]
 78%|#######8  | 3.50M/4.49M [00:04<00:01, 527kB/s]
 79%|#######9  | 3.55M/4.49M [00:04<00:02, 418kB/s]
 82%|########2 | 3.68M/4.49M [00:04<00:01, 530kB/s]
 83%|########3 | 3.74M/4.49M [00:05<00:01, 516kB/s]
 85%|########4 | 3.80M/4.49M [00:05<00:01, 511kB/s]
 86%|########5 | 3.85M/4.49M [00:05<00:01, 489kB/s]
 87%|########7 | 3.91M/4.49M [00:05<00:01, 405kB/s]
 88%|########8 | 3.96M/4.49M [00:05<00:01, 429kB/s]
 90%|########9 | 4.02M/4.49M [00:05<00:01, 460kB/s]
 91%|######### | 4.06M/4.49M [00:05<00:00, 464kB/s]
 92%|#########1| 4.12M/4.49M [00:05<00:00, 489kB/s]
 93%|#########3| 4.17M/4.49M [00:06<00:00, 504kB/s]
 94%|#########4| 4.23M/4.49M [00:06<00:00, 512kB/s]
 95%|#########5| 4.28M/4.49M [00:06<00:00, 510kB/s]
 97%|#########6| 4.34M/4.49M [00:06<00:00, 514kB/s]
 98%|#########7| 4.39M/4.49M [00:06<00:00, 519kB/s]
 99%|#########9| 4.45M/4.49M [00:06<00:00, 525kB/s]
100%|##########| 4.49M/4.49M [00:06<00:00, 709kB/s]
<IPython.lib.display.Audio object>
<IPython.lib.display.Audio object>
<IPython.lib.display.Audio object>

Total running time of the script: ( 0 minutes 7.909 seconds)

Gallery generated by Sphinx-Gallery

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources