.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "tutorials/audio_io_tutorial.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_tutorials_audio_io_tutorial.py: Audio I/O ========= This tutorial shows how to use TorchAudio's basic I/O API to load audio files into PyTorch's Tensor object, and save Tensor objects to audio files. .. GENERATED FROM PYTHON SOURCE LINES 9-16 .. code-block:: default import torch import torchaudio print(torch.__version__) print(torchaudio.__version__) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none 1.12.0 0.12.0 .. GENERATED FROM PYTHON SOURCE LINES 17-29 Preparation ----------- First, we import the modules and download the audio assets we use in this tutorial. .. note:: When running this tutorial in Google Colab, install the required packages with the following: .. code:: !pip install boto3 .. GENERATED FROM PYTHON SOURCE LINES 29-49 .. code-block:: default import io import os import tarfile import tempfile import boto3 import matplotlib.pyplot as plt import requests from botocore import UNSIGNED from botocore.config import Config from IPython.display import Audio from torchaudio.utils import download_asset SAMPLE_GSM = download_asset("tutorial-assets/steam-train-whistle-daniel_simon.gsm") SAMPLE_WAV = download_asset("tutorial-assets/Lab41-SRI-VOiCES-src-sp0307-ch127535-sg0042.wav") SAMPLE_WAV_8000 = download_asset("tutorial-assets/Lab41-SRI-VOiCES-src-sp0307-ch127535-sg0042-8000hz.wav") .. rst-class:: sphx-glr-script-out Out: .. code-block:: none 0%| | 0.00/7.99k [00:00`__ - ``"ULAW"``: Mu-law, [`wikipedia `__] - ``"ALAW"``: A-law [`wikipedia `__] - ``"MP3"`` : MP3, MPEG-1 Audio Layer III - ``"VORBIS"``: OGG Vorbis [`xiph.org `__] - ``"AMR_NB"``: Adaptive Multi-Rate [`wikipedia `__] - ``"AMR_WB"``: Adaptive Multi-Rate Wideband [`wikipedia `__] - ``"OPUS"``: Opus [`opus-codec.org `__] - ``"GSM"``: GSM-FR [`wikipedia `__] - ``"HTK"``: Single channel 16-bit PCM - ``"UNKNOWN"`` None of above .. GENERATED FROM PYTHON SOURCE LINES 94-100 **Note** - ``bits_per_sample`` can be ``0`` for formats with compression and/or variable bit rate (such as MP3). - ``num_frames`` can be ``0`` for GSM-FR format. .. GENERATED FROM PYTHON SOURCE LINES 100-105 .. code-block:: default metadata = torchaudio.info(SAMPLE_GSM) print(metadata) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none AudioMetaData(sample_rate=8000, num_frames=0, num_channels=1, bits_per_sample=0, encoding=GSM) .. GENERATED FROM PYTHON SOURCE LINES 106-111 Querying file-like object ------------------------- :py:func:`torchaudio.info` works on file-like objects. .. GENERATED FROM PYTHON SOURCE LINES 111-117 .. code-block:: default url = "https://download.pytorch.org/torchaudio/tutorial-assets/steam-train-whistle-daniel_simon.wav" with requests.get(url, stream=True) as response: metadata = torchaudio.info(response.raw) print(metadata) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none AudioMetaData(sample_rate=44100, num_frames=109368, num_channels=2, bits_per_sample=16, encoding=PCM_S) .. GENERATED FROM PYTHON SOURCE LINES 118-126 .. note:: When passing a file-like object, ``info`` does not read all of the underlying data; rather, it reads only a portion of the data from the beginning. Therefore, for a given audio format, it may not be able to retrieve the correct metadata, including the format itself. In such case, you can pass ``format`` argument to specify the format of the audio. .. GENERATED FROM PYTHON SOURCE LINES 128-144 Loading audio data ------------------ To load audio data, you can use :py:func:`torchaudio.load`. This function accepts a path-like object or file-like object as input. The returned value is a tuple of waveform (``Tensor``) and sample rate (``int``). By default, the resulting tensor object has ``dtype=torch.float32`` and its value range is ``[-1.0, 1.0]``. For the list of supported format, please refer to `the torchaudio documentation `__. .. GENERATED FROM PYTHON SOURCE LINES 144-148 .. code-block:: default waveform, sample_rate = torchaudio.load(SAMPLE_WAV) .. GENERATED FROM PYTHON SOURCE LINES 150-168 .. code-block:: default def plot_waveform(waveform, sample_rate): waveform = waveform.numpy() num_channels, num_frames = waveform.shape time_axis = torch.arange(0, num_frames) / sample_rate figure, axes = plt.subplots(num_channels, 1) if num_channels == 1: axes = [axes] for c in range(num_channels): axes[c].plot(time_axis, waveform[c], linewidth=1) axes[c].grid(True) if num_channels > 1: axes[c].set_ylabel(f"Channel {c+1}") figure.suptitle("waveform") plt.show(block=False) .. GENERATED FROM PYTHON SOURCE LINES 170-173 .. code-block:: default plot_waveform(waveform, sample_rate) .. image-sg:: /tutorials/images/sphx_glr_audio_io_tutorial_001.png :alt: waveform :srcset: /tutorials/images/sphx_glr_audio_io_tutorial_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 175-191 .. code-block:: default def plot_specgram(waveform, sample_rate, title="Spectrogram"): waveform = waveform.numpy() num_channels, num_frames = waveform.shape figure, axes = plt.subplots(num_channels, 1) if num_channels == 1: axes = [axes] for c in range(num_channels): axes[c].specgram(waveform[c], Fs=sample_rate) if num_channels > 1: axes[c].set_ylabel(f"Channel {c+1}") figure.suptitle(title) plt.show(block=False) .. GENERATED FROM PYTHON SOURCE LINES 193-196 .. code-block:: default plot_specgram(waveform, sample_rate) .. image-sg:: /tutorials/images/sphx_glr_audio_io_tutorial_002.png :alt: Spectrogram :srcset: /tutorials/images/sphx_glr_audio_io_tutorial_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 198-200 .. code-block:: default Audio(waveform.numpy()[0], rate=sample_rate) .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 201-209 Loading from file-like object ----------------------------- The I/O functions support file-like objects. This allows for fetching and decoding audio data from locations within and beyond the local file system. The following examples illustrate this. .. GENERATED FROM PYTHON SOURCE LINES 212-219 .. code-block:: default # Load audio data as HTTP request url = "https://download.pytorch.org/torchaudio/tutorial-assets/Lab41-SRI-VOiCES-src-sp0307-ch127535-sg0042.wav" with requests.get(url, stream=True) as response: waveform, sample_rate = torchaudio.load(response.raw) plot_specgram(waveform, sample_rate, title="HTTP datasource") .. image-sg:: /tutorials/images/sphx_glr_audio_io_tutorial_003.png :alt: HTTP datasource :srcset: /tutorials/images/sphx_glr_audio_io_tutorial_003.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 221-230 .. code-block:: default # Load audio from tar file tar_path = download_asset("tutorial-assets/VOiCES_devkit.tar.gz") tar_item = "VOiCES_devkit/source-16k/train/sp0307/Lab41-SRI-VOiCES-src-sp0307-ch127535-sg0042.wav" with tarfile.open(tar_path, mode="r") as tarfile_: fileobj = tarfile_.extractfile(tar_item) waveform, sample_rate = torchaudio.load(fileobj) plot_specgram(waveform, sample_rate, title="TAR file") .. image-sg:: /tutorials/images/sphx_glr_audio_io_tutorial_004.png :alt: TAR file :srcset: /tutorials/images/sphx_glr_audio_io_tutorial_004.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out Out: .. code-block:: none 0%| | 0.00/110k [00:00` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: audio_io_tutorial.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_