.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "tutorials/audio_io_tutorial.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_tutorials_audio_io_tutorial.py: Audio I/O ========= **Author**: `Moto Hira `__ This tutorial shows how to use TorchAudio's basic I/O API to inspect audio data, load them into PyTorch Tensors and save PyTorch Tensors. .. warning:: There are multiple changes planned/made to audio I/O in recent releases. For the detail of these changes please refer to :ref:`Introduction of Dispatcher `. .. GENERATED FROM PYTHON SOURCE LINES 18-25 .. code-block:: default import torch import torchaudio print(torch.__version__) print(torchaudio.__version__) .. rst-class:: sphx-glr-script-out .. code-block:: none 2.4.0.dev20240328 2.2.0.dev20240329 .. GENERATED FROM PYTHON SOURCE LINES 26-38 Preparation ----------- First, we import the modules and download the audio assets we use in this tutorial. .. note:: When running this tutorial in Google Colab, install the required packages with the following: .. code:: !pip install boto3 .. GENERATED FROM PYTHON SOURCE LINES 38-68 .. code-block:: default import io import os import tarfile import tempfile import boto3 import matplotlib.pyplot as plt import requests from botocore import UNSIGNED from botocore.config import Config from IPython.display import Audio from torchaudio.utils import download_asset SAMPLE_GSM = download_asset("tutorial-assets/steam-train-whistle-daniel_simon.gsm") SAMPLE_WAV = download_asset("tutorial-assets/Lab41-SRI-VOiCES-src-sp0307-ch127535-sg0042.wav") SAMPLE_WAV_8000 = download_asset("tutorial-assets/Lab41-SRI-VOiCES-src-sp0307-ch127535-sg0042-8000hz.wav") def _hide_seek(obj): class _wrapper: def __init__(self, obj): self.obj = obj def read(self, n): return self.obj.read(n) return _wrapper(obj) .. rst-class:: sphx-glr-script-out .. code-block:: none 0%| | 0.00/7.99k [00:00`__ - ``"ULAW"``: Mu-law, [`wikipedia `__] - ``"ALAW"``: A-law [`wikipedia `__] - ``"MP3"`` : MP3, MPEG-1 Audio Layer III - ``"VORBIS"``: OGG Vorbis [`xiph.org `__] - ``"AMR_NB"``: Adaptive Multi-Rate [`wikipedia `__] - ``"AMR_WB"``: Adaptive Multi-Rate Wideband [`wikipedia `__] - ``"OPUS"``: Opus [`opus-codec.org `__] - ``"GSM"``: GSM-FR [`wikipedia `__] - ``"HTK"``: Single channel 16-bit PCM - ``"UNKNOWN"`` None of above .. GENERATED FROM PYTHON SOURCE LINES 113-119 **Note** - ``bits_per_sample`` can be ``0`` for formats with compression and/or variable bit rate (such as MP3). - ``num_frames`` can be ``0`` for GSM-FR format. .. GENERATED FROM PYTHON SOURCE LINES 119-124 .. code-block:: default metadata = torchaudio.info(SAMPLE_GSM) print(metadata) .. rst-class:: sphx-glr-script-out .. code-block:: none AudioMetaData(sample_rate=8000, num_frames=39680, num_channels=1, bits_per_sample=0, encoding=GSM) .. GENERATED FROM PYTHON SOURCE LINES 125-130 Querying file-like object ------------------------- :py:func:`torchaudio.info` works on file-like objects. .. GENERATED FROM PYTHON SOURCE LINES 130-136 .. code-block:: default url = "https://download.pytorch.org/torchaudio/tutorial-assets/steam-train-whistle-daniel_simon.wav" with requests.get(url, stream=True) as response: metadata = torchaudio.info(_hide_seek(response.raw)) print(metadata) .. rst-class:: sphx-glr-script-out .. code-block:: none AudioMetaData(sample_rate=44100, num_frames=109368, num_channels=2, bits_per_sample=16, encoding=PCM_S) .. GENERATED FROM PYTHON SOURCE LINES 137-145 .. note:: When passing a file-like object, ``info`` does not read all of the underlying data; rather, it reads only a portion of the data from the beginning. Therefore, for a given audio format, it may not be able to retrieve the correct metadata, including the format itself. In such case, you can pass ``format`` argument to specify the format of the audio. .. GENERATED FROM PYTHON SOURCE LINES 147-163 Loading audio data ------------------ To load audio data, you can use :py:func:`torchaudio.load`. This function accepts a path-like object or file-like object as input. The returned value is a tuple of waveform (``Tensor``) and sample rate (``int``). By default, the resulting tensor object has ``dtype=torch.float32`` and its value range is ``[-1.0, 1.0]``. For the list of supported format, please refer to `the torchaudio documentation `__. .. GENERATED FROM PYTHON SOURCE LINES 163-167 .. code-block:: default waveform, sample_rate = torchaudio.load(SAMPLE_WAV) .. GENERATED FROM PYTHON SOURCE LINES 169-186 .. code-block:: default def plot_waveform(waveform, sample_rate): waveform = waveform.numpy() num_channels, num_frames = waveform.shape time_axis = torch.arange(0, num_frames) / sample_rate figure, axes = plt.subplots(num_channels, 1) if num_channels == 1: axes = [axes] for c in range(num_channels): axes[c].plot(time_axis, waveform[c], linewidth=1) axes[c].grid(True) if num_channels > 1: axes[c].set_ylabel(f"Channel {c+1}") figure.suptitle("waveform") .. GENERATED FROM PYTHON SOURCE LINES 188-191 .. code-block:: default plot_waveform(waveform, sample_rate) .. image-sg:: /tutorials/images/sphx_glr_audio_io_tutorial_001.png :alt: waveform :srcset: /tutorials/images/sphx_glr_audio_io_tutorial_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 193-208 .. code-block:: default def plot_specgram(waveform, sample_rate, title="Spectrogram"): waveform = waveform.numpy() num_channels, num_frames = waveform.shape figure, axes = plt.subplots(num_channels, 1) if num_channels == 1: axes = [axes] for c in range(num_channels): axes[c].specgram(waveform[c], Fs=sample_rate) if num_channels > 1: axes[c].set_ylabel(f"Channel {c+1}") figure.suptitle(title) .. GENERATED FROM PYTHON SOURCE LINES 210-213 .. code-block:: default plot_specgram(waveform, sample_rate) .. image-sg:: /tutorials/images/sphx_glr_audio_io_tutorial_002.png :alt: Spectrogram :srcset: /tutorials/images/sphx_glr_audio_io_tutorial_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 215-217 .. code-block:: default Audio(waveform.numpy()[0], rate=sample_rate) .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 218-226 Loading from file-like object ----------------------------- The I/O functions support file-like objects. This allows for fetching and decoding audio data from locations within and beyond the local file system. The following examples illustrate this. .. GENERATED FROM PYTHON SOURCE LINES 229-236 .. code-block:: default # Load audio data as HTTP request url = "https://download.pytorch.org/torchaudio/tutorial-assets/Lab41-SRI-VOiCES-src-sp0307-ch127535-sg0042.wav" with requests.get(url, stream=True) as response: waveform, sample_rate = torchaudio.load(_hide_seek(response.raw)) plot_specgram(waveform, sample_rate, title="HTTP datasource") .. image-sg:: /tutorials/images/sphx_glr_audio_io_tutorial_003.png :alt: HTTP datasource :srcset: /tutorials/images/sphx_glr_audio_io_tutorial_003.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 238-247 .. code-block:: default # Load audio from tar file tar_path = download_asset("tutorial-assets/VOiCES_devkit.tar.gz") tar_item = "VOiCES_devkit/source-16k/train/sp0307/Lab41-SRI-VOiCES-src-sp0307-ch127535-sg0042.wav" with tarfile.open(tar_path, mode="r") as tarfile_: fileobj = tarfile_.extractfile(tar_item) waveform, sample_rate = torchaudio.load(fileobj) plot_specgram(waveform, sample_rate, title="TAR file") .. image-sg:: /tutorials/images/sphx_glr_audio_io_tutorial_004.png :alt: TAR file :srcset: /tutorials/images/sphx_glr_audio_io_tutorial_004.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none 0%| | 0.00/110k [00:00` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: audio_io_tutorial.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_