.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "tutorials/audio_feature_augmentation_tutorial.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_tutorials_audio_feature_augmentation_tutorial.py: Audio Feature Augmentation ========================== **Author**: `Moto Hira `__ .. GENERATED FROM PYTHON SOURCE LINES 9-21 .. code-block:: default # When running this tutorial in Google Colab, install the required packages # with the following. # !pip install torchaudio librosa import torch import torchaudio import torchaudio.transforms as T print(torch.__version__) print(torchaudio.__version__) .. rst-class:: sphx-glr-script-out .. code-block:: none 2.4.0.dev20240416 2.2.0.dev20240418 .. GENERATED FROM PYTHON SOURCE LINES 22-25 Preparation ----------- .. GENERATED FROM PYTHON SOURCE LINES 25-31 .. code-block:: default import librosa import matplotlib.pyplot as plt from IPython.display import Audio from torchaudio.utils import download_asset .. GENERATED FROM PYTHON SOURCE LINES 32-35 In this tutorial, we will use a speech data from `VOiCES dataset `__, which is licensed under Creative Commos BY 4.0. .. GENERATED FROM PYTHON SOURCE LINES 35-73 .. code-block:: default SAMPLE_WAV_SPEECH_PATH = download_asset("tutorial-assets/Lab41-SRI-VOiCES-src-sp0307-ch127535-sg0042.wav") def _get_sample(path, resample=None): effects = [["remix", "1"]] if resample: effects.extend( [ ["lowpass", f"{resample // 2}"], ["rate", f"{resample}"], ] ) return torchaudio.sox_effects.apply_effects_file(path, effects=effects) def get_speech_sample(*, resample=None): return _get_sample(SAMPLE_WAV_SPEECH_PATH, resample=resample) def get_spectrogram( n_fft=400, win_len=None, hop_len=None, power=2.0, ): waveform, _ = get_speech_sample() spectrogram = T.Spectrogram( n_fft=n_fft, win_length=win_len, hop_length=hop_len, center=True, pad_mode="reflect", power=power, ) return spectrogram(waveform) .. GENERATED FROM PYTHON SOURCE LINES 74-84 SpecAugment ----------- `SpecAugment `__ is a popular spectrogram augmentation technique. ``torchaudio`` implements :py:func:`torchaudio.transforms.TimeStretch`, :py:func:`torchaudio.transforms.TimeMasking` and :py:func:`torchaudio.transforms.FrequencyMasking`. .. GENERATED FROM PYTHON SOURCE LINES 86-89 TimeStretch ----------- .. GENERATED FROM PYTHON SOURCE LINES 89-98 .. code-block:: default spec = get_spectrogram(power=None) stretch = T.TimeStretch() spec_12 = stretch(spec, overriding_rate=1.2) spec_09 = stretch(spec, overriding_rate=0.9) .. GENERATED FROM PYTHON SOURCE LINES 99-101 Visualization ~~~~~~~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 101-116 .. code-block:: default def plot(): def plot_spec(ax, spec, title): ax.set_title(title) ax.imshow(librosa.amplitude_to_db(spec), origin="lower", aspect="auto") fig, axes = plt.subplots(3, 1, sharex=True, sharey=True) plot_spec(axes[0], torch.abs(spec_12[0]), title="Stretched x1.2") plot_spec(axes[1], torch.abs(spec[0]), title="Original") plot_spec(axes[2], torch.abs(spec_09[0]), title="Stretched x0.9") fig.tight_layout() plot() .. image-sg:: /tutorials/images/sphx_glr_audio_feature_augmentation_tutorial_001.png :alt: Stretched x1.2, Original, Stretched x0.9 :srcset: /tutorials/images/sphx_glr_audio_feature_augmentation_tutorial_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 117-119 Audio Samples ~~~~~~~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 119-129 .. code-block:: default def preview(spec, rate=16000): ispec = T.InverseSpectrogram() waveform = ispec(spec) return Audio(waveform[0].numpy().T, rate=rate) preview(spec) .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 131-134 .. code-block:: default preview(spec_12) .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 136-139 .. code-block:: default preview(spec_09) .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 140-143 Time and Frequency Masking -------------------------- .. GENERATED FROM PYTHON SOURCE LINES 143-153 .. code-block:: default torch.random.manual_seed(4) time_masking = T.TimeMasking(time_mask_param=80) freq_masking = T.FrequencyMasking(freq_mask_param=80) spec = get_spectrogram() time_masked = time_masking(spec) freq_masked = freq_masking(spec) .. GENERATED FROM PYTHON SOURCE LINES 155-170 .. code-block:: default def plot(): def plot_spec(ax, spec, title): ax.set_title(title) ax.imshow(librosa.power_to_db(spec), origin="lower", aspect="auto") fig, axes = plt.subplots(3, 1, sharex=True, sharey=True) plot_spec(axes[0], spec[0], title="Original") plot_spec(axes[1], time_masked[0], title="Masked along time axis") plot_spec(axes[2], freq_masked[0], title="Masked along frequency axis") fig.tight_layout() plot() .. image-sg:: /tutorials/images/sphx_glr_audio_feature_augmentation_tutorial_002.png :alt: Original, Masked along time axis, Masked along frequency axis :srcset: /tutorials/images/sphx_glr_audio_feature_augmentation_tutorial_002.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 2.253 seconds) .. _sphx_glr_download_tutorials_audio_feature_augmentation_tutorial.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: audio_feature_augmentation_tutorial.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: audio_feature_augmentation_tutorial.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_