.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "tutorials/audio_feature_augmentation_tutorial.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_tutorials_audio_feature_augmentation_tutorial.py: Audio Feature Augmentation ========================== .. GENERATED FROM PYTHON SOURCE LINES 6-18 .. code-block:: default # When running this tutorial in Google Colab, install the required packages # with the following. # !pip install torchaudio librosa import torch import torchaudio import torchaudio.transforms as T print(torch.__version__) print(torchaudio.__version__) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none 1.10.0+cpu 0.10.0+cpu .. GENERATED FROM PYTHON SOURCE LINES 19-22 Preparing data and utility functions (skip this section) -------------------------------------------------------- .. GENERATED FROM PYTHON SOURCE LINES 22-100 .. code-block:: default #@title Prepare data and utility functions. {display-mode: "form"} #@markdown #@markdown You do not need to look into this cell. #@markdown Just execute once and you are good to go. #@markdown #@markdown In this tutorial, we will use a speech data from [VOiCES dataset](https://iqtlabs.github.io/voices/), which is licensed under Creative Commos BY 4.0. #------------------------------------------------------------------------------- # Preparation of data and helper functions. #------------------------------------------------------------------------------- import os import requests import librosa import matplotlib.pyplot as plt _SAMPLE_DIR = "_assets" SAMPLE_WAV_SPEECH_URL = "https://pytorch-tutorial-assets.s3.amazonaws.com/VOiCES_devkit/source-16k/train/sp0307/Lab41-SRI-VOiCES-src-sp0307-ch127535-sg0042.wav" SAMPLE_WAV_SPEECH_PATH = os.path.join(_SAMPLE_DIR, "speech.wav") os.makedirs(_SAMPLE_DIR, exist_ok=True) def _fetch_data(): uri = [ (SAMPLE_WAV_SPEECH_URL, SAMPLE_WAV_SPEECH_PATH), ] for url, path in uri: with open(path, 'wb') as file_: file_.write(requests.get(url).content) _fetch_data() def _get_sample(path, resample=None): effects = [ ["remix", "1"] ] if resample: effects.extend([ ["lowpass", f"{resample // 2}"], ["rate", f'{resample}'], ]) return torchaudio.sox_effects.apply_effects_file(path, effects=effects) def get_speech_sample(*, resample=None): return _get_sample(SAMPLE_WAV_SPEECH_PATH, resample=resample) def get_spectrogram( n_fft = 400, win_len = None, hop_len = None, power = 2.0, ): waveform, _ = get_speech_sample() spectrogram = T.Spectrogram( n_fft=n_fft, win_length=win_len, hop_length=hop_len, center=True, pad_mode="reflect", power=power, ) return spectrogram(waveform) def plot_spectrogram(spec, title=None, ylabel='freq_bin', aspect='auto', xmax=None): fig, axs = plt.subplots(1, 1) axs.set_title(title or 'Spectrogram (db)') axs.set_ylabel(ylabel) axs.set_xlabel('frame') im = axs.imshow(librosa.power_to_db(spec), origin='lower', aspect=aspect) if xmax: axs.set_xlim((0, xmax)) fig.colorbar(im, ax=axs) plt.show(block=False) .. GENERATED FROM PYTHON SOURCE LINES 101-113 SpecAugment ----------- `SpecAugment `__ is a popular spectrogram augmentation technique. ``torchaudio`` implements ``TimeStretch``, ``TimeMasking`` and ``FrequencyMasking``. TimeStretch ~~~~~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 113-127 .. code-block:: default spec = get_spectrogram(power=None) stretch = T.TimeStretch() rate = 1.2 spec_ = stretch(spec, rate) plot_spectrogram(torch.abs(spec_[0]), title=f"Stretched x{rate}", aspect='equal', xmax=304) plot_spectrogram(torch.abs(spec[0]), title="Original", aspect='equal', xmax=304) rate = 0.9 spec_ = stretch(spec, rate) plot_spectrogram(torch.abs(spec_[0]), title=f"Stretched x{rate}", aspect='equal', xmax=304) .. rst-class:: sphx-glr-horizontal * .. image-sg:: /tutorials/images/sphx_glr_audio_feature_augmentation_tutorial_001.png :alt: Stretched x1.2 :srcset: /tutorials/images/sphx_glr_audio_feature_augmentation_tutorial_001.png :class: sphx-glr-multi-img * .. image-sg:: /tutorials/images/sphx_glr_audio_feature_augmentation_tutorial_002.png :alt: Original :srcset: /tutorials/images/sphx_glr_audio_feature_augmentation_tutorial_002.png :class: sphx-glr-multi-img * .. image-sg:: /tutorials/images/sphx_glr_audio_feature_augmentation_tutorial_003.png :alt: Stretched x0.9 :srcset: /tutorials/images/sphx_glr_audio_feature_augmentation_tutorial_003.png :class: sphx-glr-multi-img .. GENERATED FROM PYTHON SOURCE LINES 128-131 TimeMasking ~~~~~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 131-142 .. code-block:: default torch.random.manual_seed(4) spec = get_spectrogram() plot_spectrogram(spec[0], title="Original") masking = T.TimeMasking(time_mask_param=80) spec = masking(spec) plot_spectrogram(spec[0], title="Masked along time axis") .. rst-class:: sphx-glr-horizontal * .. image-sg:: /tutorials/images/sphx_glr_audio_feature_augmentation_tutorial_004.png :alt: Original :srcset: /tutorials/images/sphx_glr_audio_feature_augmentation_tutorial_004.png :class: sphx-glr-multi-img * .. image-sg:: /tutorials/images/sphx_glr_audio_feature_augmentation_tutorial_005.png :alt: Masked along time axis :srcset: /tutorials/images/sphx_glr_audio_feature_augmentation_tutorial_005.png :class: sphx-glr-multi-img .. GENERATED FROM PYTHON SOURCE LINES 143-146 FrequencyMasking ~~~~~~~~~~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 146-157 .. code-block:: default torch.random.manual_seed(4) spec = get_spectrogram() plot_spectrogram(spec[0], title="Original") masking = T.FrequencyMasking(freq_mask_param=80) spec = masking(spec) plot_spectrogram(spec[0], title="Masked along frequency axis") .. rst-class:: sphx-glr-horizontal * .. image-sg:: /tutorials/images/sphx_glr_audio_feature_augmentation_tutorial_006.png :alt: Original :srcset: /tutorials/images/sphx_glr_audio_feature_augmentation_tutorial_006.png :class: sphx-glr-multi-img * .. image-sg:: /tutorials/images/sphx_glr_audio_feature_augmentation_tutorial_007.png :alt: Masked along frequency axis :srcset: /tutorials/images/sphx_glr_audio_feature_augmentation_tutorial_007.png :class: sphx-glr-multi-img .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 2.130 seconds) .. _sphx_glr_download_tutorials_audio_feature_augmentation_tutorial.py: .. only :: html .. container:: sphx-glr-footer :class: sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: audio_feature_augmentation_tutorial.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: audio_feature_augmentation_tutorial.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_