.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "tutorials/audio_data_augmentation_tutorial.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_tutorials_audio_data_augmentation_tutorial.py: Audio Data Augmentation ======================= **Author**: `Moto Hira `__ ``torchaudio`` provides a variety of ways to augment audio data. In this tutorial, we look into a way to apply effects, filters, RIR (room impulse response) and codecs. At the end, we synthesize noisy speech over phone from clean speech. .. GENERATED FROM PYTHON SOURCE LINES 15-23 .. code-block:: default import torch import torchaudio import torchaudio.functional as F print(torch.__version__) print(torchaudio.__version__) .. rst-class:: sphx-glr-script-out .. code-block:: none 2.0.0 2.0.1 .. GENERATED FROM PYTHON SOURCE LINES 24-29 Preparation ----------- First, we import the modules and download the audio assets we use in this tutorial. .. GENERATED FROM PYTHON SOURCE LINES 29-43 .. code-block:: default import math from IPython.display import Audio import matplotlib.pyplot as plt from torchaudio.utils import download_asset SAMPLE_WAV = download_asset("tutorial-assets/steam-train-whistle-daniel_simon.wav") SAMPLE_RIR = download_asset("tutorial-assets/Lab41-SRI-VOiCES-rm1-impulse-mc01-stu-clo-8000hz.wav") SAMPLE_SPEECH = download_asset("tutorial-assets/Lab41-SRI-VOiCES-src-sp0307-ch127535-sg0042-8000hz.wav") SAMPLE_NOISE = download_asset("tutorial-assets/Lab41-SRI-VOiCES-rm1-babb-mc01-stu-clo-8000hz.wav") .. rst-class:: sphx-glr-script-out .. code-block:: none 0%| | 0.00/427k [00:00`__. **Tip** If you need to load and resample your audio data on the fly, then you can use :py:func:`torchaudio.sox_effects.apply_effects_file` with effect ``"rate"``. **Note** :py:func:`torchaudio.sox_effects.apply_effects_file` accepts a file-like object or path-like object. Similar to :py:func:`torchaudio.load`, when the audio format cannot be inferred from either the file extension or header, you can provide argument ``format`` to specify the format of the audio source. **Note** This process is not differentiable. .. GENERATED FROM PYTHON SOURCE LINES 78-98 .. code-block:: default # Load the data waveform1, sample_rate1 = torchaudio.load(SAMPLE_WAV) # Define effects effects = [ ["lowpass", "-1", "300"], # apply single-pole lowpass filter ["speed", "0.8"], # reduce the speed # This only changes sample rate, so it is necessary to # add `rate` effect with original sample rate after this. ["rate", f"{sample_rate1}"], ["reverb", "-w"], # Reverbration gives some dramatic feeling ] # Apply effects waveform2, sample_rate2 = torchaudio.sox_effects.apply_effects_tensor(waveform1, sample_rate1, effects) print(waveform1.shape, sample_rate1) print(waveform2.shape, sample_rate2) .. rst-class:: sphx-glr-script-out .. code-block:: none torch.Size([2, 109368]) 44100 torch.Size([2, 136710]) 44100 .. GENERATED FROM PYTHON SOURCE LINES 99-103 Note that the number of frames and number of channels are different from those of the original after the effects are applied. Let’s listen to the audio. .. GENERATED FROM PYTHON SOURCE LINES 103-123 .. code-block:: default def plot_waveform(waveform, sample_rate, title="Waveform", xlim=None): waveform = waveform.numpy() num_channels, num_frames = waveform.shape time_axis = torch.arange(0, num_frames) / sample_rate figure, axes = plt.subplots(num_channels, 1) if num_channels == 1: axes = [axes] for c in range(num_channels): axes[c].plot(time_axis, waveform[c], linewidth=1) axes[c].grid(True) if num_channels > 1: axes[c].set_ylabel(f"Channel {c+1}") if xlim: axes[c].set_xlim(xlim) figure.suptitle(title) plt.show(block=False) .. GENERATED FROM PYTHON SOURCE LINES 125-143 .. code-block:: default def plot_specgram(waveform, sample_rate, title="Spectrogram", xlim=None): waveform = waveform.numpy() num_channels, _ = waveform.shape figure, axes = plt.subplots(num_channels, 1) if num_channels == 1: axes = [axes] for c in range(num_channels): axes[c].specgram(waveform[c], Fs=sample_rate) if num_channels > 1: axes[c].set_ylabel(f"Channel {c+1}") if xlim: axes[c].set_xlim(xlim) figure.suptitle(title) plt.show(block=False) .. GENERATED FROM PYTHON SOURCE LINES 144-147 Original: ~~~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 147-152 .. code-block:: default plot_waveform(waveform1, sample_rate1, title="Original", xlim=(-0.1, 3.2)) plot_specgram(waveform1, sample_rate1, title="Original", xlim=(0, 3.04)) Audio(waveform1, rate=sample_rate1) .. rst-class:: sphx-glr-horizontal * .. image-sg:: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_001.png :alt: Original :srcset: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_001.png :class: sphx-glr-multi-img * .. image-sg:: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_002.png :alt: Original :srcset: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_002.png :class: sphx-glr-multi-img .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 153-156 Effects applied: ~~~~~~~~~~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 156-161 .. code-block:: default plot_waveform(waveform2, sample_rate2, title="Effects Applied", xlim=(-0.1, 3.2)) plot_specgram(waveform2, sample_rate2, title="Effects Applied", xlim=(0, 3.04)) Audio(waveform2, rate=sample_rate2) .. rst-class:: sphx-glr-horizontal * .. image-sg:: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_003.png :alt: Effects Applied :srcset: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_003.png :class: sphx-glr-multi-img * .. image-sg:: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_004.png :alt: Effects Applied :srcset: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_004.png :class: sphx-glr-multi-img .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 162-164 Doesn’t it sound more dramatic? .. GENERATED FROM PYTHON SOURCE LINES 166-181 Simulating room reverberation ----------------------------- `Convolution reverb `__ is a technique that's used to make clean audio sound as though it has been produced in a different environment. Using Room Impulse Response (RIR), for instance, we can make clean speech sound as though it has been uttered in a conference room. For this process, we need RIR data. The following data are from the VOiCES dataset, but you can record your own — just turn on your microphone and clap your hands. .. GENERATED FROM PYTHON SOURCE LINES 181-187 .. code-block:: default rir_raw, sample_rate = torchaudio.load(SAMPLE_RIR) plot_waveform(rir_raw, sample_rate, title="Room Impulse Response (raw)") plot_specgram(rir_raw, sample_rate, title="Room Impulse Response (raw)") Audio(rir_raw, rate=sample_rate) .. rst-class:: sphx-glr-horizontal * .. image-sg:: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_005.png :alt: Room Impulse Response (raw) :srcset: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_005.png :class: sphx-glr-multi-img * .. image-sg:: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_006.png :alt: Room Impulse Response (raw) :srcset: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_006.png :class: sphx-glr-multi-img .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 188-191 First, we need to clean up the RIR. We extract the main impulse and normalize it by its power. .. GENERATED FROM PYTHON SOURCE LINES 191-197 .. code-block:: default rir = rir_raw[:, int(sample_rate * 1.01) : int(sample_rate * 1.3)] rir = rir / torch.norm(rir, p=2) plot_waveform(rir, sample_rate, title="Room Impulse Response") .. image-sg:: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_007.png :alt: Room Impulse Response :srcset: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_007.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 198-201 Then, using :py:func:`torchaudio.functional.fftconvolve`, we convolve the speech signal with the RIR. .. GENERATED FROM PYTHON SOURCE LINES 201-205 .. code-block:: default speech, _ = torchaudio.load(SAMPLE_SPEECH) augmented = F.fftconvolve(speech, rir) .. GENERATED FROM PYTHON SOURCE LINES 206-209 Original: ~~~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 209-214 .. code-block:: default plot_waveform(speech, sample_rate, title="Original") plot_specgram(speech, sample_rate, title="Original") Audio(speech, rate=sample_rate) .. rst-class:: sphx-glr-horizontal * .. image-sg:: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_008.png :alt: Original :srcset: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_008.png :class: sphx-glr-multi-img * .. image-sg:: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_009.png :alt: Original :srcset: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_009.png :class: sphx-glr-multi-img .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 215-218 RIR applied: ~~~~~~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 218-224 .. code-block:: default plot_waveform(augmented, sample_rate, title="RIR Applied") plot_specgram(augmented, sample_rate, title="RIR Applied") Audio(augmented, rate=sample_rate) .. rst-class:: sphx-glr-horizontal * .. image-sg:: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_010.png :alt: RIR Applied :srcset: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_010.png :class: sphx-glr-multi-img * .. image-sg:: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_011.png :alt: RIR Applied :srcset: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_011.png :class: sphx-glr-multi-img .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 225-241 Adding background noise ----------------------- To introduce background noise to audio data, we can add a noise Tensor to the Tensor representing the audio data according to some desired signal-to-noise ratio (SNR) [`wikipedia `__], which determines the intensity of the audio data relative to that of the noise in the output. $$ \\mathrm{SNR} = \\frac{P_{signal}}{P_{noise}} $$ $$ \\mathrm{SNR_{dB}} = 10 \\log _{{10}} \\mathrm {SNR} $$ To add noise to audio data per SNRs, we use :py:func:`torchaudio.functional.add_noise`. .. GENERATED FROM PYTHON SOURCE LINES 241-250 .. code-block:: default speech, _ = torchaudio.load(SAMPLE_SPEECH) noise, _ = torchaudio.load(SAMPLE_NOISE) noise = noise[:, : speech.shape[1]] snr_dbs = torch.tensor([20, 10, 3]) noisy_speeches = F.add_noise(speech, noise, snr_dbs) .. GENERATED FROM PYTHON SOURCE LINES 251-254 Background noise: ~~~~~~~~~~~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 254-259 .. code-block:: default plot_waveform(noise, sample_rate, title="Background noise") plot_specgram(noise, sample_rate, title="Background noise") Audio(noise, rate=sample_rate) .. rst-class:: sphx-glr-horizontal * .. image-sg:: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_012.png :alt: Background noise :srcset: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_012.png :class: sphx-glr-multi-img * .. image-sg:: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_013.png :alt: Background noise :srcset: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_013.png :class: sphx-glr-multi-img .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 260-263 SNR 20 dB: ~~~~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 263-269 .. code-block:: default snr_db, noisy_speech = snr_dbs[0], noisy_speeches[0:1] plot_waveform(noisy_speech, sample_rate, title=f"SNR: {snr_db} [dB]") plot_specgram(noisy_speech, sample_rate, title=f"SNR: {snr_db} [dB]") Audio(noisy_speech, rate=sample_rate) .. rst-class:: sphx-glr-horizontal * .. image-sg:: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_014.png :alt: SNR: 20 [dB] :srcset: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_014.png :class: sphx-glr-multi-img * .. image-sg:: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_015.png :alt: SNR: 20 [dB] :srcset: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_015.png :class: sphx-glr-multi-img .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 270-273 SNR 10 dB: ~~~~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 273-279 .. code-block:: default snr_db, noisy_speech = snr_dbs[1], noisy_speeches[1:2] plot_waveform(noisy_speech, sample_rate, title=f"SNR: {snr_db} [dB]") plot_specgram(noisy_speech, sample_rate, title=f"SNR: {snr_db} [dB]") Audio(noisy_speech, rate=sample_rate) .. rst-class:: sphx-glr-horizontal * .. image-sg:: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_016.png :alt: SNR: 10 [dB] :srcset: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_016.png :class: sphx-glr-multi-img * .. image-sg:: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_017.png :alt: SNR: 10 [dB] :srcset: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_017.png :class: sphx-glr-multi-img .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 280-283 SNR 3 dB: ~~~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 283-290 .. code-block:: default snr_db, noisy_speech = snr_dbs[2], noisy_speeches[2:3] plot_waveform(noisy_speech, sample_rate, title=f"SNR: {snr_db} [dB]") plot_specgram(noisy_speech, sample_rate, title=f"SNR: {snr_db} [dB]") Audio(noisy_speech, rate=sample_rate) .. rst-class:: sphx-glr-horizontal * .. image-sg:: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_018.png :alt: SNR: 3 [dB] :srcset: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_018.png :class: sphx-glr-multi-img * .. image-sg:: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_019.png :alt: SNR: 3 [dB] :srcset: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_019.png :class: sphx-glr-multi-img .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 291-299 Applying codec to Tensor object ------------------------------- :py:func:`torchaudio.functional.apply_codec` can apply codecs to a Tensor object. **Note** This process is not differentiable. .. GENERATED FROM PYTHON SOURCE LINES 299-313 .. code-block:: default waveform, sample_rate = torchaudio.load(SAMPLE_SPEECH) configs = [ {"format": "wav", "encoding": "ULAW", "bits_per_sample": 8}, {"format": "gsm"}, {"format": "vorbis", "compression": -1}, ] waveforms = [] for param in configs: augmented = F.apply_codec(waveform, sample_rate, **param) waveforms.append(augmented) .. rst-class:: sphx-glr-script-out .. code-block:: none /usr/local/envs/python3.8/lib/python3.8/site-packages/torchaudio/backend/sox_io_backend.py:416: UserWarning: File-like object support in sox_io backend is deprecated, and will be removed in v2.1. See https://github.com/pytorch/audio/issues/2950 for the detail.Please migrate to the new dispatcher, or use soundfile backend. warnings.warn(_deprecation_message) /usr/local/envs/python3.8/lib/python3.8/site-packages/torchaudio/backend/sox_io_backend.py:235: UserWarning: File-like object support in sox_io backend is deprecated, and will be removed in v2.1. See https://github.com/pytorch/audio/issues/2950 for the detail.Please migrate to the new dispatcher, or use soundfile backend. warnings.warn(_deprecation_message) .. GENERATED FROM PYTHON SOURCE LINES 314-317 Original: ~~~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 317-322 .. code-block:: default plot_waveform(waveform, sample_rate, title="Original") plot_specgram(waveform, sample_rate, title="Original") Audio(waveform, rate=sample_rate) .. rst-class:: sphx-glr-horizontal * .. image-sg:: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_020.png :alt: Original :srcset: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_020.png :class: sphx-glr-multi-img * .. image-sg:: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_021.png :alt: Original :srcset: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_021.png :class: sphx-glr-multi-img .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 323-326 8 bit mu-law: ~~~~~~~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 326-331 .. code-block:: default plot_waveform(waveforms[0], sample_rate, title="8 bit mu-law") plot_specgram(waveforms[0], sample_rate, title="8 bit mu-law") Audio(waveforms[0], rate=sample_rate) .. rst-class:: sphx-glr-horizontal * .. image-sg:: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_022.png :alt: 8 bit mu-law :srcset: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_022.png :class: sphx-glr-multi-img * .. image-sg:: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_023.png :alt: 8 bit mu-law :srcset: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_023.png :class: sphx-glr-multi-img .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 332-335 GSM-FR: ~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 335-340 .. code-block:: default plot_waveform(waveforms[1], sample_rate, title="GSM-FR") plot_specgram(waveforms[1], sample_rate, title="GSM-FR") Audio(waveforms[1], rate=sample_rate) .. rst-class:: sphx-glr-horizontal * .. image-sg:: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_024.png :alt: GSM-FR :srcset: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_024.png :class: sphx-glr-multi-img * .. image-sg:: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_025.png :alt: GSM-FR :srcset: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_025.png :class: sphx-glr-multi-img .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 341-344 Vorbis: ~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 344-349 .. code-block:: default plot_waveform(waveforms[2], sample_rate, title="Vorbis") plot_specgram(waveforms[2], sample_rate, title="Vorbis") Audio(waveforms[2], rate=sample_rate) .. rst-class:: sphx-glr-horizontal * .. image-sg:: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_026.png :alt: Vorbis :srcset: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_026.png :class: sphx-glr-multi-img * .. image-sg:: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_027.png :alt: Vorbis :srcset: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_027.png :class: sphx-glr-multi-img .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 350-357 Simulating a phone recoding --------------------------- Combining the previous techniques, we can simulate audio that sounds like a person talking over a phone in a echoey room with people talking in the background. .. GENERATED FROM PYTHON SOURCE LINES 357-406 .. code-block:: default sample_rate = 16000 original_speech, sample_rate = torchaudio.load(SAMPLE_SPEECH) plot_specgram(original_speech, sample_rate, title="Original") # Apply RIR rir_applied = F.fftconvolve(speech, rir) plot_specgram(rir_applied, sample_rate, title="RIR Applied") # Add background noise # Because the noise is recorded in the actual environment, we consider that # the noise contains the acoustic feature of the environment. Therefore, we add # the noise after RIR application. noise, _ = torchaudio.load(SAMPLE_NOISE) noise = noise[:, : rir_applied.shape[1]] snr_db = torch.tensor([8]) bg_added = F.add_noise(rir_applied, noise, snr_db) plot_specgram(bg_added, sample_rate, title="BG noise added") # Apply filtering and change sample rate filtered, sample_rate2 = torchaudio.sox_effects.apply_effects_tensor( bg_added, sample_rate, effects=[ ["lowpass", "4000"], [ "compand", "0.02,0.05", "-60,-60,-30,-10,-20,-8,-5,-8,-2,-8", "-8", "-7", "0.05", ], ["rate", "8000"], ], ) plot_specgram(filtered, sample_rate2, title="Filtered") # Apply telephony codec codec_applied = F.apply_codec(filtered, sample_rate2, format="gsm") plot_specgram(codec_applied, sample_rate2, title="GSM Codec Applied") .. rst-class:: sphx-glr-horizontal * .. image-sg:: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_028.png :alt: Original :srcset: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_028.png :class: sphx-glr-multi-img * .. image-sg:: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_029.png :alt: RIR Applied :srcset: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_029.png :class: sphx-glr-multi-img * .. image-sg:: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_030.png :alt: BG noise added :srcset: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_030.png :class: sphx-glr-multi-img * .. image-sg:: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_031.png :alt: Filtered :srcset: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_031.png :class: sphx-glr-multi-img * .. image-sg:: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_032.png :alt: GSM Codec Applied :srcset: /tutorials/images/sphx_glr_audio_data_augmentation_tutorial_032.png :class: sphx-glr-multi-img .. rst-class:: sphx-glr-script-out .. code-block:: none /usr/local/envs/python3.8/lib/python3.8/site-packages/torchaudio/backend/sox_io_backend.py:416: UserWarning: File-like object support in sox_io backend is deprecated, and will be removed in v2.1. See https://github.com/pytorch/audio/issues/2950 for the detail.Please migrate to the new dispatcher, or use soundfile backend. warnings.warn(_deprecation_message) /usr/local/envs/python3.8/lib/python3.8/site-packages/torchaudio/backend/sox_io_backend.py:235: UserWarning: File-like object support in sox_io backend is deprecated, and will be removed in v2.1. See https://github.com/pytorch/audio/issues/2950 for the detail.Please migrate to the new dispatcher, or use soundfile backend. warnings.warn(_deprecation_message) .. GENERATED FROM PYTHON SOURCE LINES 407-410 Original speech: ~~~~~~~~~~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 410-413 .. code-block:: default Audio(original_speech, rate=sample_rate) .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 414-417 RIR applied: ~~~~~~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 417-420 .. code-block:: default Audio(rir_applied, rate=sample_rate) .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 421-424 Background noise added: ~~~~~~~~~~~~~~~~~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 424-427 .. code-block:: default Audio(bg_added, rate=sample_rate) .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 428-431 Filtered: ~~~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 431-434 .. code-block:: default Audio(filtered, rate=sample_rate2) .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 435-438 Codec applied: ~~~~~~~~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 438-440 .. code-block:: default Audio(codec_applied, rate=sample_rate2) .. raw:: html


.. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 16.050 seconds) .. _sphx_glr_download_tutorials_audio_data_augmentation_tutorial.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: audio_data_augmentation_tutorial.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: audio_data_augmentation_tutorial.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_