.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "tutorials/squim_tutorial.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_tutorials_squim_tutorial.py: Torchaudio-Squim: Non-intrusive Speech Assessment in TorchAudio ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 9-12 Author: `Anurag Kumar `__, `Zhaoheng Ni `__ .. GENERATED FROM PYTHON SOURCE LINES 15-18 1. Overview ^^^^^^^^^^^ .. GENERATED FROM PYTHON SOURCE LINES 21-62 This tutorial shows uses of Torchaudio-Squim to estimate objective and subjective metrics for assessment of speech quality and intelligibility. TorchAudio-Squim enables speech assessment in Torchaudio. It provides interface and pre-trained models to estimate various speech quality and intelligibility metrics. Currently, Torchaudio-Squim [1] supports reference-free estimation 3 widely used objective metrics: - Wideband Perceptual Estimation of Speech Quality (PESQ) [2] - Short-Time Objective Intelligibility (STOI) [3] - Scale-Invariant Signal-to-Distortion Ratio (SI-SDR) [4] It also supports estimation of subjective Mean Opinion Score (MOS) for a given audio waveform using Non-Matching References [1, 5]. **References** [1] Kumar, Anurag, et al. “TorchAudio-Squim: Reference-less Speech Quality and Intelligibility measures in TorchAudio.” ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023. [2] I. Rec, “P.862.2: Wideband extension to recommendation P.862 for the assessment of wideband telephone networks and speech codecs,” International Telecommunication Union, CH–Geneva, 2005. [3] Taal, C. H., Hendriks, R. C., Heusdens, R., & Jensen, J. (2010, March). A short-time objective intelligibility measure for time-frequency weighted noisy speech. In 2010 IEEE international conference on acoustics, speech and signal processing (pp. 4214-4217). IEEE. [4] Le Roux, Jonathan, et al. “SDR–half-baked or well done?.” ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019. [5] Manocha, Pranay, and Anurag Kumar. “Speech quality assessment through MOS using non-matching references.” Interspeech, 2022. .. GENERATED FROM PYTHON SOURCE LINES 62-70 .. code-block:: default import torch import torchaudio print(torch.__version__) print(torchaudio.__version__) .. rst-class:: sphx-glr-script-out .. code-block:: none 2.4.0.dev20240503 2.2.0.dev20240504 .. GENERATED FROM PYTHON SOURCE LINES 71-79 2. Preparation ^^^^^^^^^^^^^^ First import the modules and define the helper functions. We will need torch, torchaudio to use Torchaudio-squim, Matplotlib to plot data, pystoi, pesq for computing reference metrics. .. GENERATED FROM PYTHON SOURCE LINES 79-106 .. code-block:: default try: from pesq import pesq from pystoi import stoi from torchaudio.pipelines import SQUIM_OBJECTIVE, SQUIM_SUBJECTIVE except ImportError: try: import google.colab # noqa: F401 print( """ To enable running this notebook in Google Colab, install nightly torch and torchaudio builds by adding the following code block to the top of the notebook before running it: !pip3 uninstall -y torch torchvision torchaudio !pip3 install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu !pip3 install pesq !pip3 install pystoi """ ) except Exception: pass raise import matplotlib.pyplot as plt .. GENERATED FROM PYTHON SOURCE LINES 109-148 .. code-block:: default import torchaudio.functional as F from IPython.display import Audio from torchaudio.utils import download_asset def si_snr(estimate, reference, epsilon=1e-8): estimate = estimate - estimate.mean() reference = reference - reference.mean() reference_pow = reference.pow(2).mean(axis=1, keepdim=True) mix_pow = (estimate * reference).mean(axis=1, keepdim=True) scale = mix_pow / (reference_pow + epsilon) reference = scale * reference error = estimate - reference reference_pow = reference.pow(2) error_pow = error.pow(2) reference_pow = reference_pow.mean(axis=1) error_pow = error_pow.mean(axis=1) si_snr = 10 * torch.log10(reference_pow) - 10 * torch.log10(error_pow) return si_snr.item() def plot(waveform, title, sample_rate=16000): wav_numpy = waveform.numpy() sample_size = waveform.shape[1] time_axis = torch.arange(0, sample_size) / sample_rate figure, axes = plt.subplots(2, 1) axes[0].plot(time_axis, wav_numpy[0], linewidth=1) axes[0].grid(True) axes[1].specgram(wav_numpy[0], Fs=sample_rate) figure.suptitle(title) .. GENERATED FROM PYTHON SOURCE LINES 149-152 3. Load Speech and Noise Sample ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. GENERATED FROM PYTHON SOURCE LINES 152-157 .. code-block:: default SAMPLE_SPEECH = download_asset("tutorial-assets/Lab41-SRI-VOiCES-src-sp0307-ch127535-sg0042.wav") SAMPLE_NOISE = download_asset("tutorial-assets/Lab41-SRI-VOiCES-rm1-babb-mc01-stu-clo.wav") .. rst-class:: sphx-glr-script-out .. code-block:: none 0%| | 0.00/156k [00:00

.. GENERATED FROM PYTHON SOURCE LINES 196-198 Play noise sample .. GENERATED FROM PYTHON SOURCE LINES 198-202 .. code-block:: default Audio(WAVEFORM_NOISE.numpy()[0], rate=16000) .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 203-206 4. Create distorted (noisy) speech samples ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. GENERATED FROM PYTHON SOURCE LINES 206-211 .. code-block:: default snr_dbs = torch.tensor([20, -5]) WAVEFORM_DISTORTED = F.add_noise(WAVEFORM_SPEECH, WAVEFORM_NOISE, snr_dbs) .. GENERATED FROM PYTHON SOURCE LINES 212-214 Play distorted speech with 20dB SNR .. GENERATED FROM PYTHON SOURCE LINES 214-218 .. code-block:: default Audio(WAVEFORM_DISTORTED.numpy()[0], rate=16000) .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 219-221 Play distorted speech with -5dB SNR .. GENERATED FROM PYTHON SOURCE LINES 221-225 .. code-block:: default Audio(WAVEFORM_DISTORTED.numpy()[1], rate=16000) .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 226-229 5. Visualize the waveforms ^^^^^^^^^^^^^^^^^^^^^^^^^^ .. GENERATED FROM PYTHON SOURCE LINES 232-234 Visualize speech sample .. GENERATED FROM PYTHON SOURCE LINES 234-238 .. code-block:: default plot(WAVEFORM_SPEECH, "Clean Speech") .. image-sg:: /tutorials/images/sphx_glr_squim_tutorial_001.png :alt: Clean Speech :srcset: /tutorials/images/sphx_glr_squim_tutorial_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 239-241 Visualize noise sample .. GENERATED FROM PYTHON SOURCE LINES 241-245 .. code-block:: default plot(WAVEFORM_NOISE, "Noise") .. image-sg:: /tutorials/images/sphx_glr_squim_tutorial_002.png :alt: Noise :srcset: /tutorials/images/sphx_glr_squim_tutorial_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 246-248 Visualize distorted speech with 20dB SNR .. GENERATED FROM PYTHON SOURCE LINES 248-252 .. code-block:: default plot(WAVEFORM_DISTORTED[0:1], f"Distorted Speech with {snr_dbs[0]}dB SNR") .. image-sg:: /tutorials/images/sphx_glr_squim_tutorial_003.png :alt: Distorted Speech with 20dB SNR :srcset: /tutorials/images/sphx_glr_squim_tutorial_003.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 253-255 Visualize distorted speech with -5dB SNR .. GENERATED FROM PYTHON SOURCE LINES 255-259 .. code-block:: default plot(WAVEFORM_DISTORTED[1:2], f"Distorted Speech with {snr_dbs[1]}dB SNR") .. image-sg:: /tutorials/images/sphx_glr_squim_tutorial_004.png :alt: Distorted Speech with -5dB SNR :srcset: /tutorials/images/sphx_glr_squim_tutorial_004.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 260-263 6. Predict Objective Metrics ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. GENERATED FROM PYTHON SOURCE LINES 266-268 Get the pre-trained ``SquimObjective``\ model. .. GENERATED FROM PYTHON SOURCE LINES 268-272 .. code-block:: default objective_model = SQUIM_OBJECTIVE.get_model() .. rst-class:: sphx-glr-script-out .. code-block:: none Downloading: "https://download.pytorch.org/torchaudio/models/squim_objective_dns2020.pth" to /root/.cache/torch/hub/checkpoints/squim_objective_dns2020.pth 0%| | 0.00/28.2M [00:00` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: squim_tutorial.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_