.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "tutorials/mvdr_tutorial.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_tutorials_mvdr_tutorial.py: MVDR with torchaudio ==================== **Author** `Zhaoheng Ni `__ .. GENERATED FROM PYTHON SOURCE LINES 10-24 Overview -------- This is a tutorial on how to apply MVDR beamforming by using `torchaudio `__. Steps - Ideal Ratio Mask (IRM) is generated by dividing the clean/noise magnitude by the mixture magnitude. - We test all three solutions (``ref_channel``, ``stv_evd``, ``stv_power``) of torchaudio's MVDR module. - We test the single-channel and multi-channel masks for MVDR beamforming. The multi-channel mask is averaged along channel dimension when computing the covariance matrices of speech and noise, respectively. .. GENERATED FROM PYTHON SOURCE LINES 27-45 Preparation ----------- First, we import the necessary packages and retrieve the data. The multi-channel audio example is selected from `ConferencingSpeech `__ dataset. The original filename is ``SSB07200001\#noise-sound-bible-0038\#7.86_6.16_3.00_3.14_4.84_134.5285_191.7899_0.4735\#15217\#25.16333303751458\#0.2101221178590021.wav`` which was generated with; - ``SSB07200001.wav`` from `AISHELL-3 `__ (Apache License v.2.0) - ``noise-sound-bible-0038.wav`` from `MUSAN `__ (Attribution 4.0 International — CC BY 4.0) .. GENERATED FROM PYTHON SOURCE LINES 45-72 .. code-block:: default import os import requests import torch import torchaudio import IPython.display as ipd torch.random.manual_seed(0) device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') print(torch.__version__) print(torchaudio.__version__) print(device) filenames = [ 'mix.wav', 'reverb_clean.wav', 'clean.wav', ] base_url = 'https://download.pytorch.org/torchaudio/tutorial-assets/mvdr' for filename in filenames: os.makedirs('_assets', exist_ok=True) if not os.path.exists(filename): with open(f'_assets/{filename}', 'wb') as file: file.write(requests.get(f'{base_url}/{filename}').content) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none 1.10.0+cpu 0.10.0+cpu cpu .. GENERATED FROM PYTHON SOURCE LINES 73-76 Generate the Ideal Ratio Mask (IRM) ----------------------------------- .. GENERATED FROM PYTHON SOURCE LINES 78-81 Loading audio data ~~~~~~~~~~~~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 81-89 .. code-block:: default mix, sr = torchaudio.load('_assets/mix.wav') reverb_clean, sr2 = torchaudio.load('_assets/reverb_clean.wav') clean, sr3 = torchaudio.load('_assets/clean.wav') assert sr == sr2 noise = mix - reverb_clean .. GENERATED FROM PYTHON SOURCE LINES 90-94 .. note:: The MVDR Module requires ``torch.cdouble`` dtype for noisy STFT. We need to convert the dtype of the waveforms to ``torch.double`` .. GENERATED FROM PYTHON SOURCE LINES 95-101 .. code-block:: default mix = mix.to(torch.double) noise = noise.to(torch.double) clean = clean.to(torch.double) reverb_clean = reverb_clean.to(torch.double) .. GENERATED FROM PYTHON SOURCE LINES 102-105 Compute STFT ~~~~~~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 105-118 .. code-block:: default stft = torchaudio.transforms.Spectrogram( n_fft=1024, hop_length=256, power=None, ) istft = torchaudio.transforms.InverseSpectrogram(n_fft=1024, hop_length=256) spec_mix = stft(mix) spec_clean = stft(clean) spec_reverb_clean = stft(reverb_clean) spec_noise = stft(noise) .. GENERATED FROM PYTHON SOURCE LINES 119-126 Generate the Ideal Ratio Mask (IRM) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. note:: We found using the mask directly peforms better than using the square root of it. This is slightly different from the definition of IRM. .. GENERATED FROM PYTHON SOURCE LINES 126-137 .. code-block:: default def get_irms(spec_clean, spec_noise, spec_mix): mag_mix = spec_mix.abs() ** 2 mag_clean = spec_clean.abs() ** 2 mag_noise = spec_noise.abs() ** 2 irm_speech = mag_clean / (mag_clean + mag_noise) irm_noise = mag_noise / (mag_clean + mag_noise) return irm_speech, irm_noise .. GENERATED FROM PYTHON SOURCE LINES 138-141 .. note:: We use reverberant clean speech as the target here, you can also set it to dry clean speech. .. GENERATED FROM PYTHON SOURCE LINES 141-144 .. code-block:: default irm_speech, irm_noise = get_irms(spec_reverb_clean, spec_noise, spec_mix) .. GENERATED FROM PYTHON SOURCE LINES 145-148 Apply MVDR ---------- .. GENERATED FROM PYTHON SOURCE LINES 150-153 Apply MVDR beamforming by using multi-channel masks ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 153-161 .. code-block:: default results_multi = {} for solution in ['ref_channel', 'stv_evd', 'stv_power']: mvdr = torchaudio.transforms.MVDR(ref_channel=0, solution=solution, multi_mask=True) stft_est = mvdr(spec_mix, irm_speech, irm_noise) est = istft(stft_est, length=mix.shape[-1]) results_multi[solution] = est .. GENERATED FROM PYTHON SOURCE LINES 162-167 Apply MVDR beamforming by using single-channel masks ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We use the 1st channel as an example. The channel selection may depend on the design of the microphone array .. GENERATED FROM PYTHON SOURCE LINES 167-175 .. code-block:: default results_single = {} for solution in ['ref_channel', 'stv_evd', 'stv_power']: mvdr = torchaudio.transforms.MVDR(ref_channel=0, solution=solution, multi_mask=False) stft_est = mvdr(spec_mix, irm_speech[0], irm_noise[0]) est = istft(stft_est, length=mix.shape[-1]) results_single[solution] = est .. GENERATED FROM PYTHON SOURCE LINES 176-179 Compute Si-SDR scores ~~~~~~~~~~~~~~~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 179-199 .. code-block:: default def si_sdr(estimate, reference, epsilon=1e-8): estimate = estimate - estimate.mean() reference = reference - reference.mean() reference_pow = reference.pow(2).mean(axis=1, keepdim=True) mix_pow = (estimate * reference).mean(axis=1, keepdim=True) scale = mix_pow / (reference_pow + epsilon) reference = scale * reference error = estimate - reference reference_pow = reference.pow(2) error_pow = error.pow(2) reference_pow = reference_pow.mean(axis=1) error_pow = error_pow.mean(axis=1) sisdr = 10 * torch.log10(reference_pow) - 10 * torch.log10(error_pow) return sisdr.item() .. GENERATED FROM PYTHON SOURCE LINES 200-203 Results ------- .. GENERATED FROM PYTHON SOURCE LINES 205-208 Single-channel mask results ~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 208-212 .. code-block:: default for solution in results_single: print(solution+": ", si_sdr(results_single[solution][None,...], reverb_clean[0:1])) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none ref_channel: 15.035907456985868 stv_evd: 16.563734673832553 stv_power: 17.820481909929907 .. GENERATED FROM PYTHON SOURCE LINES 213-216 Multi-channel mask results ~~~~~~~~~~~~~~~~~~~~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 216-220 .. code-block:: default for solution in results_multi: print(solution+": ", si_sdr(results_multi[solution][None,...], reverb_clean[0:1])) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none ref_channel: 13.177373866143256 stv_evd: 12.433610809532858 stv_power: 12.897505397104673 .. GENERATED FROM PYTHON SOURCE LINES 221-224 Original audio -------------- .. GENERATED FROM PYTHON SOURCE LINES 226-229 Mixture speech ~~~~~~~~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 229-232 .. code-block:: default ipd.Audio(mix[0], rate=16000) .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 233-236 Noise ~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 236-239 .. code-block:: default ipd.Audio(noise[0], rate=16000) .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 240-243 Clean speech ~~~~~~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 243-246 .. code-block:: default ipd.Audio(clean[0], rate=16000) .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 247-250 Enhanced audio -------------- .. GENERATED FROM PYTHON SOURCE LINES 252-255 Multi-channel mask, ref_channel solution ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 255-258 .. code-block:: default ipd.Audio(results_multi['ref_channel'], rate=16000) .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 259-262 Multi-channel mask, stv_evd solution ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 262-265 .. code-block:: default ipd.Audio(results_multi['stv_evd'], rate=16000) .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 266-269 Multi-channel mask, stv_power solution ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 269-272 .. code-block:: default ipd.Audio(results_multi['stv_power'], rate=16000) .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 273-276 Single-channel mask, ref_channel solution ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 276-279 .. code-block:: default ipd.Audio(results_single['ref_channel'], rate=16000) .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 280-283 Single-channel mask, stv_evd solution ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 283-286 .. code-block:: default ipd.Audio(results_single['stv_evd'], rate=16000) .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 287-290 Single-channel mask, stv_power solution ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 290-292 .. code-block:: default ipd.Audio(results_single['stv_power'], rate=16000) .. raw:: html


.. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 0.841 seconds) .. _sphx_glr_download_tutorials_mvdr_tutorial.py: .. only :: html .. container:: sphx-glr-footer :class: sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: mvdr_tutorial.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: mvdr_tutorial.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_