.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "tutorials/hybrid_demucs_tutorial.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_tutorials_hybrid_demucs_tutorial.py: Music Source Separation with Hybrid Demucs ========================================== **Author**: `Sean Kim `__ This tutorial shows how to use the Hybrid Demucs model in order to perform music separation .. GENERATED FROM PYTHON SOURCE LINES 13-32 1. Overview ----------- Performing music separation is composed of the following steps 1. Build the Hybrid Demucs pipeline. 2. Format the waveform into chunks of expected sizes and loop through chunks (with overlap) and feed into pipeline. 3. Collect output chunks and combine according to the way they have been overlapped. The Hybrid Demucs [`Défossez, 2021 `__] model is a developed version of the `Demucs `__ model, a waveform based model which separates music into its respective sources, such as vocals, bass, and drums. Hybrid Demucs effectively uses spectrogram to learn through the frequency domain and also moves to time convolutions. .. GENERATED FROM PYTHON SOURCE LINES 35-41 2. Preparation -------------- First, we install the necessary dependencies. The first requirement is ``torchaudio`` and ``torch`` .. GENERATED FROM PYTHON SOURCE LINES 41-50 .. code-block:: default import torch import torchaudio print(torch.__version__) print(torchaudio.__version__) import matplotlib.pyplot as plt .. rst-class:: sphx-glr-script-out .. code-block:: none 2.2.0 2.2.0 .. GENERATED FROM PYTHON SOURCE LINES 51-55 In addition to ``torchaudio``, ``mir_eval`` is required to perform signal-to-distortion ratio (SDR) calculations. To install ``mir_eval`` please use ``pip3 install mir_eval``. .. GENERATED FROM PYTHON SOURCE LINES 55-61 .. code-block:: default from IPython.display import Audio from mir_eval import separation from torchaudio.pipelines import HDEMUCS_HIGH_MUSDB_PLUS from torchaudio.utils import download_asset .. GENERATED FROM PYTHON SOURCE LINES 62-72 3. Construct the pipeline ------------------------- Pre-trained model weights and related pipeline components are bundled as :py:func:`torchaudio.pipelines.HDEMUCS_HIGH_MUSDB_PLUS`. This is a :py:class:`torchaudio.models.HDemucs` model trained on `MUSDB18-HQ `__ and additional internal extra training data. This specific model is suited for higher sample rates, around 44.1 kHZ and has a nfft value of 4096 with a depth of 6 in the model implementation. .. GENERATED FROM PYTHON SOURCE LINES 72-85 .. code-block:: default bundle = HDEMUCS_HIGH_MUSDB_PLUS model = bundle.get_model() device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") model.to(device) sample_rate = bundle.sample_rate print(f"Sample rate: {sample_rate}") .. rst-class:: sphx-glr-script-out .. code-block:: none 0%| | 0.00/319M [00:00= length: fade.fade_out_len = 0 return final def plot_spectrogram(stft, title="Spectrogram"): magnitude = stft.abs() spectrogram = 20 * torch.log10(magnitude + 1e-8).numpy() _, axis = plt.subplots(1, 1) axis.imshow(spectrogram, cmap="viridis", vmin=-60, vmax=0, origin="lower", aspect="auto") axis.set_title(title) plt.tight_layout() .. GENERATED FROM PYTHON SOURCE LINES 172-187 5. Run Model ------------ Finally, we run the model and store the separate source files in a directory As a test song, we will be using A Classic Education by NightOwl from MedleyDB (Creative Commons BY-NC-SA 4.0). This is also located in `MUSDB18-HQ `__ dataset within the ``train`` sources. In order to test with a different song, the variable names and urls below can be changed alongside with the parameters to test the song separator in different ways. .. GENERATED FROM PYTHON SOURCE LINES 187-217 .. code-block:: default # We download the audio file from our storage. Feel free to download another file and use audio from a specific path SAMPLE_SONG = download_asset("tutorial-assets/hdemucs_mix.wav") waveform, sample_rate = torchaudio.load(SAMPLE_SONG) # replace SAMPLE_SONG with desired path for different song waveform = waveform.to(device) mixture = waveform # parameters segment: int = 10 overlap = 0.1 print("Separating track") ref = waveform.mean(0) waveform = (waveform - ref.mean()) / ref.std() # normalization sources = separate_sources( model, waveform[None], device=device, segment=segment, overlap=overlap, )[0] sources = sources * ref.std() + ref.mean() sources_list = model.sources sources = list(sources) audios = dict(zip(sources_list, sources)) .. rst-class:: sphx-glr-script-out .. code-block:: none 0%| | 0.00/28.8M [00:00

.. GENERATED FROM PYTHON SOURCE LINES 300-302 Drums SDR, Spectrogram, and Audio .. GENERATED FROM PYTHON SOURCE LINES 302-306 .. code-block:: default # Drums Clip output_results(drums, drums_spec, "drums") .. image-sg:: /tutorials/images/sphx_glr_hybrid_demucs_tutorial_002.png :alt: Spectrogram - drums :srcset: /tutorials/images/sphx_glr_hybrid_demucs_tutorial_002.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none SDR score is: 4.964103512281138 .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 307-309 Bass SDR, Spectrogram, and Audio .. GENERATED FROM PYTHON SOURCE LINES 309-313 .. code-block:: default # Bass Clip output_results(bass, bass_spec, "bass") .. image-sg:: /tutorials/images/sphx_glr_hybrid_demucs_tutorial_003.png :alt: Spectrogram - bass :srcset: /tutorials/images/sphx_glr_hybrid_demucs_tutorial_003.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none SDR score is: 18.905954431001057 .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 314-316 Vocals SDR, Spectrogram, and Audio .. GENERATED FROM PYTHON SOURCE LINES 316-320 .. code-block:: default # Vocals Audio output_results(vocals, vocals_spec, "vocals") .. image-sg:: /tutorials/images/sphx_glr_hybrid_demucs_tutorial_004.png :alt: Spectrogram - vocals :srcset: /tutorials/images/sphx_glr_hybrid_demucs_tutorial_004.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none SDR score is: 8.792216836345062 .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 321-323 Other SDR, Spectrogram, and Audio .. GENERATED FROM PYTHON SOURCE LINES 323-327 .. code-block:: default # Other Clip output_results(other, other_spec, "other") .. image-sg:: /tutorials/images/sphx_glr_hybrid_demucs_tutorial_005.png :alt: Spectrogram - other :srcset: /tutorials/images/sphx_glr_hybrid_demucs_tutorial_005.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none SDR score is: 8.866916703002428 .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 328-349 .. code-block:: default # Optionally, the full audios can be heard in from running the next 5 # cells. They will take a bit longer to load, so to run simply uncomment # out the ``Audio`` cells for the respective track to produce the audio # for the full song. # # Full Audio # Audio(mixture, rate=sample_rate) # Drums Audio # Audio(audios["drums"], rate=sample_rate) # Bass Audio # Audio(audios["bass"], rate=sample_rate) # Vocals Audio # Audio(audios["vocals"], rate=sample_rate) # Other Audio # Audio(audios["other"], rate=sample_rate) .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 22.977 seconds) .. _sphx_glr_download_tutorials_hybrid_demucs_tutorial.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: hybrid_demucs_tutorial.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: hybrid_demucs_tutorial.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_