HiFiGAN Vocoder pipeline, trained on The LJ Speech Dataset [Ito and Johnson, 2017].
This pipeine can be used with an external component which generates mel spectrograms from text, for example, Tacotron2 - see examples in
HiFiGANVocoderBundle. Although this works with the existing Tacotron2 bundles, for the best results one needs to retrain Tacotron2 using the same data preprocessing pipeline which was used for training HiFiGAN. In particular, the original HiFiGAN implementation uses a custom method of generating mel spectrograms from waveforms, different from
torchaudio.transforms.MelSpectrogram. We reimplemented this transform as
HiFiGANVocoderBundle.get_mel_transform(), making sure it is equivalent to the original HiFiGAN code here.
The underlying vocoder is constructed by
torchaudio.prototype.models.hifigan_vocoder(). The weights are converted from the ones published with the original paper [Kong et al., 2020] under MIT License. See links to pre-trained models on GitHub.
Please refer to
HiFiGANVocoderBundlefor usage instructions.