HiFiGAN Vocoder pipeline, trained on The LJ Speech Dataset [Ito and Johnson, 2017].

This pipeine can be used with an external component which generates mel spectrograms from text, for example, Tacotron2 - see examples in HiFiGANVocoderBundle. Although this works with the existing Tacotron2 bundles, for the best results one needs to retrain Tacotron2 using the same data preprocessing pipeline which was used for training HiFiGAN. In particular, the original HiFiGAN implementation uses a custom method of generating mel spectrograms from waveforms, different from torchaudio.transforms.MelSpectrogram. We reimplemented this transform as HiFiGANVocoderBundle.get_mel_transform(), making sure it is equivalent to the original HiFiGAN code here.

The underlying vocoder is constructed by torchaudio.prototype.models.hifigan_vocoder(). The weights are converted from the ones published with the original paper [Kong et al., 2020] under MIT License. See links to pre-trained models on GitHub.

Please refer to HiFiGANVocoderBundle for usage instructions.


Access comprehensive developer documentation for PyTorch

View Docs


Get in-depth tutorials for beginners and advanced developers

View Tutorials


Find development resources and get your questions answered

View Resources