TACOTRON2_GRIFFINLIM_PHONE_LJSPEECH¶

torchaudio.pipelines.TACOTRON2_GRIFFINLIM_PHONE_LJSPEECH¶

Phoneme-based TTS pipeline with Tacotron2 trained on LJSpeech [Ito and Johnson, 2017] for 1,500 epochs and GriffinLim as vocoder.

The text processor encodes the input texts based on phoneme. It uses DeepPhonemizer to convert graphemes to phonemes. The model (en_us_cmudict_forward) was trained on CMUDict.

You can find the training script here. The text processor is set to the “english_phonemes”.

Please refer to torchaudio.pipelines.Tacotron2TTSBundle() for the usage.

Example - “Hello world! T T S stands for Text to Speech!”

Example - “The examination and testimony of the experts enabled the Commission to conclude that five shots may have been fired,”

TACOTRON2_GRIFFINLIM_PHONE_LJSPEECH¶

Docs

Tutorials

Resources