Character-based TTS pipeline with Tacotron2 trained on LJSpeech [Ito and Johnson, 2017] for 1,500 epochs and WaveRNN vocoder trained on 8 bits depth waveform of LJSpeech [Ito and Johnson, 2017] for 10,000 epochs.

The text processor encodes the input texts character-by-character.

You can find the training script here. The following parameters were used; win_length=1100, hop_length=275, n_fft=2048, mel_fmin=40, and mel_fmax=11025.

You can find the training script here.

Please refer to torchaudio.pipelines.Tacotron2TTSBundle() for the usage.

Example - “Hello world! T T S stands for Text to Speech!”

Spectrogram generated by Tacotron2

Example - “The examination and testimony of the experts enabled the Commission to conclude that five shots may have been fired,”

Spectrogram generated by Tacotron2


Access comprehensive developer documentation for PyTorch

View Docs


Get in-depth tutorials for beginners and advanced developers

View Tutorials


Find development resources and get your questions answered

View Resources