Character-based TTS pipeline with Tacotron2 trained on LJSpeech [Ito and Johnson, 2017] for 1,500 epochs and WaveRNN vocoder trained on 8 bits depth waveform of LJSpeech [Ito and Johnson, 2017] for 10,000 epochs.

The text processor encodes the input texts character-by-character.

You can find the training script here. The following parameters were used; win_length=1100, hop_length=275, n_fft=2048, mel_fmin=40, and mel_fmax=11025.

You can find the training script here.

Please refer to torchaudio.pipelines.Tacotron2TTSBundle() for the usage.

Example - “Hello world! T T S stands for Text to Speech!”

Spectrogram generated by Tacotron2

Example - “The examination and testimony of the experts enabled the Commission to conclude that five shots may have been fired,”

Spectrogram generated by Tacotron2


