Shortcuts

TACOTRON2_WAVERNN_PHONE_LJSPEECH

torchaudio.pipelines.TACOTRON2_WAVERNN_PHONE_LJSPEECH

Phoneme-based TTS pipeline with Tacotron2 trained on LJSpeech [Ito and Johnson, 2017] for 1,500 epochs, and WaveRNN vocoder trained on 8 bits depth waveform of LJSpeech [Ito and Johnson, 2017] for 10,000 epochs.

The text processor encodes the input texts based on phoneme. It uses DeepPhonemizer to convert graphemes to phonemes. The model (en_us_cmudict_forward) was trained on CMUDict.

You can find the training script for Tacotron2 here. The following parameters were used; win_length=1100, hop_length=275, n_fft=2048, mel_fmin=40, and mel_fmax=11025.

You can find the training script for WaveRNN here.

Please refer to torchaudio.pipelines.Tacotron2TTSBundle() for the usage.

Example - “Hello world! T T S stands for Text to Speech!”

Spectrogram generated by Tacotron2

Example - “The examination and testimony of the experts enabled the Commission to conclude that five shots may have been fired,”

Spectrogram generated by Tacotron2

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources