Silero Text-To-Speech Models

# this assumes that you have a proper version of PyTorch already installed
pip install -q torchaudio omegaconf

import torch

language = 'en'
speaker = 'lj_16khz'
device = torch.device('cpu')
model, symbols, sample_rate, example_text, apply_tts = torch.hub.load(repo_or_dir='snakers4/silero-models',
                                                                      model='silero_tts',
                                                                      language=language,
                                                                      speaker=speaker)
model = model.to(device)  # gpu or cpu
audio = apply_tts(texts=[example_text],
                  model=model,
                  sample_rate=sample_rate,
                  symbols=symbols,
                  device=device)

Model Description

Silero Text-To-Speech models provide enterprise grade TTS in a compact form-factor for several commonly spoken languages:

One-line usage
Naturally sounding speech
No GPU or training required
Minimalism and lack of dependencies
A library of voices in many languages
Support for 16kHz and 8kHz out of the box
High throughput on slow hardware. Decent performance on one CPU thread

Supported Languages and Formats

As of this page update, the speakers of the following languages are supported both in 8 kHz and 16 kHz:

Russian (6 speakers)
English (1 speaker)
German (1 speaker)
Spanish (1 speaker)
French (1 speaker)

To see the always up-to-date language list, please visit our repo and see the yml file for all available checkpoints.

Additional Examples and Benchmarks

For additional examples and other model formats please visit this link. For quality and performance benchmarks please see the wiki. These resources will be updated from time to time.

Silero Text-To-Speech Models

Model Description

Supported Languages and Formats

Additional Examples and Benchmarks

References

Docs

Tutorials

Resources