# this assumes that you have a proper version of PyTorch already installed
pip install -q torchaudio omegaconf
import torch

language = 'en'
speaker = 'lj_16khz'
device = torch.device('cpu')
model, symbols, sample_rate, example_text, apply_tts = torch.hub.load(repo_or_dir='snakers4/silero-models',
                                                                      model='silero_tts',
                                                                      language=language,
                                                                      speaker=speaker)
model = model.to(device)  # gpu or cpu
audio = apply_tts(texts=[example_text],
                  model=model,
                  sample_rate=sample_rate,
                  symbols=symbols,
                  device=device)

Model Description

Silero Text-To-Speech models provide enterprise grade TTS in a compact form-factor for several commonly spoken languages:

  • One-line usage
  • Naturally sounding speech
  • No GPU or training required
  • Minimalism and lack of dependencies
  • A library of voices in many languages
  • Support for 16kHz and 8kHz out of the box
  • High throughput on slow hardware. Decent performance on one CPU thread

Supported Languages and Formats

As of this page update, the speakers of the following languages are supported both in 8 kHz and 16 kHz:

  • Russian (6 speakers)
  • English (1 speaker)
  • German (1 speaker)
  • Spanish (1 speaker)
  • French (1 speaker)

To see the always up-to-date language list, please visit our repo and see the yml file for all available checkpoints.

Additional Examples and Benchmarks

For additional examples and other model formats please visit this link. For quality and performance benchmarks please see the wiki. These resources will be updated from time to time.

References