Shortcuts

torchaudio.prototype.pipelines

The pipelines subpackage contains APIs to models with pretrained weights and relevant utilities.

RNN-T Streaming/Non-Streaming ASR

EMFORMER_RNNT_BASE_MUSTC

torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_MUSTC

Pre-trained Emformer-RNNT-based ASR pipeline capable of performing both streaming and non-streaming inference. The underlying model is constructed by torchaudio.models.emformer_rnnt_base() and utilizes weights trained on MuST-C release v2.0 [Cattoni et al., 2021] dataset using training script train.py here with num_symbols=501. Please refer to torchaudio.pipelines.RNNTBundle for usage instructions.

EMFORMER_RNNT_BASE_TEDLIUM3

torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_TEDLIUM3

Pre-trained Emformer-RNNT-based ASR pipeline capable of performing both streaming and non-streaming inference.

The underlying model is constructed by torchaudio.models.emformer_rnnt_base() and utilizes weights trained on TED-LIUM Release 3 dataset using training script train.py here with num_symbols=501.

Please refer to torchaudio.pipelines.RNNTBundle for usage instructions.

HiFiGAN Vocoder

Interface

HiFiGANVocoderBundle defines HiFiGAN Vocoder pipeline capable of transforming mel spectrograms into waveforms.

HiFiGANVocoderBundle

Data class that bundles associated information to use pretrained HiFiGANVocoder.

Pretrained Models

HIFIGAN_VOCODER_V3_LJSPEECH

HiFiGAN Vocoder pipeline, trained on The LJ Speech Dataset [Ito and Johnson, 2017].

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources