torchaudio.prototype.pipelines¶
The pipelines subpackage contains APIs to models with pretrained weights and relevant utilities.
RNN-T Streaming/Non-Streaming ASR¶
EMFORMER_RNNT_BASE_MUSTC¶
- torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_MUSTC¶
Pre-trained Emformer-RNNT-based ASR pipeline capable of performing both streaming and non-streaming inference. The underlying model is constructed by
torchaudio.models.emformer_rnnt_base()
and utilizes weights trained on MuST-C release v2.0 [Cattoni et al., 2021] dataset using training scripttrain.py
here withnum_symbols=501
. Please refer totorchaudio.pipelines.RNNTBundle
for usage instructions.
EMFORMER_RNNT_BASE_TEDLIUM3¶
- torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_TEDLIUM3¶
Pre-trained Emformer-RNNT-based ASR pipeline capable of performing both streaming and non-streaming inference.
The underlying model is constructed by
torchaudio.models.emformer_rnnt_base()
and utilizes weights trained on TED-LIUM Release 3 dataset using training scripttrain.py
here withnum_symbols=501
.Please refer to
torchaudio.pipelines.RNNTBundle
for usage instructions.
HiFiGAN Vocoder¶
Interface¶
HiFiGANVocoderBundle
defines HiFiGAN Vocoder pipeline capable of transforming mel spectrograms into waveforms.
Data class that bundles associated information to use pretrained |
Pretrained Models¶
HiFiGAN Vocoder pipeline, trained on The LJ Speech Dataset [Ito and Johnson, 2017]. |