torchaudio.prototype.models¶
The torchaudio.prototype.models
subpackage contains definitions of models for addressing common audio tasks.
Note
For models with pre-trained parameters, please refer to torchaudio.prototype.pipelines
module.
Model defintions are responsible for constructing computation graphs and executing them.
Some models have complex structure and variations. For such models, factory functions are provided.
Conformer Wav2Vec2 pre-train model for training from scratch. |
|
Implements the convolution-augmented streaming transformer architecture introduced in Streaming Transformer Transducer based Speech Recognition Using Non-Causal Convolution [Shi et al., 2022]. |
|
Generator part of HiFi GAN [Kong et al., 2020]. |
|
Speech Quality and Intelligibility Measures (SQUIM) model that predicts objective metric scores for speech enhancement (e.g., STOI, PESQ, and SI-SDR). |
|
Speech Quality and Intelligibility Measures (SQUIM) model that predicts subjective metric scores for speech enhancement (e.g., Mean Opinion Score (MOS)). |
Prototype Factory Functions of Beta Models¶
Some model definitions are in beta, but there are new factory functions that are still in prototype. Please check “Prototype Factory Functions” section in each model.
Acoustic model used in wav2vec 2.0 [Baevski et al., 2020]. |
|
Recurrent neural network transducer (RNN-T) model. |