torchaudio.models¶
The torchaudio.models
subpackage contains definitions of models for addressing common audio tasks.
Note
For models with pre-trained parameters, please refer to torchaudio.pipelines
module.
Model defintions are responsible for constructing computation graphs and executing them.
Some models have complex structure and variations. For such models, factory functions are provided.
Conformer architecture introduced in Conformer: Convolution-augmented Transformer for Speech Recognition [Gulati et al., 2020]. |
|
Conv-TasNet architecture introduced in Conv-TasNet: Surpassing Ideal Time–Frequency Magnitude Masking for Speech Separation [Luo and Mesgarani, 2019]. |
|
DeepSpeech architecture introduced in Deep Speech: Scaling up end-to-end speech recognition [Hannun et al., 2014]. |
|
Emformer architecture introduced in Emformer: Efficient Memory Transformer Based Acoustic Model for Low Latency Streaming Speech Recognition [Shi et al., 2021]. |
|
Hybrid Demucs model from Hybrid Spectrogram and Waveform Source Separation [Défossez, 2021]. |
|
HuBERT model used for pretraining in HuBERT [Hsu et al., 2021]. |
|
Recurrent neural network transducer (RNN-T) model. |
|
Beam search decoder for RNN-T model. |
|
Tacotron2 model from Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions [Shen et al., 2018] based on the implementation from Nvidia Deep Learning Examples. |
|
Wav2Letter model architecture from Wav2Letter: an End-to-End ConvNet-based Speech Recognition System [Collobert et al., 2016]. |
|
Acoustic model used in wav2vec 2.0 [Baevski et al., 2020]. |
|
WaveRNN model from Efficient Neural Audio Synthesis [Kalchbrenner et al., 2018] based on the implementation from fatchord/WaveRNN. |