• Docs >
  • torchaudio.models >
  • Current (stable)
Shortcuts

torchaudio.models

The torchaudio.models subpackage contains definitions of models for addressing common audio tasks.

Note

For models with pre-trained parameters, please refer to torchaudio.pipelines module.

Model defintions are responsible for constructing computation graphs and executing them.

Some models have complex structure and variations. For such models, factory functions are provided.

Conformer

Conformer architecture introduced in Conformer: Convolution-augmented Transformer for Speech Recognition [Gulati et al., 2020].

ConvTasNet

Conv-TasNet architecture introduced in Conv-TasNet: Surpassing Ideal Time–Frequency Magnitude Masking for Speech Separation [Luo and Mesgarani, 2019].

DeepSpeech

DeepSpeech architecture introduced in Deep Speech: Scaling up end-to-end speech recognition [Hannun et al., 2014].

Emformer

Emformer architecture introduced in Emformer: Efficient Memory Transformer Based Acoustic Model for Low Latency Streaming Speech Recognition [Shi et al., 2021].

HDemucs

Hybrid Demucs model from Hybrid Spectrogram and Waveform Source Separation [Défossez, 2021].

HuBERTPretrainModel

HuBERT model used for pretraining in HuBERT [Hsu et al., 2021].

RNNT

Recurrent neural network transducer (RNN-T) model.

RNNTBeamSearch

Beam search decoder for RNN-T model.

SquimObjective

Speech Quality and Intelligibility Measures (SQUIM) model that predicts objective metric scores for speech enhancement (e.g., STOI, PESQ, and SI-SDR).

SquimSubjective

Speech Quality and Intelligibility Measures (SQUIM) model that predicts subjective metric scores for speech enhancement (e.g., Mean Opinion Score (MOS)).

Tacotron2

Tacotron2 model from Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions [Shen et al., 2018] based on the implementation from Nvidia Deep Learning Examples.

Wav2Letter

Wav2Letter model architecture from Wav2Letter: an End-to-End ConvNet-based Speech Recognition System [Collobert et al., 2016].

Wav2Vec2Model

Acoustic model used in wav2vec 2.0 [Baevski et al., 2020].

WaveRNN

WaveRNN model from Efficient Neural Audio Synthesis [Kalchbrenner et al., 2018] based on the implementation from fatchord/WaveRNN.

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources