Shortcuts

torchaudio.models

The torchaudio.models subpackage contains definitions of models for addressing common audio tasks.

For pre-trained models, please refer to torchaudio.pipelines module.

Model Definitions

Model defintions are responsible for constructing computation graphs and executing them.

Some models have complex structure and variations. For such models, Factory Functions are provided.

Conformer

Conformer architecture introduced in Conformer: Convolution-augmented Transformer for Speech Recognition [Gulati et al., 2020].

ConvTasNet

Conv-TasNet architecture introduced in Conv-TasNet: Surpassing Ideal Time–Frequency Magnitude Masking for Speech Separation [Luo and Mesgarani, 2019].

DeepSpeech

DeepSpeech architecture introduced in Deep Speech: Scaling up end-to-end speech recognition [Hannun et al., 2014].

Emformer

Emformer architecture introduced in Emformer: Efficient Memory Transformer Based Acoustic Model for Low Latency Streaming Speech Recognition [Shi et al., 2021].

HDemucs

Hybrid Demucs model from Hybrid Spectrogram and Waveform Source Separation [Défossez, 2021].

HuBERTPretrainModel

HuBERT model used for pretraining in HuBERT [Hsu et al., 2021].

RNNT

Recurrent neural network transducer (RNN-T) model.

RNNTBeamSearch

Beam search decoder for RNN-T model.

Tacotron2

Tacotron2 model from Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions [Shen et al., 2018] based on the implementation from Nvidia Deep Learning Examples.

Wav2Letter

Wav2Letter model architecture from Wav2Letter: an End-to-End ConvNet-based Speech Recognition System [Collobert et al., 2016].

Wav2Vec2Model

Acoustic model used in wav2vec 2.0 [Baevski et al., 2020].

WaveRNN

WaveRNN model from Efficient Neural Audio Synthesis [Kalchbrenner et al., 2018] based on the implementation from fatchord/WaveRNN.

Factory Functions

conv_tasnet_base

Builds non-causal version of ConvTasNet.

emformer_rnnt_model

Builds Emformer-based RNNT.

emformer_rnnt_base

Builds basic version of Emformer-based RNNT.

wav2vec2_model

Builds custom Wav2Vec2Model.

wav2vec2_base

Builds "base" Wav2Vec2Model from wav2vec 2.0 [Baevski et al., 2020]

wav2vec2_large

Builds "large" Wav2Vec2Model from wav2vec 2.0 [Baevski et al., 2020]

wav2vec2_large_lv60k

Builds "large lv-60k" Wav2Vec2Model from wav2vec 2.0 [Baevski et al., 2020]

hubert_base

Builds "base" HuBERT from HuBERT [Hsu et al., 2021]

hubert_large

Builds "large" HuBERT from HuBERT [Hsu et al., 2021]

hubert_xlarge

Builds "extra large" HuBERT from HuBERT [Hsu et al., 2021]

hubert_pretrain_model

Builds custom HuBERTPretrainModel for training from scratch

hubert_pretrain_base

Builds "base" HuBERTPretrainModel from HuBERT [Hsu et al., 2021] for pretraining.

hubert_pretrain_large

Builds "large" HuBERTPretrainModel from HuBERT [Hsu et al., 2021] for pretraining.

hubert_pretrain_xlarge

Builds "extra large" HuBERTPretrainModel from HuBERT [Hsu et al., 2021] for pretraining.

hdemucs_low

Builds low nfft (1024) version of HDemucs, suitable for sample rates around 8 kHz.

hdemucs_medium

Builds medium nfft (2048) version of HDemucs, suitable for sample rates of 16-32 kHz.

hdemucs_high

Builds medium nfft (4096) version of HDemucs, suitable for sample rates of 44.1-48 kHz.

Utility Functions

import_fairseq_model

Builds Wav2Vec2Model from the corresponding model object of fairseq.

import_huggingface_model

Builds Wav2Vec2Model from the corresponding model object of Transformers.

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources