WAV2VEC2_XLSR53

torchaudio.pipelines.WAV2VEC2_XLSR53

Wav2vec 2.0 model (“base” architecture), pre-trained on 56,000 hours of unlabeled audio from multiple datasets ( Multilingual LibriSpeech [Pratap et al., 2020], CommonVoice [Ardila et al., 2020] and BABEL [Gales et al., 2014]), not fine-tuned.

Originally published by the authors of Unsupervised Cross-lingual Representation Learning for Speech Recognition [Conneau et al., 2020] under MIT License and redistributed with the same license. [License, Source]

Please refer to torchaudio.pipelines.Wav2Vec2Bundle for the usage.

WAV2VEC2_XLSR53

Docs

Tutorials

Resources