WAV2VEC2_XLSR_300M¶

torchaudio.pipelines.WAV2VEC2_XLSR_300M¶

XLS-R model with 300 million parameters, pre-trained on 436,000 hours of unlabeled audio from multiple datasets ( Multilingual LibriSpeech [Pratap et al., 2020], CommonVoice [Ardila et al., 2020], VoxLingua107 [Valk and Alumäe, 2021], BABEL [Gales et al., 2014], and VoxPopuli [Wang et al., 2021]) in 128 languages, not fine-tuned.

Originally published by the authors of XLS-R [Babu et al., 2021] under MIT License and redistributed with the same license. [License, Source]

Docs