VOXPOPULI_ASR_BASE_10K_EN

torchaudio.pipelines.VOXPOPULI_ASR_BASE_10K_EN

wav2vec 2.0 model (“base” architecture), pre-trained on 10k hours of unlabeled audio from VoxPopuli dataset [Wang et al., 2021] (“10k” subset, consisting of 23 languages), and fine-tuned for ASR on 543 hours of transcribed audio from “en” subset.

Originally published by the authors of VoxPopuli [Wang et al., 2021] under CC BY-NC 4.0 and redistributed with the same license. [License, Source]