VOXPOPULI_ASR_BASE_10K_ES

torchaudio.pipelines.VOXPOPULI_ASR_BASE_10K_ES

wav2vec 2.0 model (“base” architecture), pre-trained on 10k hours of unlabeled audio from VoxPopuli dataset [Wang et al., 2021] (“10k” subset, consisting of 23 languages), and fine-tuned for ASR on 166 hours of transcribed audio from “es” subset.

Originally published by the authors of VoxPopuli [Wang et al., 2021] under CC BY-NC 4.0 and redistributed with the same license. [License, Source]