Wav2Vec2Bundle¶
- class torchaudio.pipelines.Wav2Vec2Bundle[source]¶
Data class that bundles associated information to use pretrained
Wav2Vec2Model
.This class provides interfaces for instantiating the pretrained model along with the information necessary to retrieve pretrained weights and additional data to be used with the model.
Torchaudio library instantiates objects of this class, each of which represents a different pretrained model. Client code should access pretrained models via these instances.
Please see below for the usage and the available values.
- Example - Feature Extraction
>>> import torchaudio >>> >>> bundle = torchaudio.pipelines.HUBERT_BASE >>> >>> # Build the model and load pretrained weight. >>> model = bundle.get_model() Downloading: 100%|███████████████████████████████| 360M/360M [00:06<00:00, 60.6MB/s] >>> >>> # Resample audio to the expected sampling rate >>> waveform = torchaudio.functional.resample(waveform, sample_rate, bundle.sample_rate) >>> >>> # Extract acoustic features >>> features, _ = model.extract_features(waveform)
sample_rate¶
get_model¶
- Wav2Vec2Bundle.get_model(*, dl_kwargs=None) Module [source]¶
Construct the model and load the pretrained weight.
The weight file is downloaded from the internet and cached with
torch.hub.load_state_dict_from_url()
- Parameters:
dl_kwargs (dictionary of keyword arguments) – Passed to
torch.hub.load_state_dict_from_url()
.- Returns:
Variation of
Wav2Vec2Model
.For the models listed below, an additional layer normalization is performed on the input.
For all other models, a
Wav2Vec2Model
instance is returned.WAV2VEC2_LARGE_LV60K
WAV2VEC2_ASR_LARGE_LV60K_10M
WAV2VEC2_ASR_LARGE_LV60K_100H
WAV2VEC2_ASR_LARGE_LV60K_960H
WAV2VEC2_XLSR53
WAV2VEC2_XLSR_300M
WAV2VEC2_XLSR_1B
WAV2VEC2_XLSR_2B
HUBERT_LARGE
HUBERT_XLARGE
HUBERT_ASR_LARGE
HUBERT_ASR_XLARGE
WAVLM_LARGE