ConformerWav2Vec2PretrainModel¶
- class torchaudio.prototype.models.ConformerWav2Vec2PretrainModel(wav2vec2: Wav2Vec2Model, mask_generator: Module, negative_sampler: Module)[source]¶
Conformer Wav2Vec2 pre-train model for training from scratch.
Note
To build the model, please use one of the factory functions,
conformer_wav2vec2_base()
orconformer_wav2vec2_large()
- Parameters:
wav2vec2 (nn.Module) – Conformer based Wav2Vec2 model, including feature extractor and conformer encoder components.
mask_generator (nn.Module) – Mask generator that generates the mask for masked prediction during training.
negative_sampler (nn.Module) – Negative sampler to apply after masking.
Methods¶
forward¶
- ConformerWav2Vec2PretrainModel.forward(features: Tensor, audio_lengths: Optional[Tensor] = None) Tuple[Tensor, Optional[Tensor], Tensor, Tensor] [source]¶
- Parameters:
features (Tensor) – Tensor of audio features of shape (batch, frame, dim).
audio_lengths (Tensor or None, optional) – Tensor of valid length of each valid auidio in the batch. shape: (batch, ) (Default:
None
)
- Returns:
- Tensor
The masked sequences of probability distribution of shape (batch, frame dim).
- Tensor or None
If
lengths
argument was provided, a Tensor of shape (batch, ) representing valid length in time axis is returns.- Tensor
The mask indices.
- Tensor
The targets, prior to negative sampling.
- Tensor
The negative samples.
- Tensor
The indices of the negative samples.
- Return type:
(Tensor, Optional[Tensor], Tensor, Tensor, Tensor, Tensor)
Factory Functions¶
Build a custom Conformer Wav2Vec2 Model for pre-training |
|
Build Conformer Wav2Vec2 Model for pre-training with "small" architecture from Conformer-Based Self-Supervised Learning for Non-Speech Audio Tasks [Srivastava et al., 2022] |
|
Build Conformer Wav2Vec2 Model for pre-training with "large" architecture from Conformer-Based Slef-Supervised Learning for Non-Speech Audio Tasks [Srivastava et al., 2022] |