torchaudio.prototype.models.conformer_wav2vec2_pretrain_large¶
- torchaudio.prototype.models.conformer_wav2vec2_pretrain_large(extractor_input_dim: int = 64, extractor_output_dim: int = 256, encoder_projection_dropout: float = 0.0, mask_prob: float = 0.3, mask_length: int = 3, num_negatives: int = 100, cross_sample_negatives: int = 0) ConformerWav2Vec2PretrainModel [source]¶
Build Conformer Wav2Vec2 Model for pre-training with “large” architecture from Conformer-Based Slef-Supervised Learning for Non-Speech Audio Tasks [Srivastava et al., 2022]
- Parameters:
extractor_input_dim (int, optional) – Input dimension of the features. (Default: 64)
extractor_output_dim (int, optional) – Output dimension after feature extraction. (Default: 256)
encoder_projection_dropout (float, optional) – The dropout probability applied after the input feature is projected to
embed_dim
. (Default: 0.0)mask_prob (float, optional) – Probability for each token to be chosen as start of the span to be masked. (Default: 0.3)
mask_length (int, optional) – The lengths of the mask. (Default: 3)
num_negatives (int, optional) – Number of sampled negatives. (Default: 0)
cross_sample_negatives (int, optional) – Number of cross sampled negatives. (Default: 0)
- Returns:
The resulting model.
- Return type: