Shortcuts

torchaudio.prototype.models.conformer_wav2vec2_pretrain_large

torchaudio.prototype.models.conformer_wav2vec2_pretrain_large(extractor_input_dim: int = 64, extractor_output_dim: int = 256, encoder_projection_dropout: float = 0.0, mask_prob: float = 0.3, mask_length: int = 3, num_negatives: int = 100, cross_sample_negatives: int = 0) ConformerWav2Vec2PretrainModel[source]

Build Conformer Wav2Vec2 Model for pre-training with “large” architecture from Conformer-Based Slef-Supervised Learning for Non-Speech Audio Tasks [Srivastava et al., 2022]

Parameters:
  • extractor_input_dim (int, optional) – Input dimension of the features. (Default: 64)

  • extractor_output_dim (int, optional) – Output dimension after feature extraction. (Default: 256)

  • encoder_projection_dropout (float, optional) – The dropout probability applied after the input feature is projected to embed_dim. (Default: 0.0)

  • mask_prob (float, optional) – Probability for each token to be chosen as start of the span to be masked. (Default: 0.3)

  • mask_length (int, optional) – The lengths of the mask. (Default: 3)

  • num_negatives (int, optional) – Number of sampled negatives. (Default: 0)

  • cross_sample_negatives (int, optional) – Number of cross sampled negatives. (Default: 0)

Returns:

The resulting model.

Return type:

ConformerWav2Vec2PretrainModel

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources