torchaudio.prototype.models.conformer_wav2vec2_pretrain_base¶

torchaudio.prototype.models.conformer_wav2vec2_pretrain_base(extractor_input_dim: int = 64, extractor_output_dim: int = 256, encoder_projection_dropout: float = 0.0, mask_prob: float = 0.3, mask_length: int = 3, num_negatives: int = 100, cross_sample_negatives: int = 0) → ConformerWav2Vec2PretrainModel[source]¶

Build Conformer Wav2Vec2 Model for pre-training with “small” architecture from Conformer-Based Self-Supervised Learning for Non-Speech Audio Tasks [Srivastava et al., 2022]

Parameters:

extractor_input_dim (int, optional) – Input dimension of the features. (Default: 64)
extractor_output_dim (int, optional) – Output dimension after feature extraction. (Default: 256)
encoder_projection_dropout (float, optional) – The dropout probability applied after the input feature is projected to embed_dim. (Default: 0.0)
mask_prob (float, optional) – Probability for each token to be chosen as start of the span to be masked. (Default: 0.3)
mask_length (int, optional) – The lengths of the mask. (Default: 3)
num_negatives (int, optional) – Number of sampled negatives. (Default: 0)
cross_sample_negatives (int, optional) – Number of cross sampled negatives. (Default: 0)

Returns:

The resulting model.

Return type:

ConformerWav2Vec2PretrainModel

torchaudio.prototype.models.conformer_wav2vec2_pretrain_base¶

Docs

Tutorials

Resources