torchaudio.prototype.models.conformer_wav2vec2_pretrain_model¶

torchaudio.prototype.models.conformer_wav2vec2_pretrain_model(extractor_input_dim: int, extractor_output_dim: int, extractor_stride: int, encoder_embed_dim: int, encoder_projection_dropout: float, encoder_num_layers: int, encoder_num_heads: int, encoder_ff_interm_features: int, encoder_depthwise_conv_kernel_size: int, encoder_dropout: float, encoder_convolution_first: bool, encoder_use_group_norm: bool, mask_prob: float, mask_selection: str, mask_other: float, mask_length: int, no_mask_overlap: bool, mask_min_space: int, mask_channel_prob: float, mask_channel_selection: str, mask_channel_other: float, mask_channel_length: int, no_mask_channel_overlap: bool, mask_channel_min_space: int, num_negatives: int, cross_sample_negatives: int) → ConformerWav2Vec2PretrainModel[source]¶

Build a custom Conformer Wav2Vec2 Model for pre-training

Parameters:

extractor_input_dim (int) – Input dimension of the features.
extractor_output_dim (int) – Output dimension after feature extraction.
extractor_stride (int) – Stride used in time reduction layer of feature extraction.
encoder_embed_dim (int) – The dimension of the embedding in the feature projection.
encoder_projection_dropout (float) – The dropout probability applied after the input feature is projected to embed_dim
encoder_num_layers (int) – Number of Conformer layers in the encoder.
encoder_num_heads (int) – Number of heads in each Conformer layer.
encoder_ff_interm_features (int) – Hidden layer dimension of the feedforward network in each Conformer layer.
encoder_depthwise_conv_kernel_size (int or List[int]) – List of kernel sizes corresponding to each of the Conformer layers. If int is provided, all layers will have the same kernel size.
encoder_dropout (float) – Dropout probability in each Conformer layer.
encoder_convolution_first (bool) – Whether to apply the convolution module ahead of the attention module in each Conformer layer.
encoder_use_group_norm (bool) – Whether to use GroupNorm rather than BatchNorm1d in the convolution module in each Conformer layer.
mask_prob (float) – Probability for each token to be chosen as start of the span to be masked.
mask_selection (str) – How to choose the mask length. Options: [static, uniform, normal, poisson].
mask_other (float) – Secondary mask argument (used for more complex distributions).
mask_length (int) – The lengths of the mask.
no_mask_overlap (bool) – Whether to allow masks to overlap.
mask_min_space (int) – Minimum space between spans (if no overlap is enabled).
mask_channel_prob – (float): The probability of replacing a feature with 0.
mask_channel_selection (str) – How to choose the mask length for channel masking. Options: [static, uniform, normal, poisson].
mask_channel_other (float) – Secondary mask argument for channel masking (used for more complex distributions).
mask_channel_length (int) – Minimum space between spans (if no overlap is enabled) for channel masking.
no_mask_channel_overlap (bool) – Whether to allow channel masks to overlap.
mask_channel_min_space (int) – Minimum space between spans for channel masking (if no overlap is enabled).
num_negatives (int) – Number of negatives to sample.
cross_sample_negatives (int) – Number of cross sampled negatives.

Returns:

The resulting model.

Return type:

ConformerWav2Vec2PretrainModel

torchaudio.prototype.models.conformer_wav2vec2_pretrain_model¶

Docs

Tutorials

Resources