torchaudio.prototype.models.conformer_wav2vec2_model(extractor_input_dim: int, extractor_output_dim: int, extractor_stride: int, encoder_embed_dim: int, encoder_projection_dropout: float, encoder_num_layers: int, encoder_num_heads: int, encoder_ff_interm_features: int, encoder_depthwise_conv_kernel_size: Union[int, List[int]], encoder_dropout: float, encoder_convolution_first: bool, encoder_use_group_norm: bool) Wav2Vec2Model[source]

Build a custom Conformer Wav2Vec2Model

  • extractor_input_dim (int) – Input dimension of the features.

  • extractor_output_dim (int) – Output dimension after feature extraction.

  • extractor_stride (int) – Stride used in time reduction layer of feature extraction.

  • encoder_embed_dim (int) – The dimension of the embedding in the feature projection.

  • encoder_projection_dropout (float) – The dropout probability applied after the input feature is projected to embed_dim

  • encoder_num_layers (int) – Number of Conformer layers in the encoder.

  • encoder_num_heads (int) – Number of heads in each Conformer layer.

  • encoder_ff_interm_features (int) – Hidden layer dimension of the feedforward network in each Conformer layer.

  • encoder_depthwise_conv_kernel_size (int or List[int]) – List of kernel sizes corresponding to each of the Conformer layers. If int is provided, all layers will have the same kernel size.

  • encoder_dropout (float) – Dropout probability in each Conformer layer.

  • encoder_convolution_first (bool) – Whether to apply the convolution module ahead of the attention module in each Conformer layer.

  • encoder_use_group_norm (bool) – Whether to use GroupNorm rather than BatchNorm1d in the convolution module in each Conformer layer.


The resulting wav2vec2 model with a conformer encoder.

Return type:



Access comprehensive developer documentation for PyTorch

View Docs


Get in-depth tutorials for beginners and advanced developers

View Tutorials


Find development resources and get your questions answered

View Resources