torchaudio.models.wavlm_model¶
- torchaudio.models.wavlm_model(extractor_mode: str, extractor_conv_layer_config: Optional[List[Tuple[int, int, int]]], extractor_conv_bias: bool, encoder_embed_dim: int, encoder_projection_dropout: float, encoder_pos_conv_kernel: int, encoder_pos_conv_groups: int, encoder_num_layers: int, encoder_num_heads: int, encoder_num_buckets: int, encoder_max_distance: int, encoder_attention_dropout: float, encoder_ff_interm_features: int, encoder_ff_interm_dropout: float, encoder_dropout: float, encoder_layer_norm_first: bool, encoder_layer_drop: float, aux_num_out: Optional[int]) Wav2Vec2Model [source]¶
Builds custom WaveLM model [Chen et al., 2022]. The architecture is compatible with Wav2Vec2 model [Baevski et al., 2020], and so the output object is
Wav2Vec2Model
. Most of the arguments have the same meaning as inwav2vec2_model()
so please refer there for documentation.- Parameters:
extractor_mode (str) – Operation mode of feature extractor. See
wav2vec2_model()
.extractor_conv_layer_config (list of python:integer tuples or None) – See
wav2vec2_model()
.extractor_conv_bias (bool) – See
wav2vec2_model()
.encoder_embed_dim (int) – See
wav2vec2_model()
.encoder_projection_dropout (float) – See
wav2vec2_model()
.encoder_pos_conv_kernel (int) – See
wav2vec2_model()
.encoder_pos_conv_groups (int) – See
wav2vec2_model()
.encoder_num_layers (int) – See
wav2vec2_model()
.encoder_num_heads (int) – See
wav2vec2_model()
.encoder_num_buckets (int) – Number of buckets for relative position embedding.
encoder_max_distance (int) – Maximum distance for relative position embedding.
encoder_attention_dropout (float) – See
wav2vec2_model()
.encoder_ff_interm_features (int) – See
wav2vec2_model()
.encoder_ff_interm_dropout (float) – See
wav2vec2_model()
.encoder_dropout (float) – See
wav2vec2_model()
.encoder_layer_norm_first (bool) – See
wav2vec2_model()
.encoder_layer_drop (float) – See
wav2vec2_model()
.aux_num_out (int or None) – See
wav2vec2_model()
.
- Returns:
The resulting model.
- Return type: