HuBERTPretrainModel

class torchaudio.models.HuBERTPretrainModel[source]

HuBERT model used for pretraining in HuBERT [Hsu et al., 2021].

Note

To build the model, please use one of the factory functions.

Methods

HuBERTPretrainModel.forward(waveforms: Tensor, labels: Tensor, audio_lengths: Optional[Tensor] = None) → Tuple[Tensor, Optional[Tensor]][source]

Compute the sequence of probability distribution over labels.

Parameters:

waveforms (Tensor) – Audio tensor of dimension [batch, frames].
labels (Tensor) – Label for pre-training. A Tensor of dimension [batch, frames].
audio_lengths (Tensor or None, optional) – Indicates the valid length of each audio in the batch. Shape: [batch, ]. When the waveforms contains audios with different durations, by providing lengths argument, the model will compute the corresponding valid output lengths and apply proper mask in transformer attention layer. If None, it is assumed that all the audio in waveforms have valid length. Default: None.

Returns:

Tensor: The masked sequences of probability distribution (in logit). Shape: (masked_frames, num labels).
Tensor: The unmasked sequence of probability distribution (in logit). Shape: (unmasked_frames, num labels).
Tensor: The feature mean value for additional penalty loss. Shape: (1,).

Return type:

(Tensor, Tensor, Tensor)

`hubert_pretrain_model`	Builds custom `HuBERTPretrainModel` for training from scratch
`hubert_pretrain_base`	Builds "base" `HuBERTPretrainModel` from HuBERT [Hsu et al., 2021] for pretraining.
`hubert_pretrain_large`	Builds "large" `HuBERTPretrainModel` from HuBERT [Hsu et al., 2021] for pretraining.
`hubert_pretrain_xlarge`	Builds "extra large" `HuBERTPretrainModel` from HuBERT [Hsu et al., 2021] for pretraining.