Shortcuts

torchaudio.models

The models subpackage contains definitions of models for addressing common audio tasks.

Wav2Letter

class torchaudio.models.Wav2Letter(num_classes: int = 40, input_type: str = 'waveform', num_features: int = 1)[source]

Wav2Letter model architecture from the Wav2Letter an End-to-End ConvNet-based Speech Recognition System.

\(\text{padding} = \frac{\text{ceil}(\text{kernel} - \text{stride})}{2}\)

Parameters
  • num_classes (int, optional) – Number of classes to be classified. (Default: 40)

  • input_type (str, optional) – Wav2Letter can use as input: waveform, power_spectrum or mfcc (Default: waveform).

  • num_features (int, optional) – Number of input features that the network will receive (Default: 1).

forward(x: torch.Tensor) → torch.Tensor[source]
Parameters

x (torch.Tensor) – Tensor of dimension (batch_size, num_features, input_length).

Returns

Predictor tensor of dimension (batch_size, number_of_classes, input_length).

Return type

Tensor

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources