Shortcuts

ConvTasNet

class torchaudio.models.ConvTasNet(num_sources: int = 2, enc_kernel_size: int = 16, enc_num_feats: int = 512, msk_kernel_size: int = 3, msk_num_feats: int = 128, msk_num_hidden_feats: int = 512, msk_num_layers: int = 8, msk_num_stacks: int = 3, msk_activate: str = 'sigmoid')[source]

Conv-TasNet architecture introduced in Conv-TasNet: Surpassing Ideal Time–Frequency Magnitude Masking for Speech Separation [Luo and Mesgarani, 2019].

Note

This implementation corresponds to the “non-causal” setting in the paper.

See also

Parameters:
  • num_sources (int, optional) – The number of sources to split.

  • enc_kernel_size (int, optional) – The convolution kernel size of the encoder/decoder, <L>.

  • enc_num_feats (int, optional) – The feature dimensions passed to mask generator, <N>.

  • msk_kernel_size (int, optional) – The convolution kernel size of the mask generator, <P>.

  • msk_num_feats (int, optional) – The input/output feature dimension of conv block in the mask generator, <B, Sc>.

  • msk_num_hidden_feats (int, optional) – The internal feature dimension of conv block of the mask generator, <H>.

  • msk_num_layers (int, optional) – The number of layers in one conv block of the mask generator, <X>.

  • msk_num_stacks (int, optional) – The numbr of conv blocks of the mask generator, <R>.

  • msk_activate (str, optional) – The activation function of the mask output (Default: sigmoid).

forward

ConvTasNet.forward(input: Tensor) Tensor[source]

Perform source separation. Generate audio source waveforms.

Parameters:

input (torch.Tensor) – 3D Tensor with shape [batch, channel==1, frames]

Returns:

3D Tensor with shape [batch, channel==num_sources, frames]

Return type:

Tensor

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources