ConvTasNet¶
- class torchaudio.models.ConvTasNet(num_sources: int = 2, enc_kernel_size: int = 16, enc_num_feats: int = 512, msk_kernel_size: int = 3, msk_num_feats: int = 128, msk_num_hidden_feats: int = 512, msk_num_layers: int = 8, msk_num_stacks: int = 3, msk_activate: str = 'sigmoid')[source]¶
Conv-TasNet architecture introduced in Conv-TasNet: Surpassing Ideal Time–Frequency Magnitude Masking for Speech Separation [Luo and Mesgarani, 2019].
Note
This implementation corresponds to the “non-causal” setting in the paper.
See also
conv_tasnet_base()
: A factory function.torchaudio.pipelines.SourceSeparationBundle
: Source separation pipeline with pre-trained models.
- Parameters:
num_sources (int, optional) – The number of sources to split.
enc_kernel_size (int, optional) – The convolution kernel size of the encoder/decoder, <L>.
enc_num_feats (int, optional) – The feature dimensions passed to mask generator, <N>.
msk_kernel_size (int, optional) – The convolution kernel size of the mask generator, <P>.
msk_num_feats (int, optional) – The input/output feature dimension of conv block in the mask generator, <B, Sc>.
msk_num_hidden_feats (int, optional) – The internal feature dimension of conv block of the mask generator, <H>.
msk_num_layers (int, optional) – The number of layers in one conv block of the mask generator, <X>.
msk_num_stacks (int, optional) – The numbr of conv blocks of the mask generator, <R>.
msk_activate (str, optional) – The activation function of the mask output (Default:
sigmoid
).
forward¶
- ConvTasNet.forward(input: Tensor) Tensor [source]¶
Perform source separation. Generate audio source waveforms.
- Parameters:
input (torch.Tensor) – 3D Tensor with shape [batch, channel==1, frames]
- Returns:
3D Tensor with shape [batch, channel==num_sources, frames]
- Return type:
Tensor