

class torchaudio.transforms.SoudenMVDR(*args, **kwargs)[source]

Minimum Variance Distortionless Response (MVDR [Capon, 1969]) module based on the method proposed by Souden et, al. [Souden et al., 2009].

This feature supports the following devices: CPU, CUDA This API supports the following properties: Autograd, TorchScript

Given the multi-channel complex-valued spectrum Y\textbf{Y}, the power spectral density (PSD) matrix of target speech ΦSS\bf{\Phi}_{\textbf{SS}}, the PSD matrix of noise ΦNN\bf{\Phi}_{\textbf{NN}}, and a one-hot vector that represents the reference channel u\bf{u}, the module computes the single-channel complex-valued spectrum of the enhanced speech S^\hat{\textbf{S}}. The formula is defined as:

S^(f)=wbf(f)HY(f)\hat{\textbf{S}}(f) = \textbf{w}_{\text{bf}}(f)^{\mathsf{H}} \textbf{Y}(f)

where wbf(f)\textbf{w}_{\text{bf}}(f) is the MVDR beamforming weight for the ff-th frequency bin.

The beamforming weight is computed by:

wMVDR(f)=ΦNN1(f)ΦSS(f)Trace(ΦNN1(f)ΦSS(f))u\textbf{w}_{\text{MVDR}}(f) = \frac{{{\bf{\Phi}_{\textbf{NN}}^{-1}}(f){\bf{\Phi}_{\textbf{SS}}}}(f)} {\text{Trace}({{{\bf{\Phi}_{\textbf{NN}}^{-1}}(f) \bf{\Phi}_{\textbf{SS}}}(f))}}\bm{u}
Tutorials using SoudenMVDR:
Speech Enhancement with MVDR Beamforming

Speech Enhancement with MVDR Beamforming

Speech Enhancement with MVDR Beamforming
forward(specgram: Tensor, psd_s: Tensor, psd_n: Tensor, reference_channel: Union[int, Tensor], diagonal_loading: bool = True, diag_eps: float = 1e-07, eps: float = 1e-08) Tensor[source]
  • specgram (torch.Tensor) – Multi-channel complex-valued spectrum. Tensor with dimensions (…, channel, freq, time).

  • psd_s (torch.Tensor) – The complex-valued power spectral density (PSD) matrix of target speech. Tensor with dimensions (…, freq, channel, channel).

  • psd_n (torch.Tensor) – The complex-valued power spectral density (PSD) matrix of noise. Tensor with dimensions (…, freq, channel, channel).

  • reference_channel (int or torch.Tensor) – Specifies the reference channel. If the dtype is int, it represents the reference channel index. If the dtype is torch.Tensor, its shape is (…, channel), where the channel dimension is one-hot.

  • diagonal_loading (bool, optional) – If True, enables applying diagonal loading to psd_n. (Default: True)

  • diag_eps (float, optional) – The coefficient multiplied to the identity matrix for diagonal loading. It is only effective when diagonal_loading is set to True. (Default: 1e-7)

  • eps (float, optional) – Value to add to the denominator in the beamforming weight formula. (Default: 1e-8)


Single-channel complex-valued enhanced spectrum with dimensions (…, freq, time).

Return type:



Access comprehensive developer documentation for PyTorch

View Docs


Get in-depth tutorials for beginners and advanced developers

View Tutorials


Find development resources and get your questions answered

View Resources