SoudenMVDR¶
- class torchaudio.transforms.SoudenMVDR(*args, **kwargs)[source]¶
Minimum Variance Distortionless Response (MVDR [Capon, 1969]) module based on the method proposed by Souden et, al. [Souden et al., 2009].
Given the multi-channel complex-valued spectrum \(\textbf{Y}\), the power spectral density (PSD) matrix of target speech \(\bf{\Phi}_{\textbf{SS}}\), the PSD matrix of noise \(\bf{\Phi}_{\textbf{NN}}\), and a one-hot vector that represents the reference channel \(\bf{u}\), the module computes the single-channel complex-valued spectrum of the enhanced speech \(\hat{\textbf{S}}\). The formula is defined as:
\[\hat{\textbf{S}}(f) = \textbf{w}_{\text{bf}}(f)^{\mathsf{H}} \textbf{Y}(f) \]where \(\textbf{w}_{\text{bf}}(f)\) is the MVDR beamforming weight for the \(f\)-th frequency bin.
The beamforming weight is computed by:
\[\textbf{w}_{\text{MVDR}}(f) = \frac{{{\bf{\Phi}_{\textbf{NN}}^{-1}}(f){\bf{\Phi}_{\textbf{SS}}}}(f)} {\text{Trace}({{{\bf{\Phi}_{\textbf{NN}}^{-1}}(f) \bf{\Phi}_{\textbf{SS}}}(f))}}\bm{u} \]- Tutorials using
SoudenMVDR
: Speech Enhancement with MVDR Beamforming
Speech Enhancement with MVDR Beamforming
- forward(specgram: Tensor, psd_s: Tensor, psd_n: Tensor, reference_channel: Union[int, Tensor], diagonal_loading: bool = True, diag_eps: float = 1e-07, eps: float = 1e-08) Tensor [source]¶
- Parameters:
specgram (torch.Tensor) – Multi-channel complex-valued spectrum. Tensor with dimensions (…, channel, freq, time).
psd_s (torch.Tensor) – The complex-valued power spectral density (PSD) matrix of target speech. Tensor with dimensions (…, freq, channel, channel).
psd_n (torch.Tensor) – The complex-valued power spectral density (PSD) matrix of noise. Tensor with dimensions (…, freq, channel, channel).
reference_channel (int or torch.Tensor) – Specifies the reference channel. If the dtype is
int
, it represents the reference channel index. If the dtype istorch.Tensor
, its shape is (…, channel), where thechannel
dimension is one-hot.diagonal_loading (bool, optional) – If
True
, enables applying diagonal loading topsd_n
. (Default:True
)diag_eps (float, optional) – The coefficient multiplied to the identity matrix for diagonal loading. It is only effective when
diagonal_loading
is set toTrue
. (Default:1e-7
)eps (float, optional) – Value to add to the denominator in the beamforming weight formula. (Default:
1e-8
)
- Returns:
Single-channel complex-valued enhanced spectrum with dimensions (…, freq, time).
- Return type:
- Tutorials using