torchaudio.functional.mvdr_weights_rtf¶
- torchaudio.functional.mvdr_weights_rtf(rtf: Tensor, psd_n: Tensor, reference_channel: Optional[Union[int, Tensor]] = None, diagonal_loading: bool = True, diag_eps: float = 1e-07, eps: float = 1e-08) Tensor [source]¶
Compute the Minimum Variance Distortionless Response (MVDR [Capon, 1969]) beamforming weights based on the relative transfer function (RTF) and power spectral density (PSD) matrix of noise.
Given the relative transfer function (RTF) matrix or the steering vector of target speech \(\bm{v}\), the PSD matrix of noise \(\bf{\Phi}_{\textbf{NN}}\), and a one-hot vector that represents the reference channel \(\bf{u}\), the method computes the MVDR beamforming weight martrix \(\textbf{w}_{\text{MVDR}}\). The formula is defined as:
\[\textbf{w}_{\text{MVDR}}(f) = \frac{{{\bf{\Phi}_{\textbf{NN}}^{-1}}(f){\bm{v}}(f)}} {{\bm{v}^{\mathsf{H}}}(f){\bf{\Phi}_{\textbf{NN}}^{-1}}(f){\bm{v}}(f)} \]where \((.)^{\mathsf{H}}\) denotes the Hermitian Conjugate operation.
- Parameters:
rtf (torch.Tensor) – The complex-valued RTF vector of target speech. Tensor with dimensions (…, freq, channel).
psd_n (torch.Tensor) – The complex-valued power spectral density (PSD) matrix of noise. Tensor with dimensions (…, freq, channel, channel).
reference_channel (int or torch.Tensor) – Specifies the reference channel. If the dtype is
int
, it represents the reference channel index. If the dtype istorch.Tensor
, its shape is (…, channel), where thechannel
dimension is one-hot.diagonal_loading (bool, optional) – If
True
, enables applying diagonal loading topsd_n
. (Default:True
)diag_eps (float, optional) – The coefficient multiplied to the identity matrix for diagonal loading. It is only effective when
diagonal_loading
is set toTrue
. (Default:1e-7
)eps (float, optional) – Value to add to the denominator in the beamforming weight formula. (Default:
1e-8
)
- Returns:
The complex-valued MVDR beamforming weight matrix with dimensions (…, freq, channel).
- Return type: