RTFMVDR¶
- class torchaudio.transforms.RTFMVDR[source]¶
Minimum Variance Distortionless Response (MVDR [Capon, 1969]) module based on the relative transfer function (RTF) and power spectral density (PSD) matrix of noise.
Given the multi-channel complex-valued spectrum \(\textbf{Y}\), the relative transfer function (RTF) matrix or the steering vector of target speech \(\bm{v}\), the PSD matrix of noise \(\bf{\Phi}_{\textbf{NN}}\), and a one-hot vector that represents the reference channel \(\bf{u}\), the module computes the single-channel complex-valued spectrum of the enhanced speech \(\hat{\textbf{S}}\). The formula is defined as:
\[\hat{\textbf{S}}(f) = \textbf{w}_{\text{bf}}(f)^{\mathsf{H}} \textbf{Y}(f) \]where \(\textbf{w}_{\text{bf}}(f)\) is the MVDR beamforming weight for the \(f\)-th frequency bin, \((.)^{\mathsf{H}}\) denotes the Hermitian Conjugate operation.
The beamforming weight is computed by:
\[\textbf{w}_{\text{MVDR}}(f) = \frac{{{\bf{\Phi}_{\textbf{NN}}^{-1}}(f){\bm{v}}(f)}} {{\bm{v}^{\mathsf{H}}}(f){\bf{\Phi}_{\textbf{NN}}^{-1}}(f){\bm{v}}(f)} \]- Tutorials using
RTFMVDR
: Speech Enhancement with MVDR Beamforming
Speech Enhancement with MVDR Beamforming
- forward(specgram: Tensor, rtf: Tensor, psd_n: Tensor, reference_channel: Union[int, Tensor], diagonal_loading: bool = True, diag_eps: float = 1e-07, eps: float = 1e-08) Tensor [source]¶
- Parameters:
specgram (torch.Tensor) – Multi-channel complex-valued spectrum. Tensor with dimensions (…, channel, freq, time)
rtf (torch.Tensor) – The complex-valued RTF vector of target speech. Tensor with dimensions (…, freq, channel).
psd_n (torch.Tensor) – The complex-valued power spectral density (PSD) matrix of noise. Tensor with dimensions (…, freq, channel, channel).
reference_channel (int or torch.Tensor) – Specifies the reference channel. If the dtype is
int
, it represents the reference channel index. If the dtype istorch.Tensor
, its shape is (…, channel), where thechannel
dimension is one-hot.diagonal_loading (bool, optional) – If
True
, enables applying diagonal loading topsd_n
. (Default:True
)diag_eps (float, optional) – The coefficient multiplied to the identity matrix for diagonal loading. It is only effective when
diagonal_loading
is set toTrue
. (Default:1e-7
)eps (float, optional) – Value to add to the denominator in the beamforming weight formula. (Default:
1e-8
)
- Returns:
Single-channel complex-valued enhanced spectrum with dimensions (…, freq, time).
- Return type:
- Tutorials using