RTFMVDR
- class torchaudio.transforms.RTFMVDR[source]
Minimum Variance Distortionless Response (MVDR [Capon, 1969]) module based on the relative transfer function (RTF) and power spectral density (PSD) matrix of noise.
Given the multi-channel complex-valued spectrum , the relative transfer function (RTF) matrix or the steering vector of target speech , the PSD matrix of noise , and a one-hot vector that represents the reference channel , the module computes the single-channel complex-valued spectrum of the enhanced speech . The formula is defined as:
where is the MVDR beamforming weight for the -th frequency bin, denotes the Hermitian Conjugate operation.
The beamforming weight is computed by:
- Tutorials using
RTFMVDR
: Speech Enhancement with MVDR Beamforming
Speech Enhancement with MVDR Beamforming
- forward(specgram: Tensor, rtf: Tensor, psd_n: Tensor, reference_channel: Union[int, Tensor], diagonal_loading: bool = True, diag_eps: float = 1e-07, eps: float = 1e-08) Tensor [source]
- Parameters:
specgram (torch.Tensor) – Multi-channel complex-valued spectrum. Tensor with dimensions (…, channel, freq, time)
rtf (torch.Tensor) – The complex-valued RTF vector of target speech. Tensor with dimensions (…, freq, channel).
psd_n (torch.Tensor) – The complex-valued power spectral density (PSD) matrix of noise. Tensor with dimensions (…, freq, channel, channel).
reference_channel (int or torch.Tensor) – Specifies the reference channel. If the dtype is
int
, it represents the reference channel index. If the dtype istorch.Tensor
, its shape is (…, channel), where thechannel
dimension is one-hot.diagonal_loading (bool, optional) – If
True
, enables applying diagonal loading topsd_n
. (Default:True
)diag_eps (float, optional) – The coefficient multiplied to the identity matrix for diagonal loading. It is only effective when
diagonal_loading
is set toTrue
. (Default:1e-7
)eps (float, optional) – Value to add to the denominator in the beamforming weight formula. (Default:
1e-8
)
- Returns:
Single-channel complex-valued enhanced spectrum with dimensions (…, freq, time).
- Return type:
- Tutorials using