torchaudio.functional.rtf_power

torchaudio.functional.rtf_power(psd_s: Tensor, psd_n: Tensor, reference_channel: Union[int, Tensor], n_iter: int = 3, diagonal_loading: bool = True, diag_eps: float = 1e-07) → Tensor[source]

Estimate the relative transfer function (RTF) or the steering vector by the power method.

Parameters:

psd_s (torch.Tensor) – The complex-valued power spectral density (PSD) matrix of target speech. Tensor with dimensions (…, freq, channel, channel).
psd_n (torch.Tensor) – The complex-valued power spectral density (PSD) matrix of noise. Tensor with dimensions (…, freq, channel, channel).
reference_channel (int or torch.Tensor) – Specifies the reference channel. If the dtype is int, it represents the reference channel index. If the dtype is torch.Tensor, its shape is (…, channel), where the channel dimension is one-hot.
diagonal_loading (bool, optional) – If True, enables applying diagonal loading to psd_n. (Default: True)
diag_eps (float, optional) – The coefficient multiplied to the identity matrix for diagonal loading. It is only effective when diagonal_loading is set to True. (Default: 1e-7)

Returns:

The estimated complex-valued RTF of target speech. Tensor of dimension (…, freq, channel).

Return type:

torch.Tensor

Tutorials using rtf_power:: Speech Enhancement with MVDR Beamforming

Speech Enhancement with MVDR Beamforming

Docs