torchaudio.functional.sliding_window_cmn¶

torchaudio.functional.sliding_window_cmn(specgram: Tensor, cmn_window: int = 600, min_cmn_window: int = 100, center: bool = False, norm_vars: bool = False) → Tensor[source]¶

Apply sliding-window cepstral mean (and optionally variance) normalization per utterance.

Parameters:

specgram (Tensor) – Tensor of spectrogram of dimension (…, time, freq)
cmn_window (int, optional) – Window in frames for running average CMN computation (int, default = 600)
min_cmn_window (int, optional) – Minimum CMN window used at start of decoding (adds latency only at start). Only applicable if center == false, ignored if center==true (int, default = 100)
center (bool, optional) – If true, use a window centered on the current frame (to the extent possible, modulo end effects). If false, window is to the left. (bool, default = false)
norm_vars (bool, optional) – If true, normalize variance to one. (bool, default = false)

Returns:

Tensor matching input shape (…, freq, time)

Return type:

Tensor

torchaudio.functional.sliding_window_cmn¶

Docs

Tutorials

Resources