LFCC¶

class torchaudio.transforms.LFCC(sample_rate: int = 16000, n_filter: int = 128, f_min: float = 0.0, f_max: Optional[float] = None, n_lfcc: int = 40, dct_type: int = 2, norm: str = 'ortho', log_lf: bool = False, speckwargs: Optional[dict] = None)[source]¶

Create the linear-frequency cepstrum coefficients from an audio signal.

By default, this calculates the LFCC on the DB-scaled linear filtered spectrogram. This is not the textbook implementation, but is implemented here to give consistency with librosa.

This output depends on the maximum value in the input spectrogram, and so may return different values for an audio clip split into snippets vs. a a full clip.

Parameters:

sample_rate (int, optional) – Sample rate of audio signal. (Default: 16000)
n_filter (int, optional) – Number of linear filters to apply. (Default: 128)
n_lfcc (int, optional) – Number of lfc coefficients to retain. (Default: 40)
f_min (float, optional) – Minimum frequency. (Default: 0.)
f_max (float or None, optional) – Maximum frequency. (Default: None)
dct_type (int, optional) – type of DCT (discrete cosine transform) to use. (Default: 2)
norm (str, optional) – norm to use. (Default: "ortho")
log_lf (bool, optional) – whether to use log-lf spectrograms instead of db-scaled. (Default: False)
speckwargs (dict or None, optional) – arguments for Spectrogram. (Default: None)

Example

>>> waveform, sample_rate = torchaudio.load("test.wav", normalize=True)
>>> transform = transforms.LFCC(
>>>     sample_rate=sample_rate,
>>>     n_lfcc=13,
>>>     speckwargs={"n_fft": 400, "hop_length": 160, "center": False},
>>> )
>>> lfcc = transform(waveform)

LFCC¶

Docs

Tutorials

Resources