MFCC¶
- class torchaudio.transforms.MFCC(sample_rate: int = 16000, n_mfcc: int = 40, dct_type: int = 2, norm: str = 'ortho', log_mels: bool = False, melkwargs: Optional[dict] = None)[source]¶
Create the Mel-frequency cepstrum coefficients from an audio signal.
By default, this calculates the MFCC on the DB-scaled Mel spectrogram. This is not the textbook implementation, but is implemented here to give consistency with librosa.
This output depends on the maximum value in the input spectrogram, and so may return different values for an audio clip split into snippets vs. a a full clip.
- Parameters:
sample_rate (int, optional) – Sample rate of audio signal. (Default:
16000
)n_mfcc (int, optional) – Number of mfc coefficients to retain. (Default:
40
)dct_type (int, optional) – type of DCT (discrete cosine transform) to use. (Default:
2
)norm (str, optional) – norm to use. (Default:
"ortho"
)log_mels (bool, optional) – whether to use log-mel spectrograms instead of db-scaled. (Default:
False
)melkwargs (dict or None, optional) – arguments for MelSpectrogram. (Default:
None
)
- Example
>>> waveform, sample_rate = torchaudio.load("test.wav", normalize=True) >>> transform = transforms.MFCC( >>> sample_rate=sample_rate, >>> n_mfcc=13, >>> melkwargs={"n_fft": 400, "hop_length": 160, "n_mels": 23, "center": False}, >>> ) >>> mfcc = transform(waveform)
See also
torchaudio.functional.melscale_fbanks()
- The function used to generate the filter banks.- Tutorials using
MFCC
: - Audio Feature Extractions