torchaudio.functional.spectrogram

torchaudio.functional.spectrogram(waveform: Tensor, pad: int, window: Tensor, n_fft: int, hop_length: int, win_length: int, power: Optional[float], normalized: Union[bool, str], center: bool = True, pad_mode: str = 'reflect', onesided: bool = True, return_complex: Optional[bool] = None) → Tensor[source]

Create a spectrogram or a batch of spectrograms from a raw audio signal. The spectrogram can be either magnitude-only or complex.

Parameters:

waveform (Tensor) – Tensor of audio of dimension (…, time)
pad (int) – Two sided padding of signal
window (Tensor) – Window tensor that is applied/multiplied to each frame/window
n_fft (int) – Size of FFT
hop_length (int) – Length of hop between STFT windows
win_length (int) – Window size
power (float or None) – Exponent for the magnitude spectrogram, (must be > 0) e.g., 1 for energy, 2 for power, etc. If None, then the complex spectrum is returned instead.
normalized (bool or str) – Whether to normalize by magnitude after stft. If input is str, choices are "window" and "frame_length", if specific normalization type is desirable. True maps to "window". When normalized on "window", waveform is normalized upon the window’s L2 energy. If normalized on "frame_length", waveform is normalized by dividing by $(\text{frame\_length})^{0.5}$ .
center (bool, optional) – whether to pad waveform on both sides so that the $t$ -th frame is centered at time $t \times \text{hop\_length}$ . Default: True
pad_mode (string, optional) – controls the padding method used when center is True. Default: "reflect"
onesided (bool, optional) – controls whether to return half of results to avoid redundancy. Default: True
return_complex (bool, optional) – Deprecated and not used.

Returns:

Dimension (…, freq, time), freq is n_fft // 2 + 1 and n_fft is the number of Fourier bins, and time is the number of window hops (n_frame).

Return type:

Tensor

Docs