torchaudio.functional.spectrogram¶
- torchaudio.functional.spectrogram(waveform: Tensor, pad: int, window: Tensor, n_fft: int, hop_length: int, win_length: int, power: Optional[float], normalized: Union[bool, str], center: bool = True, pad_mode: str = 'reflect', onesided: bool = True, return_complex: Optional[bool] = None) Tensor [source]¶
Create a spectrogram or a batch of spectrograms from a raw audio signal. The spectrogram can be either magnitude-only or complex.
- Parameters:
waveform (Tensor) – Tensor of audio of dimension (…, time)
pad (int) – Two sided padding of signal
window (Tensor) – Window tensor that is applied/multiplied to each frame/window
n_fft (int) – Size of FFT
hop_length (int) – Length of hop between STFT windows
win_length (int) – Window size
power (float or None) – Exponent for the magnitude spectrogram, (must be > 0) e.g., 1 for magnitude, 2 for power, etc. If None, then the complex spectrum is returned instead.
normalized (bool or str) – Whether to normalize by magnitude after stft. If input is str, choices are
"window"
and"frame_length"
, if specific normalization type is desirable.True
maps to"window"
. When normalized on"window"
, waveform is normalized upon the window’s L2 energy. If normalized on"frame_length"
, waveform is normalized by dividing by \((\text{frame\_length})^{0.5}\).center (bool, optional) – whether to pad
waveform
on both sides so that the \(t\)-th frame is centered at time \(t \times \text{hop\_length}\). Default:True
pad_mode (string, optional) – controls the padding method used when
center
isTrue
. Default:"reflect"
onesided (bool, optional) – controls whether to return half of results to avoid redundancy. Default:
True
return_complex (bool, optional) – Deprecated and not used.
- Returns:
Dimension (…, freq, time), freq is
n_fft // 2 + 1
andn_fft
is the number of Fourier bins, and time is the number of window hops (n_frame).- Return type:
Tensor