torch.stft(input, n_fft, hop_length=None, win_length=None, window=None, center=True, pad_mode='reflect', normalized=False, onesided=True)[source]

Short-time Fourier transform (STFT).

Ignoring the optional batch dimension, this method computes the following expression:

X[m,ω]=k=0win_length-1window[k] input[m×hop_length+k] exp(j2πωkwin_length),X[m, \omega] = \sum_{k = 0}^{\text{win\_length-1}}% \text{window}[k]\ \text{input}[m \times \text{hop\_length} + k]\ % \exp\left(- j \frac{2 \pi \cdot \omega k}{\text{win\_length}}\right),

where mm is the index of the sliding window, and ω\omega is the frequency that 0ω<n_fft0 \leq \omega < \text{n\_fft} . When onesided is the default value True,

  • input must be either a 1-D time sequence or a 2-D batch of time sequences.

  • If hop_length is None (default), it is treated as equal to floor(n_fft / 4).

  • If win_length is None (default), it is treated as equal to n_fft.

  • window can be a 1-D tensor of size win_length, e.g., from torch.hann_window(). If window is None (default), it is treated as if having 11 everywhere in the window. If win_length<n_fft\text{win\_length} < \text{n\_fft} , window will be padded on both sides to length n_fft before being applied.

  • If center is True (default), input will be padded on both sides so that the tt -th frame is centered at time t×hop_lengtht \times \text{hop\_length} . Otherwise, the tt -th frame begins at time t×hop_lengtht \times \text{hop\_length} .

  • pad_mode determines the padding method used on input when center is True. See torch.nn.functional.pad() for all available options. Default is "reflect".

  • If onesided is True (default), only values for ω\omega in [0,1,2,,n_fft2+1]\left[0, 1, 2, \dots, \left\lfloor \frac{\text{n\_fft}}{2} \right\rfloor + 1\right] are returned because the real-to-complex Fourier transform satisfies the conjugate symmetry, i.e., X[m,ω]=X[m,n_fftω]X[m, \omega] = X[m, \text{n\_fft} - \omega]^* .

  • If normalized is True (default is False), the function returns the normalized STFT results, i.e., multiplied by (frame_length)0.5(\text{frame\_length})^{-0.5} .

Returns the real and the imaginary parts together as one tensor of size (×N×T×2)(* \times N \times T \times 2) , where * is the optional batch size of input, NN is the number of frequencies where STFT is applied, TT is the total number of frames used, and each pair in the last dimension represents a complex number as the real part and the imaginary part.


This function changed signature at version 0.4.1. Calling with the previous signature may cause error or return incorrect result.

  • input (Tensor) – the input tensor

  • n_fft (int) – size of Fourier transform

  • hop_length (int, optional) – the distance between neighboring sliding window frames. Default: None (treated as equal to floor(n_fft / 4))

  • win_length (int, optional) – the size of window frame and STFT filter. Default: None (treated as equal to n_fft)

  • window (Tensor, optional) – the optional window function. Default: None (treated as window of all 11 s)

  • center (bool, optional) – whether to pad input on both sides so that the tt -th frame is centered at time t×hop_lengtht \times \text{hop\_length} . Default: True

  • pad_mode (string, optional) – controls the padding method used when center is True. Default: "reflect"

  • normalized (bool, optional) – controls whether to return the normalized STFT results Default: False

  • onesided (bool, optional) – controls whether to return half of results to avoid redundancy Default: True


A tensor containing the STFT result with shape described above

Return type



Access comprehensive developer documentation for PyTorch

View Docs


Get in-depth tutorials for beginners and advanced developers

View Tutorials


Find development resources and get your questions answered

View Resources