# torch.stft¶

torch.stft(input: torch.Tensor, n_fft: int, hop_length: Optional[int] = None, win_length: Optional[int] = None, window: Optional[torch.Tensor] = None, center: bool = True, pad_mode: str = 'reflect', normalized: bool = False, onesided: Optional[bool] = None, return_complex: Optional[bool] = None) → torch.Tensor[source]

Short-time Fourier transform (STFT).

Warning

Setting return_complex explicitly will be required in a future PyTorch release. Set it to False to preserve the current behavior or True to return a complex output.

The STFT computes the Fourier transform of short overlapping windows of the input. This giving frequency components of the signal as they change over time. The interface of this function is modeled after the librosa stft function.

Ignoring the optional batch dimension, this method computes the following expression:

$X[m, \omega] = \sum_{k = 0}^{\text{win\_length-1}}% \text{window}[k]\ \text{input}[m \times \text{hop\_length} + k]\ % \exp\left(- j \frac{2 \pi \cdot \omega k}{\text{win\_length}}\right),$

where $m$ is the index of the sliding window, and $\omega$ is the frequency that $0 \leq \omega < \text{n\_fft}$ . When onesided is the default value True,

• input must be either a 1-D time sequence or a 2-D batch of time sequences.

• If hop_length is None (default), it is treated as equal to floor(n_fft / 4).

• If win_length is None (default), it is treated as equal to n_fft.

• window can be a 1-D tensor of size win_length, e.g., from torch.hann_window(). If window is None (default), it is treated as if having $1$ everywhere in the window. If $\text{win\_length} < \text{n\_fft}$ , window will be padded on both sides to length n_fft before being applied.

• If center is True (default), input will be padded on both sides so that the $t$ -th frame is centered at time $t \times \text{hop\_length}$ . Otherwise, the $t$ -th frame begins at time $t \times \text{hop\_length}$ .

• pad_mode determines the padding method used on input when center is True. See torch.nn.functional.pad() for all available options. Default is "reflect".

• If onesided is True (default for real input), only values for $\omega$ in $\left[0, 1, 2, \dots, \left\lfloor \frac{\text{n\_fft}}{2} \right\rfloor + 1\right]$ are returned because the real-to-complex Fourier transform satisfies the conjugate symmetry, i.e., $X[m, \omega] = X[m, \text{n\_fft} - \omega]^*$ . Note if the input or window tensors are complex, then onesided output is not possible.

• If normalized is True (default is False), the function returns the normalized STFT results, i.e., multiplied by $(\text{frame\_length})^{-0.5}$ .

• If return_complex is True (default if input is complex), the return is a input.dim() + 1 dimensional complex tensor. If False, the output is a input.dim() + 2 dimensional real tensor where the last dimension represents the real and imaginary components.

Returns either a complex tensor of size $(* \times N \times T)$ if return_complex is true, or a real tensor of size $(* \times N \times T \times 2)$ . Where $*$ is the optional batch size of input, $N$ is the number of frequencies where STFT is applied and $T$ is the total number of frames used.

Warning

This function changed signature at version 0.4.1. Calling with the previous signature may cause error or return incorrect result.

Parameters
• input (Tensor) – the input tensor

• n_fft (int) – size of Fourier transform

• hop_length (int, optional) – the distance between neighboring sliding window frames. Default: None (treated as equal to floor(n_fft / 4))

• win_length (int, optional) – the size of window frame and STFT filter. Default: None (treated as equal to n_fft)

• window (Tensor, optional) – the optional window function. Default: None (treated as window of all $1$ s)

• center (bool, optional) – whether to pad input on both sides so that the $t$ -th frame is centered at time $t \times \text{hop\_length}$ . Default: True

• pad_mode (string, optional) – controls the padding method used when center is True. Default: "reflect"

• normalized (bool, optional) – controls whether to return the normalized STFT results Default: False

• onesided (bool, optional) – controls whether to return half of results to avoid redundancy for real inputs. Default: True for real input and window, False otherwise.

• return_complex (bool, optional) – whether to return a complex tensor, or a real tensor with an extra last dimension for the real and imaginary components.

Returns

A tensor containing the STFT result with shape described above

Return type

Tensor