Shortcuts

Spectrogram

class torchaudio.transforms.Spectrogram(n_fft: int = 400, win_length: ~typing.Optional[int] = None, hop_length: ~typing.Optional[int] = None, pad: int = 0, window_fn: ~typing.Callable[[...], ~torch.Tensor] = <built-in method hann_window of type object>, power: ~typing.Optional[float] = 2.0, normalized: ~typing.Union[bool, str] = False, wkwargs: ~typing.Optional[dict] = None, center: bool = True, pad_mode: str = 'reflect', onesided: bool = True, return_complex: ~typing.Optional[bool] = None)[source]

Create a spectrogram from a audio signal.

This feature supports the following devices: CPU, CUDA This API supports the following properties: Autograd, TorchScript
Parameters:
  • n_fft (int, optional) – Size of FFT, creates n_fft // 2 + 1 bins. (Default: 400)

  • win_length (int or None, optional) – Window size. (Default: n_fft)

  • hop_length (int or None, optional) – Length of hop between STFT windows. (Default: win_length // 2)

  • pad (int, optional) – Two sided padding of signal. (Default: 0)

  • window_fn (Callable[..., Tensor], optional) – A function to create a window tensor that is applied/multiplied to each frame/window. (Default: torch.hann_window)

  • power (float or None, optional) – Exponent for the magnitude spectrogram, (must be > 0) e.g., 1 for magnitude, 2 for power, etc. If None, then the complex spectrum is returned instead. (Default: 2)

  • normalized (bool or str, optional) – Whether to normalize by magnitude after stft. If input is str, choices are "window" and "frame_length", if specific normalization type is desirable. True maps to "window". (Default: False)

  • wkwargs (dict or None, optional) – Arguments for window function. (Default: None)

  • center (bool, optional) – whether to pad waveform on both sides so that the \(t\)-th frame is centered at time \(t \times \text{hop\_length}\). (Default: True)

  • pad_mode (string, optional) – controls the padding method used when center is True. (Default: "reflect")

  • onesided (bool, optional) – controls whether to return half of results to avoid redundancy (Default: True)

  • return_complex (bool, optional) – Deprecated and not used.

Example
>>> waveform, sample_rate = torchaudio.load("test.wav", normalize=True)
>>> transform = torchaudio.transforms.Spectrogram(n_fft=800)
>>> spectrogram = transform(waveform)
Tutorials using Spectrogram:
Audio Feature Augmentation

Audio Feature Augmentation

Audio Feature Augmentation
StreamWriter Basic Usage

StreamWriter Basic Usage

StreamWriter Basic Usage
Music Source Separation with Hybrid Demucs

Music Source Separation with Hybrid Demucs

Music Source Separation with Hybrid Demucs
Speech Enhancement with MVDR Beamforming

Speech Enhancement with MVDR Beamforming

Speech Enhancement with MVDR Beamforming
Audio Feature Extractions

Audio Feature Extractions

Audio Feature Extractions
forward(waveform: Tensor) Tensor[source]
Parameters:

waveform (Tensor) – Tensor of audio of dimension (…, time).

Returns:

Dimension (…, freq, time), where freq is n_fft // 2 + 1 where n_fft is the number of Fourier bins, and time is the number of window hops (n_frame).

Return type:

Tensor

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources