class torchaudio.transforms.SpeedPerturbation(orig_freq: int, factors: Sequence[float])[source]

Applies the speed perturbation augmentation introduced in Audio augmentation for speech recognition [Ko et al., 2015]. For a given input, the module samples a speed-up factor from factors uniformly at random and adjusts the speed of the input by that factor.

This feature supports the following devices: CPU, CUDA This API supports the following properties: Autograd, TorchScript
  • orig_freq (int) – Original frequency of the signals in waveform.

  • factors (Sequence[float]) – Factors by which to adjust speed of input. Values greater than 1.0 compress waveform in time, whereas values less than 1.0 stretch waveform in time.

>>> speed_perturb = SpeedPerturbation(16000, [0.9, 1.1, 1.0, 1.0, 1.0])
>>> # waveform speed will be adjusted by factor 0.9 with 20% probability,
>>> # 1.1 with 20% probability, and 1.0 (i.e. kept the same) with 60% probability.
>>> speed_perturbed_waveform = speed_perturb(waveform, lengths)
forward(waveform: Tensor, lengths: Optional[Tensor] = None) Tuple[Tensor, Optional[Tensor]][source]
  • waveform (torch.Tensor) – Input signals, with shape (…, time).

  • lengths (torch.Tensor or None, optional) – Valid lengths of signals in waveform, with shape (…). If None, all elements in waveform are treated as valid. (Default: None)



Speed-adjusted waveform, with shape (…, new_time).

torch.Tensor or None

If lengths is not None, valid lengths of signals in speed-adjusted waveform, with shape (…); otherwise, None.

Return type:

(torch.Tensor, torch.Tensor or None)


Access comprehensive developer documentation for PyTorch

View Docs


Get in-depth tutorials for beginners and advanced developers

View Tutorials


Find development resources and get your questions answered

View Resources