SpeedPerturbation

class torchaudio.transforms.SpeedPerturbation(orig_freq: int, factors: Sequence[float])[source]

Applies the speed perturbation augmentation introduced in Audio augmentation for speech recognition [Ko et al., 2015]. For a given input, the module samples a speed-up factor from factors uniformly at random and adjusts the speed of the input by that factor.

Parameters:

orig_freq (int) – Original frequency of the signals in waveform.
factors (Sequence[float]) – Factors by which to adjust speed of input. Values greater than 1.0 compress waveform in time, whereas values less than 1.0 stretch waveform in time.

Example

>>> speed_perturb = SpeedPerturbation(16000, [0.9, 1.1, 1.0, 1.0, 1.0])
>>> # waveform speed will be adjusted by factor 0.9 with 20% probability,
>>> # 1.1 with 20% probability, and 1.0 (i.e. kept the same) with 60% probability.
>>> speed_perturbed_waveform = speed_perturb(waveform, lengths)

forward(waveform: Tensor, lengths: Optional[Tensor] = None) → Tuple[Tensor, Optional[Tensor]][source]

Parameters:

waveform (torch.Tensor) – Input signals, with shape (…, time).
lengths (torch.Tensor or None, optional) – Valid lengths of signals in waveform, with shape (…). If None, all elements in waveform are treated as valid. (Default: None)

Returns:

torch.Tensor: Speed-adjusted waveform, with shape (…, new_time).
torch.Tensor or None: If lengths is not None, valid lengths of signals in speed-adjusted waveform, with shape (…); otherwise, None.

Return type:

(torch.Tensor, torch.Tensor or None)

SpeedPerturbation

Docs

Tutorials

Resources