torchaudio.functional¶
Functions to perform common audio operations.
Utility¶
Turn a spectrogram from the power/amplitude scale to the decibel scale. |
|
Turn a tensor from the decibel scale to the power/amplitude scale. |
|
Create a frequency bin conversion matrix. |
|
Creates a linear triangular filterbank. |
|
Create a DCT transformation matrix with shape ( |
|
Apply a mask along |
|
Apply a mask along |
|
Encode signal based on mu-law companding. |
|
Decode mu-law encoded signal. |
|
Apply codecs as a form of augmentation. |
|
Resamples the waveform at the new frequency using bandlimited interpolation. |
|
Measure audio loudness according to the ITU-R BS.1770-4 recommendation. |
|
Convolves inputs along their last dimension using the direct method. |
|
Convolves inputs along their last dimension using FFT. |
|
Scales and adds noise to waveform per signal-to-noise ratio. |
|
Pre-emphasizes a waveform along its last dimension, i.e. for each signal \(x\) in |
|
De-emphasizes a waveform along its last dimension. |
|
Adjusts waveform speed. |
Filtering¶
Design two-pole all-pass filter. |
|
Design two-pole band filter. |
|
Design two-pole band-pass filter. |
|
Design two-pole band-reject filter. |
|
Design a bass tone-control effect. |
|
Perform a biquad filter of input tensor. |
|
Apply contrast effect. |
|
Apply a DC shift to the audio. |
|
Apply ISO 908 CD de-emphasis (shelving) IIR filter. |
|
Apply dither |
|
Design biquad peaking equalizer filter and perform filtering. |
|
Apply an IIR filter forward and backward to a waveform. |
|
Apply a flanger effect to the audio. |
|
Apply amplification or attenuation to the whole waveform. |
|
Design biquad highpass filter and perform filtering. |
|
Perform an IIR filter by evaluating difference equation. |
|
Design biquad lowpass filter and perform filtering. |
|
Apply a overdrive effect to the audio. |
|
Apply a phasing effect to the audio. |
|
Apply RIAA vinyl playback equalization. |
|
Design a treble tone-control effect. |
Feature Extractions¶
Voice Activity Detector. |
|
Create a spectrogram or a batch of spectrograms from a raw audio signal. |
|
Create an inverse spectrogram or a batch of inverse spectrograms from the provided complex-valued spectrogram. |
|
Compute waveform from a linear scale magnitude spectrogram using the Griffin-Lim transformation. |
|
Given a STFT tensor, speed up in time without modifying pitch by a factor of |
|
Shift the pitch of a waveform by |
|
Compute delta coefficients of a tensor, usually a spectrogram: |
|
Detect pitch frequency. |
|
Apply sliding-window cepstral mean (and optionally variance) normalization per utterance. |
|
Extract pitch based on method described in A pitch extraction algorithm tuned for automatic speech recognition [Ghahremani et al., 2014]. |
|
Compute the spectral centroid for each channel along the time axis. |
Multi-channel¶
Compute cross-channel power spectral density (PSD) matrix. |
|
Compute the Minimum Variance Distortionless Response (MVDR [Capon, 1969]) beamforming weights by the method proposed by Souden et, al. [Souden et al., 2009]. |
|
Compute the Minimum Variance Distortionless Response (MVDR [Capon, 1969]) beamforming weights based on the relative transfer function (RTF) and power spectral density (PSD) matrix of noise. |
|
Estimate the relative transfer function (RTF) or the steering vector by eigenvalue decomposition. |
|
Estimate the relative transfer function (RTF) or the steering vector by the power method. |
|
Apply the beamforming weight to the multi-channel noisy spectrum to obtain the single-channel enhanced spectrum. |
Loss¶
Compute the RNN Transducer loss from Sequence Transduction with Recurrent Neural Networks [Graves, 2012]. |
Metric¶
Calculate the word level edit (Levenshtein) distance between two sequences. |