torchaudio.functional¶
Functions to perform common audio operations.
Utility¶
Turn a spectrogram from the power/amplitude scale to the decibel scale. 

Turn a tensor from the decibel scale to the power/amplitude scale. 

Create a frequency bin conversion matrix. 

Creates a linear triangular filterbank. 

Create a DCT transformation matrix with shape ( 

Apply a mask along 

Apply a mask along 

Encode signal based on mulaw companding. 

Decode mulaw encoded signal. 

Apply codecs as a form of augmentation. 

Resamples the waveform at the new frequency using bandlimited interpolation. 

Measure audio loudness according to the ITUR BS.17704 recommendation. 
Filtering¶
Design twopole allpass filter. 

Design twopole band filter. 

Design twopole bandpass filter. 

Design twopole bandreject filter. 

Design a bass tonecontrol effect. 

Perform a biquad filter of input tensor. 

Apply contrast effect. 

Apply a DC shift to the audio. 

Apply ISO 908 CD deemphasis (shelving) IIR filter. 

Apply dither 

Design biquad peaking equalizer filter and perform filtering. 

Apply an IIR filter forward and backward to a waveform. 

Apply a flanger effect to the audio. 

Apply amplification or attenuation to the whole waveform. 

Design biquad highpass filter and perform filtering. 

Perform an IIR filter by evaluating difference equation. 

Design biquad lowpass filter and perform filtering. 

Apply a overdrive effect to the audio. 

Apply a phasing effect to the audio. 

Apply RIAA vinyl playback equalization. 

Design a treble tonecontrol effect. 
Feature Extractions¶
Voice Activity Detector. 

Create a spectrogram or a batch of spectrograms from a raw audio signal. 

Create an inverse spectrogram or a batch of inverse spectrograms from the provided complexvalued spectrogram. 

Compute waveform from a linear scale magnitude spectrogram using the GriffinLim transformation. 

Given a STFT tensor, speed up in time without modifying pitch by a factor of 

Shift the pitch of a waveform by 

Compute delta coefficients of a tensor, usually a spectrogram: 

Detect pitch frequency. 

Apply slidingwindow cepstral mean (and optionally variance) normalization per utterance. 

Extract pitch based on method described in A pitch extraction algorithm tuned for automatic speech recognition [Ghahremani et al., 2014]. 

Compute the spectral centroid for each channel along the time axis. 
Multichannel¶
Compute crosschannel power spectral density (PSD) matrix. 

Compute the Minimum Variance Distortionless Response (MVDR [Capon, 1969]) beamforming weights by the method proposed by Souden et, al. [Souden et al., 2009]. 

Compute the Minimum Variance Distortionless Response (MVDR [Capon, 1969]) beamforming weights based on the relative transfer function (RTF) and power spectral density (PSD) matrix of noise. 

Estimate the relative transfer function (RTF) or the steering vector by eigenvalue decomposition. 

Estimate the relative transfer function (RTF) or the steering vector by the power method. 

Apply the beamforming weight to the multichannel noisy spectrum to obtain the singlechannel enhanced spectrum. 
Loss¶
Compute the RNN Transducer loss from Sequence Transduction with Recurrent Neural Networks [Graves, 2012]. 
Metric¶
Calculate the word level edit (Levenshtein) distance between two sequences. 