torchaudio.functional

Functions to perform common audio operations.

Utility

`amplitude_to_DB`	Turn a spectrogram from the power/amplitude scale to the decibel scale.
`DB_to_amplitude`	Turn a tensor from the decibel scale to the power/amplitude scale.
`melscale_fbanks`	Create a frequency bin conversion matrix.
`linear_fbanks`	Creates a linear triangular filterbank.
`create_dct`	Create a DCT transformation matrix with shape (`n_mels`, `n_mfcc`), normalized depending on norm.
`mask_along_axis`	Apply a mask along `axis`.
`mask_along_axis_iid`	Apply a mask along `axis`.
`mu_law_encoding`	Encode signal based on mu-law companding.
`mu_law_decoding`	Decode mu-law encoded signal.
`apply_codec`	DEPRECATED: Apply codecs as a form of augmentation.
`resample`	Resamples the waveform at the new frequency using bandlimited interpolation.
`loudness`	Measure audio loudness according to the ITU-R BS.1770-4 recommendation.
`convolve`	Convolves inputs along their last dimension using the direct method.
`fftconvolve`	Convolves inputs along their last dimension using FFT.
`add_noise`	Scales and adds noise to waveform per signal-to-noise ratio.
`preemphasis`	Pre-emphasizes a waveform along its last dimension, i.e. for each signal $x$ in `waveform`, computes output $y$ as.
`deemphasis`	De-emphasizes a waveform along its last dimension.
`speed`	Adjusts waveform speed.
`frechet_distance`	Computes the Fréchet distance between two multivariate normal distributions [Dowson and Landau, 1982].

`forced_align`	Align a CTC label sequence to an emission.
`merge_tokens`	Removes repeated tokens and blank tokens from the given CTC token sequence.
`TokenSpan`	Token with time stamps and score.

`allpass_biquad`	Design two-pole all-pass filter.
`band_biquad`	Design two-pole band filter.
`bandpass_biquad`	Design two-pole band-pass filter.
`bandreject_biquad`	Design two-pole band-reject filter.
`bass_biquad`	Design a bass tone-control effect.
`biquad`	Perform a biquad filter of input tensor.
`contrast`	Apply contrast effect.
`dcshift`	Apply a DC shift to the audio.
`deemph_biquad`	Apply ISO 908 CD de-emphasis (shelving) IIR filter.
`dither`	Apply dither
`equalizer_biquad`	Design biquad peaking equalizer filter and perform filtering.
`filtfilt`	Apply an IIR filter forward and backward to a waveform.
`flanger`	Apply a flanger effect to the audio.
`gain`	Apply amplification or attenuation to the whole waveform.
`highpass_biquad`	Design biquad highpass filter and perform filtering.
`lfilter`	Perform an IIR filter by evaluating difference equation, using differentiable implementation developed separately by Yu et al. [Yu and Fazekas, 2023] and Forgione et al. [Forgione and Piga, 2021].
`lowpass_biquad`	Design biquad lowpass filter and perform filtering.
`overdrive`	Apply a overdrive effect to the audio.
`phaser`	Apply a phasing effect to the audio.
`riaa_biquad`	Apply RIAA vinyl playback equalization.
`treble_biquad`	Design a treble tone-control effect.

`vad`	Voice Activity Detector.
`spectrogram`	Create a spectrogram or a batch of spectrograms from a raw audio signal.
`inverse_spectrogram`	Create an inverse spectrogram or a batch of inverse spectrograms from the provided complex-valued spectrogram.
`griffinlim`	Compute waveform from a linear scale magnitude spectrogram using the Griffin-Lim transformation.
`phase_vocoder`	Given a STFT tensor, speed up in time without modifying pitch by a factor of `rate`.
`pitch_shift`	Shift the pitch of a waveform by `n_steps` steps.
`compute_deltas`	Compute delta coefficients of a tensor, usually a spectrogram:
`detect_pitch_frequency`	Detect pitch frequency.
`sliding_window_cmn`	Apply sliding-window cepstral mean (and optionally variance) normalization per utterance.
`spectral_centroid`	Compute the spectral centroid for each channel along the time axis.

`psd`	Compute cross-channel power spectral density (PSD) matrix.
`mvdr_weights_souden`	Compute the Minimum Variance Distortionless Response (MVDR [Capon, 1969]) beamforming weights by the method proposed by Souden et, al. [Souden et al., 2009].
`mvdr_weights_rtf`	Compute the Minimum Variance Distortionless Response (MVDR [Capon, 1969]) beamforming weights based on the relative transfer function (RTF) and power spectral density (PSD) matrix of noise.
`rtf_evd`	Estimate the relative transfer function (RTF) or the steering vector by eigenvalue decomposition.
`rtf_power`	Estimate the relative transfer function (RTF) or the steering vector by the power method.
`apply_beamforming`	Apply the beamforming weight to the multi-channel noisy spectrum to obtain the single-channel enhanced spectrum.

Compute the RNN Transducer loss from Sequence Transduction with Recurrent Neural Networks [Graves, 2012].

Calculate the word level edit (Levenshtein) distance between two sequences.