• Docs >
  • torchaudio.functional >
  • Nightly (unstable)
Shortcuts

torchaudio.functional

Functions to perform common audio operations.

Utility

amplitude_to_DB

Turn a spectrogram from the power/amplitude scale to the decibel scale.

DB_to_amplitude

Turn a tensor from the decibel scale to the power/amplitude scale.

melscale_fbanks

Create a frequency bin conversion matrix.

linear_fbanks

Creates a linear triangular filterbank.

create_dct

Create a DCT transformation matrix with shape (n_mels, n_mfcc), normalized depending on norm.

mask_along_axis

Apply a mask along axis.

mask_along_axis_iid

Apply a mask along axis.

mu_law_encoding

Encode signal based on mu-law companding.

mu_law_decoding

Decode mu-law encoded signal.

apply_codec

DEPRECATED: Apply codecs as a form of augmentation.

resample

Resamples the waveform at the new frequency using bandlimited interpolation.

loudness

Measure audio loudness according to the ITU-R BS.1770-4 recommendation.

convolve

Convolves inputs along their last dimension using the direct method.

fftconvolve

Convolves inputs along their last dimension using FFT.

add_noise

Scales and adds noise to waveform per signal-to-noise ratio.

preemphasis

Pre-emphasizes a waveform along its last dimension, i.e. for each signal \(x\) in waveform, computes output \(y\) as.

deemphasis

De-emphasizes a waveform along its last dimension.

speed

Adjusts waveform speed.

frechet_distance

Computes the Fréchet distance between two multivariate normal distributions [Dowson and Landau, 1982].

Forced Alignment

forced_align

Align a CTC label sequence to an emission.

merge_tokens

Removes repeated tokens and blank tokens from the given CTC token sequence.

TokenSpan

Token with time stamps and score.

Filtering

allpass_biquad

Design two-pole all-pass filter.

band_biquad

Design two-pole band filter.

bandpass_biquad

Design two-pole band-pass filter.

bandreject_biquad

Design two-pole band-reject filter.

bass_biquad

Design a bass tone-control effect.

biquad

Perform a biquad filter of input tensor.

contrast

Apply contrast effect.

dcshift

Apply a DC shift to the audio.

deemph_biquad

Apply ISO 908 CD de-emphasis (shelving) IIR filter.

dither

Apply dither

equalizer_biquad

Design biquad peaking equalizer filter and perform filtering.

filtfilt

Apply an IIR filter forward and backward to a waveform.

flanger

Apply a flanger effect to the audio.

gain

Apply amplification or attenuation to the whole waveform.

highpass_biquad

Design biquad highpass filter and perform filtering.

lfilter

Perform an IIR filter by evaluating difference equation, using differentiable implementation developed independently by Yu et al. [Yu and Fazekas, 2023] and Forgione et al. [Forgione and Piga, 2021].

lowpass_biquad

Design biquad lowpass filter and perform filtering.

overdrive

Apply a overdrive effect to the audio.

phaser

Apply a phasing effect to the audio.

riaa_biquad

Apply RIAA vinyl playback equalization.

treble_biquad

Design a treble tone-control effect.

Feature Extractions

vad

Voice Activity Detector.

spectrogram

Create a spectrogram or a batch of spectrograms from a raw audio signal.

inverse_spectrogram

Create an inverse spectrogram or a batch of inverse spectrograms from the provided complex-valued spectrogram.

griffinlim

Compute waveform from a linear scale magnitude spectrogram using the Griffin-Lim transformation.

phase_vocoder

Given a STFT tensor, speed up in time without modifying pitch by a factor of rate.

pitch_shift

Shift the pitch of a waveform by n_steps steps.

compute_deltas

Compute delta coefficients of a tensor, usually a spectrogram:

detect_pitch_frequency

Detect pitch frequency.

sliding_window_cmn

Apply sliding-window cepstral mean (and optionally variance) normalization per utterance.

spectral_centroid

Compute the spectral centroid for each channel along the time axis.

Multi-channel

psd

Compute cross-channel power spectral density (PSD) matrix.

mvdr_weights_souden

Compute the Minimum Variance Distortionless Response (MVDR [Capon, 1969]) beamforming weights by the method proposed by Souden et, al. [Souden et al., 2009].

mvdr_weights_rtf

Compute the Minimum Variance Distortionless Response (MVDR [Capon, 1969]) beamforming weights based on the relative transfer function (RTF) and power spectral density (PSD) matrix of noise.

rtf_evd

Estimate the relative transfer function (RTF) or the steering vector by eigenvalue decomposition.

rtf_power

Estimate the relative transfer function (RTF) or the steering vector by the power method.

apply_beamforming

Apply the beamforming weight to the multi-channel noisy spectrum to obtain the single-channel enhanced spectrum.

Loss

rnnt_loss

Compute the RNN Transducer loss from Sequence Transduction with Recurrent Neural Networks [Graves, 2012].

Metric

edit_distance

Calculate the word level edit (Levenshtein) distance between two sequences.

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources