Create SoX effects chain for preprocessing audio.
Create an object for passing sox effect information between python and c++
An object with the following attributes: ename (str) which is the name of effect, and eopts (List[str]) which is a list of effect options.
- Return type
SoxEffectsChain(normalization: Union[bool, float, Callable] = True, channels_first: bool = True, out_siginfo: Any = None, out_encinfo: Any = None, filetype: str = 'raw')¶
SoX effects chain class.
normalization (bool, number, or callable, optional) – If boolean
True, then output is divided by
1 << 31(assumes signed 32-bit audio), and normalizes to
[-1, 1]. If
number, then output is divided by that number. If
callable, then the output is passed as a parameter to the given function, then the output is divided by the result. (Default:
channels_first (bool, optional) – Set channels first or length first in result. (Default:
out_siginfo (sox_signalinfo_t, optional) – a sox_signalinfo_t type, which could be helpful if the audio type cannot be automatically determined. (Default:
out_encinfo (sox_encodinginfo_t, optional) – a sox_encodinginfo_t type, which could be set if the audio type cannot be automatically determined. (Default:
filetype (str, optional) – a filetype or extension to be set if sox cannot determine it automatically. (Default:
An output Tensor of size
[C x L]or
[L x C]where L is the number of audio frames and C is the number of channels. An integer which is the sample rate of the audio (as listed in the metadata of the file)
- Return type
>>> class MyDataset(Dataset): ... def __init__(self, audiodir_path): ... self.data = [ ... os.path.join(audiodir_path, fn) ... for fn in os.listdir(audiodir_path)] ... self.E = torchaudio.sox_effects.SoxEffectsChain() ... self.E.append_effect_to_chain("rate", ) # resample to 16000hz ... self.E.append_effect_to_chain("channels", ["1"]) # mono signal ... def __getitem__(self, index): ... fn = self.data[index] ... self.E.set_input_file(fn) ... x, sr = self.E.sox_build_flow_effects() ... return x, sr ... ... def __len__(self): ... return len(self.data) ... >>> ds = MyDataset(path_to_audio_files) >>> for sig, sr in ds: ... pass
append_effect_to_chain(ename: str, eargs: Optional[Union[List[str], str]] = None) → None¶
Append effect to a sox effects chain.
set_input_file(input_file: str) → None¶
Set input file for input of chain
input_file (str) – The path to the input file.
sox_build_flow_effects(out: Optional[torch.Tensor] = None) → Tuple[torch.Tensor, int]¶
Build effects chain and flow effects from input file to output tensor
out (Tensor, optional) – Where the output will be written to. (Default:
An output Tensor of size [C x L] or [L x C] where L is the number of audio frames and C is the number of channels. An integer which is the sample rate of the audio (as listed in the metadata of the file)
- Return type