torchaudio¶
Note
Release 2.1 will revise torchaudio.info
, torchaudio.load
, and torchaudio.save
to allow for backend selection via function parameter rather than torchaudio.set_audio_backend
, with FFmpeg being the default backend.
The new API can be enabled in the current release by setting environment variable TORCHAUDIO_USE_BACKEND_DISPATCHER=1
.
See Future API for details on the new API.
Current API¶
I/O functionalities¶
Audio I/O functions are implemented in torchaudio.backend module, but for the ease of use, the following functions are made available on torchaudio
module. There are different backends available and you can switch backends with set_audio_backend()
.
Please refer to torchaudio.backend for the detail, and the Audio I/O tutorial for the usage.
- torchaudio.info(filepath: str, ...)¶
Fetch meta data of an audio file. Refer to torchaudio.backend for the detail.
- torchaudio.load(filepath: str, ...)¶
Load audio file into torch.Tensor object. Refer to torchaudio.backend for the detail.
- torchaudio.save(filepath: str, src: torch.Tensor, sample_rate: int, ...)¶
Save torch.Tensor object into an audio format. Refer to torchaudio.backend for the detail.
Backend Utilities¶
- torchaudio.list_audio_backends() List[str] [source]¶
List available backends
- Returns:
The list of available backends.
- Return type:
List[str]
Future API¶
In the next release, each of torchaudio.info
, torchaudio.load
, and torchaudio.save
will allow for selecting a backend to use via parameter backend
.
The functions will support using any of FFmpeg, SoX, and SoundFile, provided that the corresponding library is installed.
If a backend is not explicitly chosen, the functions will select a backend to use given order of precedence (FFmpeg, SoX, SoundFile) and library availability.
Note that only FFmpeg and SoundFile will support file-like objects.
These functions can be enabled in the current release by setting environment variable TORCHAUDIO_USE_BACKEND_DISPATCHER=1
.
- torchaudio.info(uri: Union[BinaryIO, str, PathLike], format: Optional[str] = None, buffer_size: int = 4096, backend: Optional[str] = None) AudioMetaData
Get signal information of an audio file.
- Parameters:
uri (path-like object or file-like object) –
Source of audio data. The following types are accepted:
path-like
: file pathfile-like
: Object withread(size: int) -> bytes
method, which returns byte string of at mostsize
length.
Note
When the input type is file-like object, this function cannot get the correct length (
num_samples
) for certain formats, such asvorbis
. In this case, the value ofnum_samples
is0
.format (str or None, optional) – If not
None
, interpreted as hint that may allow backend to override the detected format. (Default:None
)buffer_size (int, optional) – Size of buffer to use when processing file-like objects, in bytes. (Default:
4096
)backend (str or None, optional) – I/O backend to use. If
None
, function selects backend given input and available backends. Otherwise, must be one of [“ffmpeg”, “sox”, “soundfile”], with the corresponding backend available. (Default:None
)
- Returns:
Metadata of the given audio.
- Return type:
- torchaudio.load(uri: Union[BinaryIO, str, PathLike], frame_offset: int = 0, num_frames: int = -1, normalize: bool = True, channels_first: bool = True, format: Optional[str] = None, buffer_size: int = 4096, backend: Optional[str] = None) Tuple[Tensor, int]
Load audio data from file.
Note
The formats this function can handle depend on backend availability. This function is tested on the following formats:
WAV
32-bit floating-point
32-bit signed integer
24-bit signed integer
16-bit signed integer
8-bit unsigned integer
FLAC
OGG/VORBIS
SPHERE
By default (
normalize=True
,channels_first=True
), this function returns Tensor withfloat32
dtype, and the shape of [channel, time].Warning
normalize
argument does not perform volume normalization. It only converts the sample type to torch.float32 from the native sample type.When the input format is WAV with integer type, such as 32-bit signed integer, 16-bit signed integer, 24-bit signed integer, and 8-bit unsigned integer, by providing
normalize=False
, this function can return integer Tensor, where the samples are expressed within the whole range of the corresponding dtype, that is,int32
tensor for 32-bit signed PCM,int16
for 16-bit signed PCM anduint8
for 8-bit unsigned PCM. Since torch does not supportint24
dtype, 24-bit signed PCM are converted toint32
tensors.normalize
argument has no effect on 32-bit floating-point WAV and other formats, such asflac
andmp3
.For these formats, this function always returns
float32
Tensor with values.- Parameters:
uri (path-like object or file-like object) – Source of audio data.
frame_offset (int, optional) – Number of frames to skip before start reading data.
num_frames (int, optional) – Maximum number of frames to read.
-1
reads all the remaining samples, starting fromframe_offset
. This function may return the less number of frames if there is not enough frames in the given file.normalize (bool, optional) –
When
True
, this function converts the native sample type tofloat32
. Default:True
.If input file is integer WAV, giving
False
will change the resulting Tensor type to integer type. This argument has no effect for formats other than integer WAV type.channels_first (bool, optional) – When True, the returned Tensor has dimension [channel, time]. Otherwise, the returned Tensor’s dimension is [time, channel].
format (str or None, optional) – If not
None
, interpreted as hint that may allow backend to override the detected format. (Default:None
)buffer_size (int, optional) – Size of buffer to use when processing file-like objects, in bytes. (Default:
4096
)backend (str or None, optional) – I/O backend to use. If
None
, function selects backend given input and available backends. Otherwise, must be one of [“ffmpeg”, “sox”, “soundfile”], with the corresponding backend being available. (Default:None
)
- Returns:
- Resulting Tensor and sample rate.
If the input file has integer wav format and normalization is off, then it has integer type, else
float32
type. Ifchannels_first=True
, it has [channel, time] else [time, channel].
- Return type:
(torch.Tensor, int)
- torchaudio.save(uri: Union[BinaryIO, str, PathLike], src: Tensor, sample_rate: int, channels_first: bool = True, format: Optional[str] = None, encoding: Optional[str] = None, bits_per_sample: Optional[int] = None, buffer_size: int = 4096, backend: Optional[str] = None)
Save audio data to file.
Note
The formats this function can handle depend on the availability of backends. This function is tested on the following formats:
WAV
32-bit floating-point
32-bit signed integer
16-bit signed integer
8-bit unsigned integer
FLAC
OGG/VORBIS
- Parameters:
uri (str or pathlib.Path) – Path to audio file.
src (torch.Tensor) – Audio data to save. must be 2D tensor.
sample_rate (int) – sampling rate
channels_first (bool, optional) – If
True
, the given tensor is interpreted as [channel, time], otherwise [time, channel].format (str or None, optional) –
Override the audio format. When
uri
argument is path-like object, audio format is inferred from file extension. If the file extension is missing or different, you can specify the correct format with this argument.When
uri
argument is file-like object, this argument is required.Valid values are
"wav"
,"ogg"
, and"flac"
.encoding (str or None, optional) –
Changes the encoding for supported formats. This argument is effective only for supported formats, i.e.
"wav"
and""flac"`
. Valid values are"PCM_S"
(signed integer Linear PCM)"PCM_U"
(unsigned integer Linear PCM)"PCM_F"
(floating point PCM)"ULAW"
(mu-law)"ALAW"
(a-law)
bits_per_sample (int or None, optional) – Changes the bit depth for the supported formats. When
format
is one of"wav"
and"flac"
, you can change the bit depth. Valid values are8
,16
,24
,32
and64
.buffer_size (int, optional) – Size of buffer to use when processing file-like objects, in bytes. (Default:
4096
)backend (str or None, optional) – I/O backend to use. If
None
, function selects backend given input and available backends. Otherwise, must be one of [“ffmpeg”, “sox”, “soundfile”], with the corresponding backend being available. (Default:None
)
Supported formats/encodings/bit depth/compression are:
"wav"
32-bit floating-point PCM
32-bit signed integer PCM
24-bit signed integer PCM
16-bit signed integer PCM
8-bit unsigned integer PCM
8-bit mu-law
8-bit a-law
- Note:
Default encoding/bit depth is determined by the dtype of the input Tensor.
"flac"
16-bit (default)
24-bit
"ogg"
Doesn’t accept changing configuration.