torchaudio.load¶
- torchaudio.load(uri: Union[BinaryIO, str, PathLike], frame_offset: int = 0, num_frames: int = -1, normalize: bool = True, channels_first: bool = True, format: Optional[str] = None, buffer_size: int = 4096, backend: Optional[str] = None) Tuple[Tensor, int] ¶
Load audio data from source.
By default (
normalize=True
,channels_first=True
), this function returns Tensor withfloat32
dtype, and the shape of [channel, time].Note
The formats this function can handle depend on the availability of backends. Please use the following functions to fetch the supported formats.
SoundFile: Refer to the official document.
Warning
normalize
argument does not perform volume normalization. It only converts the sample type to torch.float32 from the native sample type.When the input format is WAV with integer type, such as 32-bit signed integer, 16-bit signed integer, 24-bit signed integer, and 8-bit unsigned integer, by providing
normalize=False
, this function can return integer Tensor, where the samples are expressed within the whole range of the corresponding dtype, that is,int32
tensor for 32-bit signed PCM,int16
for 16-bit signed PCM anduint8
for 8-bit unsigned PCM. Since torch does not supportint24
dtype, 24-bit signed PCM are converted toint32
tensors.normalize
argument has no effect on 32-bit floating-point WAV and other formats, such asflac
andmp3
.For these formats, this function always returns
float32
Tensor with values.- Parameters:
uri (path-like object or file-like object) – Source of audio data.
frame_offset (int, optional) – Number of frames to skip before start reading data.
num_frames (int, optional) – Maximum number of frames to read.
-1
reads all the remaining samples, starting fromframe_offset
. This function may return the less number of frames if there is not enough frames in the given file.normalize (bool, optional) –
When
True
, this function converts the native sample type tofloat32
. Default:True
.If input file is integer WAV, giving
False
will change the resulting Tensor type to integer type. This argument has no effect for formats other than integer WAV type.channels_first (bool, optional) – When True, the returned Tensor has dimension [channel, time]. Otherwise, the returned Tensor’s dimension is [time, channel].
format (str or None, optional) – If not
None
, interpreted as hint that may allow backend to override the detected format. (Default:None
)buffer_size (int, optional) – Size of buffer to use when processing file-like objects, in bytes. (Default:
4096
)backend (str or None, optional) –
I/O backend to use. If
None
, function selects backend given input and available backends. Otherwise, must be one of ["ffmpeg"
,"sox"
,"soundfile"
], with the corresponding backend being available. (Default:None
)See also
- Returns:
- Resulting Tensor and sample rate.
If the input file has integer wav format and normalization is off, then it has integer type, else
float32
type. Ifchannels_first=True
, it has [channel, time] else [time, channel].
- Return type:
(torch.Tensor, int)
- Tutorials using
load
: - StreamWriter Advanced Usage
Speech Recognition with Wav2Vec2
Speech Recognition with Wav2Vec2AudioEffector UsagesASR Inference with CUDA CTC Decoder
ASR Inference with CUDA CTC DecoderStreamWriter Basic UsageTorchaudio-Squim: Non-intrusive Speech Assessment in TorchAudio
Torchaudio-Squim: Non-intrusive Speech Assessment in TorchAudioMusic Source Separation with Hybrid Demucs
Music Source Separation with Hybrid DemucsAudio I/OSpeech Enhancement with MVDR Beamforming
Speech Enhancement with MVDR BeamformingCTC forced alignment API tutorial
CTC forced alignment API tutorialAudio Feature ExtractionsAudio Data AugmentationASR Inference with CTC Decoder
ASR Inference with CTC DecoderForced alignment for multilingual data
Forced alignment for multilingual dataForced Alignment with Wav2Vec2
Forced Alignment with Wav2Vec2