Warning
TorchAudio’s C++ API is a prototype feature. API/ABI backward compatibility is not guaranteed.
Note
The top-level namespace has been changed from torchaudio
to torio
.
StreamReader
has been renamed to StreamingMediaDecoder
.
torio::io::StreamingMediaDecoder¶
StreamingMediaDecoder
is the implementation used by Python equivalent and provides similar interface.
When working with custom I/O, such as in-memory data, StreamingMediaDecoderCustomIO
class can be used.
Both classes have the same methods defined, so their usages are the same.
Constructors¶
StreamingMediaDecoder¶
-
class StreamingMediaDecoder¶
Fetch and decode audio/video streams chunk by chunk.
Subclassed by torio::io::StreamingMediaDecoderCustomIO
-
explicit torio::io::StreamingMediaDecoder::StreamingMediaDecoder(const std::string &src, const c10::optional<std::string> &format = c10::nullopt, const c10::optional<OptionDict> &option = c10::nullopt)¶
Construct media processor from soruce URI.
- Parameters:
src – URL of source media, in the format FFmpeg can understand.
format – Specifies format (such as mp4) or device (such as lavfi and avfoundation)
option – Custom option passed when initializing format context (opening source).
StreamingMediaDecoderCustomIO¶
-
class StreamingMediaDecoderCustomIO : private detail::CustomInput, public torio::io::StreamingMediaDecoder¶
A subclass of StreamingMediaDecoder which works with custom read function. Can be used for decoding media from memory or custom object.
-
torio::io::StreamingMediaDecoderCustomIO::StreamingMediaDecoderCustomIO(void *opaque, const c10::optional<std::string> &format, int buffer_size, int (*read_packet)(void *opaque, uint8_t *buf, int buf_size), int64_t (*seek)(void *opaque, int64_t offset, int whence) = nullptr, const c10::optional<OptionDict> &option = c10::nullopt)¶
Construct StreamingMediaDecoder with custom read and seek functions.
- Parameters:
opaque – Custom data used by
read_packet
andseek
functions.format – Specify input format.
buffer_size – The size of the intermediate buffer, which FFmpeg uses to pass data to function read_packet.
read_packet – Custom read function that is called from FFmpeg to read data from the destination.
seek – Optional seek function that is used to seek the destination.
option – Custom option passed when initializing format context.
Query Methods¶
find_best_audio_stream¶
-
int64_t torio::io::StreamingMediaDecoder::find_best_audio_stream() const¶
Find a suitable audio stream using heuristics from ffmpeg.
If successful, the index of the best stream (>=0) is returned. Otherwise a negative value is returned.
find_best_video_stream¶
-
int64_t torio::io::StreamingMediaDecoder::find_best_video_stream() const¶
Find a suitable video stream using heuristics from ffmpeg.
If successful, the index of the best stream (0>=) is returned. otherwise a negative value is returned.
get_metadata¶
-
OptionDict torio::io::StreamingMediaDecoder::get_metadata() const¶
Fetch metadata of the source media.
num_src_streams¶
-
int64_t torio::io::StreamingMediaDecoder::num_src_streams() const¶
Fetch the number of source streams found in the input media.
The source streams include not only audio/video streams but also subtitle and others.
get_src_stream_info¶
-
SrcStreamInfo torio::io::StreamingMediaDecoder::get_src_stream_info(int i) const¶
Fetch information about the specified source stream.
The valid value range is
[0, num_src_streams())
.
num_out_streams¶
-
int64_t torio::io::StreamingMediaDecoder::num_out_streams() const¶
Fetch the number of output streams defined by client code.
get_out_stream_info¶
-
OutputStreamInfo torio::io::StreamingMediaDecoder::get_out_stream_info(int i) const¶
Fetch information about the specified output stream.
The valid value range is
[0, num_out_streams())
.
is_buffer_ready¶
-
bool torio::io::StreamingMediaDecoder::is_buffer_ready() const¶
Check if all the buffers of the output streams have enough decoded frames.
Configure Methods¶
add_audio_stream¶
-
void torio::io::StreamingMediaDecoder::add_audio_stream(int64_t i, int64_t frames_per_chunk, int64_t num_chunks, const c10::optional<std::string> &filter_desc = c10::nullopt, const c10::optional<std::string> &decoder = c10::nullopt, const c10::optional<OptionDict> &decoder_option = c10::nullopt)¶
Define an output audio stream.
- Parameters:
i – The index of the source stream.
frames_per_chunk – Number of frames returned as one chunk.
If a source stream is exhausted before
frames_per_chunk
frames are buffered, the chunk is returned as-is. Thus the number of frames in the chunk may be smaller than ``frames_per_chunk
.Providing
-1
disables chunking, in which case, methodpop_chunks()
returns all the buffered frames as one chunk.num_chunks – Internal buffer size.
When the number of buffered chunks exceeds this number, old chunks are dropped. For example, if
frames_per_chunk
is 5 andbuffer_chunk_size
is 3, then frames older than 15 are dropped.Providing
-1
disables this behavior, forcing the retention of all chunks.filter_desc – Description of filter graph applied to the source stream.
decoder – The name of the decoder to be used. When provided, use the specified decoder instead of the default one.
decoder_option – Options passed to decoder.
To list decoder options for a decoder, you can use
ffmpeg -h decoder=<DECODER>
command.In addition to decoder-specific options, you can also pass options related to multithreading. They are effective only if the decoder supports them. If neither of them are provided, StreamingMediaDecoder defaults to single thread.
"threads"
: The number of threads or the value"0"
to let FFmpeg decide based on its heuristics."thread_type"
: Which multithreading method to use. The valid values are"frame"
or"slice"
. Note that each decoder supports a different set of methods. If not provided, a default value is used."frame"
: Decode more than one frame at once. Each thread handles one frame. This will increase decoding delay by one frame per thread"slice"
: Decode more than one part of a single frame at once.
add_video_stream¶
-
void torio::io::StreamingMediaDecoder::add_video_stream(int64_t i, int64_t frames_per_chunk, int64_t num_chunks, const c10::optional<std::string> &filter_desc = c10::nullopt, const c10::optional<std::string> &decoder = c10::nullopt, const c10::optional<OptionDict> &decoder_option = c10::nullopt, const c10::optional<std::string> &hw_accel = c10::nullopt)¶
Define an output video stream.
- Parameters:
i, frames_per_chunk, num_chunks, filter_desc, decoder, decoder_option – See
add_audio_stream()
.hw_accel – Enable hardware acceleration.
When video is decoded on CUDA hardware, (for example by specifying
"h264_cuvid"
decoder), passing CUDA device indicator tohw_accel
(i.e.hw_accel="cuda:0"
) will make StreamingMediaDecoder place the resulting frames directly on the specified CUDA device as a CUDA tensor.If
None
, the chunk will be moved to CPU memory.
remove_stream¶
-
void torio::io::StreamingMediaDecoder::remove_stream(int64_t i)¶
Remove an output stream.
- Parameters:
i – The index of the output stream to be removed. The valid value range is
[0, num_out_streams())
.
Stream Methods¶
seek¶
-
void torio::io::StreamingMediaDecoder::seek(double timestamp, int64_t mode)¶
Seek into the given time stamp.
- Parameters:
timestamp – Target time stamp in second.
mode – Seek mode.
0
: Keyframe mode. Seek into nearest key frame before the given timestamp.1
: Any mode. Seek into any frame (including non-key frames) before the given timestamp.2
: Precise mode. First seek into the nearest key frame before the given timestamp, then decode frames until it reaches the frame closest to the given timestamp.
process_packet¶
-
int torio::io::StreamingMediaDecoder::process_packet()¶
Demultiplex and process one packet.
- Returns:
0
: A packet was processed successfully and there are still packets left in the stream, so client code can call this method again.1
: A packet was processed successfully and it reached EOF. Client code should not call this method again.<0
: An error has happened.
process_packet_block¶
-
int torio::io::StreamingMediaDecoder::process_packet_block(const double timeout, const double backoff)¶
Similar to
process_packet()
, but in case it fails due to resource temporarily being unavailable, it automatically retries.This behavior is helpful when using device input, such as a microphone, during which the buffer may be busy while sample acquisition is happening.
- Parameters:
timeout – Timeout in milli seconds.
>=0
: Keep retrying until the given time passes.<0
: Keep retrying forever.
backoff – Time to wait before retrying in milli seconds.
process_all_packets¶
-
void torio::io::StreamingMediaDecoder::process_all_packets()¶
Process packets unitl EOF.
fill_buffer¶
-
int torio::io::StreamingMediaDecoder::fill_buffer(const c10::optional<double> &timeout = c10::nullopt, const double backoff = 10.)¶
Process packets until all the chunk buffers have at least one chunk
- Parameters:
timeout – See
process_packet_block()
backoff – See
process_packet_block()
Retrieval Methods¶
pop_chunks¶
-
std::vector<c10::optional<Chunk>> torio::io::StreamingMediaDecoder::pop_chunks()¶
Pop one chunk from each output stream if it is available.
Support Structures¶
Chunk¶
-
struct Chunk¶
Stores decoded frames and metadata.
Public Members
-
torch::Tensor frames¶
Audio/video frames.
For audio, the shape is
[time, num_channels]
, and thedtype
depends on output stream configurations.For video, the shape is
[time, channel, height, width]
, and thedtype
istorch.uint8
.
-
double pts¶
Presentation time stamp of the first frame, in second.
-
torch::Tensor frames¶
SrcStreaminfo¶
-
struct SrcStreamInfo¶
Information about source stream found in the input media.
COMMON MEMBERS
-
AVMediaType media_type¶
The stream media type.
Please see refer to the FFmpeg documentation for the available values
- Todo:
Introduce own enum and get rid of FFmpeg dependency
-
const char *codec_name = "N/A"¶
The name of codec.
-
const char *codec_long_name = "N/A"¶
The name of codec in long, human friendly form.
-
const char *fmt_name = "N/A"¶
For audio, it is sample format.
Commonly found values are;
"u8"
,"u8p"
: 8-bit unsigned integer."s16"
,"s16p"
: 16-bit signed integer."s32"
,"s32p"
: 32-bit signed integer."s64"
,"s64p"
: 64-bit signed integer."flt"
,"fltp"
: 32-bit floating point."dbl"
,"dblp"
: 64-bit floating point.
For video, it is color channel format.
Commonly found values include;
"gray8"
: grayscale"rgb24"
: RGB"bgr24"
: BGR"yuv420p"
: YUV420p
-
int64_t bit_rate = 0¶
Bit rate.
-
int64_t num_frames = 0¶
Number of frames.
Note
In some formats, the value is not reliable or unavailable.
-
int bits_per_sample = 0¶
Bits per sample.
-
OptionDict metadata = {}¶
Metadata
This method can fetch ID3 tag from MP3.
Example:
{ "title": "foo", "artist": "bar", "date": "2017" }
-
AVMediaType media_type¶
OutputStreaminfo¶
-
struct OutputStreamInfo¶
Information about output stream configured by user code.
AUDIO-SPECIFIC MEMBERS
-
double sample_rate = -1¶
Sample rate.
-
int num_channels = -1¶
The number of channels.
VIDEO-SPECIFIC MEMBERS
-
int width = -1¶
Width.
-
int height = -1¶
Height.
-
AVRational frame_rate = {0, 1}¶
Frame rate.
Public Members
-
int source_index¶
The index of the input source stream.
-
AVMediaType media_type = AVMEDIA_TYPE_UNKNOWN¶
The stream media type.
Please see refer to the FFmpeg documentation for the available values
- Todo:
Introduce own enum and get rid of FFmpeg dependency
-
int format = -1¶
Media format. AVSampleFormat for audio or AVPixelFormat for video.
-
std::string filter_description = {}¶
Filter graph definition, such as
"aresample=16000,aformat=sample_fmts=fltp"
.
-
double sample_rate = -1¶