Warning

TorchAudio’s C++ API is a prototype feature. API/ABI backward compatibility is not guaranteed.

Note

The top-level namespace has been changed from torchaudio to torio. StreamReader has been renamed to StreamingMediaDecoder.

torio::io::StreamingMediaDecoder

StreamingMediaDecoder is the implementation used by Python equivalent and provides similar interface. When working with custom I/O, such as in-memory data, StreamingMediaDecoderCustomIO class can be used.

Both classes have the same methods defined, so their usages are the same.

Constructors

StreamingMediaDecoder

class StreamingMediaDecoder

Fetch and decode audio/video streams chunk by chunk.

Subclassed by torio::io::StreamingMediaDecoderCustomIO

explicit torio::io::StreamingMediaDecoder::StreamingMediaDecoder(const std::string &src, const c10::optional<std::string> &format = c10::nullopt, const c10::optional<OptionDict> &option = c10::nullopt)

Construct media processor from soruce URI.

Parameters:

src – URL of source media, in the format FFmpeg can understand.
format – Specifies format (such as mp4) or device (such as lavfi and avfoundation)
option – Custom option passed when initializing format context (opening source).

StreamingMediaDecoderCustomIO

class StreamingMediaDecoderCustomIO : private detail::CustomInput, public torio::io::StreamingMediaDecoder: A subclass of StreamingMediaDecoder which works with custom read function. Can be used for decoding media from memory or custom object.

torio::io::StreamingMediaDecoderCustomIO::StreamingMediaDecoderCustomIO(void *opaque, const c10::optional<std::string> &format, int buffer_size, int (*read_packet)(void *opaque, uint8_t *buf, int buf_size), int64_t (*seek)(void *opaque, int64_t offset, int whence) = nullptr, const c10::optional<OptionDict> &option = c10::nullopt)

Construct StreamingMediaDecoder with custom read and seek functions.

Parameters:

opaque – Custom data used by read_packet and seek functions.
format – Specify input format.
buffer_size – The size of the intermediate buffer, which FFmpeg uses to pass data to function read_packet.
read_packet – Custom read function that is called from FFmpeg to read data from the destination.
seek – Optional seek function that is used to seek the destination.
option – Custom option passed when initializing format context.

Query Methods

find_best_audio_stream

int64_t torio::io::StreamingMediaDecoder::find_best_audio_stream() const

Find a suitable audio stream using heuristics from ffmpeg.

If successful, the index of the best stream (>=0) is returned. Otherwise a negative value is returned.

find_best_video_stream

int64_t torio::io::StreamingMediaDecoder::find_best_video_stream() const

Find a suitable video stream using heuristics from ffmpeg.

If successful, the index of the best stream (0>=) is returned. otherwise a negative value is returned.

get_metadata

OptionDict torio::io::StreamingMediaDecoder::get_metadata() const: Fetch metadata of the source media.

num_src_streams

int64_t torio::io::StreamingMediaDecoder::num_src_streams() const

Fetch the number of source streams found in the input media.

The source streams include not only audio/video streams but also subtitle and others.

get_src_stream_info

SrcStreamInfo torio::io::StreamingMediaDecoder::get_src_stream_info(int i) const

Fetch information about the specified source stream.

The valid value range is [0, num_src_streams()).

num_out_streams

int64_t torio::io::StreamingMediaDecoder::num_out_streams() const: Fetch the number of output streams defined by client code.

get_out_stream_info

OutputStreamInfo torio::io::StreamingMediaDecoder::get_out_stream_info(int i) const

Fetch information about the specified output stream.

The valid value range is [0, num_out_streams()).

is_buffer_ready

bool torio::io::StreamingMediaDecoder::is_buffer_ready() const: Check if all the buffers of the output streams have enough decoded frames.

Configure Methods

add_audio_stream

void torio::io::StreamingMediaDecoder::add_audio_stream(int64_t i, int64_t frames_per_chunk, int64_t num_chunks, const c10::optional<std::string> &filter_desc = c10::nullopt, const c10::optional<std::string> &decoder = c10::nullopt, const c10::optional<OptionDict> &decoder_option = c10::nullopt)

Define an output audio stream.

Parameters:

i – The index of the source stream.
frames_per_chunk – Number of frames returned as one chunk.
If a source stream is exhausted before frames_per_chunk frames are buffered, the chunk is returned as-is. Thus the number of frames in the chunk may be smaller than ``frames_per_chunk.

Providing -1 disables chunking, in which case, method pop_chunks() returns all the buffered frames as one chunk.
num_chunks – Internal buffer size.
When the number of buffered chunks exceeds this number, old chunks are dropped. For example, if frames_per_chunk is 5 and buffer_chunk_size is 3, then frames older than 15 are dropped.

Providing -1 disables this behavior, forcing the retention of all chunks.
filter_desc – Description of filter graph applied to the source stream.
decoder – The name of the decoder to be used. When provided, use the specified decoder instead of the default one.
decoder_option – Options passed to decoder.
To list decoder options for a decoder, you can use ffmpeg -h decoder=<DECODER> command.

In addition to decoder-specific options, you can also pass options related to multithreading. They are effective only if the decoder supports them. If neither of them are provided, StreamingMediaDecoder defaults to single thread.
- "threads": The number of threads or the value "0" to let FFmpeg decide based on its heuristics.
- "thread_type": Which multithreading method to use. The valid values are "frame" or "slice". Note that each decoder supports a different set of methods. If not provided, a default value is used.
  - "frame": Decode more than one frame at once. Each thread handles one frame. This will increase decoding delay by one frame per thread
  - "slice": Decode more than one part of a single frame at once.

add_video_stream

void torio::io::StreamingMediaDecoder::add_video_stream(int64_t i, int64_t frames_per_chunk, int64_t num_chunks, const c10::optional<std::string> &filter_desc = c10::nullopt, const c10::optional<std::string> &decoder = c10::nullopt, const c10::optional<OptionDict> &decoder_option = c10::nullopt, const c10::optional<std::string> &hw_accel = c10::nullopt)

Define an output video stream.

Parameters:

i, frames_per_chunk, num_chunks, filter_desc, decoder, decoder_option – See add_audio_stream().
hw_accel – Enable hardware acceleration.
When video is decoded on CUDA hardware, (for example by specifying "h264_cuvid" decoder), passing CUDA device indicator to hw_accel (i.e. hw_accel="cuda:0") will make StreamingMediaDecoder place the resulting frames directly on the specified CUDA device as a CUDA tensor.

If None, the chunk will be moved to CPU memory.

remove_stream

void torio::io::StreamingMediaDecoder::remove_stream(int64_t i)

Remove an output stream.

Parameters:: i – The index of the output stream to be removed. The valid value range is [0, num_out_streams()).

Stream Methods

seek

void torio::io::StreamingMediaDecoder::seek(double timestamp, int64_t mode)

Seek into the given time stamp.

Parameters:

timestamp – Target time stamp in second.
mode – Seek mode.
- 0: Keyframe mode. Seek into nearest key frame before the given timestamp.
- 1: Any mode. Seek into any frame (including non-key frames) before the given timestamp.
- 2: Precise mode. First seek into the nearest key frame before the given timestamp, then decode frames until it reaches the frame closest to the given timestamp.

process_packet

int torio::io::StreamingMediaDecoder::process_packet()

Demultiplex and process one packet.

Returns:

0: A packet was processed successfully and there are still packets left in the stream, so client code can call this method again.
1: A packet was processed successfully and it reached EOF. Client code should not call this method again.
<0: An error has happened.

process_packet_block

int torio::io::StreamingMediaDecoder::process_packet_block(const double timeout, const double backoff)

Similar to process_packet(), but in case it fails due to resource temporarily being unavailable, it automatically retries.

This behavior is helpful when using device input, such as a microphone, during which the buffer may be busy while sample acquisition is happening.

Parameters:

timeout – Timeout in milli seconds.
- >=0: Keep retrying until the given time passes.
- <0: Keep retrying forever.
backoff – Time to wait before retrying in milli seconds.

process_all_packets

void torio::io::StreamingMediaDecoder::process_all_packets(): Process packets unitl EOF.

fill_buffer

int torio::io::StreamingMediaDecoder::fill_buffer(const c10::optional<double> &timeout = c10::nullopt, const double backoff = 10.)

Process packets until all the chunk buffers have at least one chunk

Parameters:

timeout – See process_packet_block()
backoff – See process_packet_block()

Retrieval Methods

pop_chunks

std::vector<c10::optional<Chunk>> torio::io::StreamingMediaDecoder::pop_chunks(): Pop one chunk from each output stream if it is available.

Support Structures

Chunk

struct Chunk

Stores decoded frames and metadata.

Public Members

torch::Tensor frames

Audio/video frames.

For audio, the shape is [time, num_channels], and the dtype depends on output stream configurations.

For video, the shape is [time, channel, height, width], and the dtype is torch.uint8.

double pts: Presentation time stamp of the first frame, in second.

SrcStreaminfo

struct SrcStreamInfo

Information about source stream found in the input media.

COMMON MEMBERS

AVMediaType media_type

The stream media type.

Please see refer to the FFmpeg documentation for the available values

Todo:: Introduce own enum and get rid of FFmpeg dependency

const char *codec_name = "N/A": The name of codec.

const char *codec_long_name = "N/A": The name of codec in long, human friendly form.

const char *fmt_name = "N/A"

For audio, it is sample format.

Commonly found values are;

"u8", "u8p": 8-bit unsigned integer.
"s16", "s16p": 16-bit signed integer.
"s32", "s32p": 32-bit signed integer.
"s64", "s64p": 64-bit signed integer.
"flt", "fltp": 32-bit floating point.
"dbl", "dblp": 64-bit floating point.

For video, it is color channel format.

Commonly found values include;

"gray8": grayscale
"rgb24": RGB
"bgr24": BGR
"yuv420p": YUV420p

int64_t bit_rate = 0: Bit rate.

int64_t num_frames = 0: Number of frames.

Note

In some formats, the value is not reliable or unavailable.

int bits_per_sample = 0: Bits per sample.

OptionDict metadata = {}

Metadata

This method can fetch ID3 tag from MP3.

Example:

{
  "title": "foo",
  "artist": "bar",
  "date": "2017"
}

AUDIO-SPECIFIC MEMBERS

double sample_rate = 0: Sample rate.

int num_channels = 0: The number of channels.

VIDEO-SPECIFIC MEMBERS

int width = 0: Width.

int height = 0: Height.

double frame_rate = 0: Frame rate.

OutputStreaminfo

struct OutputStreamInfo

Information about output stream configured by user code.

AUDIO-SPECIFIC MEMBERS

double sample_rate = -1: Sample rate.

int num_channels = -1: The number of channels.

VIDEO-SPECIFIC MEMBERS

int width = -1: Width.

int height = -1: Height.

AVRational frame_rate = {0, 1}: Frame rate.

Public Members

int source_index: The index of the input source stream.

AVMediaType media_type = AVMEDIA_TYPE_UNKNOWN

The stream media type.

Please see refer to the FFmpeg documentation for the available values

Todo:: Introduce own enum and get rid of FFmpeg dependency

int format = -1: Media format. AVSampleFormat for audio or AVPixelFormat for video.

std::string filter_description = {}: Filter graph definition, such as "aresample=16000,aformat=sample_fmts=fltp".

torio::io::StreamingMediaDecoder

Constructors

StreamingMediaDecoder

StreamingMediaDecoderCustomIO

Query Methods

find_best_audio_stream

find_best_video_stream

get_metadata

num_src_streams

get_src_stream_info

num_out_streams

get_out_stream_info

is_buffer_ready

Configure Methods

add_audio_stream

add_video_stream

remove_stream

Stream Methods

seek

process_packet

process_packet_block

process_all_packets

fill_buffer

Retrieval Methods

pop_chunks

Support Structures

Chunk

SrcStreaminfo

OutputStreaminfo

Docs

Tutorials

Resources