• Docs >
  • libtorio >
  • torio::io::StreamingMediaDecoder >
  • Current (stable)
Shortcuts

Warning

TorchAudio’s C++ API is a prototype feature. API/ABI backward compatibility is not guaranteed.

Note

The top-level namespace has been changed from torchaudio to torio. StreamReader has been renamed to StreamingMediaDecoder.

torio::io::StreamingMediaDecoder

StreamingMediaDecoder is the implementation used by Python equivalent and provides similar interface. When working with custom I/O, such as in-memory data, StreamingMediaDecoderCustomIO class can be used.

Both classes have the same methods defined, so their usages are the same.

Constructors

StreamingMediaDecoder

class StreamingMediaDecoder

Fetch and decode audio/video streams chunk by chunk.

Subclassed by torio::io::StreamingMediaDecoderCustomIO

explicit torio::io::StreamingMediaDecoder::StreamingMediaDecoder(const std::string &src, const c10::optional<std::string> &format = c10::nullopt, const c10::optional<OptionDict> &option = c10::nullopt)

Construct media processor from soruce URI.

Parameters:
  • src – URL of source media, in the format FFmpeg can understand.

  • format – Specifies format (such as mp4) or device (such as lavfi and avfoundation)

  • option – Custom option passed when initializing format context (opening source).

StreamingMediaDecoderCustomIO

class StreamingMediaDecoderCustomIO : private detail::CustomInput, public torio::io::StreamingMediaDecoder

A subclass of StreamingMediaDecoder which works with custom read function. Can be used for decoding media from memory or custom object.

torio::io::StreamingMediaDecoderCustomIO::StreamingMediaDecoderCustomIO(void *opaque, const c10::optional<std::string> &format, int buffer_size, int (*read_packet)(void *opaque, uint8_t *buf, int buf_size), int64_t (*seek)(void *opaque, int64_t offset, int whence) = nullptr, const c10::optional<OptionDict> &option = c10::nullopt)

Construct StreamingMediaDecoder with custom read and seek functions.

Parameters:
  • opaque – Custom data used by read_packet and seek functions.

  • format – Specify input format.

  • buffer_size – The size of the intermediate buffer, which FFmpeg uses to pass data to function read_packet.

  • read_packet – Custom read function that is called from FFmpeg to read data from the destination.

  • seek – Optional seek function that is used to seek the destination.

  • option – Custom option passed when initializing format context.

Query Methods

find_best_audio_stream

int64_t torio::io::StreamingMediaDecoder::find_best_audio_stream() const

Find a suitable audio stream using heuristics from ffmpeg.

If successful, the index of the best stream (>=0) is returned. Otherwise a negative value is returned.

find_best_video_stream

int64_t torio::io::StreamingMediaDecoder::find_best_video_stream() const

Find a suitable video stream using heuristics from ffmpeg.

If successful, the index of the best stream (0>=) is returned. otherwise a negative value is returned.

get_metadata

OptionDict torio::io::StreamingMediaDecoder::get_metadata() const

Fetch metadata of the source media.

num_src_streams

int64_t torio::io::StreamingMediaDecoder::num_src_streams() const

Fetch the number of source streams found in the input media.

The source streams include not only audio/video streams but also subtitle and others.

get_src_stream_info

SrcStreamInfo torio::io::StreamingMediaDecoder::get_src_stream_info(int i) const

Fetch information about the specified source stream.

The valid value range is [0, num_src_streams()).

num_out_streams

int64_t torio::io::StreamingMediaDecoder::num_out_streams() const

Fetch the number of output streams defined by client code.

get_out_stream_info

OutputStreamInfo torio::io::StreamingMediaDecoder::get_out_stream_info(int i) const

Fetch information about the specified output stream.

The valid value range is [0, num_out_streams()).

is_buffer_ready

bool torio::io::StreamingMediaDecoder::is_buffer_ready() const

Check if all the buffers of the output streams have enough decoded frames.

Configure Methods

add_audio_stream

void torio::io::StreamingMediaDecoder::add_audio_stream(int64_t i, int64_t frames_per_chunk, int64_t num_chunks, const c10::optional<std::string> &filter_desc = c10::nullopt, const c10::optional<std::string> &decoder = c10::nullopt, const c10::optional<OptionDict> &decoder_option = c10::nullopt)

Define an output audio stream.

Parameters:
  • i – The index of the source stream.

  • frames_per_chunk – Number of frames returned as one chunk.

    If a source stream is exhausted before frames_per_chunk frames are buffered, the chunk is returned as-is. Thus the number of frames in the chunk may be smaller than ``frames_per_chunk.

    Providing -1 disables chunking, in which case, method pop_chunks() returns all the buffered frames as one chunk.

  • num_chunks – Internal buffer size.

    When the number of buffered chunks exceeds this number, old chunks are dropped. For example, if frames_per_chunk is 5 and buffer_chunk_size is 3, then frames older than 15 are dropped.

    Providing -1 disables this behavior, forcing the retention of all chunks.

  • filter_desc – Description of filter graph applied to the source stream.

  • decoder – The name of the decoder to be used. When provided, use the specified decoder instead of the default one.

  • decoder_option – Options passed to decoder.

    To list decoder options for a decoder, you can use ffmpeg -h decoder=<DECODER> command.

    In addition to decoder-specific options, you can also pass options related to multithreading. They are effective only if the decoder supports them. If neither of them are provided, StreamingMediaDecoder defaults to single thread.

    • "threads": The number of threads or the value "0" to let FFmpeg decide based on its heuristics.

    • "thread_type": Which multithreading method to use. The valid values are "frame" or "slice". Note that each decoder supports a different set of methods. If not provided, a default value is used.

      • "frame": Decode more than one frame at once. Each thread handles one frame. This will increase decoding delay by one frame per thread

      • "slice": Decode more than one part of a single frame at once.

add_video_stream

void torio::io::StreamingMediaDecoder::add_video_stream(int64_t i, int64_t frames_per_chunk, int64_t num_chunks, const c10::optional<std::string> &filter_desc = c10::nullopt, const c10::optional<std::string> &decoder = c10::nullopt, const c10::optional<OptionDict> &decoder_option = c10::nullopt, const c10::optional<std::string> &hw_accel = c10::nullopt)

Define an output video stream.

Parameters:
  • i, frames_per_chunk, num_chunks, filter_desc, decoder, decoder_option – See add_audio_stream().

  • hw_accel – Enable hardware acceleration.

    When video is decoded on CUDA hardware, (for example by specifying "h264_cuvid" decoder), passing CUDA device indicator to hw_accel (i.e. hw_accel="cuda:0") will make StreamingMediaDecoder place the resulting frames directly on the specified CUDA device as a CUDA tensor.

    If None, the chunk will be moved to CPU memory.

remove_stream

void torio::io::StreamingMediaDecoder::remove_stream(int64_t i)

Remove an output stream.

Parameters:

i – The index of the output stream to be removed. The valid value range is [0, num_out_streams()).

Stream Methods

seek

void torio::io::StreamingMediaDecoder::seek(double timestamp, int64_t mode)

Seek into the given time stamp.

Parameters:
  • timestamp – Target time stamp in second.

  • mode – Seek mode.

    • 0: Keyframe mode. Seek into nearest key frame before the given timestamp.

    • 1: Any mode. Seek into any frame (including non-key frames) before the given timestamp.

    • 2: Precise mode. First seek into the nearest key frame before the given timestamp, then decode frames until it reaches the frame closest to the given timestamp.

process_packet

int torio::io::StreamingMediaDecoder::process_packet()

Demultiplex and process one packet.

Returns:

  • 0: A packet was processed successfully and there are still packets left in the stream, so client code can call this method again.

  • 1: A packet was processed successfully and it reached EOF. Client code should not call this method again.

  • <0: An error has happened.

process_packet_block

int torio::io::StreamingMediaDecoder::process_packet_block(const double timeout, const double backoff)

Similar to process_packet(), but in case it fails due to resource temporarily being unavailable, it automatically retries.

This behavior is helpful when using device input, such as a microphone, during which the buffer may be busy while sample acquisition is happening.

Parameters:
  • timeout – Timeout in milli seconds.

    • >=0: Keep retrying until the given time passes.

    • <0: Keep retrying forever.

  • backoff – Time to wait before retrying in milli seconds.

process_all_packets

void torio::io::StreamingMediaDecoder::process_all_packets()

Process packets unitl EOF.

fill_buffer

int torio::io::StreamingMediaDecoder::fill_buffer(const c10::optional<double> &timeout = c10::nullopt, const double backoff = 10.)

Process packets until all the chunk buffers have at least one chunk

Parameters:

Retrieval Methods

pop_chunks

std::vector<c10::optional<Chunk>> torio::io::StreamingMediaDecoder::pop_chunks()

Pop one chunk from each output stream if it is available.

Support Structures

Chunk

struct Chunk

Stores decoded frames and metadata.

Public Members

torch::Tensor frames

Audio/video frames.

For audio, the shape is [time, num_channels], and the dtype depends on output stream configurations.

For video, the shape is [time, channel, height, width], and the dtype is torch.uint8.

double pts

Presentation time stamp of the first frame, in second.

SrcStreaminfo

struct SrcStreamInfo

Information about source stream found in the input media.

COMMON MEMBERS

AVMediaType media_type

The stream media type.

Please see refer to the FFmpeg documentation for the available values

Todo:

Introduce own enum and get rid of FFmpeg dependency

const char *codec_name = "N/A"

The name of codec.

const char *codec_long_name = "N/A"

The name of codec in long, human friendly form.

const char *fmt_name = "N/A"

For audio, it is sample format.

Commonly found values are;

  • "u8", "u8p": 8-bit unsigned integer.

  • "s16", "s16p": 16-bit signed integer.

  • "s32", "s32p": 32-bit signed integer.

  • "s64", "s64p": 64-bit signed integer.

  • "flt", "fltp": 32-bit floating point.

  • "dbl", "dblp": 64-bit floating point.

For video, it is color channel format.

Commonly found values include;

  • "gray8": grayscale

  • "rgb24": RGB

  • "bgr24": BGR

  • "yuv420p": YUV420p

int64_t bit_rate = 0

Bit rate.

int64_t num_frames = 0

Number of frames.

Note

In some formats, the value is not reliable or unavailable.

int bits_per_sample = 0

Bits per sample.

OptionDict metadata = {}

Metadata

This method can fetch ID3 tag from MP3.

Example:

{
  "title": "foo",
  "artist": "bar",
  "date": "2017"
}

AUDIO-SPECIFIC MEMBERS

double sample_rate = 0

Sample rate.

int num_channels = 0

The number of channels.

VIDEO-SPECIFIC MEMBERS

int width = 0

Width.

int height = 0

Height.

double frame_rate = 0

Frame rate.

OutputStreaminfo

struct OutputStreamInfo

Information about output stream configured by user code.

AUDIO-SPECIFIC MEMBERS

double sample_rate = -1

Sample rate.

int num_channels = -1

The number of channels.

VIDEO-SPECIFIC MEMBERS

int width = -1

Width.

int height = -1

Height.

AVRational frame_rate = {0, 1}

Frame rate.

Public Members

int source_index

The index of the input source stream.

AVMediaType media_type = AVMEDIA_TYPE_UNKNOWN

The stream media type.

Please see refer to the FFmpeg documentation for the available values

Todo:

Introduce own enum and get rid of FFmpeg dependency

int format = -1

Media format. AVSampleFormat for audio or AVPixelFormat for video.

std::string filter_description = {}

Filter graph definition, such as "aresample=16000,aformat=sample_fmts=fltp".

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources