• Docs >
  • libtorio >
  • torio::io::StreamingMediaEncoder >
  • Old version (stable)
Shortcuts

Warning

TorchAudio’s C++ API is prototype feature. API/ABI backward compatibility is not guaranteed.

Note

The top-level namespace has been changed from torchaudio to torio. StreamWriter has been renamed to StreamingMediaEncoder.

torio::io::StreamingMediaEncoder

StreamingMediaEncoder is the implementation used by Python equivalent and provides similar interface. When working with custom I/O, such as in-memory data, StreamingMediaEncoderCustomIO class can be used.

Both classes have the same methods defined, so their usages are the same.

Constructors

StreamingMediaEncoder

class StreamingMediaEncoder

Encode and write audio/video streams chunk by chunk

Subclassed by torio::io::StreamingMediaEncoderCustomIO

explicit torio::io::StreamingMediaEncoder::StreamingMediaEncoder(const std::string &dst, const std::optional<std::string> &format = c10::nullopt)

Construct StreamingMediaEncoder from destination URI

Parameters:
  • dst – Destination where encoded data are written.

  • format – Specify output format. If not provided, it is guessed from dst.

StreamingMediaEncoderCustomIO

class StreamingMediaEncoderCustomIO : private detail::CustomOutput, public torio::io::StreamingMediaEncoder

A subclass of StreamingMediaDecoder which works with custom read function. Can be used for encoding media into memory or custom object.

torio::io::StreamingMediaEncoderCustomIO::StreamingMediaEncoderCustomIO(void *opaque, const std::optional<std::string> &format, int buffer_size, int (*write_packet)(void *opaque, uint8_t *buf, int buf_size), int64_t (*seek)(void *opaque, int64_t offset, int whence) = nullptr)

Construct StreamingMediaEncoderCustomIO with custom write and seek functions.

Parameters:
  • opaque – Custom data used by write_packet and seek functions.

  • format – Specify output format.

  • buffer_size – The size of the intermediate buffer, which FFmpeg uses to pass data to write_packet function.

  • write_packet – Custom write function that is called from FFmpeg to actually write data to the custom destination.

  • seek – Optional seek function that is used to seek the destination.

Config methods

add_audio_stream

void torio::io::StreamingMediaEncoder::add_audio_stream(int sample_rate, int num_channels, const std::string &format, const std::optional<std::string> &encoder = c10::nullopt, const std::optional<OptionDict> &encoder_option = c10::nullopt, const std::optional<std::string> &encoder_format = c10::nullopt, const std::optional<int> &encoder_sample_rate = c10::nullopt, const std::optional<int> &encoder_num_channels = c10::nullopt, const std::optional<CodecConfig> &codec_config = c10::nullopt, const std::optional<std::string> &filter_desc = c10::nullopt)

Add an output audio stream.

Parameters:
  • sample_rate – The sample rate.

  • num_channels – The number of channels.

  • format – Input sample format, which determines the dtype of the input tensor.

    • "u8": The input tensor must be torch.uint8 type.

    • "s16": The input tensor must be torch.int16 type.

    • "s32": The input tensor must be torch.int32 type.

    • "s64": The input tensor must be torch.int64 type.

    • "flt": The input tensor must be torch.float32 type.

    • "dbl": The input tensor must be torch.float64 type.

    Default: "flt".

  • encoder – The name of the encoder to be used.

    When provided, use the specified encoder instead of the default one.

    To list the available encoders, you can use ffmpeg -encoders command.

  • encoder_option – Options passed to encoder. To list encoder options for a encoder, you can use ffmpeg -h encoder=<ENCODER>.

  • encoder_format – Format used to encode media. When encoder supports multiple formats, passing this argument will override the format used for encoding. To list supported formats for the encoder, you can use ffmpeg -h encoder=<ENCODER> command.

  • encoder_sample_rate – If provided, perform resampling before encoding.

  • encoder_num_channels – If provided, change channel configuration before encoding.

  • codec_config – Codec configuration.

  • filter_desc – Additional processing to apply before encoding the input data

add_video_stream

void torio::io::StreamingMediaEncoder::add_video_stream(double frame_rate, int width, int height, const std::string &format, const std::optional<std::string> &encoder = c10::nullopt, const std::optional<OptionDict> &encoder_option = c10::nullopt, const std::optional<std::string> &encoder_format = c10::nullopt, const std::optional<double> &encoder_frame_rate = c10::nullopt, const std::optional<int> &encoder_width = c10::nullopt, const std::optional<int> &encoder_height = c10::nullopt, const std::optional<std::string> &hw_accel = c10::nullopt, const std::optional<CodecConfig> &codec_config = c10::nullopt, const std::optional<std::string> &filter_desc = c10::nullopt)

Add an output video stream.

Parameters:
  • frame_rate – Frame rate

  • width – Width

  • height – Height

  • format – Input pixel format, which determines the color channel order of the input tensor.

    • "gray8": One channel, grayscale.

    • "rgb24": Three channels in the order of RGB.

    • "bgr24": Three channels in the order of BGR.

    • "yuv444p": Three channels in the order of YUV.

    In either case, the input tensor has to be torch.uint8 type and the shape must be (frame, channel, height, width).

  • encoder – See add_audio_stream().

  • encoder_option – See add_audio_stream().

  • encoder_format – See add_audio_stream().

  • encoder_frame_rate – If provided, change frame rate before encoding.

  • encoder_width – If provided, resize image before encoding.

  • encoder_height – If provided, resize image before encoding.

  • hw_accel – Enable hardware acceleration.

  • codec_config – Codec configuration.

    When video is encoded on CUDA hardware, for example encoder="h264_nvenc", passing CUDA device indicator to hw_accel (i.e. hw_accel="cuda:0") will make StreamingMediaEncoder expect video chunk to be a CUDA Tensor. Passing CPU Tensor will result in an error.

    If None, the video chunk Tensor has to be a CPU Tensor.

  • filter_desc – Additional processing to apply before encoding the input data

set_metadata

void torio::io::StreamingMediaEncoder::set_metadata(const OptionDict &metadata)

Set file-level metadata

Parameters:

metadata – metadata.

Write methods

open

void torio::io::StreamingMediaEncoder::open(const std::optional<OptionDict> &opt = c10::nullopt)

Open the output file / device and write the header.

Parameters:

opt – Private options for protocol, device and muxer.

close

void torio::io::StreamingMediaEncoder::close()

Close the output file / device and finalize metadata.

write_audio_chunk

void torio::io::StreamingMediaEncoder::write_audio_chunk(int i, const torch::Tensor &frames, const std::optional<double> &pts = c10::nullopt)

Write audio data

Parameters:
  • i – Stream index.

  • frames – Waveform tensor. Shape: (frame, channel). The dtype must match what was passed to add_audio_stream() method.

  • pts

    Presentation timestamp. If provided, it overwrites the PTS of the first frame with the provided one. Otherwise, PTS are incremented per an inverse of sample rate. Only values exceed the PTS values processed internally.

    NOTE: The provided value is converted to integer value expressed in basis of sample rate. Therefore, it is truncated to the nearest value of n / sample_rate.

write_video_chunk

void torio::io::StreamingMediaEncoder::write_video_chunk(int i, const torch::Tensor &frames, const std::optional<double> &pts = c10::nullopt)

Write video data

Parameters:
  • i – Stream index.

  • frames – Video/image tensor. Shape: (time, channel, height, width). The dtype must be torch.uint8. The shape (height, width and the number of channels) must match what was configured when calling add_video_stream().

  • pts

    Presentation timestamp. If provided, it overwrites the PTS of the first frame with the provided one. Otherwise, PTS are incremented per an inverse of frame rate. Only values exceed the PTS values processed internally.

    NOTE: The provided value is converted to integer value expressed in basis of frame rate. Therefore, it is truncated to the nearest value of n / frame_rate.

flush

void torio::io::StreamingMediaEncoder::flush()

Flush the frames from encoders and write the frames to the destination.

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources