Warning
TorchAudio’s C++ API is prototype feature. API/ABI backward compatibility is not guaranteed.
Note
The top-level namespace has been changed from torchaudio
to torio
.
StreamWriter
has been renamed to StreamingMediaEncoder
.
torio::io::StreamingMediaEncoder¶
StreamingMediaEncoder
is the implementation used by Python equivalent and provides similar interface.
When working with custom I/O, such as in-memory data, StreamingMediaEncoderCustomIO
class can be used.
Both classes have the same methods defined, so their usages are the same.
Constructors¶
StreamingMediaEncoder¶
-
class StreamingMediaEncoder¶
Encode and write audio/video streams chunk by chunk
Subclassed by torio::io::StreamingMediaEncoderCustomIO
-
explicit torio::io::StreamingMediaEncoder::StreamingMediaEncoder(const std::string &dst, const std::optional<std::string> &format = c10::nullopt)¶
Construct StreamingMediaEncoder from destination URI
- Parameters:
dst – Destination where encoded data are written.
format – Specify output format. If not provided, it is guessed from
dst
.
StreamingMediaEncoderCustomIO¶
-
class StreamingMediaEncoderCustomIO : private detail::CustomOutput, public torio::io::StreamingMediaEncoder¶
A subclass of StreamingMediaDecoder which works with custom read function. Can be used for encoding media into memory or custom object.
-
torio::io::StreamingMediaEncoderCustomIO::StreamingMediaEncoderCustomIO(void *opaque, const std::optional<std::string> &format, int buffer_size, int (*write_packet)(void *opaque, uint8_t *buf, int buf_size), int64_t (*seek)(void *opaque, int64_t offset, int whence) = nullptr)¶
Construct StreamingMediaEncoderCustomIO with custom write and seek functions.
- Parameters:
opaque – Custom data used by
write_packet
andseek
functions.format – Specify output format.
buffer_size – The size of the intermediate buffer, which FFmpeg uses to pass data to write_packet function.
write_packet – Custom write function that is called from FFmpeg to actually write data to the custom destination.
seek – Optional seek function that is used to seek the destination.
Config methods¶
add_audio_stream¶
-
void torio::io::StreamingMediaEncoder::add_audio_stream(int sample_rate, int num_channels, const std::string &format, const std::optional<std::string> &encoder = c10::nullopt, const std::optional<OptionDict> &encoder_option = c10::nullopt, const std::optional<std::string> &encoder_format = c10::nullopt, const std::optional<int> &encoder_sample_rate = c10::nullopt, const std::optional<int> &encoder_num_channels = c10::nullopt, const std::optional<CodecConfig> &codec_config = c10::nullopt, const std::optional<std::string> &filter_desc = c10::nullopt)¶
Add an output audio stream.
- Parameters:
sample_rate – The sample rate.
num_channels – The number of channels.
format – Input sample format, which determines the dtype of the input tensor.
"u8"
: The input tensor must betorch.uint8
type."s16"
: The input tensor must betorch.int16
type."s32"
: The input tensor must betorch.int32
type."s64"
: The input tensor must betorch.int64
type."flt"
: The input tensor must betorch.float32
type."dbl"
: The input tensor must betorch.float64
type.
Default:
"flt"
.encoder – The name of the encoder to be used.
When provided, use the specified encoder instead of the default one.
To list the available encoders, you can use
ffmpeg -encoders
command.encoder_option – Options passed to encoder. To list encoder options for a encoder, you can use
ffmpeg -h encoder=<ENCODER>
.encoder_format – Format used to encode media. When encoder supports multiple formats, passing this argument will override the format used for encoding. To list supported formats for the encoder, you can use
ffmpeg -h encoder=<ENCODER>
command.encoder_sample_rate – If provided, perform resampling before encoding.
encoder_num_channels – If provided, change channel configuration before encoding.
codec_config – Codec configuration.
filter_desc – Additional processing to apply before encoding the input data
add_video_stream¶
-
void torio::io::StreamingMediaEncoder::add_video_stream(double frame_rate, int width, int height, const std::string &format, const std::optional<std::string> &encoder = c10::nullopt, const std::optional<OptionDict> &encoder_option = c10::nullopt, const std::optional<std::string> &encoder_format = c10::nullopt, const std::optional<double> &encoder_frame_rate = c10::nullopt, const std::optional<int> &encoder_width = c10::nullopt, const std::optional<int> &encoder_height = c10::nullopt, const std::optional<std::string> &hw_accel = c10::nullopt, const std::optional<CodecConfig> &codec_config = c10::nullopt, const std::optional<std::string> &filter_desc = c10::nullopt)¶
Add an output video stream.
- Parameters:
frame_rate – Frame rate
width – Width
height – Height
format – Input pixel format, which determines the color channel order of the input tensor.
"gray8"
: One channel, grayscale."rgb24"
: Three channels in the order of RGB."bgr24"
: Three channels in the order of BGR."yuv444p"
: Three channels in the order of YUV.
In either case, the input tensor has to be
torch.uint8
type and the shape must be (frame, channel, height, width).encoder – See
add_audio_stream()
.encoder_option – See
add_audio_stream()
.encoder_format – See
add_audio_stream()
.encoder_frame_rate – If provided, change frame rate before encoding.
encoder_width – If provided, resize image before encoding.
encoder_height – If provided, resize image before encoding.
hw_accel – Enable hardware acceleration.
codec_config – Codec configuration.
When video is encoded on CUDA hardware, for example
encoder="h264_nvenc"
, passing CUDA device indicator tohw_accel
(i.e.hw_accel="cuda:0"
) will make StreamingMediaEncoder expect video chunk to be a CUDA Tensor. Passing CPU Tensor will result in an error.If
None
, the video chunk Tensor has to be a CPU Tensor.filter_desc – Additional processing to apply before encoding the input data
set_metadata¶
-
void torio::io::StreamingMediaEncoder::set_metadata(const OptionDict &metadata)¶
Set file-level metadata
- Parameters:
metadata – metadata.
Write methods¶
open¶
-
void torio::io::StreamingMediaEncoder::open(const std::optional<OptionDict> &opt = c10::nullopt)¶
Open the output file / device and write the header.
- Parameters:
opt – Private options for protocol, device and muxer.
close¶
-
void torio::io::StreamingMediaEncoder::close()¶
Close the output file / device and finalize metadata.
write_audio_chunk¶
-
void torio::io::StreamingMediaEncoder::write_audio_chunk(int i, const torch::Tensor &frames, const std::optional<double> &pts = c10::nullopt)¶
Write audio data
- Parameters:
i – Stream index.
frames – Waveform tensor. Shape:
(frame, channel)
. Thedtype
must match what was passed toadd_audio_stream()
method.pts –
Presentation timestamp. If provided, it overwrites the PTS of the first frame with the provided one. Otherwise, PTS are incremented per an inverse of sample rate. Only values exceed the PTS values processed internally.
NOTE: The provided value is converted to integer value expressed in basis of sample rate. Therefore, it is truncated to the nearest value of
n / sample_rate
.
write_video_chunk¶
-
void torio::io::StreamingMediaEncoder::write_video_chunk(int i, const torch::Tensor &frames, const std::optional<double> &pts = c10::nullopt)¶
Write video data
- Parameters:
i – Stream index.
frames – Video/image tensor. Shape:
(time, channel, height, width)
. Thedtype
must betorch.uint8
. The shape(height, width and the number of channels)
must match what was configured when callingadd_video_stream()
.pts –
Presentation timestamp. If provided, it overwrites the PTS of the first frame with the provided one. Otherwise, PTS are incremented per an inverse of frame rate. Only values exceed the PTS values processed internally.
NOTE: The provided value is converted to integer value expressed in basis of frame rate. Therefore, it is truncated to the nearest value of
n / frame_rate
.
flush¶
-
void torio::io::StreamingMediaEncoder::flush()¶
Flush the frames from encoders and write the frames to the destination.