Decoding / Encoding images and videos¶
The torchvision.io
package provides functions for performing IO
operations. They are currently specific to reading and writing images and
videos.
Images¶
|
Reads a JPEG, PNG or GIF image into a 3 dimensional RGB or grayscale Tensor. |
|
Detect whether an image is a JPEG, PNG or GIF and performs the appropriate operation to decode the image into a 3 dimensional RGB or grayscale Tensor. |
|
Takes a (list of) input tensor(s) in CHW layout and returns a (list of) buffer(s) with the contents of the corresponding JPEG file(s). |
|
Decodes a JPEG image into a 3 dimensional RGB or grayscale Tensor. |
|
Takes an input tensor in CHW layout and saves it in a JPEG file. |
|
Decode a GIF image into a 3 or 4 dimensional RGB Tensor. |
|
Takes an input tensor in CHW layout and returns a buffer with the contents of its corresponding PNG file. |
|
Decodes a PNG image into a 3 dimensional RGB or grayscale Tensor. |
|
Takes an input tensor in CHW layout (or HW in the case of grayscale images) and saves it in a PNG file. |
|
Reads and outputs the bytes contents of a file as a uint8 Tensor with one dimension. |
|
Writes the contents of an uint8 tensor with one dimension to a file. |
|
Support for various modes while reading images. |
Video¶
|
Reads a video from a file, returning both the video frames and the audio frames |
|
List the video frames timestamps. |
|
Writes a 4d tensor in [T, H, W, C] format in a video file |
Fine-grained video API¶
In addition to the read_video
function, we provide a high-performance
lower-level API for more fine-grained control compared to the read_video
function.
It does all this whilst fully supporting torchscript.
Warning
The fine-grained video API is in Beta stage, and backward compatibility is not guaranteed.
|
Fine-grained video-reading API. |
Example of inspecting a video:
import torchvision
video_path = "path to a test video"
# Constructor allocates memory and a threaded decoder
# instance per video. At the moment it takes two arguments:
# path to the video file, and a wanted stream.
reader = torchvision.io.VideoReader(video_path, "video")
# The information about the video can be retrieved using the
# `get_metadata()` method. It returns a dictionary for every stream, with
# duration and other relevant metadata (often frame rate)
reader_md = reader.get_metadata()
# metadata is structured as a dict of dicts with following structure
# {"stream_type": {"attribute": [attribute per stream]}}
#
# following would print out the list of frame rates for every present video stream
print(reader_md["video"]["fps"])
# we explicitly select the stream we would like to operate on. In
# the constructor we select a default video stream, but
# in practice, we can set whichever stream we would like
video.set_current_stream("video:0")