• Docs >
  • Decoding / Encoding images and videos
Shortcuts

Decoding / Encoding images and videos

The torchvision.io package provides functions for performing IO operations. They are currently specific to reading and writing images and videos.

Images

read_image(path[, mode, apply_exif_orientation])

Reads a JPEG, PNG or GIF image into a 3 dimensional RGB or grayscale Tensor.

decode_image(input[, mode, ...])

Detect whether an image is a JPEG, PNG or GIF and performs the appropriate operation to decode the image into a 3 dimensional RGB or grayscale Tensor.

encode_jpeg(input[, quality])

Takes a (list of) input tensor(s) in CHW layout and returns a (list of) buffer(s) with the contents of the corresponding JPEG file(s).

decode_jpeg(input[, mode, device, ...])

Decodes a JPEG image into a 3 dimensional RGB or grayscale Tensor.

write_jpeg(input, filename[, quality])

Takes an input tensor in CHW layout and saves it in a JPEG file.

decode_gif(input)

Decode a GIF image into a 3 or 4 dimensional RGB Tensor.

encode_png(input[, compression_level])

Takes an input tensor in CHW layout and returns a buffer with the contents of its corresponding PNG file.

decode_png(input[, mode, apply_exif_orientation])

Decodes a PNG image into a 3 dimensional RGB or grayscale Tensor.

write_png(input, filename[, compression_level])

Takes an input tensor in CHW layout (or HW in the case of grayscale images) and saves it in a PNG file.

read_file(path)

Reads and outputs the bytes contents of a file as a uint8 Tensor with one dimension.

write_file(filename, data)

Writes the contents of an uint8 tensor with one dimension to a file.

ImageReadMode(value)

Support for various modes while reading images.

Video

read_video(filename[, start_pts, end_pts, ...])

Reads a video from a file, returning both the video frames and the audio frames

read_video_timestamps(filename[, pts_unit])

List the video frames timestamps.

write_video(filename, video_array, fps[, ...])

Writes a 4d tensor in [T, H, W, C] format in a video file

Fine-grained video API

In addition to the read_video function, we provide a high-performance lower-level API for more fine-grained control compared to the read_video function. It does all this whilst fully supporting torchscript.

Warning

The fine-grained video API is in Beta stage, and backward compatibility is not guaranteed.

VideoReader(src[, stream, num_threads])

Fine-grained video-reading API.

Example of inspecting a video:

import torchvision
video_path = "path to a test video"
# Constructor allocates memory and a threaded decoder
# instance per video. At the moment it takes two arguments:
# path to the video file, and a wanted stream.
reader = torchvision.io.VideoReader(video_path, "video")

# The information about the video can be retrieved using the
# `get_metadata()` method. It returns a dictionary for every stream, with
# duration and other relevant metadata (often frame rate)
reader_md = reader.get_metadata()

# metadata is structured as a dict of dicts with following structure
# {"stream_type": {"attribute": [attribute per stream]}}
#
# following would print out the list of frame rates for every present video stream
print(reader_md["video"]["fps"])

# we explicitly select the stream we would like to operate on. In
# the constructor we select a default video stream, but
# in practice, we can set whichever stream we would like
video.set_current_stream("video:0")

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources