Decoding / Encoding images and videos

The torchvision.io module provides utilities for decoding and encoding images and videos.

Image Decoding

Torchvision currently supports decoding JPEG, PNG, WEBP, GIF, AVIF, and HEIC images. JPEG decoding can also be done on CUDA GPUs.

The main entry point is the decode_image() function, which you can use as an alternative to PIL.Image.open(). It will decode images straight into image Tensors, thus saving you the conversion and allowing you to run transforms/preproc natively on tensors.

from torchvision.io import decode_image

img = decode_image("path_to_image", mode="RGB")
img.dtype  # torch.uint8

# Or
raw_encoded_bytes = ...  # read encoded bytes from your file system
img = decode_image(raw_encoded_bytes, mode="RGB")

decode_image() will automatically detect the image format, and call the corresponding decoder (except for HEIC and AVIF images, see details in decode_avif() and decode_heic()). You can also use the lower-level format-specific decoders which can be more powerful, e.g. if you want to encode/decode JPEGs on CUDA.

`decode_image`(input[, mode, ...])	Decode an image into a uint8 tensor, from a path or from raw encoded bytes.
`decode_jpeg`(input[, mode, device, ...])	Decode JPEG image(s) into 3D RGB or grayscale Tensor(s), on CPU or CUDA.
`decode_png`(input[, mode, apply_exif_orientation])	Decodes a PNG image into a 3 dimensional RGB or grayscale Tensor.
`decode_webp`(input[, mode])	Decode a WEBP image into a 3 dimensional RGB[A] Tensor.
`decode_avif`(input[, mode])	Decode an AVIF image into a 3 dimensional RGB[A] Tensor.
`decode_heic`(input[, mode])	Decode an HEIC image into a 3 dimensional RGB[A] Tensor.
`decode_gif`(input)	Decode a GIF image into a 3 or 4 dimensional RGB Tensor.

ImageReadMode(value)

Allow automatic conversion to RGB, RGBA, etc while decoding.

Obsolete decoding function:

read_image(path[, mode, apply_exif_orientation])

[OBSOLETE] Use decode_image() instead.

Image Encoding

For encoding, JPEG (cpu and CUDA) and PNG are supported.

`encode_jpeg`(input[, quality])	Encode RGB tensor(s) into raw encoded jpeg bytes, on CPU or CUDA.
`write_jpeg`(input, filename[, quality])	Takes an input tensor in CHW layout and saves it in a JPEG file.
`encode_png`(input[, compression_level])	Takes an input tensor in CHW layout and returns a buffer with the contents of its corresponding PNG file.
`write_png`(input, filename[, compression_level])	Takes an input tensor in CHW layout (or HW in the case of grayscale images) and saves it in a PNG file.

IO operations

`read_file`(path)	Return the bytes contents of a file as a uint8 1D Tensor.
`write_file`(filename, data)	Write the content of an uint8 1D tensor to a file.

Video - DEPREACTED

Warning

DEPRECATED: All the video decoding and encoding capabilities of torchvision are deprecated from version 0.22 and will be removed in version 0.24. We recommend that you migrate to TorchCodec, where we’ll consolidate the future decoding/encoding capabilities of PyTorch

`read_video`(filename[, start_pts, end_pts, ...])	[DEPRECATED] Reads a video from a file, returning both the video frames and the audio frames
`read_video_timestamps`(filename[, pts_unit])	[DEPREACTED] List the video frames timestamps.
`write_video`(filename, video_array, fps[, ...])	[DEPRECATED] Writes a 4d tensor in [T, H, W, C] format in a video file.

Fine-grained video API

In addition to the read_video function, we provide a high-performance lower-level API for more fine-grained control compared to the read_video function. It does all this whilst fully supporting torchscript.

VideoReader(src[, stream, num_threads])

[DEPRECATED] Fine-grained video-reading API.

Decoding / Encoding images and videos

Image Decoding

Image Encoding

IO operations

Video - DEPREACTED

Docs

Tutorials

Resources