• Docs >
  • Transforming and augmenting images
Shortcuts

Transforming and augmenting images

Note

In 0.15, we released a new set of transforms available in the torchvision.transforms.v2 namespace, which add support for transforming not just images but also bounding boxes, masks, or videos. These transforms are fully backward compatible with the current ones, and you’ll see them documented below with a v2. prefix. To get started with those new transforms, you can check out Transforms v2: End-to-end object detection example. Note that these transforms are still BETA, and while we don’t expect major breaking changes in the future, some APIs may still change according to user feedback. Please submit any feedback you may have here, and you can also check out this issue to learn more about the APIs that we suspect might involve future changes.

Transforms are common image transformations available in the torchvision.transforms module. They can be chained together using Compose. Most transform classes have a function equivalent: functional transforms give fine-grained control over the transformations. This is useful if you have to build a more complex transformation pipeline (e.g. in the case of segmentation tasks).

Most transformations accept both PIL images and tensor images, although some transformations are PIL-only and some are tensor-only. The Conversion may be used to convert to and from PIL images, or for converting dtypes and ranges.

The transformations that accept tensor images also accept batches of tensor images. A Tensor Image is a tensor with (C, H, W) shape, where C is a number of channels, H and W are image height and width. A batch of Tensor Images is a tensor of (B, C, H, W) shape, where B is a number of images in the batch.

The expected range of the values of a tensor image is implicitly defined by the tensor dtype. Tensor images with a float dtype are expected to have values in [0, 1). Tensor images with an integer dtype are expected to have values in [0, MAX_DTYPE] where MAX_DTYPE is the largest value that can be represented in that dtype.

Randomized transformations will apply the same transformation to all the images of a given batch, but they will produce different transformations across calls. For reproducible transformations across calls, you may use functional transforms.

The following examples illustrate the use of the available transforms:

Warning

Since v0.8.0 all random transformations are using torch default random generator to sample random parameters. It is a backward compatibility breaking change and user should set the random state as following:

# Previous versions
# import random
# random.seed(12)

# Now
import torch
torch.manual_seed(17)

Please, keep in mind that the same seed for torch random generator and Python random generator will not produce the same results.

Transforms scriptability

In order to script the transformations, please use torch.nn.Sequential instead of Compose.

transforms = torch.nn.Sequential(
    transforms.CenterCrop(10),
    transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),
)
scripted_transforms = torch.jit.script(transforms)

Make sure to use only scriptable transformations, i.e. that work with torch.Tensor and does not require lambda functions or PIL.Image.

For any custom transformations to be used with torch.jit.script, they should be derived from torch.nn.Module.

Geometry

Resize(size[, interpolation, max_size, ...])

Resize the input image to the given size.

v2.Resize(size[, interpolation, max_size, ...])

[BETA] Resize the input to the given size.

v2.ScaleJitter(target_size[, scale_range, ...])

[BETA] Perform Large Scale Jitter on the input according to "Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation".

v2.RandomShortestSize(min_size[, max_size, ...])

[BETA] Randomly resize the input.

v2.RandomResize(min_size, max_size[, ...])

[BETA] Randomly resize the input.

RandomCrop(size[, padding, pad_if_needed, ...])

Crop the given image at a random location.

v2.RandomCrop(size[, padding, ...])

[BETA] Crop the input at a random location.

RandomResizedCrop(size[, scale, ratio, ...])

Crop a random portion of image and resize it to a given size.

v2.RandomResizedCrop(size[, scale, ratio, ...])

[BETA] Crop a random portion of the input and resize it to a given size.

v2.RandomIoUCrop([min_scale, max_scale, ...])

[BETA] Random IoU crop transformation from "SSD: Single Shot MultiBox Detector".

CenterCrop(size)

Crops the given image at the center.

v2.CenterCrop(size)

[BETA] Crop the input at the center.

FiveCrop(size)

Crop the given image into four corners and the central crop.

v2.FiveCrop(size)

[BETA] Crop the image or video into four corners and the central crop.

TenCrop(size[, vertical_flip])

Crop the given image into four corners and the central crop plus the flipped version of these (horizontal flipping is used by default).

v2.TenCrop(size[, vertical_flip])

[BETA] Crop the image or video into four corners and the central crop plus the flipped version of these (horizontal flipping is used by default).

Pad(padding[, fill, padding_mode])

Pad the given image on all sides with the given "pad" value.

v2.Pad(padding[, fill, padding_mode])

[BETA] Pad the input on all sides with the given "pad" value.

v2.RandomZoomOut([fill, side_range, p])

[BETA] "Zoom out" transformation from "SSD: Single Shot MultiBox Detector".

RandomRotation(degrees[, interpolation, ...])

Rotate the image by angle.

v2.RandomRotation(degrees[, interpolation, ...])

[BETA] Rotate the input by angle.

RandomAffine(degrees[, translate, scale, ...])

Random affine transformation of the image keeping center invariant.

v2.RandomAffine(degrees[, translate, scale, ...])

[BETA] Random affine transformation the input keeping center invariant.

RandomPerspective([distortion_scale, p, ...])

Performs a random perspective transformation of the given image with a given probability.

v2.RandomPerspective([distortion_scale, p, ...])

[BETA] Perform a random perspective transformation of the input with a given probability.

ElasticTransform([alpha, sigma, ...])

Transform a tensor image with elastic transformations.

v2.ElasticTransform([alpha, sigma, ...])

[BETA] Transform the input with elastic transformations.

RandomHorizontalFlip([p])

Horizontally flip the given image randomly with a given probability.

v2.RandomHorizontalFlip([p])

[BETA] Horizontally flip the input with a given probability.

RandomVerticalFlip([p])

Vertically flip the given image randomly with a given probability.

v2.RandomVerticalFlip([p])

[BETA] Vertically flip the input with a given probability.

Color

ColorJitter([brightness, contrast, ...])

Randomly change the brightness, contrast, saturation and hue of an image.

v2.ColorJitter([brightness, contrast, ...])

[BETA] Randomly change the brightness, contrast, saturation and hue of an image or video.

v2.RandomPhotometricDistort([brightness, ...])

[BETA] Randomly distorts the image or video as used in SSD: Single Shot MultiBox Detector.

Grayscale([num_output_channels])

Convert image to grayscale.

v2.Grayscale([num_output_channels])

[BETA] Convert images or videos to grayscale.

RandomGrayscale([p])

Randomly convert image to grayscale with a probability of p (default 0.1).

v2.RandomGrayscale([p])

[BETA] Randomly convert image or videos to grayscale with a probability of p (default 0.1).

GaussianBlur(kernel_size[, sigma])

Blurs image with randomly chosen Gaussian blur.

v2.GaussianBlur(kernel_size[, sigma])

[BETA] Blurs image with randomly chosen Gaussian blur.

RandomInvert([p])

Inverts the colors of the given image randomly with a given probability.

v2.RandomInvert([p])

[BETA] Inverts the colors of the given image or video with a given probability.

RandomPosterize(bits[, p])

Posterize the image randomly with a given probability by reducing the number of bits for each color channel.

v2.RandomPosterize(bits[, p])

[BETA] Posterize the image or video with a given probability by reducing the number of bits for each color channel.

RandomSolarize(threshold[, p])

Solarize the image randomly with a given probability by inverting all pixel values above a threshold.

v2.RandomSolarize(threshold[, p])

[BETA] Solarize the image or video with a given probability by inverting all pixel values above a threshold.

RandomAdjustSharpness(sharpness_factor[, p])

Adjust the sharpness of the image randomly with a given probability.

v2.RandomAdjustSharpness(sharpness_factor[, p])

[BETA] Adjust the sharpness of the image or video with a given probability.

RandomAutocontrast([p])

Autocontrast the pixels of the given image randomly with a given probability.

v2.RandomAutocontrast([p])

[BETA] Autocontrast the pixels of the given image or video with a given probability.

RandomEqualize([p])

Equalize the histogram of the given image randomly with a given probability.

v2.RandomEqualize([p])

[BETA] Equalize the histogram of the given image or video with a given probability.

Composition

Compose(transforms)

Composes several transforms together.

v2.Compose(transforms)

[BETA] Composes several transforms together.

RandomApply(transforms[, p])

Apply randomly a list of transformations with a given probability.

v2.RandomApply(transforms[, p])

[BETA] Apply randomly a list of transformations with a given probability.

RandomChoice(transforms[, p])

Apply single transformation randomly picked from a list.

v2.RandomChoice(transforms[, p])

[BETA] Apply single transformation randomly picked from a list.

RandomOrder(transforms)

Apply a list of transformations in a random order.

v2.RandomOrder(transforms)

[BETA] Apply a list of transformations in a random order.

Miscellaneous

LinearTransformation(transformation_matrix, ...)

Transform a tensor image with a square transformation matrix and a mean_vector computed offline.

v2.LinearTransformation(...)

[BETA] Transform a tensor image or video with a square transformation matrix and a mean_vector computed offline.

Normalize(mean, std[, inplace])

Normalize a tensor image with mean and standard deviation.

v2.Normalize(mean, std[, inplace])

[BETA] Normalize a tensor image or video with mean and standard deviation.

RandomErasing([p, scale, ratio, value, inplace])

Randomly selects a rectangle region in a torch.Tensor image and erases its pixels.

v2.RandomErasing([p, scale, ratio, value, ...])

[BETA] Randomly select a rectangle region in the input image or video and erase its pixels.

Lambda(lambd)

Apply a user-defined lambda as a transform.

v2.Lambda(lambd, *types)

[BETA] Apply a user-defined function as a transform.

v2.SanitizeBoundingBox([min_size, labels_getter])

[BETA] Remove degenerate/invalid bounding boxes and their corresponding labels and masks.

v2.ClampBoundingBox()

[BETA] Clamp bounding boxes to their corresponding image dimensions.

v2.UniformTemporalSubsample(num_samples)

[BETA] Uniformly subsample num_samples indices from the temporal dimension of the video.

Conversion

Note

Beware, some of these conversion transforms below will scale the values while performing the conversion, while some may not do any scaling. By scaling, we mean e.g. that a uint8 -> float32 would map the [0, 255] range into [0, 1] (and vice-versa).

ToPILImage([mode])

Convert a tensor or an ndarray to PIL Image - this does not scale values.

v2.ToPILImage

alias of ToImagePIL

v2.ToImagePIL([mode])

[BETA] Convert a tensor or an ndarray to PIL Image - this does not scale values.

ToTensor()

Convert a PIL Image or ndarray to tensor and scale the values accordingly.

v2.ToTensor()

[BETA] Convert a PIL Image or ndarray to tensor and scale the values accordingly.

PILToTensor()

Convert a PIL Image to a tensor of the same type - this does not scale values.

v2.PILToTensor()

[BETA] Convert a PIL Image to a tensor of the same type - this does not scale values.

v2.ToImageTensor()

[BETA] Convert a tensor, ndarray, or PIL Image to Image ; this does not scale values.

ConvertImageDtype(dtype)

Convert a tensor image to the given dtype and scale the values accordingly.

v2.ConvertDtype([dtype])

[BETA] Convert input image or video to the given dtype and scale the values accordingly.

v2.ConvertImageDtype

alias of ConvertDtype

v2.ToDtype(dtype)

[BETA] Converts the input to a specific dtype - this does not scale values.

v2.ConvertBoundingBoxFormat(format)

[BETA] Convert bounding box coordinates to the given format, eg from "CXCYWH" to "XYXY".

Auto-Augmentation

AutoAugment is a common Data Augmentation technique that can improve the accuracy of Image Classification models. Though the data augmentation policies are directly linked to their trained dataset, empirical studies show that ImageNet policies provide significant improvements when applied to other datasets. In TorchVision we implemented 3 policies learned on the following datasets: ImageNet, CIFAR10 and SVHN. The new transform can be used standalone or mixed-and-matched with existing transforms:

AutoAugmentPolicy(value)

AutoAugment policies learned on different datasets.

AutoAugment([policy, interpolation, fill])

AutoAugment data augmentation method based on "AutoAugment: Learning Augmentation Strategies from Data".

v2.AutoAugment([policy, interpolation, fill])

[BETA] AutoAugment data augmentation method based on "AutoAugment: Learning Augmentation Strategies from Data".

RandAugment([num_ops, magnitude, ...])

RandAugment data augmentation method based on "RandAugment: Practical automated data augmentation with a reduced search space".

v2.RandAugment([num_ops, magnitude, ...])

[BETA] RandAugment data augmentation method based on "RandAugment: Practical automated data augmentation with a reduced search space".

TrivialAugmentWide([num_magnitude_bins, ...])

Dataset-independent data-augmentation with TrivialAugment Wide, as described in "TrivialAugment: Tuning-free Yet State-of-the-Art Data Augmentation".

v2.TrivialAugmentWide([num_magnitude_bins, ...])

[BETA] Dataset-independent data-augmentation with TrivialAugment Wide, as described in "TrivialAugment: Tuning-free Yet State-of-the-Art Data Augmentation".

AugMix([severity, mixture_width, ...])

AugMix data augmentation method based on "AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty".

v2.AugMix([severity, mixture_width, ...])

[BETA] AugMix data augmentation method based on "AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty".

Functional Transforms

Note

You’ll find below the documentation for the existing torchvision.transforms.functional namespace. The torchvision.transforms.v2.functional namespace exists as well and can be used! The same functionals are present, so you simply need to change your import to rely on the v2 namespace.

Functional transforms give you fine-grained control of the transformation pipeline. As opposed to the transformations above, functional transforms don’t contain a random number generator for their parameters. That means you have to specify/generate all parameters, but the functional transform will give you reproducible results across calls.

Example: you can apply a functional transform with the same parameters to multiple images like this:

import torchvision.transforms.functional as TF
import random

def my_segmentation_transforms(image, segmentation):
    if random.random() > 0.5:
        angle = random.randint(-30, 30)
        image = TF.rotate(image, angle)
        segmentation = TF.rotate(segmentation, angle)
    # more transforms ...
    return image, segmentation

Example: you can use a functional transform to build transform classes with custom behavior:

import torchvision.transforms.functional as TF
import random

class MyRotationTransform:
    """Rotate by one of the given angles."""

    def __init__(self, angles):
        self.angles = angles

    def __call__(self, x):
        angle = random.choice(self.angles)
        return TF.rotate(x, angle)

rotation_transform = MyRotationTransform(angles=[-30, -15, 0, 15, 30])

adjust_brightness(img, brightness_factor)

Adjust brightness of an image.

adjust_contrast(img, contrast_factor)

Adjust contrast of an image.

adjust_gamma(img, gamma[, gain])

Perform gamma correction on an image.

adjust_hue(img, hue_factor)

Adjust hue of an image.

adjust_saturation(img, saturation_factor)

Adjust color saturation of an image.

adjust_sharpness(img, sharpness_factor)

Adjust the sharpness of an image.

affine(img, angle, translate, scale, shear)

Apply affine transformation on the image keeping image center invariant.

autocontrast(img)

Maximize contrast of an image by remapping its pixels per channel so that the lowest becomes black and the lightest becomes white.

center_crop(img, output_size)

Crops the given image at the center.

convert_image_dtype(image[, dtype])

Convert a tensor image to the given dtype and scale the values accordingly This function does not support PIL Image.

crop(img, top, left, height, width)

Crop the given image at specified location and output size.

equalize(img)

Equalize the histogram of an image by applying a non-linear mapping to the input in order to create a uniform distribution of grayscale values in the output.

erase(img, i, j, h, w, v[, inplace])

Erase the input Tensor Image with given value.

five_crop(img, size)

Crop the given image into four corners and the central crop.

gaussian_blur(img, kernel_size[, sigma])

Performs Gaussian blurring on the image by given kernel.

get_dimensions(img)

Returns the dimensions of an image as [channels, height, width].

get_image_num_channels(img)

Returns the number of channels of an image.

get_image_size(img)

Returns the size of an image as [width, height].

hflip(img)

Horizontally flip the given image.

invert(img)

Invert the colors of an RGB/grayscale image.

normalize(tensor, mean, std[, inplace])

Normalize a float tensor image with mean and standard deviation.

pad(img, padding[, fill, padding_mode])

Pad the given image on all sides with the given "pad" value.

perspective(img, startpoints, endpoints[, ...])

Perform perspective transform of the given image.

pil_to_tensor(pic)

Convert a PIL Image to a tensor of the same type.

posterize(img, bits)

Posterize an image by reducing the number of bits for each color channel.

resize(img, size[, interpolation, max_size, ...])

Resize the input image to the given size.

resized_crop(img, top, left, height, width, size)

Crop the given image and resize it to desired size.

rgb_to_grayscale(img[, num_output_channels])

Convert RGB image to grayscale version of image.

rotate(img, angle[, interpolation, expand, ...])

Rotate the image by angle.

solarize(img, threshold)

Solarize an RGB/grayscale image by inverting all pixel values above a threshold.

ten_crop(img, size[, vertical_flip])

Generate ten cropped images from the given image.

to_grayscale(img[, num_output_channels])

Convert PIL image of any mode (RGB, HSV, LAB, etc) to grayscale version of image.

to_pil_image(pic[, mode])

Convert a tensor or an ndarray to PIL Image.

to_tensor(pic)

Convert a PIL Image or numpy.ndarray to tensor.

vflip(img)

Vertically flip the given image.

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources