torchvision.transforms

Transforms are common image transforms. They can be chained together using Compose

class torchvision.transforms.Compose(transforms)

Composes several transforms together.

Parameters:transforms (list of Transform objects) – list of transforms to compose.

Example

>>> transforms.Compose([
>>>     transforms.CenterCrop(10),
>>>     transforms.ToTensor(),
>>> ])

Transforms on PIL Image

class torchvision.transforms.Resize(size, interpolation=2)

Resize the input PIL Image to the given size.

Parameters:
  • size (sequence or int) – Desired output size. If size is a sequence like (h, w), output size will be matched to this. If size is an int, smaller edge of the image will be matched to this number. i.e, if height > width, then image will be rescaled to (size * height / width, size)
  • interpolation (int, optional) – Desired interpolation. Default is PIL.Image.BILINEAR
class torchvision.transforms.Scale(*args, **kwargs)

Note: This transform is deprecated in favor of Resize.

class torchvision.transforms.CenterCrop(size)

Crops the given PIL Image at the center.

Parameters:size (sequence or int) – Desired output size of the crop. If size is an int instead of sequence like (h, w), a square crop (size, size) is made.
class torchvision.transforms.RandomCrop(size, padding=0)

Crop the given PIL Image at a random location.

Parameters:
  • size (sequence or int) – Desired output size of the crop. If size is an int instead of sequence like (h, w), a square crop (size, size) is made.
  • padding (int or sequence, optional) – Optional padding on each border of the image. Default is 0, i.e no padding. If a sequence of length 4 is provided, it is used to pad left, top, right, bottom borders respectively.
class torchvision.transforms.RandomHorizontalFlip

Horizontally flip the given PIL Image randomly with a probability of 0.5.

class torchvision.transforms.RandomVerticalFlip

Vertically flip the given PIL Image randomly with a probability of 0.5.

class torchvision.transforms.RandomResizedCrop(size, interpolation=2)

Crop the given PIL Image to random size and aspect ratio.

A crop of random size of (0.08 to 1.0) of the original size and a random aspect ratio of 3/4 to 4/3 of the original aspect ratio is made. This crop is finally resized to given size. This is popularly used to train the Inception networks.

Parameters:
  • size – expected output size of each edge
  • interpolation – Default: PIL.Image.BILINEAR
class torchvision.transforms.RandomSizedCrop(*args, **kwargs)

Note: This transform is deprecated in favor of RandomResizedCrop.

class torchvision.transforms.FiveCrop(size)

Crop the given PIL Image into four corners and the central crop

Note

This transform returns a tuple of images and there may be a mismatch in the number of inputs and targets your Dataset returns. See below for an example of how to deal with this.

Parameters:size (sequence or int) – Desired output size of the crop. If size is an int instead of sequence like (h, w), a square crop of size (size, size) is made.

Example

>>> transform = Compose([
>>>    FiveCrop(size), # this is a list of PIL Images
>>>    Lambda(lambda crops: torch.stack([ToTensor()(crop) for crop in crops])) # returns a 4D tensor
>>> ])
>>> #In your test loop you can do the following:
>>> input, target = batch # input is a 5d tensor, target is 2d
>>> bs, ncrops, c, h, w = input.size()
>>> result = model(input.view(-1, c, h, w)) # fuse batch size and ncrops
>>> result_avg = result.view(bs, ncrops, -1).mean(1) # avg over crops
class torchvision.transforms.TenCrop(size, vertical_flip=False)

Crop the given PIL Image into four corners and the central crop plus the flipped version of these (horizontal flipping is used by default)

Note

This transform returns a tuple of images and there may be a mismatch in the number of inputs and targets your Dataset returns. See below for an example of how to deal with this.

Parameters:
  • size (sequence or int) – Desired output size of the crop. If size is an int instead of sequence like (h, w), a square crop (size, size) is made.
  • vertical_flip (bool) – Use vertical flipping instead of horizontal

Example

>>> transform = Compose([
>>>    TenCrop(size), # this is a list of PIL Images
>>>    Lambda(lambda crops: torch.stack([ToTensor()(crop) for crop in crops])) # returns a 4D tensor
>>> ])
>>> #In your test loop you can do the following:
>>> input, target = batch # input is a 5d tensor, target is 2d
>>> bs, ncrops, c, h, w = input.size()
>>> result = model(input.view(-1, c, h, w)) # fuse batch size and ncrops
>>> result_avg = result.view(bs, ncrops, -1).mean(1) # avg over crops
class torchvision.transforms.Pad(padding, fill=0)

Pad the given PIL Image on all sides with the given “pad” value.

Parameters:
  • padding (int or tuple) – Padding on each border. If a single int is provided this is used to pad all borders. If tuple of length 2 is provided this is the padding on left/right and top/bottom respectively. If a tuple of length 4 is provided this is the padding for the left, top, right and bottom borders respectively.
  • fill – Pixel fill value. Default is 0. If a tuple of length 3, it is used to fill R, G, B channels respectively.
class torchvision.transforms.ColorJitter(brightness=0, contrast=0, saturation=0, hue=0)

Randomly change the brightness, contrast and saturation of an image.

Parameters:
  • brightness (float) – How much to jitter brightness. brightness_factor is chosen uniformly from [max(0, 1 - brightness), 1 + brightness].
  • contrast (float) – How much to jitter contrast. contrast_factor is chosen uniformly from [max(0, 1 - contrast), 1 + contrast].
  • saturation (float) – How much to jitter saturation. saturation_factor is chosen uniformly from [max(0, 1 - saturation), 1 + saturation].
  • hue (float) – How much to jitter hue. hue_factor is chosen uniformly from [-hue, hue]. Should be >=0 and <= 0.5.

Transforms on torch.*Tensor

class torchvision.transforms.Normalize(mean, std)

Normalize an tensor image with mean and standard deviation. Given mean: (M1,...,Mn) and std: (M1,..,Mn) for n channels, this transform will normalize each channel of the input torch.*Tensor i.e. input[channel] = (input[channel] - mean[channel]) / std[channel]

Parameters:
  • mean (sequence) – Sequence of means for each channel.
  • std (sequence) – Sequence of standard deviations for each channel.
__call__(tensor)
Parameters:tensor (Tensor) – Tensor image of size (C, H, W) to be normalized.
Returns:Normalized Tensor image.
Return type:Tensor

Conversion Transforms

class torchvision.transforms.ToTensor

Convert a PIL Image or numpy.ndarray to tensor.

Converts a PIL Image or numpy.ndarray (H x W x C) in the range [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0].

__call__(pic)
Parameters:pic (PIL Image or numpy.ndarray) – Image to be converted to tensor.
Returns:Converted image.
Return type:Tensor
class torchvision.transforms.ToPILImage(mode=None)

Convert a tensor or an ndarray to PIL Image.

Converts a torch.*Tensor of shape C x H x W or a numpy ndarray of shape H x W x C to a PIL Image while preserving the value range.

Parameters:mode (PIL.Image mode) – color space and pixel depth of input data (optional). If mode is None (default) there are some assumptions made about the input data: 1. If the input has 3 channels, the mode is assumed to be RGB. 2. If the input has 4 channels, the mode is assumed to be RGBA. 3. If the input has 1 channel, the mode is determined by the data type (i,e, int, float, short).
__call__(pic)
Parameters:pic (Tensor or numpy.ndarray) – Image to be converted to PIL Image.
Returns:Image converted to PIL Image.
Return type:PIL Image

Generic Transforms

class torchvision.transforms.Lambda(lambd)

Apply a user-defined lambda as a transform.

Parameters:lambd (function) – Lambda/function to be used for transform.