torchvision.datasets

All datasets are subclasses of torch.utils.data.Dataset i.e, they have __getitem__ and __len__ methods implemented. Hence, they can all be passed to a torch.utils.data.DataLoader which can load multiple samples parallelly using torch.multiprocessing workers. For example:

imagenet_data = torchvision.datasets.ImageNet('path/to/imagenet_root/')
data_loader = torch.utils.data.DataLoader(imagenet_data,
                                          batch_size=4,
                                          shuffle=True,
                                          num_workers=args.nThreads)

The following datasets are available:

Datasets

CelebA
CIFAR
Cityscapes
COCO
- Captions
- Detection
DatasetFolder
EMNIST
FakeData
Fashion-MNIST
Flickr
HMDB51
ImageFolder
ImageNet
Kinetics-400
KMNIST
LSUN
MNIST
Omniglot
PhotoTour
Places365
QMNIST
SBD
SBU
STL10
SVHN
UCF101
USPS
VOC

All the datasets have almost similar API. They all have two common arguments: transform and target_transform to transform the input and target respectively.

CelebA

class torchvision.datasets.CelebA(root: str, split: str = 'train', target_type: Union[List[str], str] = 'attr', transform: Union[Callable, NoneType] = None, target_transform: Union[Callable, NoneType] = None, download: bool = False) → None[source]

Large-scale CelebFaces Attributes (CelebA) Dataset Dataset.

Parameters:

root (string) – Root directory where images are downloaded to.
split (string) – One of {‘train’, ‘valid’, ‘test’, ‘all’}. Accordingly dataset is selected.
target_type (string or list, optional) –
Type of target to use, attr, identity, bbox, or landmarks. Can also be a list to output a tuple with all specified target types. The targets represent:

attr (np.array shape=(40,) dtype=int): binary (0, 1) labels for attributes identity (int): label for each person (data points with the same identity are the same person) bbox (np.array shape=(4,) dtype=int): bounding box (x, y, width, height) landmarks (np.array shape=(10,) dtype=int): landmark points (lefteye_x, lefteye_y, righteye_x,

righteye_y, nose_x, nose_y, leftmouth_x, leftmouth_y, rightmouth_x, rightmouth_y)

Defaults to attr. If empty, None will be returned as target.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.ToTensor
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

CIFAR

class torchvision.datasets.CIFAR10(root: str, train: bool = True, transform: Union[Callable, NoneType] = None, target_transform: Union[Callable, NoneType] = None, download: bool = False) → None[source]

CIFAR10 Dataset.

Parameters:

root (string) – Root directory of dataset where directory cifar-10-batches-py exists or will be saved to if download is set to True.
train (bool, optional) – If True, creates dataset from training set, otherwise creates from test set.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.RandomCrop
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

__getitem__(index: int) → Tuple[Any, Any][source]

Parameters:	index (int) – Index
Returns:	(image, target) where target is index of the target class.
Return type:	tuple

class torchvision.datasets.CIFAR100(root: str, train: bool = True, transform: Union[Callable, NoneType] = None, target_transform: Union[Callable, NoneType] = None, download: bool = False) → None[source]

CIFAR100 Dataset.

This is a subclass of the CIFAR10 Dataset.

Cityscapes

Note

Requires Cityscape to be downloaded.

class torchvision.datasets.Cityscapes(root: str, split: str = 'train', mode: str = 'fine', target_type: Union[List[str], str] = 'instance', transform: Union[Callable, NoneType] = None, target_transform: Union[Callable, NoneType] = None, transforms: Union[Callable, NoneType] = None) → None[source]

Cityscapes Dataset.

Parameters:

root (string) – Root directory of dataset where directory leftImg8bit and gtFine or gtCoarse are located.
split (string, optional) – The image split to use, train, test or val if mode=”fine” otherwise train, train_extra or val
mode (string, optional) – The quality mode to use, fine or coarse
target_type (string or list, optional) – Type of target to use, instance, semantic, polygon or color. Can also be a list to output a tuple with all specified target types.
transform (callable, optional) – A function/transform that takes in a PIL image and returns a transformed version. E.g, transforms.RandomCrop
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
transforms (callable, optional) – A function/transform that takes input sample and its target as entry and returns a transformed version.

Examples

Get semantic segmentation target

dataset = Cityscapes('./data/cityscapes', split='train', mode='fine',
                     target_type='semantic')

img, smnt = dataset[0]

Get multiple targets

dataset = Cityscapes('./data/cityscapes', split='train', mode='fine',
                     target_type=['instance', 'color', 'polygon'])

img, (inst, col, poly) = dataset[0]

Validate on the “coarse” set

dataset = Cityscapes('./data/cityscapes', split='val', mode='coarse',
                     target_type='semantic')

img, smnt = dataset[0]

__getitem__(index: int) → Tuple[Any, Any][source]

Parameters:	index (int) – Index
Returns:	(image, target) where target is a tuple of all target types if target_type is a list with more than one item. Otherwise target is a json object if target_type=”polygon”, else the image segmentation.
Return type:	tuple

COCO

Note

These require the COCO API to be installed

Captions

class torchvision.datasets.CocoCaptions(root: str, annFile: str, transform: Union[Callable, NoneType] = None, target_transform: Union[Callable, NoneType] = None, transforms: Union[Callable, NoneType] = None) → None[source]

MS Coco Captions Dataset.

Parameters:

root (string) – Root directory where images are downloaded to.
annFile (string) – Path to json annotation file.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.ToTensor
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
transforms (callable, optional) – A function/transform that takes input sample and its target as entry and returns a transformed version.

Example

import torchvision.datasets as dset
import torchvision.transforms as transforms
cap = dset.CocoCaptions(root = 'dir where images are',
                        annFile = 'json annotation file',
                        transform=transforms.ToTensor())

print('Number of samples: ', len(cap))
img, target = cap[3] # load 4th sample

print("Image Size: ", img.size())
print(target)

Output:

Number of samples: 82783
Image Size: (3L, 427L, 640L)
[u'A plane emitting smoke stream flying over a mountain.',
u'A plane darts across a bright blue sky behind a mountain covered in snow',
u'A plane leaves a contrail above the snowy mountain top.',
u'A mountain that has a plane flying overheard in the distance.',
u'A mountain view with a plume of smoke in the background']

__getitem__(index: int) → Tuple[Any, Any][source]

Parameters:	index (int) – Index
Returns:	Tuple (image, target). target is a list of captions for the image.
Return type:	tuple

Detection

class torchvision.datasets.CocoDetection(root: str, annFile: str, transform: Union[Callable, NoneType] = None, target_transform: Union[Callable, NoneType] = None, transforms: Union[Callable, NoneType] = None) → None[source]

MS Coco Detection Dataset.

Parameters:

root (string) – Root directory where images are downloaded to.
annFile (string) – Path to json annotation file.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.ToTensor
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
transforms (callable, optional) – A function/transform that takes input sample and its target as entry and returns a transformed version.

__getitem__(index: int) → Tuple[Any, Any][source]

Parameters:	index (int) – Index
Returns:	Tuple (image, target). target is the object returned by `coco.loadAnns`.
Return type:	tuple

DatasetFolder

class torchvision.datasets.DatasetFolder(root: str, loader: Callable[[str], Any], extensions: Union[Tuple[str, ...], NoneType] = None, transform: Union[Callable, NoneType] = None, target_transform: Union[Callable, NoneType] = None, is_valid_file: Union[Callable[[str], bool], NoneType] = None) → None[source]

A generic data loader where the samples are arranged in this way:

root/class_x/xxx.ext
root/class_x/xxy.ext
root/class_x/xxz.ext

root/class_y/123.ext
root/class_y/nsdf3.ext
root/class_y/asd932_.ext

Parameters:

root (string) – Root directory path.
loader (callable) – A function to load a sample given its path.
extensions (tuple[string]) – A list of allowed extensions. both extensions and is_valid_file should not be passed.
transform (callable, optional) – A function/transform that takes in a sample and returns a transformed version. E.g, transforms.RandomCrop for images.
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
is_valid_file – A function that takes path of a file and check if the file is a valid file (used to check of corrupt files) both extensions and is_valid_file should not be passed.

__getitem__(index: int) → Tuple[Any, Any][source]

Parameters:	index (int) – Index
Returns:	(sample, target) where target is class_index of the target class.
Return type:	tuple

EMNIST

class torchvision.datasets.EMNIST(root: str, split: str, **kwargs) → None[source]

EMNIST Dataset.

Parameters:

root (string) – Root directory of dataset where EMNIST/processed/training.pt and EMNIST/processed/test.pt exist.
split (string) – The dataset has 6 different splits: byclass, bymerge, balanced, letters, digits and mnist. This argument specifies which one to use.
train (bool, optional) – If True, creates dataset from training.pt, otherwise from test.pt.
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.RandomCrop
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

FakeData

class torchvision.datasets.FakeData(size: int = 1000, image_size: Tuple[int, int, int] = (3, 224, 224), num_classes: int = 10, transform: Union[Callable, NoneType] = None, target_transform: Union[Callable, NoneType] = None, random_offset: int = 0) → None[source]

A fake dataset that returns randomly generated images and returns them as PIL images

Parameters:

size (int, optional) – Size of the dataset. Default: 1000 images
image_size (tuple, optional) – Size if the returned images. Default: (3, 224, 224)
num_classes (int, optional) – Number of classes in the datset. Default: 10
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.RandomCrop
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
random_offset (int) – Offsets the index-based random seed used to generate each image. Default: 0

Fashion-MNIST

class torchvision.datasets.FashionMNIST(root: str, train: bool = True, transform: Union[Callable, NoneType] = None, target_transform: Union[Callable, NoneType] = None, download: bool = False) → None[source]

Fashion-MNIST Dataset.

Parameters:

root (string) – Root directory of dataset where FashionMNIST/processed/training.pt and FashionMNIST/processed/test.pt exist.
train (bool, optional) – If True, creates dataset from training.pt, otherwise from test.pt.
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.RandomCrop
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

Flickr

class torchvision.datasets.Flickr8k(root: str, ann_file: str, transform: Union[Callable, NoneType] = None, target_transform: Union[Callable, NoneType] = None) → None[source]

Flickr8k Entities Dataset.

Parameters:	root (string) – Root directory where images are downloaded to. ann_file (string) – Path to annotation file. transform (callable, optional) – A function/transform that takes in a PIL image and returns a transformed version. E.g, `transforms.ToTensor` target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

__getitem__(index: int) → Tuple[Any, Any][source]

Parameters:	index (int) – Index
Returns:	Tuple (image, target). target is a list of captions for the image.
Return type:	tuple

class torchvision.datasets.Flickr30k(root: str, ann_file: str, transform: Union[Callable, NoneType] = None, target_transform: Union[Callable, NoneType] = None) → None[source]

Flickr30k Entities Dataset.

Parameters:	root (string) – Root directory where images are downloaded to. ann_file (string) – Path to annotation file. transform (callable, optional) – A function/transform that takes in a PIL image and returns a transformed version. E.g, `transforms.ToTensor` target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

__getitem__(index: int) → Tuple[Any, Any][source]

Parameters:	index (int) – Index
Returns:	Tuple (image, target). target is a list of captions for the image.
Return type:	tuple

HMDB51

class torchvision.datasets.HMDB51(root, annotation_path, frames_per_clip, step_between_clips=1, frame_rate=None, fold=1, train=True, transform=None, _precomputed_metadata=None, num_workers=1, _video_width=0, _video_height=0, _video_min_dimension=0, _audio_samples=0)[source]

HMDB51 dataset.

HMDB51 is an action recognition video dataset. This dataset consider every video as a collection of video clips of fixed size, specified by frames_per_clip, where the step in frames between each clip is given by step_between_clips.

To give an example, for 2 videos with 10 and 15 frames respectively, if frames_per_clip=5 and step_between_clips=5, the dataset size will be (2 + 3) = 5, where the first two elements will come from video 1, and the next three elements from video 2. Note that we drop clips which do not have exactly frames_per_clip elements, so not all frames in a video might be present.

Internally, it uses a VideoClips object to handle clip creation.

Parameters:

root (string) – Root directory of the HMDB51 Dataset.
annotation_path (str) – Path to the folder containing the split files.
frames_per_clip (int) – Number of frames in a clip.
step_between_clips (int) – Number of frames between each clip.
fold (int, optional) – Which fold to use. Should be between 1 and 3.
train (bool, optional) – If True, creates a dataset from the train split, otherwise from the test split.
transform (callable, optional) – A function/transform that takes in a TxHxWxC video and returns a transformed version.

Returns:

the T video frames audio(Tensor[K, L]): the audio frames, where K is the number of channels

and L is the number of points

label (int): class of the video clip

Return type:

video (Tensor[T, H, W, C])

ImageFolder

class torchvision.datasets.ImageFolder(root: str, transform: Union[Callable, NoneType] = None, target_transform: Union[Callable, NoneType] = None, loader: Callable[[str], Any] = <function default_loader>, is_valid_file: Union[Callable[[str], bool], NoneType] = None)[source]

A generic data loader where the images are arranged in this way:

root/dog/xxx.png
root/dog/xxy.png
root/dog/xxz.png

root/cat/123.png
root/cat/nsdf3.png
root/cat/asd932_.png

Parameters:

root (string) – Root directory path.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.RandomCrop
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
loader (callable, optional) – A function to load an image given its path.
is_valid_file – A function that takes path of an Image file and check if the file is a valid file (used to check of corrupt files)

__getitem__(index: int) → Tuple[Any, Any]

Parameters:	index (int) – Index
Returns:	(sample, target) where target is class_index of the target class.
Return type:	tuple

ImageNet

class torchvision.datasets.ImageNet(root: str, split: str = 'train', download: Union[str, NoneType] = None, **kwargs) → None[source]

ImageNet 2012 Classification Dataset.

Parameters:

root (string) – Root directory of the ImageNet Dataset.
split (string, optional) – The dataset split, supports train, or val.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.RandomCrop
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
loader – A function to load an image given its path.

Note

This requires scipy to be installed

Kinetics-400

class torchvision.datasets.Kinetics400(root, frames_per_clip, step_between_clips=1, frame_rate=None, extensions=('avi', ), transform=None, _precomputed_metadata=None, num_workers=1, _video_width=0, _video_height=0, _video_min_dimension=0, _audio_samples=0, _audio_channels=0)[source]

Kinetics-400 dataset.

Kinetics-400 is an action recognition video dataset. This dataset consider every video as a collection of video clips of fixed size, specified by frames_per_clip, where the step in frames between each clip is given by step_between_clips.

To give an example, for 2 videos with 10 and 15 frames respectively, if frames_per_clip=5 and step_between_clips=5, the dataset size will be (2 + 3) = 5, where the first two elements will come from video 1, and the next three elements from video 2. Note that we drop clips which do not have exactly frames_per_clip elements, so not all frames in a video might be present.

Internally, it uses a VideoClips object to handle clip creation.

Parameters:

root (string) – Root directory of the Kinetics-400 Dataset.
frames_per_clip (int) – number of frames in a clip
step_between_clips (int) – number of frames between each clip
transform (callable, optional) – A function/transform that takes in a TxHxWxC video and returns a transformed version.

Returns:

the T video frames audio(Tensor[K, L]): the audio frames, where K is the number of channels

and L is the number of points

label (int): class of the video clip

Return type:

video (Tensor[T, H, W, C])

KMNIST

class torchvision.datasets.KMNIST(root: str, train: bool = True, transform: Union[Callable, NoneType] = None, target_transform: Union[Callable, NoneType] = None, download: bool = False) → None[source]

Kuzushiji-MNIST Dataset.

Parameters:

root (string) – Root directory of dataset where KMNIST/processed/training.pt and KMNIST/processed/test.pt exist.
train (bool, optional) – If True, creates dataset from training.pt, otherwise from test.pt.
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.RandomCrop
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

LSUN

class torchvision.datasets.LSUN(root: str, classes: Union[str, List[str]] = 'train', transform: Union[Callable, NoneType] = None, target_transform: Union[Callable, NoneType] = None) → None[source]

LSUN dataset.

Parameters:

root (string) – Root directory for the database files.
classes (string or list) – One of {‘train’, ‘val’, ‘test’} or a list of categories to load. e,g. [‘bedroom_train’, ‘church_outdoor_train’].
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.RandomCrop
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

__getitem__(index: int) → Tuple[Any, Any][source]

Parameters:	index (int) – Index
Returns:	Tuple (image, target) where target is the index of the target category.
Return type:	tuple

MNIST

class torchvision.datasets.MNIST(root: str, train: bool = True, transform: Union[Callable, NoneType] = None, target_transform: Union[Callable, NoneType] = None, download: bool = False) → None[source]

MNIST Dataset.

Parameters:

root (string) – Root directory of dataset where MNIST/processed/training.pt and MNIST/processed/test.pt exist.
train (bool, optional) – If True, creates dataset from training.pt, otherwise from test.pt.
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.RandomCrop
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

Omniglot

class torchvision.datasets.Omniglot(root: str, background: bool = True, transform: Union[Callable, NoneType] = None, target_transform: Union[Callable, NoneType] = None, download: bool = False) → None[source]

Omniglot Dataset. :param root: Root directory of dataset where directory

omniglot-py exists.

Parameters:

background (bool, optional) – If True, creates dataset from the “background” set, otherwise creates from the “evaluation” set. This terminology is defined by the authors.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.RandomCrop
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
download (bool, optional) – If true, downloads the dataset zip files from the internet and puts it in root directory. If the zip files are already downloaded, they are not downloaded again.

PhotoTour

class torchvision.datasets.PhotoTour(root: str, name: str, train: bool = True, transform: Union[Callable, NoneType] = None, download: bool = False) → None[source]

Learning Local Image Descriptors Data Dataset.

Parameters:

root (string) – Root directory where images are.
name (string) – Name of the dataset to load.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version.
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

__getitem__(index: int) → Union[torch.Tensor, Tuple[Any, Any, torch.Tensor]][source]

Parameters:	index (int) – Index
Returns:	(data1, data2, matches)
Return type:	tuple

Places365

class torchvision.datasets.Places365(root: str, split: str = 'train-standard', small: bool = False, download: bool = False, transform: Union[Callable, NoneType] = None, target_transform: Union[Callable, NoneType] = None, loader: Callable[[str], Any] = <function default_loader>) → None[source]

Places365 classification dataset.

Parameters:

root (string) – Root directory of the Places365 dataset.
split (string, optional) – The dataset split. Can be one of train-standard (default), train-challendge, val.
small (bool, optional) – If True, uses the small images, i. e. resized to 256 x 256 pixels, instead of the high resolution ones.
download (bool, optional) – If True, downloads the dataset components and places them in root. Already downloaded archives are not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.RandomCrop
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
loader – A function to load an image given its path.

Raises:

RuntimeError – If download is False and the meta files, i. e. the devkit, are not present or corrupted.
RuntimeError – If download is True and the image archive is already extracted.

QMNIST

class torchvision.datasets.QMNIST(root: str, what: Union[str, NoneType] = None, compat: bool = True, train: bool = True, **kwargs) → None[source]

QMNIST Dataset.

Parameters:

root (string) – Root directory of dataset whose ``processed’’ subdir contains torch binary files with the datasets.
what (string,optional) – Can be ‘train’, ‘test’, ‘test10k’, ‘test50k’, or ‘nist’ for respectively the mnist compatible training set, the 60k qmnist testing set, the 10k qmnist examples that match the mnist testing set, the 50k remaining qmnist testing examples, or all the nist digits. The default is to select ‘train’ or ‘test’ according to the compatibility argument ‘train’.
compat (bool,optional) – A boolean that says whether the target for each example is class number (for compatibility with the MNIST dataloader) or a torch vector containing the full qmnist information. Default=True.
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.RandomCrop
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
train (bool,optional,compatibility) – When argument ‘what’ is not specified, this boolean decides whether to load the training set ot the testing set. Default: True.

SBD

class torchvision.datasets.SBDataset(root: str, image_set: str = 'train', mode: str = 'boundaries', download: bool = False, transforms: Union[Callable, NoneType] = None) → None[source]

Semantic Boundaries Dataset

The SBD currently contains annotations from 11355 images taken from the PASCAL VOC 2011 dataset.

Note

Please note that the train and val splits included with this dataset are different from the splits in the PASCAL VOC dataset. In particular some “train” images might be part of VOC2012 val. If you are interested in testing on VOC 2012 val, then use image_set=’train_noval’, which excludes all val images.

Warning

This class needs scipy to load target files from .mat format.

Parameters:

root (string) – Root directory of the Semantic Boundaries Dataset
image_set (string, optional) – Select the image_set to use, train, val or train_noval. Image set train_noval excludes VOC 2012 val images.
mode (string, optional) – Select target type. Possible values ‘boundaries’ or ‘segmentation’. In case of ‘boundaries’, the target is an array of shape [num_classes, H, W], where num_classes=20.
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transforms (callable, optional) – A function/transform that takes input sample and its target as entry and returns a transformed version. Input sample is PIL image and target is a numpy array if mode=’boundaries’ or PIL image if mode=’segmentation’.

SBU

class torchvision.datasets.SBU(root: str, transform: Union[Callable, NoneType] = None, target_transform: Union[Callable, NoneType] = None, download: bool = True) → None[source]

SBU Captioned Photo Dataset.

Parameters:

root (string) – Root directory of dataset where tarball SBUCaptionedPhotoDataset.tar.gz exists.
transform (callable, optional) – A function/transform that takes in a PIL image and returns a transformed version. E.g, transforms.RandomCrop
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
download (bool, optional) – If True, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

__getitem__(index: int) → Tuple[Any, Any][source]

Parameters:	index (int) – Index
Returns:	(image, target) where target is a caption for the photo.
Return type:	tuple

STL10

class torchvision.datasets.STL10(root: str, split: str = 'train', folds: Union[int, NoneType] = None, transform: Union[Callable, NoneType] = None, target_transform: Union[Callable, NoneType] = None, download: bool = False) → None[source]

STL10 Dataset.

Parameters:

root (string) – Root directory of dataset where directory stl10_binary exists.
split (string) – One of {‘train’, ‘test’, ‘unlabeled’, ‘train+unlabeled’}. Accordingly dataset is selected.
folds (int, optional) –
One of {0-9} or None. For training, loads one of the 10 pre-defined folds of 1k samples for the

standard evaluation procedure. If no value is passed, loads the 5k samples.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.RandomCrop
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

__getitem__(index: int) → Tuple[Any, Any][source]

Parameters:	index (int) – Index
Returns:	(image, target) where target is index of the target class.
Return type:	tuple

SVHN

class torchvision.datasets.SVHN(root: str, split: str = 'train', transform: Union[Callable, NoneType] = None, target_transform: Union[Callable, NoneType] = None, download: bool = False) → None[source]

SVHN Dataset. Note: The SVHN dataset assigns the label 10 to the digit 0. However, in this Dataset, we assign the label 0 to the digit 0 to be compatible with PyTorch loss functions which expect the class labels to be in the range [0, C-1]

Warning

This class needs scipy to load data from .mat format.

Parameters:

root (string) – Root directory of dataset where directory SVHN exists.
split (string) – One of {‘train’, ‘test’, ‘extra’}. Accordingly dataset is selected. ‘extra’ is Extra training set.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.RandomCrop
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

__getitem__(index: int) → Tuple[Any, Any][source]

Parameters:	index (int) – Index
Returns:	(image, target) where target is index of the target class.
Return type:	tuple

UCF101

class torchvision.datasets.UCF101(root, annotation_path, frames_per_clip, step_between_clips=1, frame_rate=None, fold=1, train=True, transform=None, _precomputed_metadata=None, num_workers=1, _video_width=0, _video_height=0, _video_min_dimension=0, _audio_samples=0)[source]

UCF101 dataset.

UCF101 is an action recognition video dataset. This dataset consider every video as a collection of video clips of fixed size, specified by frames_per_clip, where the step in frames between each clip is given by step_between_clips.

To give an example, for 2 videos with 10 and 15 frames respectively, if frames_per_clip=5 and step_between_clips=5, the dataset size will be (2 + 3) = 5, where the first two elements will come from video 1, and the next three elements from video 2. Note that we drop clips which do not have exactly frames_per_clip elements, so not all frames in a video might be present.

Internally, it uses a VideoClips object to handle clip creation.

Parameters:

root (string) – Root directory of the UCF101 Dataset.
annotation_path (str) – path to the folder containing the split files
frames_per_clip (int) – number of frames in a clip.
step_between_clips (int, optional) – number of frames between each clip.
fold (int, optional) – which fold to use. Should be between 1 and 3.
train (bool, optional) – if True, creates a dataset from the train split, otherwise from the test split.
transform (callable, optional) – A function/transform that takes in a TxHxWxC video and returns a transformed version.

Returns:

the T video frames audio(Tensor[K, L]): the audio frames, where K is the number of channels

and L is the number of points

label (int): class of the video clip

Return type:

video (Tensor[T, H, W, C])

USPS

class torchvision.datasets.USPS(root: str, train: bool = True, transform: Union[Callable, NoneType] = None, target_transform: Union[Callable, NoneType] = None, download: bool = False) → None[source]

USPS Dataset. The data-format is : [label [index:value ]*256 n] * num_lines, where label lies in [1, 10]. The value for each pixel lies in [-1, 1]. Here we transform the label into [0, 9] and make pixel values in [0, 255].

Parameters:

root (string) – Root directory of dataset to store``USPS`` data files.
train (bool, optional) – If True, creates dataset from usps.bz2, otherwise from usps.t.bz2.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.RandomCrop
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

__getitem__(index: int) → Tuple[Any, Any][source]

Parameters:	index (int) – Index
Returns:	(image, target) where target is index of the target class.
Return type:	tuple

VOC

class torchvision.datasets.VOCSegmentation(root: str, year: str = '2012', image_set: str = 'train', download: bool = False, transform: Union[Callable, NoneType] = None, target_transform: Union[Callable, NoneType] = None, transforms: Union[Callable, NoneType] = None)[source]

Pascal VOC Segmentation Dataset.

Parameters:

root (string) – Root directory of the VOC Dataset.
year (string, optional) – The dataset year, supports years 2007 to 2012.
image_set (string, optional) – Select the image_set to use, train, trainval or val
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.RandomCrop
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
transforms (callable, optional) – A function/transform that takes input sample and its target as entry and returns a transformed version.

__getitem__(index: int) → Tuple[Any, Any][source]

Parameters:	index (int) – Index
Returns:	(image, target) where target is the image segmentation.
Return type:	tuple

class torchvision.datasets.VOCDetection(root: str, year: str = '2012', image_set: str = 'train', download: bool = False, transform: Union[Callable, NoneType] = None, target_transform: Union[Callable, NoneType] = None, transforms: Union[Callable, NoneType] = None)[source]

Pascal VOC Detection Dataset.

Parameters:

root (string) – Root directory of the VOC Dataset.
year (string, optional) – The dataset year, supports years 2007 to 2012.
image_set (string, optional) – Select the image_set to use, train, trainval or val
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. (default: alphabetic indexing of VOC’s 20 classes).
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.RandomCrop
target_transform (callable, required) – A function/transform that takes in the target and transforms it.
transforms (callable, optional) – A function/transform that takes input sample and its target as entry and returns a transformed version.

__getitem__(index: int) → Tuple[Any, Any][source]

Parameters:	index (int) – Index
Returns:	(image, target) where target is a dictionary of the XML tree.
Return type:	tuple

torchvision.datasets

CelebA

CIFAR

Cityscapes

COCO

Captions

Detection

DatasetFolder

EMNIST

FakeData

Fashion-MNIST

Flickr

HMDB51

ImageFolder

ImageNet

Kinetics-400

KMNIST

LSUN

MNIST

Omniglot

PhotoTour

Places365

QMNIST

SBD

SBU

STL10

SVHN

UCF101

USPS

VOC

Docs

Tutorials

Resources