torchvision.datasets¶
All datasets are subclasses of torch.utils.data.Dataset
i.e, they have __getitem__
and __len__
methods implemented.
Hence, they can all be passed to a torch.utils.data.DataLoader
which can load multiple samples in parallel using torch.multiprocessing
workers.
For example:
imagenet_data = torchvision.datasets.ImageNet('path/to/imagenet_root/')
data_loader = torch.utils.data.DataLoader(imagenet_data,
batch_size=4,
shuffle=True,
num_workers=args.nThreads)
All the datasets have almost similar API. They all have two common arguments:
transform
and target_transform
to transform the input and target respectively.
You can also create your own datasets using the provided base classes.
Caltech¶
-
class
torchvision.datasets.
Caltech101
(root: str, target_type: Union[List[str], str] = 'category', transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, download: bool = False)[source]¶ Caltech 101 Dataset.
Warning
This class needs scipy to load target files from .mat format.
- Parameters
root (string) – Root directory of dataset where directory
caltech101
exists or will be saved to if download is set to True.target_type (string or list, optional) – Type of target to use,
category
orCan also be a list to output a tuple with all specified target types. (annotation.) –
represents the target class, and annotation is a list of points (category) –
a hand-generated outline. Defaults to category. (from) –
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g,
transforms.RandomCrop
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
-
class
torchvision.datasets.
Caltech256
(root: str, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, download: bool = False)[source]¶ Caltech 256 Dataset.
- Parameters
root (string) – Root directory of dataset where directory
caltech256
exists or will be saved to if download is set to True.transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g,
transforms.RandomCrop
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
CelebA¶
-
class
torchvision.datasets.
CelebA
(root: str, split: str = 'train', target_type: Union[List[str], str] = 'attr', transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, download: bool = False)[source]¶ Large-scale CelebFaces Attributes (CelebA) Dataset Dataset.
- Parameters
root (string) – Root directory where images are downloaded to.
split (string) – One of {‘train’, ‘valid’, ‘test’, ‘all’}. Accordingly dataset is selected.
target_type (string or list, optional) –
Type of target to use,
attr
,identity
,bbox
, orlandmarks
. Can also be a list to output a tuple with all specified target types. The targets represent:attr
(np.array shape=(40,) dtype=int): binary (0, 1) labels for attributesidentity
(int): label for each person (data points with the same identity are the same person)bbox
(np.array shape=(4,) dtype=int): bounding box (x, y, width, height)landmarks
(np.array shape=(10,) dtype=int): landmark points (lefteye_x, lefteye_y, righteye_x, righteye_y, nose_x, nose_y, leftmouth_x, leftmouth_y, rightmouth_x, rightmouth_y)
Defaults to
attr
. If empty,None
will be returned as target.transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g,
transforms.ToTensor
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
CIFAR¶
-
class
torchvision.datasets.
CIFAR10
(root: str, train: bool = True, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, download: bool = False)[source]¶ CIFAR10 Dataset.
- Parameters
root (string) – Root directory of dataset where directory
cifar-10-batches-py
exists or will be saved to if download is set to True.train (bool, optional) – If True, creates dataset from training set, otherwise creates from test set.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g,
transforms.RandomCrop
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
Cityscapes¶
Note
Requires Cityscape to be downloaded.
-
class
torchvision.datasets.
Cityscapes
(root: str, split: str = 'train', mode: str = 'fine', target_type: Union[List[str], str] = 'instance', transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, transforms: Optional[Callable] = None)[source]¶ Cityscapes Dataset.
- Parameters
root (string) – Root directory of dataset where directory
leftImg8bit
andgtFine
orgtCoarse
are located.split (string, optional) – The image split to use,
train
,test
orval
if mode=”fine” otherwisetrain
,train_extra
orval
mode (string, optional) – The quality mode to use,
fine
orcoarse
target_type (string or list, optional) – Type of target to use,
instance
,semantic
,polygon
orcolor
. Can also be a list to output a tuple with all specified target types.transform (callable, optional) – A function/transform that takes in a PIL image and returns a transformed version. E.g,
transforms.RandomCrop
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
transforms (callable, optional) – A function/transform that takes input sample and its target as entry and returns a transformed version.
Examples
Get semantic segmentation target
dataset = Cityscapes('./data/cityscapes', split='train', mode='fine', target_type='semantic') img, smnt = dataset[0]
Get multiple targets
dataset = Cityscapes('./data/cityscapes', split='train', mode='fine', target_type=['instance', 'color', 'polygon']) img, (inst, col, poly) = dataset[0]
Validate on the “coarse” set
dataset = Cityscapes('./data/cityscapes', split='val', mode='coarse', target_type='semantic') img, smnt = dataset[0]
COCO¶
Note
These require the COCO API to be installed
Detection¶
-
class
torchvision.datasets.
CocoDetection
(root: str, annFile: str, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, transforms: Optional[Callable] = None)[source]¶ MS Coco Detection Dataset.
- Parameters
root (string) – Root directory where images are downloaded to.
annFile (string) – Path to json annotation file.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g,
transforms.ToTensor
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
transforms (callable, optional) – A function/transform that takes input sample and its target as entry and returns a transformed version.
EMNIST¶
-
class
torchvision.datasets.
EMNIST
(root: str, split: str, **kwargs: Any)[source]¶ EMNIST Dataset.
- Parameters
root (string) – Root directory of dataset where
EMNIST/processed/training.pt
andEMNIST/processed/test.pt
exist.split (string) – The dataset has 6 different splits:
byclass
,bymerge
,balanced
,letters
,digits
andmnist
. This argument specifies which one to use.train (bool, optional) – If True, creates dataset from
training.pt
, otherwise fromtest.pt
.download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g,
transforms.RandomCrop
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
FakeData¶
-
class
torchvision.datasets.
FakeData
(size: int = 1000, image_size: Tuple[int, int, int] = (3, 224, 224), num_classes: int = 10, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, random_offset: int = 0)[source]¶ A fake dataset that returns randomly generated images and returns them as PIL images
- Parameters
size (int, optional) – Size of the dataset. Default: 1000 images
image_size (tuple, optional) – Size if the returned images. Default: (3, 224, 224)
num_classes (int, optional) – Number of classes in the dataset. Default: 10
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g,
transforms.RandomCrop
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
random_offset (int) – Offsets the index-based random seed used to generate each image. Default: 0
Fashion-MNIST¶
-
class
torchvision.datasets.
FashionMNIST
(root: str, train: bool = True, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, download: bool = False)[source]¶ Fashion-MNIST Dataset.
- Parameters
root (string) – Root directory of dataset where
FashionMNIST/processed/training.pt
andFashionMNIST/processed/test.pt
exist.train (bool, optional) – If True, creates dataset from
training.pt
, otherwise fromtest.pt
.download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g,
transforms.RandomCrop
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
Flickr¶
-
class
torchvision.datasets.
Flickr8k
(root: str, ann_file: str, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None)[source]¶ Flickr8k Entities Dataset.
- Parameters
root (string) – Root directory where images are downloaded to.
ann_file (string) – Path to annotation file.
transform (callable, optional) – A function/transform that takes in a PIL image and returns a transformed version. E.g,
transforms.ToTensor
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
-
class
torchvision.datasets.
Flickr30k
(root: str, ann_file: str, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None)[source]¶ Flickr30k Entities Dataset.
- Parameters
root (string) – Root directory where images are downloaded to.
ann_file (string) – Path to annotation file.
transform (callable, optional) – A function/transform that takes in a PIL image and returns a transformed version. E.g,
transforms.ToTensor
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
HMDB51¶
-
class
torchvision.datasets.
HMDB51
(root, annotation_path, frames_per_clip, step_between_clips=1, frame_rate=None, fold=1, train=True, transform=None, _precomputed_metadata=None, num_workers=1, _video_width=0, _video_height=0, _video_min_dimension=0, _audio_samples=0)[source]¶ HMDB51 dataset.
HMDB51 is an action recognition video dataset. This dataset consider every video as a collection of video clips of fixed size, specified by
frames_per_clip
, where the step in frames between each clip is given bystep_between_clips
.To give an example, for 2 videos with 10 and 15 frames respectively, if
frames_per_clip=5
andstep_between_clips=5
, the dataset size will be (2 + 3) = 5, where the first two elements will come from video 1, and the next three elements from video 2. Note that we drop clips which do not have exactlyframes_per_clip
elements, so not all frames in a video might be present.Internally, it uses a VideoClips object to handle clip creation.
- Parameters
root (string) – Root directory of the HMDB51 Dataset.
annotation_path (str) – Path to the folder containing the split files.
frames_per_clip (int) – Number of frames in a clip.
step_between_clips (int) – Number of frames between each clip.
fold (int, optional) – Which fold to use. Should be between 1 and 3.
train (bool, optional) – If
True
, creates a dataset from the train split, otherwise from thetest
split.transform (callable, optional) – A function/transform that takes in a TxHxWxC video and returns a transformed version.
- Returns
A 3-tuple with the following entries:
video (Tensor[T, H, W, C]): The T video frames
audio(Tensor[K, L]): the audio frames, where K is the number of channels and L is the number of points
label (int): class of the video clip
- Return type
ImageNet¶
-
class
torchvision.datasets.
ImageNet
(root: str, split: str = 'train', download: Optional[str] = None, **kwargs: Any)[source]¶ ImageNet 2012 Classification Dataset.
- Parameters
root (string) – Root directory of the ImageNet Dataset.
split (string, optional) – The dataset split, supports
train
, orval
.transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g,
transforms.RandomCrop
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
loader – A function to load an image given its path.
Note
This requires scipy to be installed
Kinetics-400¶
-
class
torchvision.datasets.
Kinetics400
(root, frames_per_clip, step_between_clips=1, frame_rate=None, extensions=('avi', ), transform=None, _precomputed_metadata=None, num_workers=1, _video_width=0, _video_height=0, _video_min_dimension=0, _audio_samples=0, _audio_channels=0)[source]¶ Kinetics-400 dataset.
Kinetics-400 is an action recognition video dataset. This dataset consider every video as a collection of video clips of fixed size, specified by
frames_per_clip
, where the step in frames between each clip is given bystep_between_clips
.To give an example, for 2 videos with 10 and 15 frames respectively, if
frames_per_clip=5
andstep_between_clips=5
, the dataset size will be (2 + 3) = 5, where the first two elements will come from video 1, and the next three elements from video 2. Note that we drop clips which do not have exactlyframes_per_clip
elements, so not all frames in a video might be present.Internally, it uses a VideoClips object to handle clip creation.
- Parameters
root (string) –
Root directory of the Kinetics-400 Dataset. Should be structured as follows:
root/ ├── class1 │ ├── clip1.avi │ ├── clip2.avi │ └── ... └── class2 ├── clipx.avi └── ...
frames_per_clip (int) – number of frames in a clip
step_between_clips (int) – number of frames between each clip
transform (callable, optional) – A function/transform that takes in a TxHxWxC video and returns a transformed version.
- Returns
A 3-tuple with the following entries:
video (Tensor[T, H, W, C]): the T video frames
audio(Tensor[K, L]): the audio frames, where K is the number of channels and L is the number of points
label (int): class of the video clip
- Return type
KITTI¶
-
class
torchvision.datasets.
Kitti
(root: str, train: bool = True, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, transforms: Optional[Callable] = None, download: bool = False)[source]¶ KITTI Dataset.
- Parameters
root (string) –
Root directory where images are downloaded to. Expects the following folder structure if download=False:
<root> └── Kitti └─ raw ├── training | ├── image_2 | └── label_2 └── testing └── image_2
train (bool, optional) – Use
train
split if true, elsetest
split. Defaults totrain
.transform (callable, optional) – A function/transform that takes in a PIL image and returns a transformed version. E.g,
transforms.ToTensor
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
transforms (callable, optional) – A function/transform that takes input sample and its target as entry and returns a transformed version.
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
KMNIST¶
-
class
torchvision.datasets.
KMNIST
(root: str, train: bool = True, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, download: bool = False)[source]¶ Kuzushiji-MNIST Dataset.
- Parameters
root (string) – Root directory of dataset where
KMNIST/processed/training.pt
andKMNIST/processed/test.pt
exist.train (bool, optional) – If True, creates dataset from
training.pt
, otherwise fromtest.pt
.download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g,
transforms.RandomCrop
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
LSUN¶
-
class
torchvision.datasets.
LSUN
(root: str, classes: Union[str, List[str]] = 'train', transform: Optional[Callable] = None, target_transform: Optional[Callable] = None)[source]¶ LSUN dataset.
- Parameters
root (string) – Root directory for the database files.
classes (string or list) – One of {‘train’, ‘val’, ‘test’} or a list of categories to load. e,g. [‘bedroom_train’, ‘church_outdoor_train’].
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g,
transforms.RandomCrop
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
MNIST¶
-
class
torchvision.datasets.
MNIST
(root: str, train: bool = True, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, download: bool = False)[source]¶ MNIST Dataset.
- Parameters
root (string) – Root directory of dataset where
MNIST/processed/training.pt
andMNIST/processed/test.pt
exist.train (bool, optional) – If True, creates dataset from
training.pt
, otherwise fromtest.pt
.download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g,
transforms.RandomCrop
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
Omniglot¶
-
class
torchvision.datasets.
Omniglot
(root: str, background: bool = True, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, download: bool = False)[source]¶ Omniglot Dataset.
- Parameters
root (string) – Root directory of dataset where directory
omniglot-py
exists.background (bool, optional) – If True, creates dataset from the “background” set, otherwise creates from the “evaluation” set. This terminology is defined by the authors.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g,
transforms.RandomCrop
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
download (bool, optional) – If true, downloads the dataset zip files from the internet and puts it in root directory. If the zip files are already downloaded, they are not downloaded again.
PhotoTour¶
-
class
torchvision.datasets.
PhotoTour
(root: str, name: str, train: bool = True, transform: Optional[Callable] = None, download: bool = False)[source]¶ Multi-view Stereo Correspondence Dataset.
Note
We only provide the newer version of the dataset, since the authors state that it
is more suitable for training descriptors based on difference of Gaussian, or Harris corners, as the patches are centred on real interest point detections, rather than being projections of 3D points as is the case in the old dataset.
The original dataset is available under http://phototour.cs.washington.edu/patches/default.htm.
- Parameters
root (string) – Root directory where images are.
name (string) – Name of the dataset to load.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version.
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
Places365¶
-
class
torchvision.datasets.
Places365
(root: str, split: str = 'train-standard', small: bool = False, download: bool = False, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, loader: Callable[[str], Any] = <function default_loader>)[source]¶ Places365 classification dataset.
- Parameters
root (string) – Root directory of the Places365 dataset.
split (string, optional) – The dataset split. Can be one of
train-standard
(default),train-challenge
,val
.small (bool, optional) – If
True
, uses the small images, i. e. resized to 256 x 256 pixels, instead of the high resolution ones.download (bool, optional) – If
True
, downloads the dataset components and places them inroot
. Already downloaded archives are not downloaded again.transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g,
transforms.RandomCrop
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
loader – A function to load an image given its path.
- Raises
RuntimeError – If
download is False
and the meta files, i. e. the devkit, are not present or corrupted.RuntimeError – If
download is True
and the image archive is already extracted.
QMNIST¶
-
class
torchvision.datasets.
QMNIST
(root: str, what: Optional[str] = None, compat: bool = True, train: bool = True, **kwargs: Any)[source]¶ QMNIST Dataset.
- Parameters
root (string) – Root directory of dataset whose
processed
subdir contains torch binary files with the datasets.what (string,optional) – Can be ‘train’, ‘test’, ‘test10k’, ‘test50k’, or ‘nist’ for respectively the mnist compatible training set, the 60k qmnist testing set, the 10k qmnist examples that match the mnist testing set, the 50k remaining qmnist testing examples, or all the nist digits. The default is to select ‘train’ or ‘test’ according to the compatibility argument ‘train’.
compat (bool,optional) – A boolean that says whether the target for each example is class number (for compatibility with the MNIST dataloader) or a torch vector containing the full qmnist information. Default=True.
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g,
transforms.RandomCrop
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
train (bool,optional,compatibility) – When argument ‘what’ is not specified, this boolean decides whether to load the training set ot the testing set. Default: True.
SBD¶
-
class
torchvision.datasets.
SBDataset
(root: str, image_set: str = 'train', mode: str = 'boundaries', download: bool = False, transforms: Optional[Callable] = None)[source]¶ -
The SBD currently contains annotations from 11355 images taken from the PASCAL VOC 2011 dataset.
Note
Please note that the train and val splits included with this dataset are different from the splits in the PASCAL VOC dataset. In particular some “train” images might be part of VOC2012 val. If you are interested in testing on VOC 2012 val, then use image_set=’train_noval’, which excludes all val images.
Warning
This class needs scipy to load target files from .mat format.
- Parameters
root (string) – Root directory of the Semantic Boundaries Dataset
image_set (string, optional) – Select the image_set to use,
train
,val
ortrain_noval
. Image settrain_noval
excludes VOC 2012 val images.mode (string, optional) – Select target type. Possible values ‘boundaries’ or ‘segmentation’. In case of ‘boundaries’, the target is an array of shape [num_classes, H, W], where num_classes=20.
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transforms (callable, optional) – A function/transform that takes input sample and its target as entry and returns a transformed version. Input sample is PIL image and target is a numpy array if mode=’boundaries’ or PIL image if mode=’segmentation’.
SBU¶
-
class
torchvision.datasets.
SBU
(root: str, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, download: bool = True)[source]¶ SBU Captioned Photo Dataset.
- Parameters
root (string) – Root directory of dataset where tarball
SBUCaptionedPhotoDataset.tar.gz
exists.transform (callable, optional) – A function/transform that takes in a PIL image and returns a transformed version. E.g,
transforms.RandomCrop
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
download (bool, optional) – If True, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
SEMEION¶
-
class
torchvision.datasets.
SEMEION
(root: str, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, download: bool = True)[source]¶ SEMEION Dataset.
- Parameters
root (string) – Root directory of dataset where directory
semeion.py
exists.transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g,
transforms.RandomCrop
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
STL10¶
-
class
torchvision.datasets.
STL10
(root: str, split: str = 'train', folds: Optional[int] = None, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, download: bool = False)[source]¶ STL10 Dataset.
- Parameters
root (string) – Root directory of dataset where directory
stl10_binary
exists.split (string) – One of {‘train’, ‘test’, ‘unlabeled’, ‘train+unlabeled’}. Accordingly dataset is selected.
folds (int, optional) – One of {0-9} or None. For training, loads one of the 10 pre-defined folds of 1k samples for the standard evaluation procedure. If no value is passed, loads the 5k samples.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g,
transforms.RandomCrop
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
SVHN¶
-
class
torchvision.datasets.
SVHN
(root: str, split: str = 'train', transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, download: bool = False)[source]¶ SVHN Dataset. Note: The SVHN dataset assigns the label 10 to the digit 0. However, in this Dataset, we assign the label 0 to the digit 0 to be compatible with PyTorch loss functions which expect the class labels to be in the range [0, C-1]
Warning
This class needs scipy to load data from .mat format.
- Parameters
root (string) – Root directory of dataset where directory
SVHN
exists.split (string) – One of {‘train’, ‘test’, ‘extra’}. Accordingly dataset is selected. ‘extra’ is Extra training set.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g,
transforms.RandomCrop
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
UCF101¶
-
class
torchvision.datasets.
UCF101
(root, annotation_path, frames_per_clip, step_between_clips=1, frame_rate=None, fold=1, train=True, transform=None, _precomputed_metadata=None, num_workers=1, _video_width=0, _video_height=0, _video_min_dimension=0, _audio_samples=0)[source]¶ UCF101 dataset.
UCF101 is an action recognition video dataset. This dataset consider every video as a collection of video clips of fixed size, specified by
frames_per_clip
, where the step in frames between each clip is given bystep_between_clips
.To give an example, for 2 videos with 10 and 15 frames respectively, if
frames_per_clip=5
andstep_between_clips=5
, the dataset size will be (2 + 3) = 5, where the first two elements will come from video 1, and the next three elements from video 2. Note that we drop clips which do not have exactlyframes_per_clip
elements, so not all frames in a video might be present.Internally, it uses a VideoClips object to handle clip creation.
- Parameters
root (string) – Root directory of the UCF101 Dataset.
annotation_path (str) – path to the folder containing the split files
frames_per_clip (int) – number of frames in a clip.
step_between_clips (int, optional) – number of frames between each clip.
fold (int, optional) – which fold to use. Should be between 1 and 3.
train (bool, optional) – if
True
, creates a dataset from the train split, otherwise from thetest
split.transform (callable, optional) – A function/transform that takes in a TxHxWxC video and returns a transformed version.
- Returns
A 3-tuple with the following entries:
video (Tensor[T, H, W, C]): the T video frames
audio(Tensor[K, L]): the audio frames, where K is the number of channels and L is the number of points
label (int): class of the video clip
- Return type
USPS¶
-
class
torchvision.datasets.
USPS
(root: str, train: bool = True, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, download: bool = False)[source]¶ USPS Dataset. The data-format is : [label [index:value ]*256 n] * num_lines, where
label
lies in[1, 10]
. The value for each pixel lies in[-1, 1]
. Here we transform thelabel
into[0, 9]
and make pixel values in[0, 255]
.- Parameters
root (string) – Root directory of dataset to store``USPS`` data files.
train (bool, optional) – If True, creates dataset from
usps.bz2
, otherwise fromusps.t.bz2
.transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g,
transforms.RandomCrop
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
VOC¶
-
class
torchvision.datasets.
VOCSegmentation
(root: str, year: str = '2012', image_set: str = 'train', download: bool = False, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, transforms: Optional[Callable] = None)[source]¶ Pascal VOC Segmentation Dataset.
- Parameters
root (string) – Root directory of the VOC Dataset.
year (string, optional) – The dataset year, supports years
"2007"
to"2012"
.image_set (string, optional) – Select the image_set to use,
"train"
,"trainval"
or"val"
. Ifyear=="2007"
, can also be"test"
.download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g,
transforms.RandomCrop
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
transforms (callable, optional) – A function/transform that takes input sample and its target as entry and returns a transformed version.
-
class
torchvision.datasets.
VOCDetection
(root: str, year: str = '2012', image_set: str = 'train', download: bool = False, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, transforms: Optional[Callable] = None)[source]¶ Pascal VOC Detection Dataset.
- Parameters
root (string) – Root directory of the VOC Dataset.
year (string, optional) – The dataset year, supports years
"2007"
to"2012"
.image_set (string, optional) – Select the image_set to use,
"train"
,"trainval"
or"val"
. Ifyear=="2007"
, can also be"test"
.download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. (default: alphabetic indexing of VOC’s 20 classes).
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g,
transforms.RandomCrop
target_transform (callable, required) – A function/transform that takes in the target and transforms it.
transforms (callable, optional) – A function/transform that takes input sample and its target as entry and returns a transformed version.
WIDERFace¶
-
class
torchvision.datasets.
WIDERFace
(root: str, split: str = 'train', transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, download: bool = False)[source]¶ WIDERFace Dataset.
- Parameters
root (string) –
Root directory where images and annotations are downloaded to. Expects the following folder structure if download=False:
<root> └── widerface ├── wider_face_split ('wider_face_split.zip' if compressed) ├── WIDER_train ('WIDER_train.zip' if compressed) ├── WIDER_val ('WIDER_val.zip' if compressed) └── WIDER_test ('WIDER_test.zip' if compressed)
split (string) – The dataset split to use. One of {
train
,val
,test
}. Defaults totrain
.transform (callable, optional) – A function/transform that takes in a PIL image and returns a transformed version. E.g,
transforms.RandomCrop
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
Base classes for custom datasets¶
-
class
torchvision.datasets.
DatasetFolder
(root: str, loader: Callable[[str], Any], extensions: Optional[Tuple[str, ...]] = None, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, is_valid_file: Optional[Callable[[str], bool]] = None)[source]¶ A generic data loader.
This default directory structure can be customized by overriding the
find_classes()
method.- Parameters
root (string) – Root directory path.
loader (callable) – A function to load a sample given its path.
extensions (tuple[string]) – A list of allowed extensions. both extensions and is_valid_file should not be passed.
transform (callable, optional) – A function/transform that takes in a sample and returns a transformed version. E.g,
transforms.RandomCrop
for images.target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
is_valid_file – A function that takes path of a file and check if the file is a valid file (used to check of corrupt files) both extensions and is_valid_file should not be passed.
-
find_classes
(directory: str) → Tuple[List[str], Dict[str, int]][source]¶ Find the class folders in a dataset structured as follows:
directory/ ├── class_x │ ├── xxx.ext │ ├── xxy.ext │ └── ... │ └── xxz.ext └── class_y ├── 123.ext ├── nsdf3.ext └── ... └── asd932_.ext
This method can be overridden to only consider a subset of classes, or to adapt to a different dataset directory structure.
- Parameters
directory (str) – Root directory path, corresponding to
self.root
- Raises
FileNotFoundError – If
dir
has no class folders.- Returns
List of all classes and dictionary mapping each class to an index.
- Return type
-
static
make_dataset
(directory: str, class_to_idx: Dict[str, int], extensions: Optional[Tuple[str, ...]] = None, is_valid_file: Optional[Callable[[str], bool]] = None) → List[Tuple[str, int]][source]¶ Generates a list of samples of a form (path_to_sample, class).
This can be overridden to e.g. read files from a compressed zip file instead of from the disk.
- Parameters
directory (str) – root dataset directory, corresponding to
self.root
.class_to_idx (Dict[str, int]) – Dictionary mapping class name to class index.
extensions (optional) – A list of allowed extensions. Either extensions or is_valid_file should be passed. Defaults to None.
is_valid_file (optional) – A function that takes path of a file and checks if the file is a valid file (used to check of corrupt files) both extensions and is_valid_file should not be passed. Defaults to None.
- Raises
ValueError – In case
class_to_idx
is empty.ValueError – In case
extensions
andis_valid_file
are None or both are not None.FileNotFoundError – In case no valid file was found for any class.
- Returns
samples of a form (path_to_sample, class)
- Return type
-
class
torchvision.datasets.
ImageFolder
(root: str, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, loader: Callable[[str], Any] = <function default_loader>, is_valid_file: Optional[Callable[[str], bool]] = None)[source]¶ A generic data loader where the images are arranged in this way by default:
root/dog/xxx.png root/dog/xxy.png root/dog/[...]/xxz.png root/cat/123.png root/cat/nsdf3.png root/cat/[...]/asd932_.png
This class inherits from
DatasetFolder
so the same methods can be overridden to customize the dataset.- Parameters
root (string) – Root directory path.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g,
transforms.RandomCrop
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
loader (callable, optional) – A function to load an image given its path.
is_valid_file – A function that takes path of an Image file and check if the file is a valid file (used to check of corrupt files)