wrap_dataset_for_transforms_v2

torchvision.datasets.wrap_dataset_for_transforms_v2(dataset, target_keys=None)[source]

Wrap a torchvision.dataset for usage with torchvision.transforms.v2.

Example

>>> dataset = torchvision.datasets.CocoDetection(...)
>>> dataset = wrap_dataset_for_transforms_v2(dataset)

Note

For now, only the most popular datasets are supported. Furthermore, the wrapper only supports dataset configurations that are fully supported by torchvision.transforms.v2. If you encounter an error prompting you to raise an issue to torchvision for a dataset or configuration that you need, please do so.

The dataset samples are wrapped according to the description below.

Special cases:

CocoDetection: Instead of returning the target as list of dicts, the wrapper returns a dict of lists. In addition, the key-value-pairs "boxes" (in XYXY coordinate format), "masks" and "labels" are added and wrap the data in the corresponding torchvision.tv_tensors. The original keys are preserved. If target_keys is omitted, returns only the values for the "image_id", "boxes", and "labels".

VOCDetection: The key-value-pairs "boxes" and "labels" are added to the target and wrap the data in the corresponding torchvision.tv_tensors. The original keys are preserved. If target_keys is omitted, returns only the values for the "boxes" and "labels".

CelebA: The target for target_type="bbox" is converted to the XYXY coordinate format and wrapped into a BoundingBoxes tv_tensor.

Kitti: Instead returning the target as list of dicts, the wrapper returns a dict of lists. In addition, the key-value-pairs "boxes" and "labels" are added and wrap the data in the corresponding torchvision.tv_tensors. The original keys are preserved. If target_keys is omitted, returns only the values for the "boxes" and "labels".

OxfordIIITPet: The target for target_type="segmentation" is wrapped into a Mask tv_tensor.

Cityscapes: The target for target_type="semantic" is wrapped into a Mask tv_tensor. The target for target_type="instance" is replaced by a dictionary with the key-value-pairs "masks" (as Mask tv_tensor) and "labels".

WIDERFace: The value for key "bbox" in the target is converted to XYXY coordinate format and wrapped into a BoundingBoxes tv_tensor.

Image classification datasets

This wrapper is a no-op for image classification datasets, since they were already fully supported by torchvision.transforms and thus no change is needed for torchvision.transforms.v2.

Segmentation datasets

Segmentation datasets, e.g. VOCSegmentation, return a two-tuple of PIL.Image.Image’s. This wrapper leaves the image as is (first item), while wrapping the segmentation mask into a Mask (second item).

Video classification datasets

Video classification datasets, e.g. Kinetics, return a three-tuple containing a torch.Tensor for the video and audio and a int as label. This wrapper wraps the video into a Video while leaving the other items as is.

Note

Only datasets constructed with output_format="TCHW" are supported, since the alternative output_format="THWC" is not supported by torchvision.transforms.v2.

Parameters:

dataset – the dataset instance to wrap for compatibility with transforms v2.
target_keys – Target keys to return in case the target is a dictionary. If None (default), selected keys are specific to the dataset. If "all", returns the full target. Can also be a collection of strings for fine grained access. Currently only supported for CocoDetection, VOCDetection, Kitti, and WIDERFace. See above for details.

Examples using wrap_dataset_for_transforms_v2:

Transforms v2: End-to-end object detection/segmentation example

Getting started with transforms v2

wrap_dataset_for_transforms_v2

Docs

Tutorials

Resources