wrap_dataset_for_transforms_v2¶
- torchvision.datasets.wrap_dataset_for_transforms_v2(dataset)[source]¶
[BETA] Wrap a
torchvision.dataset
for usage withtorchvision.transforms.v2
.Warning
The wrap_dataset_for_transforms_v2 function is in Beta stage, and while we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: https://github.com/pytorch/vision/issues/6753, and you can also check out https://github.com/pytorch/vision/issues/7319 to learn more about the APIs that we suspect might involve future changes.
Example
>>> dataset = torchvision.datasets.CocoDetection(...) >>> dataset = wrap_dataset_for_transforms_v2(dataset)
Note
For now, only the most popular datasets are supported. Furthermore, the wrapper only supports dataset configurations that are fully supported by
torchvision.transforms.v2
. If you encounter an error prompting you to raise an issue totorchvision
for a dataset or configuration that you need, please do so.The dataset samples are wrapped according to the description below.
Special cases:
CocoDetection
: Instead of returning the target as list of dicts, the wrapper returns a dict of lists. In addition, the key-value-pairs"boxes"
(inXYXY
coordinate format),"masks"
and"labels"
are added and wrap the data in the correspondingtorchvision.datapoints
. The original keys are preserved.VOCDetection
: The key-value-pairs"boxes"
and"labels"
are added to the target and wrap the data in the correspondingtorchvision.datapoints
. The original keys are preserved.CelebA
: The target fortarget_type="bbox"
is converted to theXYXY
coordinate format and wrapped into aBoundingBox
datapoint.Kitti
: Instead returning the target as list of dictsthe wrapper returns a dict of lists. In addition, the key-value-pairs"boxes"
and"labels"
are added and wrap the data in the correspondingtorchvision.datapoints
. The original keys are preserved.OxfordIIITPet
: The target fortarget_type="segmentation"
is wrapped into aMask
datapoint.Cityscapes
: The target fortarget_type="semantic"
is wrapped into aMask
datapoint. The target fortarget_type="instance"
is replaced by a dictionary with the key-value-pairs"masks"
(asMask
datapoint) and"labels"
.WIDERFace
: The value for key"bbox"
in the target is converted toXYXY
coordinate format and wrapped into aBoundingBox
datapoint.
Image classification datasets
This wrapper is a no-op for image classification datasets, since they were already fully supported by
torchvision.transforms
and thus no change is needed fortorchvision.transforms.v2
.Segmentation datasets
Segmentation datasets, e.g.
VOCSegmentation
return a two-tuple ofPIL.Image.Image
’s. This wrapper leaves the image as is (first item), while wrapping the segmentation mask into aMask
(second item).Video classification datasets
Video classification datasets, e.g.
Kinetics
return a three-tuple containing atorch.Tensor
for the video and audio and aint
as label. This wrapper wraps the video into aVideo
while leaving the other items as is.Note
Only datasets constructed with
output_format="TCHW"
are supported, since the alternativeoutput_format="THWC"
is not supported bytorchvision.transforms.v2
.- Parameters:
dataset – the dataset instance to wrap for compatibility with transforms v2.
Examples using
wrap_dataset_for_transforms_v2
:Datapoints FAQTransforms v2: End-to-end object detection example
Transforms v2: End-to-end object detection example