Shortcuts

CocoCaptions

class torchvision.datasets.CocoCaptions(root: str, annFile: str, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, transforms: Optional[Callable] = None)[source]

MS Coco Captions Dataset.

It requires the COCO API to be installed.

Parameters
  • root (string) – Root directory where images are downloaded to.

  • annFile (string) – Path to json annotation file.

  • transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.PILToTensor

  • target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

  • transforms (callable, optional) – A function/transform that takes input sample and its target as entry and returns a transformed version.

Example

import torchvision.datasets as dset
import torchvision.transforms as transforms
cap = dset.CocoCaptions(root = 'dir where images are',
                        annFile = 'json annotation file',
                        transform=transforms.PILToTensor())

print('Number of samples: ', len(cap))
img, target = cap[3] # load 4th sample

print("Image Size: ", img.size())
print(target)

Output:

Number of samples: 82783
Image Size: (3L, 427L, 640L)
[u'A plane emitting smoke stream flying over a mountain.',
u'A plane darts across a bright blue sky behind a mountain covered in snow',
u'A plane leaves a contrail above the snowy mountain top.',
u'A mountain that has a plane flying overheard in the distance.',
u'A mountain view with a plume of smoke in the background']
Special-members

__getitem__(index: int)Tuple[Any, Any]
Parameters

index (int) – Index

Returns

Sample and meta data, optionally transformed by the respective transforms.

Return type

(Any)

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources