keypointrcnn_resnet50_fpn¶
- torchvision.models.detection.keypointrcnn_resnet50_fpn(*, weights: Optional[KeypointRCNN_ResNet50_FPN_Weights] = None, progress: bool = True, num_classes: Optional[int] = None, num_keypoints: Optional[int] = None, weights_backbone: Optional[ResNet50_Weights] = ResNet50_Weights.IMAGENET1K_V1, trainable_backbone_layers: Optional[int] = None, **kwargs: Any) KeypointRCNN [source]¶
Constructs a Keypoint R-CNN model with a ResNet-50-FPN backbone.
Warning
The detection module is in Beta stage, and backward compatibility is not guaranteed.
Reference: Mask R-CNN.
The input to the model is expected to be a list of tensors, each of shape
[C, H, W]
, one for each image, and should be in0-1
range. Different images can have different sizes.The behavior of the model changes depending on if it is in training or evaluation mode.
During training, the model expects both the input tensors and targets (list of dictionary), containing:
boxes (
FloatTensor[N, 4]
): the ground-truth boxes in[x1, y1, x2, y2]
format, with0 <= x1 < x2 <= W
and0 <= y1 < y2 <= H
.labels (
Int64Tensor[N]
): the class label for each ground-truth boxkeypoints (
FloatTensor[N, K, 3]
): theK
keypoints location for each of theN
instances, in the format[x, y, visibility]
, wherevisibility=0
means that the keypoint is not visible.
The model returns a
Dict[Tensor]
during training, containing the classification and regression losses for both the RPN and the R-CNN, and the keypoint loss.During inference, the model requires only the input tensors, and returns the post-processed predictions as a
List[Dict[Tensor]]
, one for each input image. The fields of theDict
are as follows, whereN
is the number of detected instances:boxes (
FloatTensor[N, 4]
): the predicted boxes in[x1, y1, x2, y2]
format, with0 <= x1 < x2 <= W
and0 <= y1 < y2 <= H
.labels (
Int64Tensor[N]
): the predicted labels for each instancescores (
Tensor[N]
): the scores or each instancekeypoints (
FloatTensor[N, K, 3]
): the locations of the predicted keypoints, in[x, y, v]
format.
For more details on the output, you may refer to Instance segmentation models.
Keypoint R-CNN is exportable to ONNX for a fixed batch size with inputs images of fixed size.
Example:
>>> model = torchvision.models.detection.keypointrcnn_resnet50_fpn(weights=KeypointRCNN_ResNet50_FPN_Weights.DEFAULT) >>> model.eval() >>> x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)] >>> predictions = model(x) >>> >>> # optionally, if you want to export the model to ONNX: >>> torch.onnx.export(model, x, "keypoint_rcnn.onnx", opset_version = 11)
- Parameters:
weights (
KeypointRCNN_ResNet50_FPN_Weights
, optional) – The pretrained weights to use. SeeKeypointRCNN_ResNet50_FPN_Weights
below for more details, and possible values. By default, no pre-trained weights are used.progress (bool) – If True, displays a progress bar of the download to stderr
num_classes (int, optional) – number of output classes of the model (including the background)
num_keypoints (int, optional) – number of keypoints
weights_backbone (
ResNet50_Weights
, optional) – The pretrained weights for the backbone.trainable_backbone_layers (int, optional) – number of trainable (not frozen) layers starting from final block. Valid values are between 0 and 5, with 5 meaning all backbone layers are trainable. If
None
is passed (the default) this value is set to 3.
- class torchvision.models.detection.KeypointRCNN_ResNet50_FPN_Weights(value)[source]¶
The model builder above accepts the following values as the
weights
parameter.KeypointRCNN_ResNet50_FPN_Weights.DEFAULT
is equivalent toKeypointRCNN_ResNet50_FPN_Weights.COCO_V1
. You can also use strings, e.g.weights='DEFAULT'
orweights='COCO_LEGACY'
.KeypointRCNN_ResNet50_FPN_Weights.COCO_LEGACY:
These weights were produced by following a similar training recipe as on the paper but use a checkpoint from an early epoch.
box_map (on COCO-val2017)
50.6
kp_map (on COCO-val2017)
61.1
categories
no person, person
keypoint_names
nose, left_eye, right_eye, … (14 omitted)
min_size
height=1, width=1
num_params
59137258
recipe
GFLOPS
133.92
File size
226.1 MB
The inference transforms are available at
KeypointRCNN_ResNet50_FPN_Weights.COCO_LEGACY.transforms
and perform the following preprocessing operations: AcceptsPIL.Image
, batched(B, C, H, W)
and single(C, H, W)
imagetorch.Tensor
objects. The images are rescaled to[0.0, 1.0]
.KeypointRCNN_ResNet50_FPN_Weights.COCO_V1:
These weights were produced by following a similar training recipe as on the paper. Also available as
KeypointRCNN_ResNet50_FPN_Weights.DEFAULT
.box_map (on COCO-val2017)
54.6
kp_map (on COCO-val2017)
65.0
categories
no person, person
keypoint_names
nose, left_eye, right_eye, … (14 omitted)
min_size
height=1, width=1
num_params
59137258
recipe
GFLOPS
137.42
File size
226.1 MB
The inference transforms are available at
KeypointRCNN_ResNet50_FPN_Weights.COCO_V1.transforms
and perform the following preprocessing operations: AcceptsPIL.Image
, batched(B, C, H, W)
and single(C, H, W)
imagetorch.Tensor
objects. The images are rescaled to[0.0, 1.0]
.
Examples using
keypointrcnn_resnet50_fpn
:Visualization utilities