retinanet_resnet50_fpn¶

torchvision.models.detection.
retinanet_resnet50_fpn
(*, weights: Optional[torchvision.models.detection.retinanet.RetinaNet_ResNet50_FPN_Weights] = None, progress: bool = True, num_classes: Optional[int] = None, weights_backbone: Optional[torchvision.models.resnet.ResNet50_Weights] = ResNet50_Weights.IMAGENET1K_V1, trainable_backbone_layers: Optional[int] = None, **kwargs: Any) → torchvision.models.detection.retinanet.RetinaNet[source]¶ Constructs a RetinaNet model with a ResNet50FPN backbone.
Warning
The detection module is in Beta stage, and backward compatibility is not guaranteed.
Reference: Focal Loss for Dense Object Detection.
The input to the model is expected to be a list of tensors, each of shape
[C, H, W]
, one for each image, and should be in01
range. Different images can have different sizes.The behavior of the model changes depending if it is in training or evaluation mode.
During training, the model expects both the input tensors, as well as a targets (list of dictionary), containing:
boxes (
FloatTensor[N, 4]
): the groundtruth boxes in[x1, y1, x2, y2]
format, with0 <= x1 < x2 <= W
and0 <= y1 < y2 <= H
.labels (
Int64Tensor[N]
): the class label for each groundtruth box
The model returns a
Dict[Tensor]
during training, containing the classification and regression losses.During inference, the model requires only the input tensors, and returns the postprocessed predictions as a
List[Dict[Tensor]]
, one for each input image. The fields of theDict
are as follows, whereN
is the number of detections:boxes (
FloatTensor[N, 4]
): the predicted boxes in[x1, y1, x2, y2]
format, with0 <= x1 < x2 <= W
and0 <= y1 < y2 <= H
.labels (
Int64Tensor[N]
): the predicted labels for each detectionscores (
Tensor[N]
): the scores of each detection
For more details on the output, you may refer to Instance segmentation models.
Example:
>>> model = torchvision.models.detection.retinanet_resnet50_fpn(weights=RetinaNet_ResNet50_FPN_Weights.DEFAULT) >>> model.eval() >>> x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)] >>> predictions = model(x)
 Parameters
weights (
RetinaNet_ResNet50_FPN_Weights
, optional) – The pretrained weights to use. SeeRetinaNet_ResNet50_FPN_Weights
below for more details, and possible values. By default, no pretrained weights are used.progress (bool) – If True, displays a progress bar of the download to stderr. Default is True.
num_classes (int, optional) – number of output classes of the model (including the background)
weights_backbone (
ResNet50_Weights
, optional) – The pretrained weights for the backbone.trainable_backbone_layers (int, optional) – number of trainable (not frozen) layers starting from final block. Valid values are between 0 and 5, with 5 meaning all backbone layers are trainable. If
None
is passed (the default) this value is set to 3.**kwargs – parameters passed to the
torchvision.models.detection.RetinaNet
base class. Please refer to the source code for more details about this class.

class
torchvision.models.detection.
RetinaNet_ResNet50_FPN_Weights
(value)[source]¶ The model builder above accepts the following values as the
weights
parameter.RetinaNet_ResNet50_FPN_Weights.DEFAULT
is equivalent toRetinaNet_ResNet50_FPN_Weights.COCO_V1
. You can also use strings, e.g.weights='DEFAULT'
orweights='COCO_V1'
.RetinaNet_ResNet50_FPN_Weights.COCO_V1:
These weights were produced by following a similar training recipe as on the paper. Also available as
RetinaNet_ResNet50_FPN_Weights.DEFAULT
.box_map (on COCOval2017)
36.4
categories
__background__, person, bicycle, … (88 omitted)
min_size
height=1, width=1
num_params
34014999
recipe
The inference transforms are available at
RetinaNet_ResNet50_FPN_Weights.COCO_V1.transforms
and perform the following preprocessing operations: AcceptsPIL.Image
, batched(B, C, H, W)
and single(C, H, W)
imagetorch.Tensor
objects. The images are rescaled to[0.0, 1.0]
.