Operators¶

torchvision.ops implements operators, losses and layers that are specific for Computer Vision.

Note

All operators have native support for TorchScript.

Detection and Segmentation Operators¶

The below operators perform pre-processing as well as post-processing required in object detection and segmentation models.

`batched_nms`(boxes, scores, idxs, iou_threshold)	Performs non-maximum suppression in a batched fashion.
`masks_to_boxes`(masks)	Compute the bounding boxes around the provided masks.
`nms`(boxes, scores, iou_threshold)	Performs non-maximum suppression (NMS) on the boxes according to their intersection-over-union (IoU).
`roi_align`(input, boxes, output_size[, ...])	Performs Region of Interest (RoI) Align operator with average pooling, as described in Mask R-CNN.
`roi_pool`(input, boxes, output_size[, ...])	Performs Region of Interest (RoI) Pool operator described in Fast R-CNN
`ps_roi_align`(input, boxes, output_size[, ...])	Performs Position-Sensitive Region of Interest (RoI) Align operator mentioned in Light-Head R-CNN.
`ps_roi_pool`(input, boxes, output_size[, ...])	Performs Position-Sensitive Region of Interest (RoI) Pool operator described in R-FCN

`FeaturePyramidNetwork`(in_channels_list, ...)	Module that adds a FPN from on top of a set of feature maps.
`MultiScaleRoIAlign`(featmap_names, ...[, ...])	Multi-scale RoIAlign pooling, which is useful for detection with or without FPN.
`RoIAlign`(output_size, spatial_scale, ...[, ...])	See `roi_align()`.
`RoIPool`(output_size, spatial_scale)	See `roi_pool()`.
`PSRoIAlign`(output_size, spatial_scale, ...)	See `ps_roi_align()`.
`PSRoIPool`(output_size, spatial_scale)	See `ps_roi_pool()`.

These utility functions perform various operations on bounding boxes.

`box_area`(boxes)	Computes the area of a set of bounding boxes, which are specified by their (x1, y1, x2, y2) coordinates.
`box_convert`(boxes, in_fmt, out_fmt)	Converts `torch.Tensor` boxes from a given `in_fmt` to `out_fmt`.
`box_iou`(boxes1, boxes2)	Return intersection-over-union (Jaccard index) between two sets of boxes.
`clip_boxes_to_image`(boxes, size)	Clip boxes so that they lie inside an image of size `size`.
`complete_box_iou`(boxes1, boxes2[, eps])	Return complete intersection-over-union (Jaccard index) between two sets of boxes.
`distance_box_iou`(boxes1, boxes2[, eps])	Return distance intersection-over-union (Jaccard index) between two sets of boxes.
`generalized_box_iou`(boxes1, boxes2)	Return generalized intersection-over-union (Jaccard index) between two sets of boxes.
`remove_small_boxes`(boxes, min_size)	Remove every box from `boxes` which contains at least one side length that is smaller than `min_size`.

The following vision-specific loss functions are implemented:

`complete_box_iou_loss`(boxes1, boxes2[, ...])	Gradient-friendly IoU loss with an additional penalty that is non-zero when the boxes do not overlap.
`distance_box_iou_loss`(boxes1, boxes2[, ...])	Gradient-friendly IoU loss with an additional penalty that is non-zero when the distance between boxes' centers isn't zero.
`generalized_box_iou_loss`(boxes1, boxes2[, ...])	Gradient-friendly IoU loss with an additional penalty that is non-zero when the boxes do not overlap and scales with the size of their smallest enclosing box.
`sigmoid_focal_loss`(inputs, targets[, alpha, ...])	Loss used in RetinaNet for dense detection: https://arxiv.org/abs/1708.02002.

TorchVision provides commonly used building blocks as layers:

`Conv2dNormActivation`(in_channels, ...)	Configurable block used for Convolution2d-Normalization-Activation blocks.
`Conv3dNormActivation`(in_channels, ...)	Configurable block used for Convolution3d-Normalization-Activation blocks.
`DeformConv2d`(in_channels, out_channels, ...)	See `deform_conv2d()`.
`DropBlock2d`(p, block_size[, inplace, eps])	See `drop_block2d()`.
`DropBlock3d`(p, block_size[, inplace, eps])	See `drop_block3d()`.
`FrozenBatchNorm2d`(num_features[, eps])	BatchNorm2d where the batch statistics and the affine parameters are fixed
`MLP`(in_channels, hidden_channels, ...)	This block implements the multi-layer perceptron (MLP) module.
`Permute`(dims)	This module returns a view of the tensor input with its dimensions permuted.
`SqueezeExcitation`(input_channels, ...)	This block implements the Squeeze-and-Excitation block from https://arxiv.org/abs/1709.01507 (see Fig.
`StochasticDepth`(p, mode)	See `stochastic_depth()`.

`deform_conv2d`(input, offset, weight[, bias, ...])	Performs Deformable Convolution v2, described in Deformable ConvNets v2: More Deformable, Better Results if `mask` is not `None` and Performs Deformable Convolution, described in Deformable Convolutional Networks if `mask` is `None`.
`drop_block2d`(input, p, block_size[, ...])	Implements DropBlock2d from "DropBlock: A regularization method for convolutional networks" <https://arxiv.org/abs/1810.12890>.
`drop_block3d`(input, p, block_size[, ...])	Implements DropBlock3d from "DropBlock: A regularization method for convolutional networks" <https://arxiv.org/abs/1810.12890>.
`stochastic_depth`(input, p, mode[, training])	Implements the Stochastic Depth from "Deep Networks with Stochastic Depth" used for randomly dropping residual branches of residual architectures.