torchvision.ops implements operators that are specific for Computer Vision.


All operators have native support for TorchScript.

batched_nms(boxes, scores, idxs, iou_threshold)

Performs non-maximum suppression in a batched fashion.


Computes the area of a set of bounding boxes, which are specified by their (x1, y1, x2, y2) coordinates.

box_convert(boxes, in_fmt, out_fmt)

Converts boxes from given in_fmt to out_fmt.

box_iou(boxes1, boxes2)

Return intersection-over-union (Jaccard index) between two sets of boxes.

clip_boxes_to_image(boxes, size)

Clip boxes so that they lie inside an image of size size.

deform_conv2d(input, offset, weight[, bias, …])

Performs Deformable Convolution v2, described in Deformable ConvNets v2: More Deformable, Better Results if mask is not None and Performs Deformable Convolution, described in Deformable Convolutional Networks if mask is None.

generalized_box_iou(boxes1, boxes2)

Return generalized intersection-over-union (Jaccard index) between two sets of boxes.


Compute the bounding boxes around the provided masks.

nms(boxes, scores, iou_threshold)

Performs non-maximum suppression (NMS) on the boxes according to their intersection-over-union (IoU).

ps_roi_align(input, boxes, output_size[, …])

Performs Position-Sensitive Region of Interest (RoI) Align operator mentioned in Light-Head R-CNN.

ps_roi_pool(input, boxes, output_size[, …])

Performs Position-Sensitive Region of Interest (RoI) Pool operator described in R-FCN

remove_small_boxes(boxes, min_size)

Remove boxes which contains at least one side smaller than min_size.

roi_align(input, boxes, output_size[, …])

Performs Region of Interest (RoI) Align operator with average pooling, as described in Mask R-CNN.

roi_pool(input, boxes, output_size[, …])

Performs Region of Interest (RoI) Pool operator described in Fast R-CNN

sigmoid_focal_loss(inputs, targets[, alpha, …])

Original implementation from .

stochastic_depth(input, p, mode[, training])

Implements the Stochastic Depth from “Deep Networks with Stochastic Depth” used for randomly dropping residual branches of residual architectures.

RoIAlign(output_size, spatial_scale, …)

See roi_align().

PSRoIAlign(output_size, spatial_scale, …)

See ps_roi_align().

RoIPool(output_size, spatial_scale)

See roi_pool().

PSRoIPool(output_size, spatial_scale)

See ps_roi_pool().

DeformConv2d(in_channels, out_channels, …)

See deform_conv2d().

MultiScaleRoIAlign(featmap_names, …)

Multi-scale RoIAlign pooling, which is useful for detection with or without FPN.

FeaturePyramidNetwork(in_channels_list, …)

Module that adds a FPN from on top of a set of feature maps.

StochasticDepth(p, mode)

See stochastic_depth().

FrozenBatchNorm2d(num_features, eps, n)

BatchNorm2d where the batch statistics and the affine parameters are fixed

ConvNormActivation(in_channels, …)

Configurable block used for Convolution-Normalzation-Activation blocks.

SqueezeExcitation(input_channels, …)

This block implements the Squeeze-and-Excitation block from (see Fig.


Access comprehensive developer documentation for PyTorch

View Docs


Get in-depth tutorials for beginners and advanced developers

View Tutorials


Find development resources and get your questions answered

View Resources