Shortcuts

Operators

torchvision.ops implements operators, losses and layers that are specific for Computer Vision.

Note

All operators have native support for TorchScript.

Detection and Segmentation Operators

The below operators perform pre-processing as well as post-processing required in object detection and segmentation models.

batched_nms(boxes, scores, idxs, iou_threshold)

Performs non-maximum suppression in a batched fashion.

masks_to_boxes(masks)

Compute the bounding boxes around the provided masks.

nms(boxes, scores, iou_threshold)

Performs non-maximum suppression (NMS) on the boxes according to their intersection-over-union (IoU).

roi_align(input, boxes, output_size[, ...])

Performs Region of Interest (RoI) Align operator with average pooling, as described in Mask R-CNN.

roi_pool(input, boxes, output_size[, ...])

Performs Region of Interest (RoI) Pool operator described in Fast R-CNN

ps_roi_align(input, boxes, output_size[, ...])

Performs Position-Sensitive Region of Interest (RoI) Align operator mentioned in Light-Head R-CNN.

ps_roi_pool(input, boxes, output_size[, ...])

Performs Position-Sensitive Region of Interest (RoI) Pool operator described in R-FCN

FeaturePyramidNetwork(in_channels_list, ...)

Module that adds a FPN from on top of a set of feature maps.

MultiScaleRoIAlign(featmap_names, ...[, ...])

Multi-scale RoIAlign pooling, which is useful for detection with or without FPN.

RoIAlign(output_size, spatial_scale, ...[, ...])

See roi_align().

RoIPool(output_size, spatial_scale)

See roi_pool().

PSRoIAlign(output_size, spatial_scale, ...)

See ps_roi_align().

PSRoIPool(output_size, spatial_scale)

See ps_roi_pool().

Box Operators

These utility functions perform various operations on bounding boxes.

box_area(boxes)

Computes the area of a set of bounding boxes, which are specified by their (x1, y1, x2, y2) coordinates.

box_convert(boxes, in_fmt, out_fmt)

Converts torch.Tensor boxes from a given in_fmt to out_fmt.

box_iou(boxes1, boxes2)

Return intersection-over-union (Jaccard index) between two sets of boxes.

clip_boxes_to_image(boxes, size)

Clip boxes so that they lie inside an image of size size.

complete_box_iou(boxes1, boxes2[, eps])

Return complete intersection-over-union (Jaccard index) between two sets of boxes.

distance_box_iou(boxes1, boxes2[, eps])

Return distance intersection-over-union (Jaccard index) between two sets of boxes.

generalized_box_iou(boxes1, boxes2)

Return generalized intersection-over-union (Jaccard index) between two sets of boxes.

remove_small_boxes(boxes, min_size)

Remove every box from boxes which contains at least one side length that is smaller than min_size.

Losses

The following vision-specific loss functions are implemented:

complete_box_iou_loss(boxes1, boxes2[, ...])

Gradient-friendly IoU loss with an additional penalty that is non-zero when the boxes do not overlap.

distance_box_iou_loss(boxes1, boxes2[, ...])

Gradient-friendly IoU loss with an additional penalty that is non-zero when the distance between boxes' centers isn't zero.

generalized_box_iou_loss(boxes1, boxes2[, ...])

Gradient-friendly IoU loss with an additional penalty that is non-zero when the boxes do not overlap and scales with the size of their smallest enclosing box.

sigmoid_focal_loss(inputs, targets[, alpha, ...])

Loss used in RetinaNet for dense detection: https://arxiv.org/abs/1708.02002.

Layers

TorchVision provides commonly used building blocks as layers:

Conv2dNormActivation(in_channels, ...)

Configurable block used for Convolution2d-Normalization-Activation blocks.

Conv3dNormActivation(in_channels, ...)

Configurable block used for Convolution3d-Normalization-Activation blocks.

DeformConv2d(in_channels, out_channels, ...)

See deform_conv2d().

DropBlock2d(p, block_size[, inplace, eps])

See drop_block2d().

DropBlock3d(p, block_size[, inplace, eps])

See drop_block3d().

FrozenBatchNorm2d(num_features[, eps])

BatchNorm2d where the batch statistics and the affine parameters are fixed

MLP(in_channels, hidden_channels, ...)

This block implements the multi-layer perceptron (MLP) module.

Permute(dims)

This module returns a view of the tensor input with its dimensions permuted.

SqueezeExcitation(input_channels, ...)

This block implements the Squeeze-and-Excitation block from https://arxiv.org/abs/1709.01507 (see Fig.

StochasticDepth(p, mode)

See stochastic_depth().

deform_conv2d(input, offset, weight[, bias, ...])

Performs Deformable Convolution v2, described in Deformable ConvNets v2: More Deformable, Better Results if mask is not None and Performs Deformable Convolution, described in Deformable Convolutional Networks if mask is None.

drop_block2d(input, p, block_size[, ...])

Implements DropBlock2d from "DropBlock: A regularization method for convolutional networks" <https://arxiv.org/abs/1810.12890>.

drop_block3d(input, p, block_size[, ...])

Implements DropBlock3d from "DropBlock: A regularization method for convolutional networks" <https://arxiv.org/abs/1810.12890>.

stochastic_depth(input, p, mode[, training])

Implements the Stochastic Depth from "Deep Networks with Stochastic Depth" used for randomly dropping residual branches of residual architectures.

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources