# torchvision.ops¶

torchvision.ops implements operators that are specific for Computer Vision.

Note

Those operators currently do not support TorchScript.

torchvision.ops.nms(boxes, scores, iou_threshold)[source]

Performs non-maximum suppression (NMS) on the boxes according to their intersection-over-union (IoU).

NMS iteratively removes lower scoring boxes which have an IoU greater than iou_threshold with another (higher scoring) box.

Parameters
• boxes (Tensor[N, 4])) – boxes to perform NMS on. They are expected to be in (x1, y1, x2, y2) format

• scores (Tensor[N]) – scores for each one of the boxes

• iou_threshold (float) – discards all overlapping boxes with IoU > iou_threshold

Returns

keep – int64 tensor with the indices of the elements that have been kept by NMS, sorted in decreasing order of scores

Return type

Tensor

torchvision.ops.roi_align(input, boxes, output_size, spatial_scale=1.0, sampling_ratio=-1, aligned=False)[source]

Performs Region of Interest (RoI) Align operator described in Mask R-CNN

Parameters
• input (Tensor[N, C, H, W]) – input tensor

• boxes (Tensor[K, 5] or List[Tensor[L, 4]]) – the box coordinates in (x1, y1, x2, y2) format where the regions will be taken from. If a single Tensor is passed, then the first column should contain the batch index. If a list of Tensors is passed, then each Tensor will correspond to the boxes for an element i in a batch

• output_size (int or Tuple[int, int]) – the size of the output after the cropping is performed, as (height, width)

• spatial_scale (float) – a scaling factor that maps the input coordinates to the box coordinates. Default: 1.0

• sampling_ratio (int) – number of sampling points in the interpolation grid used to compute the output value of each pooled output bin. If > 0, then exactly sampling_ratio x sampling_ratio grid points are used. If <= 0, then an adaptive number of grid points are used (computed as ceil(roi_width / pooled_w), and likewise for height). Default: -1

• aligned (bool) – If False, use the legacy implementation. If True, pixel shift it by -0.5 for align more perfectly about two neighboring pixel indices. This version in Detectron2

Returns

output (Tensor[K, C, output_size[0], output_size[1]])

torchvision.ops.roi_pool(input, boxes, output_size, spatial_scale=1.0)[source]

Performs Region of Interest (RoI) Pool operator described in Fast R-CNN

Parameters
• input (Tensor[N, C, H, W]) – input tensor

• boxes (Tensor[K, 5] or List[Tensor[L, 4]]) – the box coordinates in (x1, y1, x2, y2) format where the regions will be taken from. If a single Tensor is passed, then the first column should contain the batch index. If a list of Tensors is passed, then each Tensor will correspond to the boxes for an element i in a batch

• output_size (int or Tuple[int, int]) – the size of the output after the cropping is performed, as (height, width)

• spatial_scale (float) – a scaling factor that maps the input coordinates to the box coordinates. Default: 1.0

Returns

output (Tensor[K, C, output_size[0], output_size[1]])

class torchvision.ops.RoIAlign(output_size, spatial_scale, sampling_ratio, aligned=False)[source]

See roi_align

class torchvision.ops.RoIPool(output_size, spatial_scale)[source]

See roi_pool