class torchvision.ops.MultiScaleRoIAlign(featmap_names: List[str], output_size: Union[int, Tuple[int], List[int]], sampling_ratio: int, *, canonical_scale: int = 224, canonical_level: int = 4)[source]

Multi-scale RoIAlign pooling, which is useful for detection with or without FPN.

It infers the scale of the pooling via the heuristics specified in eq. 1 of the Feature Pyramid Network paper. They keyword-only parameters canonical_scale and canonical_level correspond respectively to 224 and k0=4 in eq. 1, and have the following meaning: canonical_level is the target level of the pyramid from which to pool a region of interest with w x h = canonical_scale x canonical_scale.

  • featmap_names (List[str]) – the names of the feature maps that will be used for the pooling.

  • output_size (List[Tuple[int, int]] or List[int]) – output size for the pooled region

  • sampling_ratio (int) – sampling ratio for ROIAlign

  • canonical_scale (int, optional) – canonical_scale for LevelMapper

  • canonical_level (int, optional) – canonical_level for LevelMapper


>>> m = torchvision.ops.MultiScaleRoIAlign(['feat1', 'feat3'], 3, 2)
>>> i = OrderedDict()
>>> i['feat1'] = torch.rand(1, 5, 64, 64)
>>> i['feat2'] = torch.rand(1, 5, 32, 32)  # this feature won't be used in the pooling
>>> i['feat3'] = torch.rand(1, 5, 16, 16)
>>> # create some random bounding boxes
>>> boxes = torch.rand(6, 4) * 256; boxes[:, 2:] += boxes[:, :2]
>>> # original image size, before computing the feature maps
>>> image_sizes = [(512, 512)]
>>> output = m(i, [boxes], image_sizes)
>>> print(output.shape)
>>> torch.Size([6, 5, 3, 3])
forward(x: Dict[str, Tensor], boxes: List[Tensor], image_shapes: List[Tuple[int, int]]) Tensor[source]
  • x (OrderedDict[Tensor]) – feature maps for each level. They are assumed to have all the same number of channels, but they can have different sizes.

  • boxes (List[Tensor[N, 4]]) – boxes to be used to perform the pooling operation, in (x1, y1, x2, y2) format and in the image reference size, not the feature map reference. The coordinate must satisfy 0 <= x1 < x2 and 0 <= y1 < y2.

  • image_shapes (List[Tuple[height, width]]) – the sizes of each image before they have been fed to a CNN to obtain feature maps. This allows us to infer the scale factor for each one of the levels to be pooled.


result (Tensor)


Access comprehensive developer documentation for PyTorch

View Docs


Get in-depth tutorials for beginners and advanced developers

View Tutorials


Find development resources and get your questions answered

View Resources