MultiScaleRoIAlign¶

class torchvision.ops.MultiScaleRoIAlign(featmap_names: List[str], output_size: Union[int, Tuple[int], List[int]], sampling_ratio: int, *, canonical_scale: int = 224, canonical_level: int = 4)[source]¶

Multi-scale RoIAlign pooling, which is useful for detection with or without FPN.

It infers the scale of the pooling via the heuristics specified in eq. 1 of the Feature Pyramid Network paper. They keyword-only parameters canonical_scale and canonical_level correspond respectively to 224 and k0=4 in eq. 1, and have the following meaning: canonical_level is the target level of the pyramid from which to pool a region of interest with w x h = canonical_scale x canonical_scale.

Parameters:

featmap_names (List[str]) – the names of the feature maps that will be used for the pooling.
output_size (List[Tuple[int, int]] or List[int]) – output size for the pooled region
sampling_ratio (int) – sampling ratio for ROIAlign
canonical_scale (int, optional) – canonical_scale for LevelMapper
canonical_level (int, optional) – canonical_level for LevelMapper

Examples:

>>> m = torchvision.ops.MultiScaleRoIAlign(['feat1', 'feat3'], 3, 2)
>>> i = OrderedDict()
>>> i['feat1'] = torch.rand(1, 5, 64, 64)
>>> i['feat2'] = torch.rand(1, 5, 32, 32)  # this feature won't be used in the pooling
>>> i['feat3'] = torch.rand(1, 5, 16, 16)
>>> # create some random bounding boxes
>>> boxes = torch.rand(6, 4) * 256; boxes[:, 2:] += boxes[:, :2]
>>> # original image size, before computing the feature maps
>>> image_sizes = [(512, 512)]
>>> output = m(i, [boxes], image_sizes)
>>> print(output.shape)
>>> torch.Size([6, 5, 3, 3])

forward(x: Dict[str, Tensor], boxes: List[Tensor], image_shapes: List[Tuple[int, int]]) → Tensor[source]¶

Parameters:

x (OrderedDict[Tensor]) – feature maps for each level. They are assumed to have all the same number of channels, but they can have different sizes.
boxes (List[Tensor[N, 4]]) – boxes to be used to perform the pooling operation, in (x1, y1, x2, y2) format and in the image reference size, not the feature map reference. The coordinate must satisfy 0 <= x1 < x2 and 0 <= y1 < y2.
image_shapes (List[Tuple[height, width]]) – the sizes of each image before they have been fed to a CNN to obtain feature maps. This allows us to infer the scale factor for each one of the levels to be pooled.

Returns:

result (Tensor)

MultiScaleRoIAlign¶

Docs

Tutorials

Resources