MultiScaleRoIAlign¶

class torchvision.ops.MultiScaleRoIAlign(featmap_names: list[str], output_size: Union[int, tuple[int], list[int]], sampling_ratio: int, *, canonical_scale: int = 224, canonical_level: int = 4)[原始碼]¶

Multi-scale RoIAlign 池化，這對於檢測（無論是否使用 FPN）都很有用。

它透過 Feature Pyramid Network 論文的第 1 式中指定的啟發式方法推斷池化尺度。關鍵字引數 canonical_scale 和 canonical_level 分別對應於第 1 式中的 224 和 k0=4，它們的含義如下：canonical_level 是金字塔的目標層，從中池化一個具有 w x h = canonical_scale x canonical_scale 的興趣區域。

引數:

featmap_names (List[str]) – 將用於池化的特徵圖的名稱。
output_size (List[Tuple[int, int]] or List[int]) – 池化區域的輸出大小
sampling_ratio (int) – ROIAlign 的取樣比例
canonical_scale (int, 可選) – LevelMapper 的 canonical_scale
canonical_level (int, 可選) – LevelMapper 的 canonical_level

示例

>>> m = torchvision.ops.MultiScaleRoIAlign(['feat1', 'feat3'], 3, 2)
>>> i = OrderedDict()
>>> i['feat1'] = torch.rand(1, 5, 64, 64)
>>> i['feat2'] = torch.rand(1, 5, 32, 32)  # this feature won't be used in the pooling
>>> i['feat3'] = torch.rand(1, 5, 16, 16)
>>> # create some random bounding boxes
>>> boxes = torch.rand(6, 4) * 256; boxes[:, 2:] += boxes[:, :2]
>>> # original image size, before computing the feature maps
>>> image_sizes = [(512, 512)]
>>> output = m(i, [boxes], image_sizes)
>>> print(output.shape)
>>> torch.Size([6, 5, 3, 3])

forward(x: dict[str, torch.Tensor], boxes: list[torch.Tensor], image_shapes: list[tuple[int, int]]) → Tensor[原始碼]¶

引數:

x (OrderedDict[Tensor]) – 每個層的特徵圖。假定它們具有相同的通道數，但尺寸可以不同。
boxes (List[Tensor[N, 4]]) – 用於執行池化操作的邊界框，格式為 (x1, y1, x2, y2)，以影像的實際尺寸為參照，而不是特徵圖的尺寸。座標必須滿足 0 <= x1 < x2 和 0 <= y1 < y2。
image_shapes (List[Tuple[height, width]]) – 輸入到 CNN 以獲得特徵圖之前的每個影像的大小。這允許我們推斷每個池化層的尺度因子。

返回:

result (Tensor)

MultiScaleRoIAlign¶

文件

教程

資源