李沐动手学习深度学习锚框部分代码解析-CSDN博客

本文链接：https://blog.csdn.net/weixin_44708624/article/details/138624985

这里只是对代码的解析，我在写这个解析的时候并没有看后面的内容，只能大概猜一下可能是要干嘛的

首先是import相关工具，这里使用pytorch

%matplotlib inline
import torch
from d2l import torch as d2l

torch.set_printoptions(2)  # 精简输出精度

1.生成锚框

接下来是第一个难点，这个代码生啃确实得整理一下，不然很多细节都不知道。
大家可以参考https://zh-v2.d2l.ai/chapter_computer-vision/anchor.html#subsec-predicting-bounding-boxes-nms
的计算公式，其实沐神的公式没啥问题，归一化之后的结果就是下面：
很多人不明白为什么w归一化之后要乘以一个in_height/in_width，假如没有这个的话
最后的锚框宽为w * in_width, 高为 h * in_height, 这里面发现只有高能对上
如果w = w * in_height/in_width， author_w = (w*in_height/in_width) * in_width = w * in_height
这样是不是发现最后的锚框宽高是不是就满足归一化的成比例关系了，此时r就是锚框的宽高比
不过这个地方，具体问题具体分析吧，我觉得没有 * in_height/in_width也无所谓，毕竟这样也改变了面积。
归根结底是在准确的锚框都是训练出来的，最后都是会把物体框住

#@save
def multibox_prior(data, sizes, ratios):
    """生成以每个像素为中心具有不同形状的锚框"""
    in_height, in_width = data.shape[-2:]
    device, num_sizes, num_ratios = data.device, len(sizes), len(ratios)
    boxes_per_pixel = (num_sizes + num_ratios - 1)
    size_tensor = torch.tensor(sizes, device=device)
    ratio_tensor = torch.tensor(ratios, device=device)

    # 为了将锚点移动到像素的中心，需要设置偏移量。
    # 因为一个像素的高为1且宽为1，我们选择偏移我们的中心0.5
    offset_h, offset_w = 0.5, 0.5
    steps_h = 1.0 / in_height  # 在y轴上缩放步长
    steps_w = 1.0 / in_width  # 在x轴上缩放步长

    # 生成锚框的所有中心点
    # shift_y, shift_x都是笛卡尔坐标系下的值
    # 例如有四个点(1, 0), (2, 0), (1, 1), (2, 1)
    # 输出为 [0, 0], [1, 1] 和  [1, 1], [2, 2] / reshape(-1)后也是中心点的总个数
    center_h = (torch.arange(in_height, device=device) + offset_h) * steps_h
    center_w = (torch.arange(in_width, device=device) + offset_w) * steps_w
    shift_y, shift_x = torch.meshgrid(center_h, center_w, indexing='ij')
    shift_y, shift_x = shift_y.reshape(-1), shift_x.reshape(-1)

    # 生成“boxes_per_pixel”个高和宽，
    # 之后用于创建锚框的四角坐标(xmin,xmax,ymin,ymax)
    w = torch.cat((size_tensor * torch.sqrt(ratio_tensor[0]),
                   sizes[0] * torch.sqrt(ratio_tensor[1:])))\
                   * in_height / in_width  # 处理矩形输入，主要是保证w/h是归一化用的系数
    h = torch.cat((size_tensor / torch.sqrt(ratio_tensor[0]),
                   sizes[0] / torch.sqrt(ratio_tensor[1:])))
    
    # 除以2来获得半高和半宽
    # 因为要和中心点相加，所以这里除以2
    anchor_manipulations = torch.stack((-w, -h, w, h)).T.repeat(
                                        in_height * in_width, 1) / 2

    # 每个中心点都将有“boxes_per_pixel”个锚框，
    # 所以生成含所有锚框中心的网格，重复了“boxes_per_pixel”次， 根据想要生成的锚框的种类来算
    out_grid = torch.stack([shift_x, shift_y, shift_x, shift_y],
                dim=1).repeat_interleave(boxes_per_pixel, dim=0)
    output = out_grid + anchor_manipulations
    return output.unsqueeze(0)

img = d2l.plt.imread('./catdog.jpg')
h, w = img.shape[:2]

print(h, w)
X = torch.rand