Default boxes and aspect ratios We associate a set of default bounding boxes with each feature map cell, for multiple feature maps at the top of the network. The default boxes tile the feature map in a convolutional manner, so that the position of each box relative to its corresponding cell is fixed. At each feature map cell(特征图上叫cell), we predict the offsets relative to the default box shapes in the cell, as well as the per-class scores that indicate the presence of a class instance in each of those boxes. Specifically, for each box out of k at a given location, we compute c class scores and the 4 offsets relative to the original default box shape. This results in a total of (c+4)k(c + 4)k(c+4)k filters that are applied around each location in the feature map, yielding (c+4)kmn(c + 4)kmn(c+4)kmn outputs for a m × n feature map.
For an illustration of default boxes, please refer to Fig. 1. Our default boxes are similar to the anchor boxes used in Faster R-CNN [2], however we apply them to several feature maps of different resolutions. Allowing different default box shapes in several feature maps let us efficiently discretize the space of possible output box shapes.
SSD模型并不是用边界框的中心位置坐标和宽高参与运算,而是用bounding box相对于default box的偏移量来进行运算。
anchor 是先验框,是固定的,网络学习的是相对于先验框的偏移量(offset)。
假设default box 的位置表示为 d=(dcx,dcy,dw,dh)d=(d_{cx}, d_{cy}, d_{w}, d_{h})d=(dcx,dcy,dw,dh), 对应的 bounding box 表示为 b=(bcx,bcy,bw,bh)b=(b_{cx}, b_{cy}, b_w, b_h)b=(bcx,bcy,bw,bh) 其中cx,cycx, cycx,cy为中心位置坐标,w,hw, hw,h为框的宽和高,则模型预测bounding box 的输出可以表示为t:
tx=(bcx−dcx)/dwt_x=(b_{cx}-d_{cx})/d_wtx=(bcx−dcx)/dw ty=(bcy−dcy)/dht_y=(b_{cy}-d_{cy})/d_hty=(bcy−dcy)/dh tw=ln(bw/dw)t_w=ln(