torch.cat()\torch.stack()\concat操作\FPN类模型通道特征合并

本文介绍了在特征合并中两种常用的方法:FPN(Feature Pyramid Network)和BiFPN(Bi-directional Feature Pyramids Network)。详细阐述了FPN的通道合并策略以及BiFPN中通过学习权重参数实现特征融合的不同。同时,对比了torch.cat()和torch.stack()在PyTorch中用于特征拼接的差异,强调了它们在拼接维度、张量大小匹配上的要求。

特征合并相关(concat)


1、合并方法

(1)FPN

先用1*1卷积合并通道数,然后上采样,对应元素直接相加。如此合并之后为减少混叠效应,再用3*3卷积进行处理得到每一层级最后的特征图;除此之外,为不同层次的输出通道设置固定维数(因为所有层次都像传统的特征化图像金字塔一样使用共享的分类器/回归器)

与此同时由于FPN不同级别特征图尺寸不同所以对应的锚框大小也不同(但是长宽比例都是相同的)。另外,由于不同尺度的存在,ROI Pooling层也要设置不同的尺度(用于提取ROI)(ResNets中没有预先训练的fc层:这些层是随机初始化的)

(2)BiFPN

对于特征融合,从FPN 开始普遍采用的是一个特征先 Resize ,再和另一层的特征相加。作者认为这样的方式是不合理。因为这样假设这两层的特征有了相同的权重。考虑它们最终的贡献应该不同,因此,一个常规的想法是加入权重参数w来自动学习重要性。

2、torch.cat()函数

torch.cat ( (A, B), dim=0)接受一个由两个(或多个)tensor组成的元组,按行拼接,所以两个(多个)tensor的列数要相同。

torch.cat ( (A, B), dim=1)是按列拼接,所以两个tensor的行数要相同。

结果:

在深度学习处理图像时,常用的有3通道的RGB彩色图像及单通道的灰度图。张量size为c*h*w,即通道数*图像高度*图像宽度。在用torch.cat拼接两张图像时一般要求图像大小一致而通道数可不一致,即h和w同,c可不同。(如果我们直接使用cat(A,B),默认从第0维进行合并)

结果:

总结:使用torch.cat((A,B),dim)时,除拼接维数dim数值可不同外其余维数数值需相同,方能对齐。

3、关于torch.stack函数

torch.stack()函数同样有张量列表和维度两个参数。stack与cat的区别在于,torch.stack()函数要求输入张量的大小完全相同,得到的张量的维度会比输入的张量的多出1维,并且多出的那个维度就是拼接的维度,那个维度的大小就是输入张量的个数。

结果:

请 将BIFPN代入下列代码中import torch from torch import tensor import torch.nn as nn import sys,os import math import sys sys.path.append(os.getcwd()) #sys.path.append("lib/models") #sys.path.append("lib/utils") #sys.path.append("/workspace/wh/projects/DaChuang") from lib.utils import initialize_weights # from lib.models.common2 import DepthSeperabelConv2d as Conv # from lib.models.common2 import SPP, Bottleneck, BottleneckCSP, Focus, Concat, Detect from lib.models.common import Conv, SPP, Bottleneck, BottleneckCSP, Focus, Concat, Detect, SharpenConv from torch.nn import Upsample from lib.utils import check_anchor_order from lib.core.evaluate import SegmentationMetric from lib.utils.utils import time_synchronized # The lane line and the driving area segment branches without share information with each other and without link YOLOP = [ [24, 33, 42], #Det_out_idx, Da_Segout_idx, LL_Segout_idx [ -1, Focus, [3, 32, 3]], #0 [ -1, Conv, [32, 64, 3, 2]], #1 [ -1, BottleneckCSP, [64, 64, 1]], #2 [ -1, Conv, [64, 128, 3, 2]], #3 [ -1, BottleneckCSP, [128, 128, 3]], #4 [ -1, Conv, [128, 256, 3, 2]], #5 [ -1, BottleneckCSP, [256, 256, 3]], #6 [ -1, Conv, [256, 512, 3, 2]], #7 [ -1, SPP, [512, 512, [5, 9, 13]]], #8 [ -1, BottleneckCSP, [512, 512, 1, False]], #9 [ -1, Conv,[512, 256, 1, 1]], #10 [ -1, Upsample, [None, 2, 'nearest']], #11 [ [-1, 6], Concat, [1]], #12 [ -1, BottleneckCSP, [512, 256, 1, False]], #13 [ -1, Conv, [256, 128, 1, 1]], #14 [ -1, Upsample, [None, 2, 'nearest']], #15 [ [-1,4], Concat, [1]], #16 #Encoder [ -1, BottleneckCSP, [256, 128, 1, False]], #17 [ -1, Conv, [128, 128, 3, 2]], #18 [ [-1, 14], Concat, [1]], #19 [ -1, BottleneckCSP, [256, 256, 1, False]], #20 [ -1, Conv, [256, 256, 3, 2]], #21 [ [-1, 10], Concat, [1]], #22 [ -1, BottleneckCSP, [512, 512, 1, False]], #23 [ [17, 20, 23], Detect, [1, [[3,9,5,11,4,20], [7,18,6,39,12,31], [19,50,38,81,68,157]], [128, 256, 512]]], #Detection head 24 [ 16, Conv, [256, 128, 3, 1]], #25 [ -1, Upsample, [None, 2, 'nearest']], #26 [ -1, BottleneckCSP, [128, 64, 1, False]], #27 [ -1, Conv, [64, 32, 3, 1]], #28 [ -1, Upsample, [None, 2, 'nearest']], #29 [ -1, Conv, [32, 16, 3, 1]], #30 [ -1, BottleneckCSP, [16, 8, 1, False]], #31 [ -1, Upsample, [None, 2, 'nearest']], #32 [ -1, Conv, [8, 2, 3, 1]], #33 Driving area segmentation head [ 16, Conv, [256, 128, 3, 1]], #34 [ -1, Upsample, [None, 2, 'nearest']], #35 [ -1, BottleneckCSP, [128, 64, 1, False]], #36 [ -1, Conv, [64, 32, 3, 1]], #37 [ -1, Upsample, [None, 2, 'nearest']], #38 [ -1, Conv, [32, 16, 3, 1]], #39 [ -1, BottleneckCSP, [16, 8, 1, False]], #40 [ -1, Upsample, [None, 2, 'nearest']], #41 [ -1, Conv, [8, 2, 3, 1]] #42 Lane line segmentation head ] class MCnet(nn.Module): def __init__(self, block_cfg, **kwargs): super(MCnet, self).__init__() layers, save= [], [] self.nc = 1 self.detector_index = -1 self.det_out_idx = block_cfg[0][0] self.seg_out_idx = block_cfg[0][1:] # Build model for i, (from_, block, args) in enumerate(block_cfg[1:]): block = eval(block) if isinstance(block, str) else block # eval strings if block is Detect: self.detector_index = i block_ = block(*args) block_.index, block_.from_ = i, from_ layers.append(block_) save.extend(x % i for x in ([from_] if isinstance(from_, int) else from_) if x != -1) # append to savelist assert self.detector_index == block_cfg[0][0] self.model, self.save = nn.Sequential(*layers), sorted(save) self.names = [str(i) for i in range(self.nc)] # set stride、anchor for detector Detector = self.model[self.detector_index] # detector if isinstance(Detector, Detect): s = 128 # 2x min stride # for x in self.forward(torch.zeros(1, 3, s, s)): # print (x.shape) with torch.no_grad(): model_out = self.forward(torch.zeros(1, 3, s, s)) detects, _, _= model_out Detector.stride = torch.tensor([s / x.shape[-2] for x in detects]) # forward # print("stride"+str(Detector.stride )) Detector.anchors /= Detector.stride.view(-1, 1, 1) # Set the anchors for the corresponding scale check_anchor_order(Detector) self.stride = Detector.stride self._initialize_biases() initialize_weights(self) def forward(self, x): cache = [] out = [] det_out = None Da_fmap = [] LL_fmap = [] for i, block in enumerate(self.model): if block.from_ != -1: x = cache[block.from_] if isinstance(block.from_, int) else [x if j == -1 else cache[j] for j in block.from_] #calculate concat detect x = block(x) if i in self.seg_out_idx: #save driving area segment result m=nn.Sigmoid() out.append(m(x)) if i == self.detector_index: det_out = x cache.append(x if block.index in self.save else None) out.insert(0,det_out) return out def _initialize_biases(self, cf=None): # initialize biases into Detect(), cf is class frequency # https://arxiv.org/abs/1708.02002 section 3.3 # cf = torch.bincount(torch.tensor(np.concatenate(dataset.labels, 0)[:, 0]).long(), minlength=nc) + 1. # m = self.model[-1] # Detect() module m = self.model[self.detector_index] # Detect() module for mi, s in zip(m.m, m.stride): # from b = mi.bias.view(m.na, -1) # conv.bias(255) to (3,85) b.data[:, 4] += math.log(8 / (640 / s) ** 2) # obj (8 objects per 640 image) b.data[:, 5:] += math.log(0.6 / (m.nc - 0.99)) if cf is None else torch.log(cf / cf.sum()) # cls mi.bias = torch.nn.Parameter(b.view(-1), requires_grad=True) def get_net(cfg, **kwargs): m_block_cfg = YOLOP model = MCnet(m_block_cfg, **kwargs) return model if __name__ == "__main__": from torch.utils.tensorboard import SummaryWriter model = get_net(False) input_ = torch.randn((1, 3, 256, 256)) gt_ = torch.rand((1, 2, 256, 256)) metric = SegmentationMetric(2) model_out,SAD_out = model(input_) detects, dring_area_seg, lane_line_seg = model_out Da_fmap, LL_fmap = SAD_out for det in detects: print(det.shape) print(dring_area_seg.shape) print(lane_line_seg.shape)
最新发布
08-20
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值