本篇文章将介绍一个新的改进模块——SCConv(小波空间和通道重构卷积),并阐述如何将其应用于YOLOv11中,显著提升模型性能。为了减少YOLOv11模型的空间和通道维度上的冗余,我们引入空间和通道重构卷积。首先,我们将解析SCConv的工作原理,它通过空间重构单元(SRU)和通道重构单元(CRU)减少卷积神经网络中的空间和通道冗余。随后,我们会详细说明如何将该模块与YOLOv11相结合,展示代码实现细节及其使用方法,最终展现这一改进对目标检测效果的积极影响。
1. Spatial and Channel reconstruction Convolution(SCConv)结构介绍
SCConv模块由两个核心部分组成:空间重建单元 (SRU) 和 通道重建单元 (CRU)。它们按照顺序组合使用,首先通过SRU减少空间维度上的冗余,然后通过CRU减少通道维度上的冗余。SCConv可以无缝集成到现有的CNN中,用于替代标准卷积操作(Li_SCConv_Spatial_and_C…)。
1.1. 空间重建单元 (SRU)
SRU的主要目标是减少空间冗余。其工作流程如下:
- 分离操作:SRU通过训练好的参数对输入特征进行加权,分离出包含丰富空间信息的特征和不包含太多信息的冗余特征。
- 通过对特征图使用Group Normalization (GN),提取每个特征图的缩放因子(即γ),γ反映了空间像素的方差,值越大,说明该特征图包含的空间信息越丰富。
- 基于这些缩放因子,SRU将特征图分为两部分:一部分包含丰富空间信息,另一部分则是较少的信息。
- 重建操作:SRU通过交叉重建的方式将分离的特征进行重组,提升信息流。该操作不仅减少了冗余,还通过将富有信息的特征与低信息的特征组合,进一步增强了特征的空间表达能力。
1.2. 通道重建单元 (CRU)
CRU的目标是减少通道维度上的冗余。其流程分为三个步骤:
-
分离 (Split):CRU首先将输入特征图的通道分成两部分,分别包含αC个通道和(1-α)C个通道,然后通过1×1卷积进行压缩,减少计算量。
-
变换 (Transform):分离后的上半部分特征图通过Group-wise Convolution (GWC) 和**Point-wise Convolution (PWC)**进行变换,提取代表性强的高层特征;下半部分通过廉价的1×1卷积提取浅层特征,作为补充。
-
融合 (Fuse):通过全局平均池化(Pooling)和注意力机制,CRU将上半部分的高层特征和下半部分的浅层特征进行加权融合,得到最终的通道精炼特征。这种融合确保了信息在通道维度上的有效传递和冗余的消除。
2. YOLOv11与SCConv的结合
1. 改进C3k2:本文使用SCConv卷积改进C3k2,构建C3k2_SCConv模块,然后使用C3k2_SCConv替换原有的C3k2,这样就可以利用SCConv减少C3k2中的空间和通道的冗余。
2. 在backbone添加SCConv:本文将SCConv卷积添加到SPPF模块之前,减少backbone中的空间和通道的冗余。通过将空间和通道信息分别优化,减少冗余信息,从而提升模型的整体表现
3. Spatial and Channel reconstruction Convolution(SCConv)代码部分
import torch
import torch.nn.functional as F
import torch.nn as nnfrom .conv import Conv
from .block import C2f, C3, Bottleneckclass GroupBatchnorm2d(nn.Module):def __init__(self, c_num: int,group_num: int = 16,eps: float = 1e-10):super(GroupBatchnorm2d, self).__init__()assert c_num >= group_numself.group_num = group_numself.weight = nn.Parameter(torch.randn(c_num, 1, 1))self.bias = nn.Parameter(torch.zeros(c_num, 1, 1))self.eps = epsdef forward(self, x):N, C, H, W = x.size()x = x.view(N, self.group_num, -1)mean = x.mean(dim=2, keepdim=True)std = x.std(dim=2, keepdim=True)x = (x - mean) / (std + self.eps)x = x.view(N, C, H, W)return x * self.weight + self.biasclass SRU(nn.Module):def __init__(self,oup_channels: int,group_num: int = 16,gate_treshold: float = 0.5,torch_gn: bool = False):super().__init__()self.gn = nn.GroupNorm(num_channels=oup_channels, num_groups=group_num) if torch_gn else GroupBatchnorm2d(c_num=oup_channels, group_num=group_num)self.gate_treshold = gate_tresholdself.sigomid = nn.Sigmoid()def forward(self, x):gn_x = self.gn(x)w_gamma = self.gn.weight / torch.sum(self.gn.weight)w_gamma = w_gamma.view(1, -1, 1, 1)reweigts = self.sigomid(gn_x * w_gamma)# Gateinfo_mask = reweigts >= self.gate_tresholdnoninfo_mask = reweigts < self.gate_tresholdx_1 = info_mask * gn_xx_2 = noninfo_mask * gn_xx = self.reconstruct(x_1, x_2)return xdef reconstruct(self, x_1, x_2):x_11, x_12 = torch.split(x_1, x_1.size(1) // 2, dim=1)x_21, x_22 = torch.split(x_2, x_2.size(1) // 2, dim=1)return torch.cat([x_11 + x_22, x_12 + x_21], dim=1)class CRU(nn.Module):'''alpha: 0<alpha<1'''def __init__(self,op_channel: int,alpha: float = 1 / 2,squeeze_radio: int = 2,group_size: int = 2,group_kernel_size: int = 3,):super().__init__()self.up_channel = up_channel = int(alpha * op_channel)self.low_channel = low_channel = op_channel - up_channelself.squeeze1 = nn.Conv2d(up_channel, up_channel // squeeze_radio, kernel_size=1, bias=False)self.squeeze2 = nn.Conv2d(low_channel, low_channel // squeeze_radio, kernel_size=1, bias=False)# upself.GWC = nn.Conv2d(up_channel // squeeze_radio, op_channel, kernel_size=group_kernel_size, stride=1,padding=group_kernel_size // 2, groups=group_size)self.PWC1 = nn.Conv2d(up_channel // squeeze_radio, op_channel, kernel_size=1, bias=False)# lowself.PWC2 = nn.Conv2d(low_channel // squeeze_radio, op_channel - low_channel // squeeze_radio, kernel_size=1,bias=False)self.advavg = nn.AdaptiveAvgPool2d(1)def forward(self, x):# Splitup, low = torch.split(x, [self.up_channel, self.low_channel], dim=1)up, low = self.squeeze1(up), self.squeeze2(low)# TransformY1 = self.GWC(up) + self.PWC1(up)Y2 = torch.cat([self.PWC2(low), low], dim=1)# Fuseout = torch.cat([Y1, Y2], dim=1)out = F.softmax(self.advavg(out), dim=1) * outout1, out2 = torch.split(out, out.size(1) // 2, dim=1)return out1 + out2class ScConv(nn.Module):def __init__(self,op_channel: int,group_num: int = 4,gate_treshold: float = 0.5,alpha: float = 1 / 2,squeeze_radio: int = 2,group_size: int = 2,group_kernel_size: int = 3,):super().__init__()self.SRU = SRU(op_channel,group_num=group_num,gate_treshold=gate_treshold)self.CRU = CRU(op_channel,alpha=alpha,squeeze_radio=squeeze_radio,group_size=group_size,group_kernel_size=group_kernel_size)def forward(self, x):x = self.SRU(x)x = self.CRU(x)return xclass Bottleneck_ScConv(nn.Module):"""Standard bottleneck."""def __init__(self, c1, c2, shortcut=True, g=1, k=(3, 3), e=0.5):"""Initializes a standard bottleneck module with optional shortcut connection and configurable parameters."""super().__init__()c_ = int(c2 * e) # hidden channelsself.cv1 = Conv(c1, c_, k[0], 1)self.cv2 = ScConv(c2)self.add = shortcut and c1 == c2def forward(self, x):"""Applies the YOLO FPN to input data."""return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))class C3k(C3):"""C3k is a CSP bottleneck module with customizable kernel sizes for feature extraction in neural networks."""def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5, k=3):"""Initializes the C3k module with specified channels, number of layers, and configurations."""super().__init__(c1, c2, n, shortcut, g, e)c_ = int(c2 * e) # hidden channels# self.m = nn.Sequential(*(RepBottleneck(c_, c_, shortcut, g, k=(k, k), e=1.0) for _ in range(n)))self.m = nn.Sequential(*(Bottleneck_ScConv(c_, c_, shortcut, g, k=(k, k), e=1.0) for _ in range(n)))# 在c3k=True时,使用Bottleneck_ScConv特征融合,为false的时候我们使用普通的Bottleneck提取特征
class C3k2_SC(C2f):"""Faster Implementation of CSP Bottleneck with 2 convolutions."""def __init__(self, c1, c2, n=1, c3k=False, e=0.5, g=1, shortcut=True):"""Initializes the C3k2 module, a faster CSP Bottleneck with 2 convolutions and optional C3k blocks."""super().__init__(c1, c2, n, shortcut, g, e)self.m = nn.ModuleList(C3k(self.c, self.c, 2, shortcut, g) if c3k else Bottleneck(self.c, self.c, shortcut, g) for _ in range(n))if __name__ == '__main__':DW = ScConv(256)#创建一个输入张量batch_size = 8input_tensor=torch.randn(batch_size, 256, 64, 64 )#运行模型并打印输入和输出的形状output_tensor =DW(input_tensor)print("Input shape:",input_tensor.shape)print("0utput shape:",output_tensor.shape)
4. 将SCConv引入到YOLOv11中
第一: 将下面的核心代码复制到D:\bilibili\model\YOLO11\ultralytics-main\ultralytics\nn路径下,如下图所示。
第二:在task.py中导入SCConv包
第三:在task.py中的模型配置部分下面代码
第一个改进需修改的地方
第二个改进,需修改的地方
elif m is ScConv:args = [ch[f]]
第四:将模型配置文件复制到YOLOV11.YAMY文件中
第一个修改的配置文件
# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'# [depth, width, max_channels]n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPss: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPsm: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPsl: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPsx: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs# YOLO11n backbone
backbone:# [from, repeats, module, args]- [-1, 1, Conv, [64, 3, 2]] # 0-P1/2- [-1, 1, Conv, [128, 3, 2]] # 1-P2/4- [-1, 2, C3k2_SC, [256, False, 0.25]]- [-1, 1, Conv, [256, 3, 2]] # 3-P3/8- [-1, 2, C3k2_SC, [512, False, 0.25]]- [-1, 1, Conv, [512, 3, 2]] # 5-P4/16- [-1, 2, C3k2_SC, [512, True]]- [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32- [-1, 2, C3k2_SC, [1024, True]]- [-1, 1, SPPF, [1024, 5]] # 9- [-1, 2, C2PSA, [1024]] # 10# YOLO11n head
head:- [-1, 1, nn.Upsample, [None, 2, "nearest"]]- [[-1, 6], 1, Concat, [1]] # cat backbone P4- [-1, 2, C3k2_SC, [512, False]] # 13- [-1, 1, nn.Upsample, [None, 2, "nearest"]]- [[-1, 4], 1, Concat, [1]] # cat backbone P3- [-1, 2, C3k2_SC, [256, False]] # 16 (P3/8-small)- [-1, 1, Conv, [256, 3, 2]]- [[-1, 13], 1, Concat, [1]] # cat head P4- [-1, 2, C3k2_SC, [512, False]] # 19 (P4/16-medium)- [-1, 1, Conv, [512, 3, 2]]- [[-1, 10], 1, Concat, [1]] # cat head P5- [-1, 2, C3k2_SC, [1024, True]] # 22 (P5/32-large)- [[16, 19, 22], 1, Detect, [nc]] # Detect(P3, P4, P5)
第二个修改的配置文件
# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'# [depth, width, max_channels]n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPss: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPsm: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPsl: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPsx: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs# YOLO11n backbone
backbone:# [from, repeats, module, args]- [-1, 1, Conv, [64, 3, 2]] # 0-P1/2- [-1, 1, Conv, [128, 3, 2]] # 1-P2/4- [-1, 2, C3k2, [256, False, 0.25]]- [-1, 1, Conv, [256, 3, 2]] # 3-P3/8- [-1, 2, C3k2, [512, False, 0.25]]- [-1, 1, Conv, [512, 3, 2]] # 5-P4/16- [-1, 2, C3k2, [512, True]]- [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32- [-1, 2, C3k2, [1024, True]]- [-1, 1, ScConv, []]- [-1, 1, SPPF, [1024, 5]] # 9- [-1, 2, C2PSA, [1024]] # 10# YOLO11n head
head:- [-1, 1, nn.Upsample, [None, 2, "nearest"]]- [[-1, 6], 1, Concat, [1]] # cat backbone P4- [-1, 2, C3k2, [512, False]] # 13- [-1, 1, nn.Upsample, [None, 2, "nearest"]]- [[-1, 4], 1, Concat, [1]] # cat backbone P3- [-1, 2, C3k2, [256, False]] # 16 (P3/8-small)- [-1, 1, Conv, [256, 3, 2]]- [[-1, 14], 1, Concat, [1]] # cat head P4- [-1, 2, C3k2, [512, False]] # 19 (P4/16-medium)- [-1, 1, Conv, [512, 3, 2]]- [[-1, 11], 1, Concat, [1]] # cat head P5- [-1, 2, C3k2, [1024, True]] # 22 (P5/32-large)- [[17, 20, 23], 1, Detect, [nc]] # Detect(P3, P4, P5)
第五:运行成功
from ultralytics.models import NAS, RTDETR, SAM, YOLO, FastSAM, YOLOWorldif __name__=="__main__":# 使用自己的YOLOv11.yamy文件搭建模型并加载预训练权重训练模型model = YOLO(r"D:\bilibili\model\YOLO11\ultralytics-main\ultralytics\cfg\models\11\yolo11_SConv.yaml")\.load(r'D:\bilibili\model\YOLO11\ultralytics-main\yolo11n.pt') # build from YAML and transfer weightsresults = model.train(data=r'D:\bilibili\model\ultralytics-main\ultralytics\cfg\datasets\VOC_my.yaml',epochs=100, imgsz=640, batch=8)