YOLOv11 NCNN安卓部署
之前自己在验证更换relu激活函数重新训练部署模型的时候,在使用ncnn代码推理验证效果很好,但是部署到安卓上cpu模式会出现大量的错误检测框,现已更换会官方默认的权重
前言
YOLOv11 NCNN安卓部署
目前的帧率可以稳定在20帧左右,下面是这个项目的github地址:https://github.com/gaoxumustwin/ncnn-android-yolov11
上面的检测精度很低时因为这个模型只训练了5个epoch,使用3090训练一个epoch需要15分钟,后续会把训练50个epoch和100个epoch的权重更新到仓库中;
在之前复现了一个yolov8pose ncnn安卓部署的项目,在逛github的时候发现了一个关于yolov11的ncnn仓库,看了一下代码,发现作者是根据三木君大佬的代码进行改写,所以跟yolov8pose ncnn的非常的类似,所以就趁着刚改写的热乎劲,把yolov11 ncnn 安卓部署的代码改写出来;
环境配置
写这个blog的时候,安装时间为2024年11月29日
pip install ultralytics
安装后的ultralytics版本为:8.3.39,安装后的路径为:/root/miniconda3/lib/python3.8/site-packages/ultralytics
数据配置
yolov11的默认检测模型是使用COCO2017数据集进行训练,如果训练COCO数据集建议在autodl上进行训练,因为coco2017数据集在autodl上是公开数据集
如何查看autodl的共享数据
root@autodl-container-3686439328-168c7bd7:~# ls /root/autodl-pub/
ADEChallengeData2016 COCO2017 DIV2K ImageNet100 VOCdevkit mvtec_anomaly_detection.tar.xz
Aishell CUB200-2011 DOTA KITTI_Depth_Completion.tar Vimeo-90k nuScenes
BERT-Pretrain-Model CULane GOT10k KITTI_Object cifar-100
CASIAWebFace CelebA ImageNet SemanticKITTI cityscapes
数据制作
如果在实例中找到了自己需要的数据集,想使用共享数据,不能直接解压会出现只读错误,需要解压到自己的数据盘中(/root/autodl-tmp)
按照下面的流程操作即可
cd /root/autodl-tmp/
mkdir images
cd images
unzip /root/autodl-pub/COCO2017/train2017.zip
unzip /root/autodl-pub/COCO2017/val2017.zip
此时images下面只有 train2017 val2017
下载COCO2017的标签
cd /root/autodl-tmp
mkdir labels
cd labels
wget https://github.com/ultralytics/assets/releases/download/v0.0.0/coco2017labels.zip
unzip coco2017labels.zip
rm coco2017labels.zip
cd coco
rm -r annotations/
rm -r images/
rm -r LICENSE
rm -r README.txt
rm -r test-dev2017.txt
rm -r train2017.txt
rm -r val2017.txt
mv labels/* ../
rm -r coco/
此时labels下面只有 train2017 val2017
数据配置文件
复制COCO2017的配置文件到训练目录下
# workspace root
mkdir train
cp /root/miniconda3/lib/python3.8/site-packages/ultralytics/cfg/datasets/coco.yaml ./train
修改coco.yaml中的path、train和val
# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
path: /root/autodl-tmp # dataset root dir
train: images/train2017 # train images (relative to 'path') 118287 images
val: images/val2017 # val images (relative to 'path') 5000 images
更换激活函数
更换激活函数重新训练部署出现了问题,CPU识别时出现了大量错误的检测框,而GPU则不会,并且更换会参考的YOLOv11-ncnn提供的原始ncnn权重不会出现这个问题,由于时间有限,没有继续验证,但我仍认为更换激活函数的做法是正确的
如果有想去验证的朋友可以参考下面的做法:
YOLOv11默认使用的激活函数是SiLU,换成计算更高效的ReLU
更换激活函数后,原有的Pytorch模型需要重新训练再导出ONNX
修改/root/miniconda3/lib/python3.8/site-packages/ultralytics/nn/modules/conv.py中的第39行左右的default_act = nn.SiLU() 修改为 default_act = nn.ReLU()
训练
下载预训练权重
wget https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11n.pt
训练
训练脚本train.py
from ultralytics import YOLOmodel = YOLO('yolo11.yaml').load('yolo11n.pt') # 加载预训练模型 还是有用的 有助于训练results = model.train(data='./coco.yaml', epochs=100, imgsz=640, batch=64, project='runs')
模型导出
模型结构修改
使用下面的方式修改模型结构不影响训练
修改/root/miniconda3/lib/python3.8/site-packages/ultralytics/nn/modules/head.py文件,修改Detect类的导出函数在其forward函数中加如下代码
if self.export or torch.onnx.is_in_onnx_export():results = self.forward_export(x)return tuple(results)
同时在Detect类新加上如下函数
def forward_export(self, x):results = []for i in range(self.nl):dfl = self.cv2[i](x[i]).permute(0, 2, 3, 1)cls = self.cv3[i](x[i]).sigmoid().permute(0, 2, 3, 1)results.append(torch.cat((dfl, cls), -1))return results
修改后的整体代码效果如下:
class Detect(nn.Module):"""YOLO Detect head for detection models."""dynamic = False # force grid reconstructionexport = False # export modeformat = None # export formatend2end = False # end2endmax_det = 300 # max_detshape = Noneanchors = torch.empty(0) # initstrides = torch.empty(0) # initlegacy = False # backward compatibility for v3/v5/v8/v9 modelsdef __init__(self, nc=80, ch=()):"""Initializes the YOLO detection layer with specified number of classes and channels."""super().__init__()self.nc = nc # number of classesself.nl = len(ch) # number of detection layersself.reg_max = 16 # DFL channels (ch[0] // 16 to scale 4/8/12/16/20 for n/s/m/l/x)self.no = nc + self.reg_max * 4 # number of outputs per anchorself.stride = torch.zeros(self.nl) # strides computed during buildc2, c3 = max((16, ch[0] // 4, self.reg_max * 4)), max(ch[0], min(self.nc, 100)) # channelsself.cv2 = nn.ModuleList(nn.Sequential(Conv(x, c2, 3), Conv(c2, c2, 3), nn.Conv2d(c2, 4 * self.reg_max, 1)) for x in ch)self.cv3 = (nn.ModuleList(nn.Sequential(Conv(x, c3, 3), Conv(c3, c3, 3), nn.Conv2d(c3, self.nc, 1)) for x in ch)if self.legacyelse nn.ModuleList(nn.Sequential(nn.Sequential(DWConv(x, x, 3), Conv(x, c3, 1)),nn.Sequential(DWConv(c3, c3, 3), Conv(c3, c3, 1)),nn.Conv2d(c3, self.nc, 1),)for x in ch))self.dfl = DFL(self.reg_max) if self.reg_max > 1 else nn.Identity()if self.end2end:self.one2one_cv2 = copy.deepcopy(self.cv2)self.one2one_cv3 = copy.deepcopy(self.cv3)def forward(self, x):"""Concatenates and returns predicted bounding boxes and class probabilities."""if self.export or torch.onnx.is_in_onnx_export():results = self.forward_export(x)return tuple(results)if self.end2end:return self.forward_end2end(x)for i in range(self.nl):x[i] = torch.cat((self.cv2[i](x[i]), self.cv3[i](x[i])), 1)if self.training: # Training pathreturn xy = self._inference(x)return y if self.export else (y, x)def forward_export(self, x):results = []for i in range(self.nl):dfl = self.cv2[i](x[i]).permute(0, 2, 3, 1)cls = self.cv3[i](x[i]).sigmoid().permute(0, 2, 3, 1)results.append(torch.cat((dfl, cls), -1))return results
导出的名字修改
如果需要修改输出的名称则要去修改/root/miniconda3/lib/python3.8/site-packages/ultralytics/engine/exporter.py 的 export_onnx函数
导出
导出脚本export.py
from ultralytics import YOLO# load model
model = YOLO('best.pt')# export onnx
model.export(format='onnx', opset=11, simplify=True, dynamic=False, imgsz=640)
NCNN转化和优化
$ ./onnx2ncnn best.onnx yolov11.param yolov11.bin$ ./ncnnoptimize yolov11.param yolov11.bin yolov11-opt.param yolov11-opt.bin 1
安卓代码的修改
参考这两个代码进行修改
https://github.com/gaoxumustwin/ncnn-android-yolov8-pose
https://github.com/zhouweigogogo/yolo11-ncnn
对于yolo11-ncnn有以下几个修改的地方:
- 将softmax函数修改为了使用快速指数fast_exp的sigmoid
- 将 cv::dnn::NMSBoxes 修改了使用纯C++代码的实现
对于ncnn-android-yolov8-pose修改为ncnn-android-yolov11主要为将各种与yolov8pose相关的内容替换为yolov11
具体的代码过程,有兴趣的可以去查看
本人技术水平不高,代码肯定还有提升优化的地方!!!
参考资料
https://github.com/gaoxumustwin/ncnn-android-yolov8-pose
https://github.com/zhouweigogogo/yolo11-ncnn
https://github.com/triple-Mu/ncnn-examples/blob/main/cpp/yolov8/src/triplemu-yolov8.cpp
https://zhuanlan.zhihu.com/p/769076635
https://blog.csdn.net/u012863603/article/details/142977809?ops_request_misc=&request_id=&biz_id=102&utm_term=yolov11%E7%9A%84%E8%BE%93%E5%87%BA%E6%98%AF%E4%BB%80%E4%B9%88&utm_medium=distribute.pc_search_result.none-task-blog-2allsobaiduweb~default-1-142977809.142v100pc_search_result_base2&spm=1018.2226.3001.4187