MaskFormer是一套基于transformer结构的语义分割代码。
链接地址:
https://github.com/facebookresearch/MaskFormer/tree/main
测试用的数据集:ADE20k Dataset
MIT Scene Parsing Benchmark
该数据集可通过上述链接下载,其中training含有20210张图片,validation含有2000张图片。SceneParsing中是全景分割的标签图片,InstanceSegmentation是实例分割的标签图片。
1.环境搭建
本人在python3.10,CUDA11.8,torch2.1.0的linux服务器上做实验。通过pip装好torch之后,然后按照INSTALL.md中的提示安装Detectron中的包。
有以下几点需要注意:
1.需要安装opencv-python-headless版本的opnecv
pip install opencv-python-headless
2.需要安装1.*版本的numpy
pip install numpy==1.26.0
3.使用timm加载模型的时候,会遇到某些层不支持的问题,在mask_former/modeling/backbone/swin.py中,修改为如下:
# from timm.models.layers import DropPath, to_2tuple, trunc_normal_
from timm.layers import DropPath, to_2tuple, trunc_normal_
4.安装panopticapi的包
git clone https://github.com/cocodataset/panopticapi.git
python setup.py build_ext --inplace
python setup.py build_ext install
个人配好的环境如下所示:
Package Version Editable project location
----------------------- ------------------ ------------------------------------
absl-py 2.2.1
antlr4-python3-runtime 4.9.3
black 25.1.0
certifi 2025.1.31
charset-normalizer 3.4.1
click 8.1.8
cloudpickle 3.1.1
coloredlogs 15.0.1
contourpy 1.3.1
cycler 0.12.1
Cython 3.0.12
detectron2 0.6 /home/shengpeng/downloads/detectron2
filelock 3.18.0
flatbuffers 25.2.10
fonttools 4.56.0
fsspec 2025.3.0
fvcore 0.1.5.post20221221
grpcio 1.71.0
h5py 3.13.0
huggingface-hub 0.29.3
humanfriendly 10.0
hydra-core 1.3.2
idna 3.10
iopath 0.1.9
Jinja2 3.1.6
kiwisolver 1.4.8
Markdown 3.7
markdown-it-py 3.0.0
MarkupSafe 3.0.2
matplotlib 3.10.1
mdurl 0.1.2
mpmath 1.3.0
mypy-extensions 1.0.0
networkx 3.4.2
numpy 1.26.0
omegaconf 2.3.0
onnx 1.17.0
onnx-simplifier 0.4.36
onnxruntime 1.21.0
opencv-python-headless 4.11.0.86
packaging 24.2
panopticapi 0.1
pathspec 0.12.1
pillow 11.1.0
pip 25.0
platformdirs 4.3.7
portalocker 3.1.1
protobuf 6.30.2
pycocotools 2.0.8
Pygments 2.19.1
pyparsing 3.2.3
python-dateutil 2.9.0.post0
PyYAML 6.0.2
requests 2.32.3
rich 13.9.4
safetensors 0.5.3
scipy 1.15.2
setuptools 75.8.0
shapely 2.0.7
six 1.17.0
sympy 1.13.3
tabulate 0.9.0
tensorboard 2.19.0
tensorboard-data-server 0.7.2
termcolor 2.5.0
timm 1.0.15
tomli 2.2.1
torch 2.1.0+cu118
torchvision 0.16.0+cu118
tqdm 4.67.1
triton 2.1.0
typing_extensions 4.13.0
urllib3 2.3.0
Werkzeug 3.1.3
wheel 0.45.1
yacs 0.1.8
下载预训练模型,即调用demo/demo.py,指定config的配置文件,和预训练权重,对图片进行推理,看预测效果。
python demo/demo.py \
--config-file configs/ade20k-150/maskformer_R50_bs16_160k.yaml \
--input images/ADE/ADE_test_00000001.jpg \
--opts MODEL.WEIGHTS weights/MaskFormer_seg_R50_512x512.pkl
训练的脚本:
python train_net.py \
--num-gpus 2 \
--config-file configs/ade20k-150/maskformer_R50_bs16_160k.yaml \
在train_net.py中需要指定数据集的路径:
os.environ['DETECTRON2_DATASETS']='/home/shengpeng/code/github_proj2/ADE2016/SceneParsing'
2张RTX3090的卡,大概跑了一晚上,结果如下:
其中最小模型,基于R50的backbone练出来的模型也有160多M。
2.torch模型转onnx
该套代码中没有带转onnx的代码,需要自己想办法转。
找到下载的detectron2的代码,detectron2/detectron2/engine/defaults.py中,重写class DefaultPredictor的__call__函数,如下:
def __call__(self, original_image):with torch.no_grad(): image = original_image[:, :, ::-1]input_blob = torch.as_tensor(image.astype("float32").transpose(2, 0, 1))input_blob = input_blob.unsqueeze(0)# print('self.cfg.MODEL.DEVICE:', self.cfg.MODEL.DEVICE)pixel_mean = self.cfg.MODEL.PIXEL_MEANpixel_std = self.cfg.MODEL.PIXEL_STDpixel_mean = torch.Tensor(pixel_mean).view(-1, 1, 1)pixel_std = torch.Tensor(pixel_std).view(-1, 1, 1)input_blob = (input_blob-pixel_mean) / pixel_stdinput_blob = input_blob.to(self.cfg.MODEL.DEVICE)print('input_blob.shape:',input_blob.shape)predictions = self.model(input_blob)[0]return predictions
重写MaskFormer/maskformer/mask_former_model.py中的class MaskFormer的forward()的函数:
def forward(self, input_blob):print('MaskFormer input_blob:', input_blob.shape)print('self.device:', self.device)print('input_blob.device:', input_blob.device)input_h, input_w = input_blob.shape[2], input_blob.shape[3]features = self.backbone(input_blob)outputs = self.sem_seg_head(features)if self.training:# # mask classification target# if "instances" in batched_inputs[0]:# gt_instances = [x["instances"].to(self.device) for x in batched_inputs]# targets = self.prepare_targets(gt_instances, images)# else:# targets = Nonetargets = None# bipartite matching-based losslosses = self.criterion(outputs, targets)for k in list(losses.keys()):if k in self.criterion.weight_dict:losses[k] *= self.criterion.weight_dict[k]else:# remove this loss if not specified in `weight_dict`losses.pop(k)return losseselse:mask_cls_results = outputs["pred_logits"]mask_pred_results = outputs["pred_masks"]# return mask_cls_results, mask_pred_results# upsample masksmask_pred_results = F.interpolate(mask_pred_results,size=(input_h, input_w),mode="bilinear",align_corners=False,)# mask_cls_result=mask_cls_results[0]# mask_pred_result=mask_pred_results[0]# print('mask_cls_result:',mask_cls_result.shape)# print('mask_pred_result:',mask_pred_result.shape)print('mask_cls_results:',mask_cls_results.shape)print('mask_pred_results:',mask_pred_results.shape)processed_results = [] if self.sem_seg_postprocess_before_inference:mask_pred_results = sem_seg_postprocess(mask_pred_results, [input_h, input_w], input_h, input_w)# semantic segmentation inferencer = self.semantic_inference(mask_cls_results, mask_pred_results)print(f'r1:{r.shape}')if not self.sem_seg_postprocess_before_inference:r = sem_seg_postprocess(r, [input_h, input_w], input_h, input_w)print(f'r2:{r.shape}')processed_results.append({"sem_seg": r})print('processed_results num:',len(processed_results))return processed_results
在tools中新建convert_torchvision_to_onnx.py的转模型脚本:
import argparse
import glob
import multiprocessing as mp
import os# fmt: off
import sys
sys.path.insert(1, os.path.join(sys.path[0], '..'))
# fmt: onimport tempfile
import time
import warningsimport cv2
import numpy as np
import tqdmfrom detectron2.config import get_cfg
from detectron2.data.detection_utils import read_image
from detectron2.projects.deeplab import add_deeplab_config
from detectron2.utils.logger import setup_loggerfrom mask_former import add_mask_former_config
from demo.predictor import VisualizationDemoimport onnx
import torchdef setup_cfg(args):# load config from file and command-line argumentscfg = get_cfg()add_deeplab_config(cfg)add_mask_former_config(cfg)cfg.merge_from_file(args.config_file)cfg.merge_from_list(args.opts)cfg.freeze()return cfgdef get_parser():parser = argparse.ArgumentParser(description="Detectron2 demo for builtin configs")parser.add_argument("--config-file", default="configs/ade20k-150/maskformer_R50_bs16_160k.yaml")parser.add_argument("--input", nargs="+")parser.add_argument("--output", help="A file or directory to save output visualizations. ""If not given, will show output in an OpenCV window.")parser.add_argument("--confidence-threshold", type=float, default=0.5, help="Minimum score for instance predictions to be shown")parser.add_argument("--opts",help="Modify config options using the command-line 'KEY VALUE' pairs",default=['MODEL.WEIGHTS', 'output/model_0159999.pth'],nargs=argparse.REMAINDER,)return parserif __name__ == "__main__":args = get_parser().parse_args()cfg = setup_cfg(args)demo = VisualizationDemo(cfg)net = demo.predictor.modelnet.to('cpu')input_model_path=cfg.MODEL.WEIGHTSprint('input_model_path:%s' % (input_model_path))output_model_path=input_model_path.replace('.pth', '.onnx')im = torch.zeros(1, 3, 512, 512).to('cpu') # image size(1, 3, 512, 512) BCHWinput_layer_names = ["images"]output_layer_names = ["output"]dynamic = False# Export the modelprint(f'Starting export with onnx {onnx.__version__}.')torch.onnx.export(net,im,f = output_model_path,verbose = False,opset_version = 12,training = torch.onnx.TrainingMode.EVAL,do_constant_folding = True,input_names = input_layer_names,output_names = output_layer_names,dynamic_axes = {'images': {0: 'batch'},'output': {0: 'batch'}} if dynamic else None)# Checksmodel_onnx = onnx.load(output_model_path) # load onnx modelonnx.checker.check_model(model_onnx) # check onnx model# Simplify onnxsimplify = 1if simplify:import onnxsimprint(f'Simplifying with onnx-simplifier {onnxsim.__version__}.')# model_onnx, check = onnxsim.simplify(# model_onnx,# dynamic_input_shape=False,# input_shapes=None)onnx_sim_model, check = onnxsim.simplify(model_onnx)assert check, 'assert check failed'onnx.save(model_onnx, output_model_path)print('Onnx model save as {}'.format(output_model_path))
即可转换成功得到对应的onnx模型,可使用onnxruntime加载该onnx模型做推理。
3.推理速度测试
在c++代码中,加载onnx转tensorrt测试速度,对比segformer中14M的模型,和该MaskFormer161M的模型,同时基于512x512的分辨率,转fp16的engine,做推理:
segfomer_b0 10ms左右
maskformer_R50 220ms左右
这个实验结果显示,该maskformer的模型不适用于那种速度要求特别高的场景,更适用于类别数较多,全景分割的场景。