MaskFormer语义分割算法测试

MaskFormer是一套基于transformer结构的语义分割代码。

链接地址：

https://github.com/facebookresearch/MaskFormer/tree/main

测试用的数据集：ADE20k Dataset

MIT Scene Parsing Benchmark

该数据集可通过上述链接下载，其中training含有20210张图片，validation含有2000张图片。SceneParsing中是全景分割的标签图片，InstanceSegmentation是实例分割的标签图片。

1.环境搭建

本人在python3.10，CUDA11.8，torch2.1.0的linux服务器上做实验。通过pip装好torch之后，然后按照INSTALL.md中的提示安装Detectron中的包。

有以下几点需要注意：

1.需要安装opencv-python-headless版本的opnecv

pip install opencv-python-headless

2.需要安装1.*版本的numpy

pip install numpy==1.26.0

3.使用timm加载模型的时候，会遇到某些层不支持的问题，在mask_former/modeling/backbone/swin.py中，修改为如下：

# from timm.models.layers import DropPath, to_2tuple, trunc_normal_
from timm.layers import DropPath, to_2tuple, trunc_normal_

4.安装panopticapi的包

git clone https://github.com/cocodataset/panopticapi.git
python setup.py build_ext --inplace
python setup.py build_ext install

个人配好的环境如下所示：

Package                 Version            Editable project location
----------------------- ------------------ ------------------------------------
absl-py                 2.2.1
antlr4-python3-runtime  4.9.3
black                   25.1.0
certifi                 2025.1.31
charset-normalizer      3.4.1
click                   8.1.8
cloudpickle             3.1.1
coloredlogs             15.0.1
contourpy               1.3.1
cycler                  0.12.1
Cython                  3.0.12
detectron2              0.6                /home/shengpeng/downloads/detectron2
filelock                3.18.0
flatbuffers             25.2.10
fonttools               4.56.0
fsspec                  2025.3.0
fvcore                  0.1.5.post20221221
grpcio                  1.71.0
h5py                    3.13.0
huggingface-hub         0.29.3
humanfriendly           10.0
hydra-core              1.3.2
idna                    3.10
iopath                  0.1.9
Jinja2                  3.1.6
kiwisolver              1.4.8
Markdown                3.7
markdown-it-py          3.0.0
MarkupSafe              3.0.2
matplotlib              3.10.1
mdurl                   0.1.2
mpmath                  1.3.0
mypy-extensions         1.0.0
networkx                3.4.2
numpy                   1.26.0
omegaconf               2.3.0
onnx                    1.17.0
onnx-simplifier         0.4.36
onnxruntime             1.21.0
opencv-python-headless  4.11.0.86
packaging               24.2
panopticapi             0.1
pathspec                0.12.1
pillow                  11.1.0
pip                     25.0
platformdirs            4.3.7
portalocker             3.1.1
protobuf                6.30.2
pycocotools             2.0.8
Pygments                2.19.1
pyparsing               3.2.3
python-dateutil         2.9.0.post0
PyYAML                  6.0.2
requests                2.32.3
rich                    13.9.4
safetensors             0.5.3
scipy                   1.15.2
setuptools              75.8.0
shapely                 2.0.7
six                     1.17.0
sympy                   1.13.3
tabulate                0.9.0
tensorboard             2.19.0
tensorboard-data-server 0.7.2
termcolor               2.5.0
timm                    1.0.15
tomli                   2.2.1
torch                   2.1.0+cu118
torchvision             0.16.0+cu118
tqdm                    4.67.1
triton                  2.1.0
typing_extensions       4.13.0
urllib3                 2.3.0
Werkzeug                3.1.3
wheel                   0.45.1
yacs                    0.1.8

下载预训练模型，即调用demo/demo.py，指定config的配置文件，和预训练权重，对图片进行推理，看预测效果。

python demo/demo.py \
--config-file configs/ade20k-150/maskformer_R50_bs16_160k.yaml \
--input images/ADE/ADE_test_00000001.jpg \
--opts MODEL.WEIGHTS weights/MaskFormer_seg_R50_512x512.pkl

训练的脚本：

python train_net.py \
--num-gpus 2 \
--config-file configs/ade20k-150/maskformer_R50_bs16_160k.yaml \

在train_net.py中需要指定数据集的路径：

    os.environ['DETECTRON2_DATASETS']='/home/shengpeng/code/github_proj2/ADE2016/SceneParsing'

2张RTX3090的卡，大概跑了一晚上，结果如下：

其中最小模型，基于R50的backbone练出来的模型也有160多M。

2.torch模型转onnx

该套代码中没有带转onnx的代码，需要自己想办法转。

找到下载的detectron2的代码，detectron2/detectron2/engine/defaults.py中，重写class DefaultPredictor的__call__函数，如下：

    def __call__(self, original_image):with torch.no_grad(): image = original_image[:, :, ::-1]input_blob = torch.as_tensor(image.astype("float32").transpose(2, 0, 1))input_blob = input_blob.unsqueeze(0)# print('self.cfg.MODEL.DEVICE:', self.cfg.MODEL.DEVICE)pixel_mean = self.cfg.MODEL.PIXEL_MEANpixel_std = self.cfg.MODEL.PIXEL_STDpixel_mean = torch.Tensor(pixel_mean).view(-1, 1, 1)pixel_std = torch.Tensor(pixel_std).view(-1, 1, 1)input_blob = (input_blob-pixel_mean) / pixel_stdinput_blob = input_blob.to(self.cfg.MODEL.DEVICE)print('input_blob.shape:',input_blob.shape)predictions = self.model(input_blob)[0]return predictions

重写MaskFormer/maskformer/mask_former_model.py中的class MaskFormer的forward()的函数：

    def forward(self, input_blob):print('MaskFormer input_blob:', input_blob.shape)print('self.device:', self.device)print('input_blob.device:', input_blob.device)input_h, input_w = input_blob.shape[2], input_blob.shape[3]features = self.backbone(input_blob)outputs = self.sem_seg_head(features)if self.training:# # mask classification target# if "instances" in batched_inputs[0]:#     gt_instances = [x["instances"].to(self.device) for x in batched_inputs]#     targets = self.prepare_targets(gt_instances, images)# else:#     targets = Nonetargets = None# bipartite matching-based losslosses = self.criterion(outputs, targets)for k in list(losses.keys()):if k in self.criterion.weight_dict:losses[k] *= self.criterion.weight_dict[k]else:# remove this loss if not specified in `weight_dict`losses.pop(k)return losseselse:mask_cls_results = outputs["pred_logits"]mask_pred_results = outputs["pred_masks"]# return mask_cls_results, mask_pred_results# upsample masksmask_pred_results = F.interpolate(mask_pred_results,size=(input_h, input_w),mode="bilinear",align_corners=False,)# mask_cls_result=mask_cls_results[0]# mask_pred_result=mask_pred_results[0]# print('mask_cls_result:',mask_cls_result.shape)# print('mask_pred_result:',mask_pred_result.shape)print('mask_cls_results:',mask_cls_results.shape)print('mask_pred_results:',mask_pred_results.shape)processed_results = []            if self.sem_seg_postprocess_before_inference:mask_pred_results = sem_seg_postprocess(mask_pred_results, [input_h, input_w], input_h, input_w)# semantic segmentation inferencer = self.semantic_inference(mask_cls_results, mask_pred_results)print(f'r1:{r.shape}')if not self.sem_seg_postprocess_before_inference:r = sem_seg_postprocess(r, [input_h, input_w], input_h, input_w)print(f'r2:{r.shape}')processed_results.append({"sem_seg": r})print('processed_results num:',len(processed_results))return processed_results

在tools中新建convert_torchvision_to_onnx.py的转模型脚本：

import argparse
import glob
import multiprocessing as mp
import os# fmt: off
import sys
sys.path.insert(1, os.path.join(sys.path[0], '..'))
# fmt: onimport tempfile
import time
import warningsimport cv2
import numpy as np
import tqdmfrom detectron2.config import get_cfg
from detectron2.data.detection_utils import read_image
from detectron2.projects.deeplab import add_deeplab_config
from detectron2.utils.logger import setup_loggerfrom mask_former import add_mask_former_config
from demo.predictor import VisualizationDemoimport onnx
import torchdef setup_cfg(args):# load config from file and command-line argumentscfg = get_cfg()add_deeplab_config(cfg)add_mask_former_config(cfg)cfg.merge_from_file(args.config_file)cfg.merge_from_list(args.opts)cfg.freeze()return cfgdef get_parser():parser = argparse.ArgumentParser(description="Detectron2 demo for builtin configs")parser.add_argument("--config-file", default="configs/ade20k-150/maskformer_R50_bs16_160k.yaml")parser.add_argument("--input", nargs="+")parser.add_argument("--output", help="A file or directory to save output visualizations. ""If not given, will show output in an OpenCV window.")parser.add_argument("--confidence-threshold", type=float, default=0.5, help="Minimum score for instance predictions to be shown")parser.add_argument("--opts",help="Modify config options using the command-line 'KEY VALUE' pairs",default=['MODEL.WEIGHTS', 'output/model_0159999.pth'],nargs=argparse.REMAINDER,)return parserif __name__ == "__main__":args = get_parser().parse_args()cfg = setup_cfg(args)demo = VisualizationDemo(cfg)net = demo.predictor.modelnet.to('cpu')input_model_path=cfg.MODEL.WEIGHTSprint('input_model_path:%s' % (input_model_path))output_model_path=input_model_path.replace('.pth', '.onnx')im = torch.zeros(1, 3, 512, 512).to('cpu')  # image size(1, 3, 512, 512) BCHWinput_layer_names   = ["images"]output_layer_names  = ["output"]dynamic = False# Export the modelprint(f'Starting export with onnx {onnx.__version__}.')torch.onnx.export(net,im,f               = output_model_path,verbose         = False,opset_version   = 12,training        = torch.onnx.TrainingMode.EVAL,do_constant_folding = True,input_names     = input_layer_names,output_names    = output_layer_names,dynamic_axes    = {'images': {0: 'batch'},'output': {0: 'batch'}} if dynamic else None)# Checksmodel_onnx = onnx.load(output_model_path)  # load onnx modelonnx.checker.check_model(model_onnx)  # check onnx model# Simplify onnxsimplify = 1if simplify:import onnxsimprint(f'Simplifying with onnx-simplifier {onnxsim.__version__}.')# model_onnx, check = onnxsim.simplify(#     model_onnx,#     dynamic_input_shape=False,#     input_shapes=None)onnx_sim_model, check = onnxsim.simplify(model_onnx)assert check, 'assert check failed'onnx.save(model_onnx, output_model_path)print('Onnx model save as {}'.format(output_model_path))

即可转换成功得到对应的onnx模型，可使用onnxruntime加载该onnx模型做推理。