MTANet: 多任务注意力网络，用于自动医学图像分割和分类| 文献速递-深度学习结合医疗影像疾病诊断与病灶分割

Title

题目

MTANet: Multi-Task Attention Network for Automatic Medical Image Segmentation and Classification

MTANet: 多任务注意力网络，用于自动医学图像分割和分类

文献速递介绍

医学图像分割和分类是当前临床实践中的两个关键步骤，其准确性主要取决于个别临床医生的专业知识。计算机辅助诊断（CAD）系统在医学图像诊断中受到广泛关注，旨在帮助临床医生以更准确和客观的方式进行诊断决策。近年来，基于机器学习特别是深度学习的方法，在包括医学图像分割和分类在内的许多医学图像任务中取得了显著进展。

卷积神经网络（CNNs）在许多医学图像分割任务中取得了显著成功。特别是，UNet通过端到端的像素级预测在医学图像分割方面取得了重大突破。UNet引入的编码器和解码器之间的跳跃连接将低分辨率特征融入高分辨率特征中，以提高分割能力。受到UNet成功的启发，近年来大多数领先的模型都建立在UNet架构的基础上，包括ResUNet、DenseUNet、UNet++、DoubleUNet、集成学习等。

然而，这些方法主要集中在医学对象的整个区域，对于检测小的医学对象的敏感性较低。注意力机制在transformer模型成功应用后引起了广泛关注。注意力机制不使用所有可用特征，而是选择一部分相关的感知信息来检测显著特征。在自然场景图像分割网络取得成功后，注意力机制被引入到许多医学图像分割工作中，如Focus UNet、MedT、TransUNet和UACANet等。这些方法在医学分割任务上表现出色，但很少考虑解码器中的高分辨率特征和编码器与解码器之间的连接。

此外，基于transformer的架构已经在语义分割任务中展示了最先进的性能。受到Vision Transformer-based方法（VIT）的发展启发，最近的transformer-based骨干网络在性能上已经达到或超过了基于CNN的骨干网络。

Abstract

摘要

Medical image segmentation and classifica**tion are two of the most key steps in computer-aidedclinical diagnosis. The region of interest were usuallysegmented in a proper manner to extract useful featuresfor further disease classification. However, these methodsare computationally complex and time-consuming. In thispaper, we proposed a one-stage multi-task attention network (MTANet) which efficiently classifies objects in animage while generating a high-quality segmentation maskfor each medical object. A reverse addition attention modulewas designed in the segmentation task to fusion areas inglobal map and boundary cues in high-resolution features,and an attention bottleneck module was used in the classification task for image feature and clinical feature fusion.We evaluated the performance of MTANet with CNN-basedand transformer-based architectures across three imagingmodalities for different tasks: CVC-ClinicDB dataset forpolyp segmentation, ISIC-2018 dataset for skin lesion segmentation, and our private ultrasound dataset for liver tumorsegmentation and classification. Our proposed model outperformed state-of-the-art models on all three datasets andwas superior to all 25 radiologists for liver tumor diagnosis.

医学图像分割和分类是计算机辅助临床诊断中最关键的两个步骤。通常需要以适当的方式对感兴趣的区域进行分割，以提取有用的特征进行进一步的疾病分类。然而，这些方法在计算上非常复杂且耗时。在本文中，我们提出了一种一阶段多任务注意力网络（MTANet），旨在高效地对图像中的对象进行分类，并生成每个医学对象的高质量分割掩膜。在分割任务中，我们设计了逆加注意力模块，用于融合全局地图中的区域和高分辨率特征中的边界线索；在分类任务中，采用了注意力瓶颈模块，用于图像特征和临床特征的融合。我们使用基于CNN和Transformer的架构在三种成像模态下评估了MTANet的性能：CVC-ClinicDB数据集用于息肉分割，ISIC-2018数据集用于皮肤病变分割，以及我们的私有超声数据集用于肝肿瘤分割和分类。我们的模型在所有三个数据集上均优于现有模型，并在肝肿瘤诊断方面优于所有25名放射科医生。

Method

方法

Figure 2 showed the overview of the proposed MTANetwhich used a reverse addition attention module with a parallel partial decoder in the decoder of basic UNet model toobtain more high-resolution features for segmentation branchand attention bottleneck modules in the fully connectedlayers to fusion image feature and clinical feature for classification branch. Each component will be introduced asfollows.

图2显示了提出的MTANet的概述，该网络在基本UNet模型的解码器中使用了逆加注意力模块和并行部分解码器，以获取更多用于分割分支的高分辨率特征，并在全连接层中使用了注意力瓶颈模块来融合图像特征和临床特征用于分类分支。接下来将分别介绍每个组件。

Conclusion

结论

In conclusion, we proposed an end-to-end one-stage network MTANet for automatic medical image analysis. Reverseaddition attention module was designed to fusion areas inglobal map and boundary cues in high-resolution features,and an attention bottleneck module was introduced to balancethe clinical features and image features. Both CNN-based andtransformer-based architectures were proposed. Experimentson three datasets of different imaging modalities demonstratedthe capability of the proposed MTANet.

综上所述，我们提出了一种端到端的一阶段网络MTANet，用于自动医学图像分析。我们设计了逆加注意力模块，用于融合全局地图中的区域和高分辨率特征中的边界线索，同时引入了注意力瓶颈模块来平衡临床特征和图像特征。我们提出了基于CNN和基于transformer的架构。在不同成像模态的三个数据集上的实验表明了MTANet的能力。

Figure

图

Fig. 1. Flowchart shows patient enrollment process.

图1. 流程图展示了患者入组过程。

Fig. 2. Overview of the proposed MTANet.

图2. 提出的MTANet的概述

Fig. 3. Qualitative segmentation results for automatic medical image segmentation. Green lines denote the ground truth while red lines denote thepredicted results of our model.

图3. 自动医学图像分割的定性结果展示。绿线表示地面真实结果，红线表示我们模型的预测结果。

Fig. 4. Struction of models. Model-I denotes the single classification network. Model-II denotes the two-stage classification network. Model-IIIdenotes our one-stage classification network.

图4. 模型结构。Model-I 表示单一分类网络。Model-II 表示两阶段分类网络。Model-III 表示我们的一阶段分类网络。