精读 Generating Mammography Reports from Multi-view Mammograms with BERT

精读(非常推荐) Generating Mammography Reports from Multi-view Mammograms with BERT(上)

这里的作者有个叫 Ilya 的吓坏我了

1. Abstract

Writing mammography reports can be errorprone and time-consuming for radiologists. In this paper we propose a method to generate mammography reports given four images, corresponding to the four views used in screening mammography. To the best of our knowledge our work represents the first attempt to generate the mammography report using deep-learning. We propose an encoder-decoder model that includes an EfficientNet-based encoder and a Transformerbased decoder. We demonstrate that the Transformer-based attention mechanism can combine visual and semantic information to localize salient regions on the input mammograms and generate a visually interpretable report. The conducted experiments, including an evaluation by a certified radiologist, show the effectiveness of the proposed method. Our code is available at
代码: https://github.com/sberbank-ai-lab/mammo2text.

在这里插入图片描述

2. Introduction

Breast cancer represents a global healthcare problem (Glo, 2016). Increasing numbers of new cases and deaths are observed in both developed and less developed countries, only partially attributable to the increasing population age. Serial screening with mammography is the most effective method to detect early stage disease and decrease mortality. The goal of screening is to detect breast cancers when still curable to decrease breast cancer-specific mortality (Duffy et al., 2020).
初衷是在可治愈的前提下,减少死亡率

The European Society of Breast Imaging (EUSOBI) together with 30 national breast radiology bodies recommend that only qualified radiologists should be involved in screening programs. (Sardanelli et al., 2017).As the amount of organized breast screening programs grows across the world, the burden on radiologists increases with it. In National screening programs such as in Holland or Sweden, radiologists may need to read 100 radiology images per hour (Abbey et al., 2020). With a growing number of screening programs , we need more trained radiologists and new technologies that can make their workflow more effective. Since one of the most time consuming procedures in radiology is writing medical-imaging reports, we explore the potential for deep-learning to automatically generate diagnostic reports of screening mammograms.
提出由于工作负担导致,智能生成报告的背景
The rapid evolution of deep learning and artificial intelligence technologies enables them to be used as a strong tool for providing clinical decision-making support to the medical community. While many problems in the area of medical imaging and text analysis have been addressed effectively, there is no known approach to generating clinical reports for mammography studies. There are various reasons for this, such as the requirements regarding the accuracy, completeness and diagnostic relevance of the clinical information contained in the report. In this article, we present a framework (Figure1) that takes mammograms as an input, automatically generates mammography reports, and visualizes the attention of the model to provide the interpretability of the process.
We use an encoder-decoder architecture, where the encoder extracts visual features and the decoder generates reports. We adopt a convolutional neural network, specifically EfficientNet (M Tan, 2019), to extract visual features of the four images, corresponding to the four views used in screening mammography.

引文:
Tan, M., & Le, Q. (2019, May). Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning (pp. 6105-6114). PMLR.
EfficientNet (M Tan, 2019) 实际上是一种构建视觉模型网络的范式,😐 为什么要使用这样的视觉模型?如何更好的构建起来一个更好的混合视觉模型,如何组合参数,这里之所以使用这个是不是因为,医学图像并 不同于 自然图像,尤其是钼靶图像这样的有精确化的钙化点,会不会是就是它 重新思考 构建视觉模型结构的原因。

😐 这里实际上,作者解释了对于乳腺钼靶这样的高分辨率图像,使用 EfficientNet B0 可以效率更高!
We use a deep multi-view (N Wu, 2019) CNN based on EfficientNet B0 (M Tan, 2019). We chose EfficientNet B0 because it is relatively lightweight and fits in GPU memory when using high resolution images. We have one EfficientNet instance for all views (R-CC, L-CC, R-MLO, L-MLO), i.e. model weights are shared. The first convolutional layer is replaced to accept a one-channel image. The last fully-connected layer of EfficientNet is discarded. Outputs from all four views are averaged by channels and one fully connected layer is added.


For language modeling, we utilize BERT (Devlin et al., 2018), inserting an additional attention sub-layer to perform multi-head attention over the regional feature embeddings produced by the encoder.

We modify the Transformerbased attention mechanism (Vaswani et al., 2017) such that it attends to the visual information on four mammography views and previously generated words. We use the attention scores to build visually interpretable image-text attention mappings.

In addition to that, we conduct a series of indepth quantitative and qualitative experiments with the help of an experienced radiologist to demonstrate the clinical validity of our approach. We compare the predictions of our models with the ground truth to understand where the models make mistakes and demonstrate that our best model successfully describes different parts of the breast, and detects pathological regions and abnormalities. We evaluate the image-text attention mappings to demonstrate the interpretability of our model. As far as we are aware, our work represents the first attempt to generate the mammography report using deep-learning.

重点看看那个attention map,和视觉模型的比例,从论文上看效果非常好

To summarize, we make the following contributions in this paper:
  1. We propose a novel framework for mammography report generation using EfficientNet in the
    encoder and BERT in the decoder.
  2. We demonstrate that the Transformer-based attention mechanism can combine visual and textual
    information to localize salient regions on the input mammograms and generate a visually interpretable
    report.
  3. We conduct doctor evaluation and extensive experiments with automatic metrics to show the effectiveness of the proposed framework.
  4. We conduct a qualitative analysis including interpretation of image-text attention mappings to demonstrate how the model is able to generate mammography reports in a meaningful way.

3. Related work

The task of image captioning is creating a model that given a previously unseen query image generates a caption that is both grammatically and semantically correct. The main approaches to image captioning are retrieval-based, template-based and novel caption generation.

方法汇总
  1. Retrieval-based, 检索式(Retrieval-based): 这种方法通过在一个预先定义的数据库中搜索最匹配当前图像的描述来工作。数据库中的描述是由人类创建的,针对不同的图像。当给定一个新图像时,系统会尝试找到与之最相似的图像(或图像集),然后将找到的图像的描述作为新图像的描述。这种方法的优点是生成的描述文本质量较高,因为所有的描述都是人类编写的。但是,它的缺点是难以扩展到新的、未见过的图像,而且在数据库中找到精确匹配的图像可能很困难。
  2. Template-based 模板式(Template-based): 这种方法使用预定义的模板来生成描述,模板中包含可变的插槽,这些插槽可以根据图像的内容动态填充。例如,模板可以是“这是一张关于[对象]的照片,在[场景]中”,其中“[对象]”和“[场景]”会根据图像识别的结果填充。模板方法的优点是易于实现和理解,而且生成的文本通常语法正确。然而,它的缺点是生成的描述可能缺乏多样性和创造性,因为所有的描述都是基于固定模板生成的。
  3. Novel caption generation. 新颖描述生成(Novel Caption Generation): 这种方法使用深度学习模型,如卷积神经网络(CNN)和循环神经网络(RNN)或Transformer模型,直接从图像中生成新颖的描述。这种方法不依赖于预定义的模板或数据库,而是通过学习大量图像和其对应描述的数据集,使模型能够学会如何根据图像的内容生成描述。新颖描述生成方法的优点是能够创造出多样化且丰富的描述,而且可以应用于未见过的图像。然而,这种方法的挑战在于需要大量的标注数据来训练模型,且模型的训练计算成本较高。

In retrieval-based methods (Hodosh et al., 2013), (Ordonez et al., 2011) candidate captions for query images are selected from a pool of existing captions based on some measure of similarity. The downside of this approach is the inability to generate novel image-specific captions.

In template-based methods (Farhadi et al., 2010), (Kulkarni et al., 2013), (Li et al., 2011) image captions are generated by filling the blanks in fixed templates. These methods can generate grammatically and semantically correct novel captions not present in the training set but cannot generate variable-length captions.

Novel caption generation methods (Xu et al., 2015), (Yao et al., 2017), (You et al., 2016) use a representation of the query image as an input for a language model responsible for generating the captions. This approach follows the encoder-decoder architecture first applied to machine translation tasks (Cho et al., 2014).

To generate an image caption, a representation of the image must first be constructed either via generating handcrafted features or extracting such features automatically, for example using deep neural networks. Examples of hand-crafted features are local binary patterns (Ojala et al., 2002), scaleinvariant keypoints (Lowe, 2004), or histograms of oriented gradients (Dalal and Triggs, 2005). Automatic feature extraction from images is commonly used by applying convolutional neural networks (CNN) (LeCun et al., 1998) to the query image. These features may be further enhanced, for example by using a spatial Transformer (Pedersoli et al.,2017).

A sub-field of image captioning is diagnostic captioning (DC). Diagnostic captioning is automatic generation of diagnostic text based on a set of medical images of a patient. DC systems can increase the speed of producing a report for experienced physicians and decrease the number of diagnostic errors for inexperienced doctors (for a recent survey on DC methods see (Pavlopoulos et al., 2021)). The majority of the work in DC is done using encoder-decoder architecture. In addition to evaluation of grammatical and semantical correctness of captions, which is commonly assessed by calculating lexical overlap between generated captions and ground truth (Pavlopoulos et al., 2019), DC quality can be assessed by clinical correctness by conducting clinical experiments with physicians evaluating the generated reports (Zhang et al., 2019), (Liu et al., 2019).

Language models commonly used in DC usually apply recurrent neural networks (RNN) such as LSTM (Hochreiter and Schmidhuber, 1997), see (Vinyals et al., 2015) (Xu et al., 2015), with works using Transformer-based models beginning to appear (Chen et al., 2020) . A common approach in DC is the use of ’visual attention’ that allows the decoder to focus on particular areas of input images when generating the captions (Jing et al., 2017), (Yuan et al., 2019). Such mechanisms also can be used to highlight the regions of interest on the input images adding to the interpretability of the models (Zhang et al., 2017).

(Chen et al., 2020)
Zhihong Chen

(Jing et al., 2017)
Baoyu Jing

(Yuan et al., 2019)
Jianbo Yuan,

We split the dataset into the training, validation and test subsets in the proportion of 91%, 4% and 5% respectively (having 22463, 934 and 1229 cases in each subset). The splits are the same for encoder
这里可能一个case对应多个标签,所以,这里的labels并不是总数的和。
在这里插入图片描述
这里数据集划分的很仔细,值得学习,但是具体如何使用,它应该是使用了一种方法,尽量使得种类平衡。

We use a deep multi-view (N Wu, 2019) CNN based on EfficientNet B0 (M Tan, 2019). We chose EfficientNet B0 because it is relatively lightweight and fits in GPU memory when using high resolution images. We have one EfficientNet instance for all views (R-CC, L-CC, R-MLO, L-MLO), i.e. model weights are shared . The first convolutional layer is replaced to accept a one-channel image. The last fully-connected layer of EfficientNet is discarded. Outputs from all four views are averaged by channels and one fully connected layer is added.

模型的设计和我预想大体的一致,但是也有很多不同,提供了一个很好的思路。
  1. 第一层卷积核并没有使用超大卷积核,而是正常的卷积核,同时,输入通道为1,而不是3
  2. 所有的图片经过同一个视觉编码器进行学习,所谓的分享权重
  3. 所有的output通过平均相加,最终得到输出特征
  4. 去掉了最后一层的全连接层

The encoder is pretrained to predict multilabel targets important for diagnosis in mammography screening, shown in Table 1. The binary targets were extracted with regular expressions from text descriptions of the studies. Targets № 0-4 are typical pathological changes in breasts tissues. During training, the images are cropped and resized to 1350x900 px.

这里的数据集处理的比我好,同时视觉模型的大概肯定也比我自己设计的好,但是预训练的过程现在可以使用更多方法。因为现在有了CLIP,GLoRIA这样的模型进行预训练模型结构。

跳过模型的结构,先看模型的测试部分

我觉得这里展示了一个很好的模型测试范式,这里的random很好的说明了模型的效果例子,同时不同于之前看到的BLEU, METEOR, ROUGE-L 这里还告诉了我们可以使用CIDEr模型。
在这里插入图片描述

评估指标:
  1. BLEU (Bilingual Evaluation Understudy):这是机器翻译质量评估中使用最广泛的指标之一。它通过计算机器生成的翻译和一组人工翻译之间n-grams的重叠来评估翻译的质量。BLEU分数越高,意味着生成的翻译和参考翻译之间的重叠越多,通常认为翻译质量越好。

  2. METEOR (Metric for Evaluation of Translation with Explicit Ordering):它是对BLEU的改进,不仅考虑了单词的精确匹配,还考虑了词形、同义词和词序的匹配。METEOR也会对匹配的单词进行加权,给予不同类型的匹配(如词干匹配或同义词匹配)不同的重要性。

  3. ROUGE-L (Recall-Oriented Understudy for Gisting Evaluation - Longest Common Subsequence):ROUGE主要用于评估自动文本摘要或机器翻译的质量。ROUGE-L的“L”代表最长公共子序列(LCS),它考虑了候选文本和参考文本之间的最长公共子序列。这个度量考虑了候选文本和参考文本中的词序,使用最长公共子序列来评估它们之间的相似度。

  4. CIDEr (Consensus-based Image Description Evaluation):专为评估图像描述任务设计的度量,通过计算候选描述和参考描述集中n-grams的相似性来衡量描述的质量。CIDEr特别强调词汇的独特性,通过TF-IDF统计来增加稀有词汇的权重,以鼓励生成的描述能够反映出图片的特定和独特内容。

在这里插入图片描述

这里更甚使用了具体到病情的评估指标,这种好的思路真是太好了,太值得学习了

在这里插入图片描述

这样的展示图片真的太完美了,太值得学习了,俄罗斯的人工智能搞的是真好

模型的结构在(下)解析,这里跳到结论部分

In this paper we present a first-of-its-kind framework for generating mammography reports given four mammography views using deep-learning. Our model utilizes pretrained models including EfficientNet for visual extraction and BERT for report generation. We demostrate that the Transformerbased attention mechanism that simultaneously attends to four mammography views and text from the report significantly improves the performance. Our method provides a novel perspective for breast screening: generating mammography reports and providing image-text attention mappings, which makes the automatic breast screening process semantically and visually interpretable. The validity of our approach is confirmed by the corresponding doctor evaluation. In the conducted qualitative analysis we demonstrate that our best model successfully detects pathological regions, and describes abnormalities and parts of the breast.

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.rhkb.cn/news/297360.html

如若内容造成侵权/违法违规/事实不符,请联系长河编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

clickhouse 源码编译部署

clickhouse 源码编译部署 版本 21.7.9.7 点击build project,编译工程,经过一定时间(第一次编译可能几个小时,后续再编译,只编译有改动的文件)生成release目录 在cmake-build-release → programs目录下…

Java集合(个人整理笔记)

目录 1. 常见的集合有哪些? 2. 线程安全的集合有哪些?线程不安全的呢? 3. Arraylist与 LinkedList 异同点? 4. ArrayList 与 Vector 区别? 5. Array 和 ArrayList 有什么区别?什么时候该应 Array而不是…

STM32L4R9 的 QuadSPI Flash 通讯速率不理想

1. 引言 客户反应 STM32L4R9 同 QSPI Flash 通讯,测出来的读取速率为 10MB/s, 和理论值相差较大。 2. 问题分析 按照客户的时钟配置和 STM32L4R9 的数据手册中的数据,OSPI 读数速率为 10MB/s 肯定存在问题。同时我们也可以在 AN4760 应用手…

c++20协程详解(三)

前言 前面两节我们已经能够实现一个可用的协程框架了。但我们一定还想更深入的了解协程,于是我们就想尝试下能不能co_await一个协程。下面会涉及到部分模板编程的知识,主要包括(模板偏特化,模板参数列表传值,模板函数…

理论实践-CPU性能监控工具-uptime-mpstat-pidstat-vmstat-top-ps-perf

CPU 性能工具。 首先,平均负载的案例。我们先用 uptime, 查看了系统的平均负载;而在平均负载升高后,又用 mpstat 和 pidstat ,分别观察了每个 CPU 和每个进程 CPU 的使用情况,进而找出了导致平均负载升高的…

risc-v向量扩展strlen方法学习

riscv向量文档中给出了strlen的实现, 大概是这么一个思路, 加载向量: 使用向量加载指令(如 vload)从内存中加载一个向量长度的字符。比较向量与零: 使用向量比较指令(如 vmask 或 vcmpeq)来检查向量中的每…

【Spring篇】Spring IoC DI

个人主页:兜里有颗棉花糖 欢迎 点赞👍 收藏✨ 留言✉ 加关注💓本文由 兜里有颗棉花糖 原创 收录于专栏【Spring系列】 本专栏旨在分享学习Spring MVC的一点学习心得,欢迎大家在评论区交流讨论💌 目录 前言一、IoC二、…

HTMLCSSJS

HTML基本结构 <html><head><title>标题</title></head><body>页面内容</body> </html> html是一棵DOM树, html是根标签, head和body是兄弟标签, body包括内容相关, head包含对内容的编写相关, title 与标题有关.类似html这种…

STM32-05基于HAL库(CubeMX+MDK+Proteus)串行通信案例(中断方式接收命令)

文章目录 一、功能需求分析二、Proteus绘制电路原理图三、STMCubeMX 配置引脚及模式&#xff0c;生成代码四、MDK打开生成项目&#xff0c;编写HAL库的功能代码五、运行仿真程序&#xff0c;调试代码 一、功能需求分析 在中断机制实现按键检测的案例之后&#xff0c;我们介绍串…

Flink运行机制相关概念介绍

Flink运行机制相关概念介绍 1. 流式计算和批处理2. 流式计算的状态与容错3. Flink简介及其在业务系统中的位置4. Flink模型5. Flink的架构6. Flink的重要概念7. Flink的状态、状态分区、状态缩放&#xff08;rescale&#xff09;和Key Group8. Flink数据交换9. 时间语义10. 水位…

sky06笔记下

1.边沿检测 检测输入信号din的上升沿&#xff0c;并输出pulse module edge_check ( clk, rstn, din, pulse ); input wire clk,rstn; input wire din; output reg pulse;wire din_dly;always (posedge clk or negedge rstn)beginif(!rstn)din_dly < 1b0;elsedin_dly < d…

【Qt】:常用控件(四:显示类控件)

常用控件 一.Lable二.LCD Number 一.Lable QLabel 可以⽤来显⽰⽂本和图⽚. 代码⽰例:显⽰不同格式的⽂本 代码⽰例:显⽰图⽚ 此时,如果拖动窗⼝⼤⼩,可以看到图⽚并不会随着窗⼝⼤⼩的改变⽽同步变化 为了解决这个问题,可以在Widget中重写resizeEvent函数。当用户把窗口从A拖…

【Android、 kotlin】kotlin学习笔记

基本语法 fun main(){val a2var b "Hello"println("$ (a - 1} $b Kotlin!")} Variables 只赋值一次用val read-only variables with val 赋值多次用var mutable variables with var Standard output printin() and print() functions String templ…

【JavaScript】函数 ⑦ ( 函数定义方法 | 命名函数 | 函数表达式 )

文章目录 一、函数定义方法1、命名函数2、函数表达式3、函数表达式示例 一、函数定义方法 1、命名函数 定义函数的标准方式 就是 命名函数 , 也就是之前讲过的 声明函数 ; 函数 声明后 , 才能被调用 ; 声明函数的语法如下 : function functionName(parameters) { // 函数体 …

SpringBoot整合ELK8.1.x实现日志中心教程

目录 背景 环境准备 环境安装 1.JDK安装 2.安装Elasticsearch 3.安装zookeeper 4.安装Kafka 5.安装logstash 6.安装file beat 解决方案场景 1.日志采集 1.1 应用日志配置 1.1.1 创建logback-spring.xml文件 1.1.2 创建LoggerFactory 1.1.3 trace日志的记录用法 …

flutter官方案例context_menus【搭建与效果查看】【省时】

案例地址 https://github.com/flutter/samples/tree/main/context_menus 1&#xff1a;运行查看有什么可以快捷使用的&#xff0c;更新了些什么&#xff0c;可不可以直接复制粘贴 主要内容&#xff1a;在web端中模拟手机类型的点击长按操作&#xff0c;不能直接运行在安卓与io…

解决VScode中matplotlib图像中文显示问题

一、更改配置文件 参考这个文件路径找到自己Python环境下的matplotlibrc文件并用记事本打开。 用ctrl F寻找下面的这两行并将前面的#删除&#xff0c;保存并退出。 font.family: sans-serif font.serif: DejaVu Serif, Bitstream Vera Serif, Computer Modern Roman, N…

基于springboot实现校园资料分享平台系统项目【项目源码+论文说明】计算机毕业设计

基于springboot实现校园资料分享平台演示 摘要 随着信息互联网购物的飞速发展&#xff0c;国内放开了自媒体的政策&#xff0c;一般企业都开始开发属于自己内容分发平台的网站。本文介绍了校园资料分享平台的开发全过程。通过分析企业对于校园资料分享平台的需求&#xff0c;创…

WPF中通过自定义Panel实现控件拖动

背景 看到趋时软件的公众号文章&#xff08;WPF自定义Panel&#xff1a;让拖拽变得更简单&#xff09;&#xff0c;发现可以不通过Drag的方法来实现ListBox控件的拖动&#xff0c;而是通过对控件的坐标相加减去实现控件的位移等判断&#xff0c;因此根据文章里面的代码,边理解边…

跳跃游戏-java

题目描述: 给你一个非负整数数组 nums &#xff0c;你最初位于数组的 第一个下标 。数组中的每个元素代表你在该位置可以跳跃的最大长度 判断你是否能够到达最后一个下标&#xff0c;如果可以&#xff0c;返回 true &#xff1b;否则&#xff0c;返回 false 。 解题思想: …