[软件自动修复领域] 偏实证分析类论文阅读-2019年8月9日

文章目录

- 前言
- 论文列表
- - Better Test Cases for Better Automated Program Repair
  - A Theoretical and Empirical Analysis of Program Spectra Diagnosability
  - An empirical study on TensorFlow program bugs
  - An Empirical Study on Real Bug Fixes
  - LogTracker: Learning Log Revision Behaviors Proactively from Software Evolution History
  - Fine-grained and accurate source code differencing
- 小结

标题：[软件自动修复领域] 偏实证分析类论文阅读-2019年8月9日

前言

写这类博客的目的：
1）保持自己对领域内前沿研究的了解；
2）与诸君分享所看文章（有些文章确实挺经典的，值得一读）。
故而，希望每日一作。苟日新，日日新，又日新。

论文列表

Better Test Cases for Better Automated Program Repair

来源：自己之前关注的。

引用：
@inproceedings{yang_better_2017,
title = {Better test cases for better automated program repair},
booktitle = {Proceedings of the 2017 11th {Joint} {Meeting} on {Foundations} of {Software} {Engineering}},
publisher = {ACM},
author = {Yang, Jinqiu and Zhikhartsev, Alexey and Liu, Yuefei and Tan, Lin},
year = {2017},
pages = {831–841},
file = {Full Text:D:\software\zotero\pdf-download-path\storage\7LNTE8EC\Yang 等。 - 2017 - Better test cases for better automated program rep.pdf:application/pdf;Snapshot:D:\software\zotero\pdf-download-path\storage\ILP9SJ36\citation.html:text/html}
}

框架：
3 Approach

Overview.（作流程图）
Challenges.（摆难题）
3.1 Generating New Test Cases Using Fuzz Testing
3.2 Generating Memory-Safety Oracles（具体的技术）
3.3 Measuring the Overfitness of a Patch Using an Overfitness Metric (O-measure)（顺手提了一个O-measure 度量指标。还用了很多理论证明。。这里我没太看明白，定义倒是看懂了，但是后续证明没懂。还需细看）
3.4 An Optimized Setting of Opad

4 EVALUATION
Experimental Setup.
RQ1: How many overfitted patches does Opad filter out?

Motivation.
Approach.
Results.

…

5 THREATS TO VALIDITY

这种结构确实自成一派。

A Theoretical and Empirical Analysis of Program Spectra Diagnosability

来源：想的

引用：

@article{perez_theoretical_2019,
title = {A {Theoretical} and {Empirical} {Analysis} of {Program} {Spectra} {Diagnosability}},
journal = {IEEE Transactions on Software Engineering},
author = {Perez, Alexandre and Abreu, Rui and Van Deursen, Arie},
year = {2019},
file = {Snapshot:D:\software\zotero\pdf-download-path\storage\CPKNS6QM\8627980.html:text/html}
}

基本内容：
Current metrics for assessing the adequacy of a test-suite plainly focus on the number of components (be it lines, branches, paths) covered by the suite, but do not explicitly check how the tests actually exercise these components and whether they provide enough information so that spectrum-based fault localization techniques can perform accurate fault isolation. We propose a metric, called DDU, aimed at complementing adequacy measurements by quantifying a test-suite’s diagnosability, i.e., the effectiveness of applying spectrum-based fault localization to pinpoint faults in the code in the event of test failures. Our aim is to increase the value generated by creating thorough test-suites, so they are not only regarded as error detection mechanisms but also as effective diagnostic aids that help widely-used fault-localization techniques to accurately pinpoint the location of bugs in the system. We have performed a topology-based simulation of thousands of spectra and have found that DDU can effectively establish an upper bound on the effort to diagnose faults. Furthermore, our empirical experiments using the Defects4J dataset show that optimizing a test suite with respect to DDU yields a 34% gain in spectrum-based fault localization report accuracy when compared to the standard branch-coverage metric.

有意思。
1）针对的是测试用例的metric（充分性度量）
2）提到了upper bound，虽然这个abstract我没看懂。但是这个有意思
3）DDU还在Defects4J上做了测试，yields a 34% gain in SFL report accuracy 这个厉害。

在这里插入图片描述这个simulator，让我想到了之前的工作。。。

大牛写文章，真的不一样。。。
有时间得好好揣摩下这里面的各种公式。

An empirical study on TensorFlow program bugs

来源：想的
引用：
@inproceedings{zhang_empirical_2018,
title = {An empirical study on {TensorFlow} program bugs},
booktitle = {Proceedings of the 27th {ACM} {SIGSOFT} {International} {Symposium} on {Software} {Testing} and {Analysis}},
publisher = {ACM},
author = {Zhang, Yuhao and Chen, Yifan and Cheung, Shing-Chi and Xiong, Yingfei and Zhang, Lu},
year = {2018},
pages = {129–140},
file = {Full Text:D:\software\zotero\pdf-download-path\storage\M65CI8ZU\Zhang 等。 - 2018 - An empirical study on TensorFlow program bugs.pdf:application/pdf;Snapshot:D:\software\zotero\pdf-download-path\storage\5Z4H6JLI\citation.html:text/html}
}

内容：
Deep learning applications become increasingly popular in impor- tant domains such as self-driving systems and facial identity sys- tems. Defective deep learning applications may lead to catastrophic consequences. Although recent research efforts were made on test- ing and debugging deep learning applications, the characteristics of deep learning defects have never been studied.

这个点，很妙。

好词好句：

have been proposed to facilitate programming of such applications.
applications differs significantly from that of traditional applications
The development of DL applications often faces tasks that are seldom encountered in developing their traditional counterparts, e.g., configuring
As DL is increasingly adopted for mission-critical applications
defective DL applications can lead to catastrophic consequences
Despite these efforts, the characteristics of defects in DL applications have never been systematically studied. In particular, it is still unclear what new challenges the paradigm shift from traditional program languages to DL languages bring to fault detection and localization.

这是高手。。。写的太好了

To ease presentation, we refer to the defects in TF programs as bugs. We
（这表述真的太专业了。。。）
Our study has led to multiple findings. In particular, we identify
four types of symptoms, seven types of root causes, five challenges in detection and fault localization, and five strategies that the TF users have adopted to address the challenges.
（这种实证真的很牛）

组织方式：
The rest ofthe paper is organized as follows. In Section 2, we pro-
vide a background ofprogramming over the TensorFlow framework. In Section 3, we propose three research questions. In Section 4, we present howwe collected our data. In Section 5, 6, and 7, we answer these three research questions respectively

框架：
3 RESEARCH QUESTIONS
4 DATA COLLECTION
5 RQ1: SYMPTOMS AND ROOT CAUSES
6 RQ2
7 RQ3
8 THREATS TO VALIDITY

First, our study involves manual inspections on bugs. These subjec- tive steps can be biased due to our inference of the code’s intention in the lack of documentation. In order to reduce this threat, two authors analyzed the bugs separately and discussed inconsistent issues until an agreement was reached. 很有意思，很值得学习。

学到了。。。

Two groups of people can benefit from this study. For TF users,
we summarized five strategies used by other TF users to detect and debug the bugs in TF programs. For software engineering researchers, we pointed out five new challenges which call for more research efforts. Our classification of causes and symptoms offers both TF users and software engineering researchers a better understanding of deep learning program bugs.

（字字珠玑。多学多读）

An Empirical Study on Real Bug Fixes

来源：联想

引用：
@inproceedings{zhong_empirical_2015,
title = {An empirical study on real bug fixes},
booktitle = {Proceedings of the 37th {International} {Conference} on {Software} {Engineering}-{Volume} 1},
publisher = {IEEE Press},
author = {Zhong, Hao and Su, Zhendong},
year = {2015},
pages = {913–923},
file = {Full Text:D:\software\zotero\pdf-download-path\storage\SMA23DDF\Zhong 和 Su - 2015 - An empirical study on real bug fixes.pdf:application/pdf}
}

这个调查的覆盖面有点广，而且结论很多，很杂。
很有意思

II. METHODOLOGY （主要讲数据集的收集制作）
A. Dataset
B. Research Questions

III. EMPIRICAL RESULTS
A. RQ1: Fault Distribution
B-F （全是RQs）
G. Threats to Validity

IV. DISCUSSIONS AND FUTURE WORK

LogTracker: Learning Log Revision Behaviors Proactively from Software Evolution History

来源：联想
引用：

内容：
For taking the first step towards solving the second problem, this paper is inspired by code clones and assumes that logging code with similar context is pervasive in software and deserves similar modifications. To verify our as- sumptions, we conduct an empirical study on eight open-source projects.

这个挺有意义的。

Fine-grained and accurate source code differencing

来源：想看看GumTree这个工具

引用：
@inproceedings{falleri_fine-grained_2014,
title = {Fine-grained and accurate source code differencing},
booktitle = {Proceedings of the 29th {ACM}/{IEEE} international conference on {Automated} software engineering},
publisher = {ACM},
author = {Falleri, Jean-Rémy and Morandat, Floréal and Blanc, Xavier and Martinez, Matias and Monperrus, Martin},
year = {2014},
pages = {313–324},
file = {Snapshot:D:\software\zotero\pdf-download-path\storage\P8KLC9H5\citation.html:text/html}
}

内容：
At the heart of software evolution is a sequence of edit actions, called an edit script, made to a source code file. Since software systems are stored version by version, the edit script has to be computed from these versions, which is known as a complex task. Existing approaches usually compute edit scripts at the text granularity with only add line and delete line actions. However, inferring syntactic changes from such an edit script is hard. Since moving code is a frequent action performed when editing code, it should also be taken into account. In this paper, we tackle these issues by introducing an algorithm computing edit scripts at the abstract syntax tree granularity including move actions. Our objective is to compute edit scripts that are short and close to the original developer intent. Our algorithm is implemented in a freely-available and extensible tool that has been intensively validated.

原来是一个计算、展示代码差分的工具。