Every now and then, we encounter graphs and charts that fail to represent the spirit of the underlying data. This may be hard to believe, given the advancement of understanding of statistics and technology, however, “junky-charts” often find their place in scientific literature and media outlets.
时不时地,我们会遇到无法代表基础数据精神的图形和图表。 鉴于对统计和技术的了解不断提高,这可能令人难以置信,但是,“垃圾图表”经常在科学文献和媒体中找到其位置。
One recent example is the Georgia Department of Public Health’s blundering of a bar graph (Figure 1) where they reordered x-axis of time (days) to show a downward trajectory of COVID-19 cases in the state (You can read more about it in The Atlanta Journal Constitution).
最近的一个例子是佐治亚州公共卫生部对条形图的错误处理(图1),其中他们重新排列了x轴时间(天),以显示该州COVID-19病例的下降轨迹(您可以阅读有关它的更多信息) (亚特兰大期刊宪法 ))。
Given the sensitivity of the information, this error by the Georgia Department of Public Health could have led to a damaging situation. They later rectified this mistake. But why do these graphical errors keep occurring in the first place? Answers to these questions can be found in The Visual Display of Quantitative Information, first published in 1982 by Dr. Edward Tufte.
考虑到信息的敏感性,乔治亚州公共卫生部的这一错误可能导致破坏性情况。 他们后来纠正了这个错误。 但是,为什么这些图形错误始终首先发生? 这些问题的答案可以在1982年由Edward Tufte博士首次出版的《定量信息的视觉显示》中找到。
Tufte’s work provided a major breakthrough in the field of visualization and changed the illustrator’s perception of statistical graphics. He cites numerous examples of substandard charts throughout the book. He also lays out the principles on how to effectively narrate, investigate, and summarize data using graphical design. Reading this will help you refine your graphs which will aid you to become a better data scientist/analyst.
Tufte的工作在可视化领域取得了重大突破,并改变了插画家对统计图形的理解。 他在整本书中列举了许多不合格图表的示例。 他还提出了有关如何使用图形设计有效地叙述,研究和总结数据的原则。 阅读本文将帮助您完善图表,这将有助于您成为一名更好的数据科学家/分析师。
Let us first discuss the reasons for graphical distortions. Tufte explains three different doctrines of inferior graphical work.
让我们首先讨论图形失真的原因。 Tufte解释了劣等图形作品的三种不同学说。
Lack of Quantitative Skills of Professional Artists: Illustrators having no experience with little experience in statistics often lack competency in analyzing quantitative evidence. They often perceive charts and graphs as “create, concept, and style” rather than aiming to capture the essence of the data. This leads them to focus on “beautifying data” that compromises “statistical integrity”.
缺乏专业画家的量化技能:没有经验,缺乏统计学经验的插画家通常缺乏分析定量证据的能力。 他们通常认为图表是“创建,概念和样式”,而不是旨在捕获数据的本质。 这导致他们将重点放在损害“统计完整性”的“美化数据”上。
The Doctrine that Statistical Data is Boring: Designers of inept graphics often treat statistics as “boring” and “tedious”. Because of this misconception, they often unnecessarily inflate the evidence present in their datasets with decorative styles. Tufte mentioned in his book that the doctrine of boring data also serves political ends to promote certain interests over others (page 80).
统计数据无聊的学说:无能的图形设计者经常将统计数据视为“无聊的”和“乏味的”。 由于这种误解,他们通常会不必要地以装饰风格来夸大数据集中存在的证据。 图夫特在书中提到,无聊的数据学说也有政治目的,可以促进某些人对其他人的兴趣(第80页)。
The Doctrine that Graphics are Only for the Unsophisticated Readers: Illustrators who believe in this doctrine think that readers are not sophisticated enough to understand the complexity of words in the text. This leads them to unnecessarily beautify and animate their graphs to entertain their readers.
图形仅适用于不老练的读者的学说:相信这种学说的插画家认为,读者不够成熟,不足以理解文本中单词的复杂性。 这导致他们不必要地美化图表并为其添加动画效果以吸引读者。
These erroneous presumptions lead to: “graphical distortions” and “over decoration”. Graphical distortions can be attributed to “chart lies” and over decoration can be explained by the term “chartjunk”. Let us discuss what Tufte means by these terms.
这些错误的假设导致:“图形失真”和“装饰过度”。 图形失真可以归因于“图表谎言”,过度装饰可以用术语“图表垃圾”来解释。 让我们讨论一下Tufte用这些术语的含义。
Chart Lies: Chart lies can be quantified using lie factor. The formula for lie factor is given by
图表谎言:可以使用谎言因子量化图表谎言 。 谎言因子的公式为
In simple words, lie factor is a ratio between the effect of a graphic that represents the numbers in a dataset and the effect of the numerical quantities themselves. If the ratio is 1, the graphic is able to represent the real essence of the data. A ratio of more than 1 or less than 1 exaggerates or underrates the main theme of the data respectively.
简而言之,谎言系数是表示数据集中数字的图形效果与数值量自身效果之间的比率。 如果比率为1,则图形能够表示数据的真实本质。 大于1或小于1的比率分别夸大或低估了数据的主题。
Portraying different information to the audience other than what the graph is actually depicting also falls under the category of chart lies. Recently, the graph of the ratio between confirmed deaths and confirmed cases, also known as case fatality rate, has been put forward to support the argument that Coronavirus cases in the United States are under control. Contrary to the claim, a declining case fatality rate indicates that progress has been made in treating COVID-19 patients. The graph of the ratio between confirmed deaths and the total population, known as mortality rate, is a better way of communicating the information about the impact of the virus in a community.
除了图表实际描绘的内容之外,向受众描绘不同的信息也属于图表的类别。 最近,有人提出了确诊死亡人数与确诊病例数之比的图表,也称为病死率,以支持美国冠状病毒病例得到控制的论点。 与该索赔相反,病死率下降表明在治疗COVID-19患者方面取得了进展。 确认死亡人数与总人口之间的比率图表(即死亡率)是一种更好地传达有关病毒对社区影响的信息的更好方法。
Chartjunk: We may recall viewing a figure in a newspaper or in scientific literature that is heavily decorated with busy grid lines, cluttered information, or dark background colors. These extra decorations that do not provide any new information to the reader is called chartjunk. Some graphic designers create overly complex visualizations by adding chartjunk to demonstrate their artistic flair. Graphics, ideally, should be easy to understand. Very few readers want to go through puzzle-solving maneuvers simply to make sense of visualizations.
查特垃圾:我们可能还记得在报纸或科学文献中看到的一个人物,上面装饰着繁忙的网格线,混乱的信息或深色的背景色。 这些不向读者提供任何新信息的额外装饰物称为chartjunk。 一些图形设计师通过添加图表垃圾来展示其艺术才能,从而创建了过于复杂的可视化效果。 理想情况下,图形应该易于理解。 很少有读者希望仅通过可视化来进行解谜操作。
The most common chartjunk is a Moiré effect, where the artist plays with different graphical patterns to create an appearance of a vibration in the chart. Moiré vibrations create a distraction in the design and the reader often has to gaze back and forth between the legend and the graph which creates a “physiological tremor to the eye”.
最常见的图表垃圾是莫尔效应,艺术家在其中使用不同的图形模式进行播放,以在图表中产生振动的外观。 莫尔振动会干扰设计,读者常常不得不在图例和图形之间来回注视,从而产生“眼睛的生理震颤”。
What methods and techniques can we apply to eliminate chartjunk from our graphics and improve our design? To answer this question, we first need to understand the concept of the data-ink ratio. Tufte describes the data-ink ratio as the ratio between the non-erasable core ink of a graphic and the total ink used in the graphic. Mathematically, it can be expressed as:
我们可以采用什么方法和技术从图形中消除图表垃圾并改善设计? 要回答这个问题,我们首先需要了解数据墨水比率的概念。 Tufte将数据墨水比率描述为图形的不可擦除核心墨水与图形中使用的总墨水之间的比率。 在数学上,它可以表示为:
Data ink is the non-redundant information in a chart, and if it is removed, the chart would loose its main content. A good practitioner strives for a high data-ink ratio by deleting non-data ink from the display. It is also important to remember that graphics might not be the best option to represent small datasets. In those circumstances, a simple table with background information about the data may suffice.
数据墨水是图表中的非冗余信息,如果将其删除,则图表将失去其主要内容。 优秀的从业人员通过从显示器中删除非数据墨水来争取较高的数据墨水比率。 同样重要的是要记住,图形可能不是代表小型数据集的最佳选择。 在那种情况下,一个简单的表格以及有关数据的背景信息就足够了。
Edward Tufte provided very unique insights about data visualization that still have relevance in today’s modern world. The principles I discussed in this article only scratch the surface of the deep insights on visualizations that Dr. Tufte’s book provides.
Edward Tufte提供了关于数据可视化的非常独特的见解,这些见解在当今的现代世界中仍然具有重要意义。 我在本文中讨论的原理只是从Tufte博士的书中提供的关于可视化的深刻见解的表面。
Tufte, however, is not without his critics. Some of the principles that he espoused, especially data-ink ratio and chart lies are difficult to quantify. Another principle that has received some criticism is the concept of chartjunk. A recent study by Bateman et al. (2010) concluded that the embellished charts may actually be easier to remember than the kind of simple graphs that Edward Tufte advocated in his work. Their study also reported that people’s accuracy in describing the embellished charts was the same as their accuracy in describing simple charts. This suggests that chartjunk might not be as detrimental as Edward Tufte perceived.
然而,图夫特并非没有批评者。 他拥护的某些原则,尤其是数据墨水比和图表谎言很难量化。 图表垃圾的概念是另一个受到批评的原则。 Bateman等人的最新研究。 (2010年)得出的结论是,比起爱德华·塔夫特(Edward Tufte)在其工作中所倡导的那种简单图形,实际上,装饰图可能更容易记住。 他们的研究还报告说,人们描述点缀图的准确性与描述简单图的准确性相同。 这表明图表垃圾可能不会像爱德华·图夫特所想象的那样有害。
Despite these challenges, I still recommend that all aspiring data analysts/scientists add this book to their library. At the end of the day, even if some of Tufte’s principles may be controversial, I believe that they still have intrinsic value that cannot be overlooked.
尽管存在这些挑战,但我仍然建议所有有抱负的数据分析师/科学家将这本书添加到他们的图书馆中。 归根结底,即使塔夫特的某些原则可能引起争议,我相信它们仍然具有不可忽视的内在价值。
I would like to thank Jason Forrest for helping me improve this article. For more posts like this, you can follow me at my Twitter account.
我要感谢Jason Forrest帮助我改进了本文。 有关更多此类帖子,您可以通过我的 Twitter帐户 关注我 。
翻译自: https://medium.com/nightingale/improve-your-visualization-skills-using-tuftes-principles-of-graphical-design-3a0f40a53a2c
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.rhkb.cn/news/42095.html
如若内容造成侵权/违法违规/事实不符,请联系长河编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!