PRACTICAL BUSINESS ANALYSIS
How to Display Data Badly
Author(s): Howard Wainer
Source: The American Statistician , May, 1984, Vol. 38, No. 2 (May, 1984), pp. 137-147
Published by: Taylor & Francis, Ltd. on behalf of the American Statistical Association
Stable URL: https://www.jstor.org/stable/2683253
REFERENCES Linked references are available on JSTOR for this article: https://www.jstor.org/stable/2683253?seq=1&cid=pdf- reference#references_tab_contents You may need to log in to JSTOR to access the linked references.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at https://about.jstor.org/terms
Taylor & Francis, Ltd. and American Statistical Association are collaborating with JSTOR to digitize, preserve and extend access to The American Statistician
This content downloaded from ������������128.193.164.203 on Fri, 08 Jan 2021 03:00:58 UTC������������
All use subject to https://about.jstor.org/terms
Commentaries are informative essays dealing with viewpoints of sta-
tistical practice, statistical education, and other topics considered to
be of general interest to the board readership of The American Statis-
tician. Commentaries are similar in spirit to Letters to the Editor, but
they involve longer discussions of background, issues, and perspec-
tives. All commentaries will be refereed for their merit and com-
patibility with these criteria.
HOWARD WAINER*
Methods for displaying data badly have been devel- oping for many years, and a wide variety of interesting
and inventive schemes have emerged. Presented here is
a synthesis yielding the 12 most powerful techniques
that seem to underlie many of the realizations found in
practice. These 12 (the dirty dozen) are identified and
illustrated.
KEY WORDS: Graphics; Data display; Data density; Data-ink ratio.
1. INTRODUCTION
The display of data is a topic of substantial contem-
porary interest and one that has occupied the thoughts
of many scholars for almost 200 years. During this time
there have been a number of attempts to codify stan-
dards of good practice (e.g., ASME Standards 1915; Cox 1978; Ehrenberg 1977) as well as a number of books that have illustrated them (i.e., Bertin 1973,1977,1981; Schmid 1954; Schmid and Schmid
1979; Tufte 1983). The last decade or so has seen a tremendous increase in the development of new display
techniques and tools that have been reviewed recently (Macdonald-Ross 1977; Fienberg 1979; Cox 1978; Wainer and Thissen 1981). We wish to concentrate on methods of data display that leave the viewers as unin-
formed as they were before seeing the display or, worse, those that induce confusion. Although such techniques are broadly practiced, to my knowledge they have not as yet been gathered into a single source or carefully
How to Display Data Badly
categorized. This article is the beginning of such a
compendium.
The aim of good data graphics is to display data accu- rately and clearly. Let us use this definition as a starting
point for categorizing methods of bad data display. The
definition has three parts. These are (a) showing data,
(b) showing data accurately, and (c) showing data clearly. Thus, if we wish to display data badly, we have
three avenues to follow. Let us examine them in se-
quence, parse them into some of their component parts,
and see if we can identify means for measuring the success of each strategy.
2. SHOWING DATA
Obviously, if the aim of a good display is to convey information, the less information carried in the display,
Change in Science Achievement of 9-, ,,i,,,, biologicalscience 13-, and 17-Year-Olds, by Type of Exercise: 1969-1977 _. Physical science
Change in percent correct 9-YEAR-OLDS 1 ,
O1 A s ss llll.||| -2 ________
-3 _
13-YEAR-OLDS
0I
-2
-3 _
-4
_5
-6
17-YEAR-OLDS
- h Is_ _ _ _ _ _ _ _ _ __ _ _ _ _ 1 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
-2 ...................... ...1
-4 = a 1969 1970 1973 1977
Figure 1. An example of a low density graph (from S13 [ddi = .3]).
*Howard Wainer is Senior Research Scientist, Educational Testing
Service, Princeton, NJ 08541. This is the text of an invited address to
the American Statistical Association. It was supported in part by the
Program Statistics Research Project of the Educational Testing Ser-
vice. The author would like to express his gratitude to the numerous
friends and colleagues who read or heard this article and offered
valuable suggestions for its improvement. Especially helpful were
David Andrews, Paul Holland, Bruce Kaplan, James 0. Ramsay,
Edward Tufte, the participants in the Stanford Workshop on Ad-
vanced Graphical Presentation, two anonymous referees, the long-
suffering associate editor, and Gary Koch.
C) The American Statistician, May 1984, Vol. 38, No. 2 137
This content downloaded from ������������128.193.164.203 on Fri, 08 Jan 2021 03:00:58 UTC������������
All use subject to https://about.jstor.org/terms
0.8 F
i_ 2
0. -
0.0 0 t.4 0.6 0.Z 1.0
LOCATION DIFFERENCE: 3 JNI T
Figure 2. A low density graph (from Friedman and Rafsky 1981 [ddi = .5]).
the worse it is. Tufte (1983) has devised a scheme for measuring the amount of information in displays, called the data density index (ddi), which is "the number of numbers plotted per square inch." This easily calcu- lated index is often surprisingly informative. In popular and technical media we have found a range from .1 to 362. This provides us with the first rule of bad data display.
Rule 1-Show as Few Data as Possible (Minimize the Data Density)
What does a data graphic with a ddi of .3 look like?
Shown in Figure 1 is a graphic from the book Social Indicators III (S13), originally done in four colors (orig- inal size 7" by 9") that contains 18 numbers (18/63 = .3). The median data graph in S13 has a data density of .6 numbers/in2; this one is not an unusual choice. Shown in Figure 2 is a plot from the article by Friedman and Rafsky (1981) with a ddi of .5 (it shows 4 numbers in 8
Labor jyy US. vs Japan .00 _~~~~~~~~~~~~~.
100%-moutpu pe r mon-ho ur in mQanus ur ngfn cia reta-oU upt
70%
62.3%/
44%/
Figure 3. A low density graph (? 1978, The Washington Post) with chart-junk to fill in the space (ddi = .2).
Public and Private Elementary Schools m Public Selected Years 1929-1970
-Prjvale Thousan d0oi Schools
300
1929-30 1 939-40 1949-50 1959-60 1969-70 School Year
Figure 4. Hiding the data in the scale (from S13).
in2). This is unusual for JASA, where the median data graph has a ddi of 27. In defense of the producers of this
plot, the point of the graph is to show that a method of
analysis suggested by a critic of their paper was not fruitful. I suspect that prose would have worked pretty
well also.
Although arguments can be made that high data den-
sity does not imply that a graphic will be good, nor one with low density bad, it does reflect on the efficiency of
the transmission of information. Obviously, if we hold
clarity and accuracy constant, more information is bet-
THE NUMBER OF PRIVATE ELEMENTARY SCHOOLS FROM 1930-1970
15-
14 -
13-
C,,
Is
* 12- C/
1930 9.275 10 _ 1940 10.000
1950 10.375 1960 13.574 1970 14.372
9
0" 1930 1940 1950 1960 1910
Figure 5. Expanding the scale and showing the data in Figure 4 (from S13).
138 (? The American Statistician, May 1984, Vol. 38, No. 2
This content downloaded from ������������128.193.164.203 on Fri, 08 Jan 2021 03:00:58 UTC������������
All use subject to https://about.jstor.org/terms
A New Set of Projectins for the U.S. Supply of Energy Compared are two proctlons ot United State *rtrgy upply In th, y.r 2000 made by the Pftedynt s Council of Envirnonot talOuallty and th ectual 1977 supply Alltigurasar * i nquads uunita ootm"aurmomnt that reprnt a million billin-on quadrilion- Britlsh thettal unlts (8T U a), a standard masure otf ergy
0~~i_ . O.la,daa* Solad 1977 T tal 77 5_ a t U 7
M~~~~~~S 5 / 4 2 - -14 1 Nca, Coal
2000 - A-
Er,ph syes ,egy ' c_e,to Tutal 05 '40 19 1
I 7 7 7)
2000-8
Erphas.zes.rc,eased Total 1a 9 erergy oduct.on 37d *
(1979 The New York Times
Figure 6. Ignoring the visual metaphor (? 1978, The New York Times).
ter than less. One of the great assets of graphical tech-
niques is that they can convey large amounts of informa-
tion in a small space.
We note that when a graph contains little or no infor-
mation the plot can look quite empty (Figure 2) and thus raise suspicions in the viewer that there is nothing
to be communicated. A way to avoid these suspicions is
to fill up the plot with nondata figurations-what Tufte
has termed "chartjunk." Figure 3 shows a plot of the
labor productivity of Japan relative to that of the
United States. It contains one number for each of three years. Obviously, a graph of such sparse information would have a lot of blank space, so filling the space
hides the paucity of information from the reader.
A convenient measure of the extent to which this
practice is in use is Tufte's "data-ink ratio." This mea-
sure is the ratio of the amount of ink used in graphing the data to the total amount of ink in the graph. The closer to zero this ratio gets, the worse the graph. The notion of the data-ink ratio brings us to the second principle of bad data display.
Rule 2-Hide What Data You Do Show (Minimize the Data-Ink Ratio)
One can hide data in a variety of ways. One method that occurs with some regularity is hiding the data in the grid. The grid is useful for plotting the points, but only rarely afterwards. Thus to display data badly, use a fine grid and plot the points dimly (see Tufte 1983, pp. 94-95 for one repeated version of this).
A second way to hide the data is in the scale. This
corresponds to blowing up the scale (i.e., looking at the data from far away) so that any variation in the data is obscured by the magnitude of the scale. One can justify this practice by appealing to "honesty requires that we start the scale at zero," or other sorts of sophistry.
In Figure 4 is a plot that (from S13) effectively hides the growth of private schools in the scale. A redrawing
of the number of private schools on a different scale conveys the growth that took place during the mid-
1950's (Figure 5). The relationship between this rise and
Brown vs. Topeka School Board becomes an immediate
question.
To conclude this section, we have seen that we can
display data badly either by not including them (Rule 1)
.N 1.: - ,l&U^,*
Cm nlions of U.S dollars) (in millions of U S dollars)
3,000 6,000 ____ l
U.S. exports U.S. imports to China from Taiwan
2,000 4,000
U.S. imports U.S. exports from China to Taiwan
1,000 2000
1972 1974 1976 1978 1980 1970 1972 1974 1976 1978 1980
Source Dpartment of Commerce
Figure 7. Reversing the metaphor in mid-graph while changing scales on both axes (? June 14, 1981, The New York Times).
or by hiding them (Rule 2). We can measure the extent to which we are successful in excluding the data through the data density; we can sometimes convince viewers
that we have included the data through the incorpo- ration of chartjunk. Hiding the data can be done either
by using an overabundance of chartjunk or by cleverly choosing the scale so that the data disappear. A mea- sure of the success we have achieved in hiding the data is through the data-ink ratio.
3. SHOWING DATA ACCURATELY
The essence of a graphic display is that a set of num-
bers having both magnitudes and an order are repre- sented by an appropriate visual metaphor-the mag- nitude and order of the metaphorical representation match the numbers. We can display data badly by ignor- ing or distorting this concept.
Rule 3-Ignore the Visual Metaphor Altogether
If the data are ordered and if the visual metaphor has
a natural order, a bad display will surely emerge if you shuffle the relationship. In Figure 6 note that the bar
labeled 14.1 is longer than the bar labeled 18. Another method is to change the meaning of the metaphor in the middle of the plot. In Figure 7 the dark shading repre- sents imports on one side and exports on the other. This is but one of the problems of this graph; more serious still is the change of scale. There is also a difference in the time scale, but that is minor. A common theme in
Playfair's (1786) work was the difference between im- ports and exports. In Figure 8, a 200-year-old graph tells the story clearly. Two such plots would have illus- trated the story surrounding this graph quite clearly.
Rule 4-Only Order Matters
One frequent trick is to use length as the visual meta- phor when area is what is perceived. This was used quite
effectively by The Washington Post in Figure 9. Note
that this graph also has a low data density (.1), and its
data-ink ratio is close to zero. We can also calculate
Tufte's (1983) measure of perceptual distortion (PD)
for this graph. The PD in this instance is the perceived
?) The American Statistician, May 1984, Vol. 38, No. 2 139
This content downloaded from ������������128.193.164.203 on Fri, 08 Jan 2021 03:00:58 UTC������������
All use subject to https://about.jstor.org/terms
C IIA ItT T
E-xPORT.i & 1511'01TS8I
E-.N; A C_ r LA_- - .
.. ~ ~ ~ ~ ~ ~ . .. ..... 5tf
Figure 8. A plot on the same topic done well two centuries eariler (from Playfair 1786).
Til E tIXITE I, S1A'M1IS, (WFAMIEICiA.A E 1430I3632
5 5 U5
1958- ESENHOWER: $1.
Ti E FA ' t V
1963 - KENNEDY: 94c
pw~~~~~~~~~~~ t 4 3 6 2t X 1968- JOHNSON: 53Uc
IN: I 1TEDISTA\TE:SM'A3.11:11 . _
of thel lXnlshllng r4 ffiN^>: Dollar rc:LuborDportment
1978-CTER: 44CAR (August)
Figure 9. An example of how to goose up the effect by squaring the eyeball (? 1978, The Washington Post).
change in the value of the dollar from Eisenhower to
Carter divided by the actual change. I read and measure
thus:
Actual Measured
1.00 - .44 22.00 - 2.06
=44 1.27 2.06 96 PD = 9.68/1.27 = 7.62
This distortion of over 700% is substantial but by no
means a record. A less distorted view of these data is provided in
Figure 10. In addition, the spacing suggested by the
0 E I SENHOWER KENNE D T
JOHNSON
0.8
~0. 4
=0.2
CC
0.2
0. O.I I I 1958 1963 1968 1973 1978
YERR Figure 10. The data in Figure 9 as an unadorned line chart (from
Wainer, 1980).
140 ? The American Statistician, May 1984, Vol. 38, No. 2
This content downloaded from ������������128.193.164.203 on Fri, 08 Jan 2021 03:00:58 UTC������������
All use subject to https://about.jstor.org/terms
presidential faces is made explicit on the time scale.
Rule 5-Graph Data Out of Context
Often we can modify the perception of the graph
(particularly for time series data) by choosing carefully the interval displayed. A precipitous drop can disappear
if we choose a starting date just after the drop. Simi- larly, we can turn slight meanders into sharp changes by
focusing on a single meander and expanding the scale. Often the choice of scale is arbitrary but can have pro-
found effects on the perception of the display. Figure 11 shows a famous example in which President Reagan
gives an out-of-context view of the effects of his tax cut.
The Times' alternative provides the context for a deeper
understanding. Simultaneously omitting the context as well as any quantitative scale is the key to the practice
of Ordinal Graphics (see also Rule 4). Automatic rules do not always work, and wisdom is always required.
In Section 3 we discussed three rules for the accurate
display of data. One can compromise accuracy by ignor-
ing visual metaphors (Rule 3), by only paying attention to the order of the numbers and not their magnitude
(Rule 4), or by showing data out of context (Rule 5). We advocated the use of Tufte's measure of perceptual
distortion as a way of measuring the extent to which the
accuracy of the data has been compromised by the dis- play. One can think of modifications that would allow it
to be applied in other situations, but we leave such expansion to other accounts.
4. SHOWING DATA CLEARLY
In this section we discuss methods for badly dis-
playing data that do not seem as serious as those de-
THE NEW YORK TIMES, SUNDAY, AUGUST 2, 1981
$2500 Payments under the $2500 Ways and Means __ Committee plan
2000 Payments undr the 2000 Prdentfs proposW
1500 YOUR TAXES
Tawpld by AVERAGE FAMILY INCOME - S20.000
SOfl4Nfl~ 1982 1986
1 ??? w m 11W0 $ THEIR $ ~~~~~~~BILL
500 l
OUR BILL
1982 1983 1984 1985 198
Figure 11. The White House showing neither scale nor context (? 1981, The New York Times, reprinted with permission).
scribed previously; that is, the data are displayed, and
they might even be accurate in their portrayal. Yet sub-
tle (and not so subtle) techniques can be used to effec-
tively obscure the most meaningful or interesting as-
pects of the data. It is more difficult to provide objective
measures of presentational clarity, but we rely on the
reader to judge from the examples presented.
Rule 6-Change Scales in Mid-Axis
This is a powerful technique that can make large dif-
ferences look small and make exponential changes look
linear.
In Figure 12 is a graph that supports the associated
story about the skyrocketing circulation of The New
York Post compared to the plummeting Daily News
circulation. The reason given is that New Yorkers
"trust" the Post. It takes a careful look to note the
700,000 jump that the scale makes between the two lines.
In Figure 13 is a plot of physicians' incomes over
time. It appears to be linear, with a slight tapering off
in recent years. A careful look at the scale shows that it
starts out plotting every eight years and ends up plotting
yearly. A more regular scale (in Figure 14) tells quite a different story.
The soaraway Post the daily paper
New Yorkers trust 1,900,000.
1 ,829,000 NEW S 1,800000 ;
1 700,000
1,636,000
% 1,555,000
1,500,000(- -
... 1,491,000
bu,000 - -
.:_ k00t0 ... - .- -_ __ ~ a., :a1....
E~~~17 197 198 198 1982_-- .-E Fiur 12. Chngn scl in mid-ai to mak lag differences.
? The American Statistician, May 1984, Vol. 38, No. 2 141
This content downloaded from ������������128.193.164.203 on Fri, 08 Jan 2021 03:00:58 UTC������������
All use subject to https://about.jstor.org/terms
hIcomes of Doctors Vs. Other Profesionals
(MEDIAN NET INCOMES) SOURCE: Council on Wage and Price Stability
OFFICED-BASED 62.799 NONSALARIED PHYSICIANS 54,14
1i 50,823 5,4
46,780
43,100
34,740
25,050
16,107
13,150
8,744
$3,262 ... A
1939 1947 1951 1955 1963 1965 1967 1970 1972 1973 1974 1975 1976
Figure 13. Changing scale in mid-axis to make exponential growth linear (? The Washington Post).
Rule 7-Emphasize the Trivial (Ignore the Important)
Sometimes the data that are to be displayed have one important aspect and others that are trivial. The graph
can be made worse by emphasizing the trivial part. In Figure 15 we have a page from S13 that compares the
income levels of men and women by educational levels.
It reveals the not surprising result that better educated
individuals are paid better than more poorly educated
ones and that changes across time expressed in constant
dollars are reasonably constant. The comparison of
greatest interest and current concern, comparing sal-
aries between sexes within education level, must be
made clumsily by vertically transposing from one graph to another. It seems clear that Rule 7 must have been
operating here, for it would have been easy to place the
graphs side by side and allow the comparison of interest to be made more directly. Looking at the problem from
a strictly data-analytic point of view, we note that there
are two large main effects (education and sex) and a small time effect. This would have implied a plot that
INCOMES OF DOCTORS VS. OTHER PROFESSIONRLS
710
-60 cso
n50 DOCTORS OTHER
PROFESSIONALS
z30/
z20 2 z H~~~~~~~~~~~EOICRRE STRRTE0 n010.
1939 19414 1949 1954 1959 1964 1969 1974
YEAR
Figure 14. Data from Figure 13 redone with linear scale (from Wainer 1980).
Median Income of Year-Round, Full-Time Workers 25 to 34 Years Old, by Sex and Educational Attainment: 1961977
Constant 1977 dollars MALE
$20,000 - - - - ?
$18000 . = = = = =
$1200 >_ __ _ s__ --LIL _ 16eaorm
$16000 _ _ _, s_ _s _ ,_~ _| 6to16 years or ore
~~, mm ~~~~ ~~ ____ ~~~ 13 ro 1,5 yeror $14,000 L A__ M ein ino meo _
1 2 year 9 $12,000 I~-- -n ~ r ~
$10,0000
showed~~ ~a own, larges a effct clal an pla eard o thessals
$8,000 - - -_ _ _ _ - -
$4,000 - -__- -
$2,000 --
20
$ l0 - -ale' -|-- - - - - d- ic -
110 12_1`1F ,IA,A 1 e
$14,001 SW_L7 1V _
$1,000 y ars Feor lees
$4 ,000 - _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
$2 ,0 0 0 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
$0 - _ _
1968 1970 1972 1974 1976 1978 1980
Figure 15. Emphasizing the trivial: Hiding the main effect of sex differences in income through the vertical placement of plots (from S13).
showed the large effects clearly and placed the smallish time trend into the background (Figure 16).
MEDIAN INCOME OF YEAR-ROUND FULL TIME WORKERS 25-34 YEARS OLD BY SEX AND EDUCATIONAL ATTAINMENT:
1968-1977 (IN CONSTANT 1977 DOLLARS)
20-
16 '1 Males
C" ~MalesV 12 -
I-- -' ~~~~~~~~~~~~~~~Females
8 Females
Legend
4 -maximum
-median (uveriime)
142 C) The American Statistician, May 1984, Vol. 38, No. 2
This content downloaded from ������������128.193.164.203 on Fri, 08 Jan 2021 03:00:58 UTC������������
All use subject to https://about.jstor.org/terms
U.S. IMPORTS OF RED MEATS BIL. LB.
2.5 LAMB MUTTON AND GOATMEAT
2.0
1.5 ........ . ... ....... ......
1.0-- - - - -
1 ' 0 N" BEEF AN VEAL N. ...: : ' , ,\ : : :, : . .;; ' : .: : :: : ',:: - -:... ....: .. .: ,- :. ' : : : : : : :; o~~~~~~~.'.:;::-: :: 'X'-* .;.... .;. ..,...,...; ....,,.;.--
0.
1960 1963 i90 1969 1972 1975 1978 *eAftcA WG, r EOUIVALENT
^~~~~~~~~~~~~~~~~~~~~~~~~~a FA '.:,: -:-@:-:-:9o
Figure 17. Jiggling the baseline makes comparisons more difficult (from Handbook of Agricultural Charts).
Rule 8-Jiggle the Baseline
Making comparisons is always aided when the quan- tities being compared start from a common base. Thus we can always make the graph worse by starting from different bases. Such schemes as the hanging or sus- pended rootogram and the residual plot are meant to facilitate comparisons. In Figure 17 is a plot of U.S.
imports of red meat taken from the Handbook of Agri- cultural Charts published by the U.S. Department of Agriculture. Shading beneath each line is a convention that indicates summation, telling us that the amount of each kind of meat is added to the amounts below it. Because of the dominance of and the fluctuations in importation of beef and veal, it is hard to see what the
changes are in the other kinds of meat-Is the importa- tion of pork increasing? Decreasing? Staying constant? The only purpose for stacking is to indicate graphically the total summation. This is easily done through the addition of another line for TOTAL. Note that a
TOTAL will always be clear and will never intersect the other lines on the plot. A version of these data is shown
U.S. IMPORTS OF RED rMEATS* BIL. LB.-
POkK
2.5 e-_____~~_
10 -
1960 1963 1966 1969 1972 1975 1978
Source: Handbook of Agri_ultural Charts , U .S . Department of Agriculture, 1976, p. 93.
Chart Source: Origzinal
Figure 18. An alternative version of Figure 17 with a straight line used as the basis of comparison.
Life Expectancy at Birth, by Sex, Selected m Male Countres, Most Recent Available Year: Female 1970-1IMi Female
Austria, 1974 1975 HIM
Canada, 1970-1972
Finland, 1974
France, 1972 1 | 4
Germany (Fed Rep , i ill l 1973- 1975 R
Japan, 1974
U S S R., 1971-1972) i S S
Sweden, 1971-1975
United Kingdom, 1970-1972
United States, 1975
0 50 60 70 80 90
Years of life expectancy
Figure 19. Austria First! Obscuring the data structure by alpha- betizing the plot (from S13).
in Figure 18 with the separate amounts of each meat, as
well as a summation line, shown clearly. Note how
easily one can see the structure of import of each kind of meat now that the standard of comparison is a
straight line (the time axis) and no longer the import amount of those meats with greater volume.
Rule 9-Austria First!
Ordering graphs and tables alphabetically can ob- scure structure in the data that would have been obvious
had the display been ordered by some aspect of the data. One can defend oneself against criticisms by pointing out that alphabetizing "aids in finding entries of interest." Of course, with lists of modest length such aids are unnecessary; with longer lists the indexing schemes common in 19th century statistical atlases pro- vide easy lookup capability.
Figure 19 is another graph from Sf3 showing life ex- pectancies, divided by sex, in 10 industrialized nations. The order of presentation is alphabetical (with the USSR positioned as Russia). The message we get is that there is little variation and that women live longer than men. Redone as a stem-and-leaf diagram (Figure 20 is simply a reordering of the data with spacing propor- tional to the numerical differences), the magnitude of the sex difference leaps out at us. We also note that the
USSR is an outlier for men.
Rule JO-Label (a) Illegibly, (b) Incompletely,
(c) Incorrectly, and (d) Ambiguously
There are many instances of labels that either do not
C) The American Statistician, May 1984, Vol. 38, No. 2 143
This content downloaded from ������������128.193.164.203 on Fri, 08 Jan 2021 03:00:58 UTC������������
All use subject to https://about.jstor.org/terms
LIFE EXPECTANCY AT BIRTH, BY SEX, MOST RECENT AVAILABLE YEAR
WOMEN YEARS r3Ef
SWEDEN 78 77
FRANCE, US, JAPAN, CANADA 76 FINLAND, AUSTRIA, UK 75
USSR, GERMANY 74 73
72 SWEDEN 71 JAPAN 70
69 CANADA, UK, US, FRANCE 68 GERMANY, AUSTRIA
67 FINLAND 66 65
64
673 USSR 62
I
Figure 20. Ordering and spacing the data from Figure 19 as a stem-and-leaf diagram provides insights previously difficult to extract (from S13).
tell the whole story, tell the wrong story, tell two or
more stories, or are so small that one cannot figure out what story they are telling. One of my favorite examples
of small labels is from The New York Times (August
To Travel Agents In lkosoldofdlAer
$57~~~~~~~~~~~~~~~'6
O ~~E,ASTIEEN unITE=D
web of discount fars and airlines' telephone d s areras (ravel agents' overhead, offsetting revenue gains from higher volume.
Figure 21. Mixing a changed metaphor with a tiny label reverses the meaning of the data (? 1978, The New York Times).
Commtssion Payqrents to Travel Agents
1 5o
m
L 1 20-
L
I JUN ITE
0
N 9 0- TWA
5
E A S TERN
0
F 60-
D D E L T A
0
L 30-
L
A
R
5 0
1976 1977 1978
(e a t I ma t e d
Y EAR
Figure 22. Figure 21 redrawn with 1978 data placed on a comparable basis (from Wainer 1980).
1978), in which the article complains that fare cuts lower
commission payments to travel agents. The graph (Fig-
ure 21) supports this view until one notices the tiny label indicating that the small bar showing the decline is for
just the first half of 1978. This omits such heavy travel periods as Labor Day, Thanksgiving, Christmas, and so
on, so that merely doubling the first-half data is proba-
bly not enough. Nevertheless, when this bar is doubled (Figure 22), we see that the agents are doing very well indeed compared to earlier years.
Rule 11-More Is Murkier: (a) More Decimal Places and (b) More Dimensions
We often see tables in which the number of decimal places presented is far beyond the number that can be
perceived by a reader. They are also commonly presented to show more accuracy than is justified. A display can be made clearer by presenting less. In Table 1 is a section of a table from Dhariyal and Dudewicz's (1981) JASA paper. The table entries are presented to
five decimal places! In Table 2 is a heavily rounded version that shows what the authors intended clearly. It
also shows that the various columns might have a sub- stantial redundancy in them (the maximum expected gain with b/c = 10 is about 1/10th that of b/c = 100 and 1/100th that of b/c = 1,000). If they do, the entire table could have been reduced substantially.
Just as increasing the number of decimal places can make a table harder to understand, so can increasing
the number of dimensions make a graph more con-
144 (C The American Statistician, May 1984, Vol. 38, No. 2
This content downloaded from ������������128.193.164.203 on Fri, 08 Jan 2021 03:00:58 UTC������������
All use subject to https://about.jstor.org/terms
Table 1. Optimal Selection From a Finite Sequence With Sampling Cost
b/c = 10.0 100.0 1,000.0
N r* (GN(r*) - a)/c r* (GN(r*) - a)/c r* (GN(r*) - a)/c
3 2 .20000 2 2.22500 2 22.47499 4 2 .26333 2 2.88833 2 29.13832 5 2 .32333 3 3.54167 3 35.79166 6 3 .38267 3 4.23767 3 42.78764 7 3 .44600 3 4.90100 3 49.45097 8 3 .50743 4 5.57650 4 56.33005 9 3 .56743 4 6.26025 4 63.20129 10 4 .62948 4 6.92358 4 69.86462
NOTE: g(Xs + r - 1) = bR(Xs + r - 1) + a, if S =s, and g(Xs +r - 1)= 0, otherwise. Source: Dhariyal and Dudewicz (1981).
fusing. We have already seen how extra dimensions can cause ambiguity (Is it length or area or volume?). In addition, human perception of areas is inconsistent.
Just what is confusing and what is not is sometimes only a conjecture, yet a hint that a particular configuration will be confusing is obtained if the display confused the grapher. Shown in Figure 23 is a plot of per share earn- ings and dividends over a six-year period. We note (with some amusement) that 1975 is the side of a bar-the
third dimension of this bar (rectangular parallelo- piped?) chart has confused the artist! I suspect that 1975 is really what is labeled 1976, and the unlabeled bar at the end is probably 1977. A simple line chart with this interpretation is shown in Figure 24.
In Section 4 we illustrate six more rules for displaying data badly. These rules fall broadly under the heading of how to obscure the data. The techniques mentioned
were to change the scale in mid-axis, emphasize the trivial, jiggle the baseline, order the chart by a charac- teristic unrelated to the data, label poorly, and include more dimensions or decimal places than are justified or needed. These methods will work separately or in com- bination with others to produce graphs and tables of little use. Their common effect will usually be to leave the reader uninformed about the points of interest in
the data, although sometimes they will misinform us;
the physicians' income plot in Figure 13 is a prime ex- ample of misinformation.
Finally, the availability of color usually means that there are additional parameters that can be misused. The U.S. Census' two-variable color map is a wonderful
example of how using color in a graph can seduce us
Table 2. Optimal Selection From a Finite Sequence With Sampling Cost (revised)
b/c = 10 b/c = 100 b/c = 1,000
N r* G r* G r* G
3 2 .2 2 2.2 2 22 4 2 .3 2 2.9 2 29 5 2 .3 3 3.5 3 36
6 3 .4 3 4.2 3 43 7 3 .4 3 4.9 3 49
8 3 .5 4 5.6 4 56 9 3 .6 4 6.3 4 63
1 0 4 .6 4 6.9 4 70
NOTE: g(Xs + r - 1) =bR(Xs + r - 1) + a, if S = s, and g(Xs + r - 1) 0 , otherwise.
into thinking that we are communicating more than we are (see Fienberg 1979; Wainer and Francolini 1980; Wainer 1981). This leads us to the last rule.
Rule 12-If It Has Been Done Well in the Past, Think of Another Way to Do It
The two-variable color map was done rather well by Mayr (1874), 100 years before the U.S. Census version.
He used bars of varying width and frequency to accom- plish gracefully what the U.S. Census used varying saturations to do clumsily.
A particularly enlightening experience is to look carefully through the six books of graphs that William
Playfair published during the period 1786-1822. One
discovers clear, accurate, and data-laden graphs con- taining many ideas that are useful and too rarely applied today. In the course of preparing this article, I spent many hours looking at a variety of attempts to display
Earns Per Sha LAnd g~~~ DMndenldxs
(Dollars)
111. 1.71 1.21.70
1.63
1.53
.4.
S 1~~~.... , . .'.. ',. I ,.S < m -:- '' ... .. .....1
Earning . L....I D......d.
Figure,"'.''. 23'.'''. An extra dimensn.. .cnueevnt gr ( : 1979,:::: The Washington Post)::.:::::: . . . . . : . .:::::::::::::::
.::: .: ::::::.:;:;: ~ ~ ~ ~ . .. . .:::. : .: .: . . .;;:. . . : . .: . . :.: .: ::: ;::::::: , :.:.: '' :.. :.:. . . ::" . ' . . :- :, . . . . : "',.. . . . . :: . : . : . ::: . ::. :::: . . . :: .... . . . .::::: .:::: .:: .: .::: .: ; : .: .: .:: .:: ::::::..: :. :::::: .::.: .: .: :: :: .. ::: . :: :: ::.:.. . :.. : . . :.'. :: .:- . :.. : . '. . ' .::: .' .. . . .: . . . .: . . . . .': " ' : : ,''
:::::::: :::::::::;:::~ . .: .::::: .::: .: .:::::::: : .: : :: . ::: :: ::: .::::::. : ~~~~~~~~~~.,. :'. ::'.. :. . :. . ' : . . :' . .: ......... . .: . .. . . . . ':~~~~~~~~~~~ .,:. . . . . . . . . .::. . .' . . :. ':: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . , ... ... ... ... .. ... ....
'.,...::~ ~~~~~~~~~~~~ ~~~~~~ ~~ . :.' ,. ,. ,.::" '::" .'' . ": .': .:' :: . :':' ..
X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .,., ,,,., ...,., .::::::'' .. .. . . 972~~~~~~~~~~~~. 73 ... 7S .6 .77 ......
l . 1 Eanig .Dvded .l Fiur 2. An . xr .ieso cofue eve .h .grapher . .. ...
.t .99 .h Wahngo .Post).... .... ....
?D The American Statistician, May 1984, Vol. 38, No. 2 145
This content downloaded from ������������128.193.164.203 on Fri, 08 Jan 2021 03:00:58 UTC������������
All use subject to https://about.jstor.org/terms
2. 00
1. 75
EornL n.
I . 5 0
cca -J
M 1.25
Di v Xdend.
1.00
1972 1974 1976
Y E R R
Figure 24. Data from Figure 23 redrawn simply (from Wainer 1980).
data. Some of the horrors that I have presented were the fruits of that search. In addition, jewels sometimes emerged. I saved the best for last, and will conclude with one of those jewels-my nominee for the title of
"World's Champion Graph." It was produced by Minard in 1861 and portrays the devastating losses suf- fered by the French army during the course of Napo- leon's ill-fated Russian campaign of 1812. This graph (originally in color) appears in Figure 25 and is re- produced from Tufte's book (1983, p. 40). His narrative follows.
Beginning at the left on the Polish-Russian border near the
Nieman River, the thick band shows the size of the army (422,000
men) as it invaded Russia in June 1812. The width of the band
indicates the size of the army at each place on the map. In Sep-
tember, the army reached Moscow, which was by then sacked and
deserted, with 100,000 men. The path of Napoleon's retreat from
Moscow is depicted by the darker, lower band, which is linked to
a temperature scale and dates at the bottom of the chart. It was a
bitterly cold winter, and many froze on the march out of Russia.
As the graphic shows, the crossing of the Berezina River was a
disaster, and the army finally struggled back to Poland with only
10,000 men remaining. Also shown are the movements of auxiliary
troops, as they sought to protect the rear and flank of the ad-
vancing army. Minard's graphic tells a rich, coherent story with its
multivariate data, far more enlightening than just a single number
bouncing along over time. Six variables are plotted: the size of the
army, its location on a two-dimensional surface, direction of the
army's movement, and temperature on various dates during the
retreat from Moscow.
It may well be the best statistical graphic ever drawn.
5. SUMMING UP
Although the tone of this presentation tended to be light and pointed in the wrong direction, the aim is
serious. There are many paths that one can follow that will cause deteriorating quality of our data displays; the
12 rules that we described were only the beginning.
Nevertheless, they point clearly toward an outlook that
provides many hints for good display. The measures of
display described are interlocking. The data density cannot be high if the graph is cluttered with chartjunk; the data-ink ratio grows with the amount of data dis- played; perceptual distortion manifests itself most fre-
CARTE FIGURATIVE des pertes successives en hommes de l Armnee Franr&ise dans la campagne de Russ\e 1812 -1813.
Dressce par M.Mtnard. Inspecteur Ceneralt de. Ponts et Chaussees en retra'cte.
e~~~~~~~~~ G.&'&- j 0 0 C,~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
0t'r'' = 6I)7ollzhtr Awer = I , =
Figue 2. Mnar'sTABLE gaUho tRAhe Urenh Army'sil-ftued eoa nt deRedusi- carm n~rdiae foa~r the ditesouf deWorldsCapo rp"(e
Tufte_ 1983_ for_ q_ superb_ reproduction ofthis0initsoriginalcolor-p.176).
- 2 0 1 . Z 8 9 1 " -- - - - _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ r.
- Z~~~~~~~~~~~~~~~*~~~~~~~~*~~~~~~~ 9b~~~~~~~~~~~~~~~. 5$~~~~~~~~~~~~C
-tie-, ~~~~~~~~~~~~~~~~~ 10~~~~~~
XIr - I )cc:iibtr - N'~~~ciubcr - October
Fiue 5 inr' (81)gah fte rnc ry' l-ftd oa it usi-Acnidt orte il o Wrl' CaponGah"(e Tufe 183for~ upeb eprducio ofths i is oigialcolr-. 16)
146 C The American Statistician, May 1984, Vol. 38, No. 2
This content downloaded from ������������128.193.164.203 on Fri, 08 Jan 2021 03:00:58 UTC������������
All use subject to https://about.jstor.org/terms
quently when additional dimensions or worthless meta- phors are included. Thus, the rules for good display are quite simple. Examine the data carefully enough to know what they have to say, and then let them say it with a minimum of adornment. Do this while following reasonable regularity practices in the depiction of scale, and label clearly and fully. Last, and perhaps most im- portant, spend some time looking at the work of the masters of the craft. An hour spent with Playfair or Minard will not only benefit your graphical expertise but will also be enjoyable. Tukey (1977) offers 236 graphs and little chartjunk. The work of Francis Walker (1894) concerning statistical maps is clear and concise, and it is truly a mystery that their current counterparts do not make better use of the schema developed a cen- tury and more ago.
[Received September 1982. Revised September 1983.]
REFERENCES
BERTIN, J. (1973), Semiologie Graphique (2nd ed.), The Hague:
Mouton-Gautier.
(1977), La Graphique et le Traitement Graphique de
l'Information, France: Flammarion.
(1981), Graphics and the Graphical Analysis of Data, transla-
tion, W. Berg, tech. ed., H. Wainer, Berlin: DeGruyter.
COX, D.R. (1978), "Some Remarks on the Role in Statistics of
Graphical Methods," Applied Statistics, 27, 4-9.
DHARIYAL, I.D., and DUDEWICZ, E.J. (1981), "Optimal Selec- tion From a Finite Sequence With Sampling Cost," Journal of the
American Statistical Association, 76, 952-959.
EHRENBERG, A.S.C. (1977), "Rudiments of Numeracy," Journal of the Royal Statistical Society, Ser. A, 140, 277-297.
FIENBERG, S.E. (1979), Graphical Methods in Statistics, The
American Statistician, 33, 165-178.
FRIEDMAN, J.H., and RAFSKY, L.C. (1981), "Graphics for the Multivariate Two-Sample Problem," Journal of the American Sta-
tistical Association, 76, 277-287.
JOINT COMMITTEE ON STANDARDS FOR GRAPHIC
PRESENTATION, PRELIMINARY REPORT (1915), Journal
of the American Statistical Association, 14, 790-797.
MACDONALD-ROSS, M. (1977), "How Numbers Are Shown: A Review of Research on the Presentation of Quantitative Data in
Texts," Audiovisual Communications Review, 25, 359-409.
MAYR, G. VON (1874), "Gutachen Uber die Anwendung der Graph- ischen und Geographischen," Method in der Statistik, Munich.
MINARD, C.J. (1845-1869), Tableaus Graphiques et Cartes Figura- tives de M. Minard, Bibliotheque de l'Ecole Nationale des Ponts et Chaussees, Paris.
PLAYFAIR, W. (1786), The Commercial and Political Atlas, Lon- don: Corry.
SCHMID, C.F. (1954), Handbook of Graphic Presentation, New York: Ronald Press.
SCHMID, C.F., and SCHMID, S.E. (1979), Handbook of Graphic Presentation (2nd ed.), New York: John Wiley.
TUFTE, E.R. (1977), "Improving Data Display," University of Chi-
cago, Dept. of Statistics.
(1983), The Visual Display of Quantitative Information, Cheshire, Conn.: Graphics Press.
TUKEY, J.W. (1977), Exploratory Data Analysis, Reading, Mass: Addison-Wesley.
WAINER, H. (1980), "Making Newspaper Graphs Fit to Prinit," in Processing of Visible Language, Vol. 2, eds. H. Bouma, P.A.
Kolers, and M.E. Wrolsted, New York: Plenum, 125-142. , "Reply" to Meyer and Abt (1981), The American Statistician,
57.
(1983), "How Are We Doing? A Review of Social Indicators III," Journal of the American Statistical Association, 78, 492-496.
WAINER, H., and FRANCOLINI, C. (1980), "An Empirical In-
quiry Concerning Human Understanding of Two-Variable Color Maps," The American Statistician, 34, 81-93.
WAINER, H., and THISSEN, D. (1981), "Graphical Data Anal-
ysis," Annual Review of Psychology, 32, 191-241. WALKER, F.A. (1894), Statistical Atlas of the United States Based on
the Results of the Ninth Census, Washington, D.C.: U.S. Bureau of the Census.
?) The American Statistician, May 1984, Vol. 38, No. 2 147
This content downloaded from ������������128.193.164.203 on Fri, 08 Jan 2021 03:00:58 UTC������������
All use subject to https://about.jstor.org/terms
- Contents
- p. 137
- p. 138
- p. 139
- p. 140
- p. 141
- p. 142
- p. 143
- p. 144
- p. 145
- p. 146
- p. 147
- Issue Table of Contents
- American Statistician, Vol. 38, No. 2 (May, 1984) pp. 73-163
- Front Matter
- Demeaning Conditioning Diagnostics through Centering [pp. 73-77]
- [Demeaning Conditioning Diagnostics through Centering]: Comment [pp. 78-79]
- Comment: Toward a Balanced Assessment of Collinearity Diagnostics [pp. 79-82]
- Comment: Collinearity Diagnostics Depend on the Domain of Prediction, the Model, and the Data [pp. 83-87]
- Comment: Effect of Centering on Collinearity and Interpretation of the Constant [pp. 88-90]
- [Demeaning Conditioning Diagnostics through Centering]: Reply [pp. 90-93]
- The Use of Recursive Residuals in Checking Model Fit in Linear Regression [pp. 94-105]
- Sensitivity of Bayes Inference with Data-Dependent Stopping Rules [pp. 106-109]
- The Teacher's Corner
- A New Listing of Audio-Visual Materials for Statistical Education [pp. 110-116]
- A "Reverse Order" Elementary Statistics Course [pp. 117-119]
- An Algorithmic Approach to Elementary ANOVA [pp. 120-123]
- Reducing Transformation Bias in Curve Fitting [pp. 124-126]
- An Interesting Property of the Sample Mean Under a First-Order Autoregressive Model [pp. 127-129]
- Chebyshev Inequality with Estimated Mean and Variance [pp. 130-132]
- Linearly Independent, Orthogonal, and Uncorrelated Variables [pp. 133-134]
- 䭲畳歡氧猠偲潯映潦⁴桥⁊潩湴⁄楳瑲楢畴楯渠潦⁘̄湤㱳異㸲㰯獵瀾⁛灰⸠ㄳ㐭ㄳ㕝
- The Double Exponential Distribution: Using Calculus to Find a Maximum Likelihood Estimator [pp. 135-136]
- Commentaries
- How to Display Data Badly [pp. 137-147]
- Decisions in Single Group Repeated Measures Analysis: Statistical Tests and Three Computer Packages [pp. 148-150]
- We Must Preach What Should Be Practiced [pp. 151]
- [We Must Preach What Should Be Practiced]: Reply [pp. 152-153]
- Statistical Computing
- New Developments in Statistical Computing
- Gauss-A Matrix Calculator Program [pp. 154]
- 啎䥘∣協䅔㨠䍯浰慣琠䑡瑡⁁湡汹獩猠偡捫慧攠孰瀮‱㔵�
- BAYCAT: Bayes Factors for Categorical Data Analysis [pp. 156]
- LINCHI: Testing for Linear Trends in r-by-c Ordinal Contingency Tables [pp. 157]
- EXACHI: A Procedure for Analyzing r-by-c Contingency Tables with Small Cell Frequencies and a Large Number of Degrees of Freedom [pp. 157]
- CHRONIC: An SAS[Registered Trademark] Procedure for Statistical Analysis of Carcinogenesis Studies [pp. 158]
- RANKSEL: A Ranking and Selection Package [pp. 158-159]
- LOKATE<sup>TM</sup>: A File for Determining Longitude-Latitude Map Location Using Only ZIP Code and Alpha Place Name [pp. 159]
- Letters to the Editor [pp. 160-163]
- Back Matter