article review

profileahmdst
article_2.pdf

STATISTICAL DIVERSIONS Peter Petocz and Eric Sowey Macquarie University, Sydney and The University of New South Wales Sydney, Australia.

“The world is a complicated place.” This is often heard in social conversation, and the speaker gen- erally lets it go at that. But consider what it means for someone who is trying to understand how things actually work in this complicated world – how the brain detects patterns, how consumers respond to rises in credit card interest rates, how aeroplane wings deflect during a supersonic flight and so on. Understanding will not get very far without some initially simplified representation of whatever situ- ation is being examined. We call such a simplified representation a model of reality. A neat definition of a model is “a concise abstraction of reality.” A model is an abstraction in the sense that it does not include every detail of reality, but only those details that are centrally relevant to the matter under inves- tigation. And a model is concise in the sense that it is relatively easy to comprehend and to work with.

A simple example of a model is a page of a street directory. The page shows the directions and names of streets in a certain locality and represents, by a colour coding, the relative importance of the streets as traffic arteries. It’s an abstraction of reality in that it supplies the main information that a motorist needs, but little else. For example, it’s two- dimensional and so does not show the steepness of hills; neither does it show all the buildings that line those streets nor the boundaries of the land that each building occupies. And the page is concise in that it’s drawn on a small scale (typically, 1 cm = 100 m).

Because there are many different kinds of things in the world that we seek to understand, there are many different kinds of models. However, there is a basic distinction between physical models and alge- braic (also called computational) models. A physical model is, as the name suggests, some kind of object (whether in two or in three dimensions). Each page in a street directory is evidently a physical model. So

is an architect’s three-dimensional representation of the finished appearance of a building, and so also is a child’s balsa wood aeroplane. An algebraic model, by contrast, uses equations to describe the main features of interest in a real world situation and their interrelations. If these equations describe rela- tions that are certain, or relations where chance influences are ignored, then the model is called a mathematical model. Newton’s three ‘laws’ of motion and Einstein’s famous equation, E = mc 2, are examples of mathematical models. If, however, the equations explicitly include the influence of chance, then the model is called a statistical model.

Although introductory textbooks of statistics may not highlight the fact, all the standard probability distributions (binomial, Poisson, Normal, etc) are indeed statistical models. To see this in the case of the binomial distribution, let’s take a real world situation. A big bin of apricots is delivered, and we pull out 4 apricots on separate occasions to eat. What is the probability (we may be interested to know) of getting 3 ripe apricots and 1 unripe apricot in a selection of 4 apricots? It’s difficult to answer this question in the real world because there are so many things about this situation that we don’t know. For example, we don’t know (1) whether the apricots were selected deliberately or at random, (2) how a ‘ripe’ apricot is defined, and (3) how the ripe and unripe apricots are distributed through the bin.

We can simplify the problem if we make some assumptions. From the textbook we learn that (1) if we select the apricots at random, (2) if we define a selected apricot as either ripe or unripe (that is, there are only two possible outcomes), (3) if the chance of selecting a ripe apricot from the bin is always the same each time we select an apricot, and (4) if the occasions on which we select an apricot are unconnected with (that is, independent of) each

© 2010 The Authors Journal compilation © 2010 Teaching Statistics Trust

28 • Teaching Statistics. Volume 32, Number 1, Spring 2010

other, then there is a concise formula – the binomial distribution – for evaluating the probability of getting 3 ripe apricots and 1 unripe apricot in a selection of 4 apricots from the bin.

These four assumptions collectively imply a quite marked abstraction from the reality of the situation. Let’s see how. Firstly, there could be a third pos- sible outcome: a selected apricot might look ripe but be unripe inside. Secondly, once most of the apri- cots in the bin have been drawn out, it will not be very realistic to continue to assume that the chance of selecting a ripe apricot is constant at each draw. After all, every time an unripe apricot is selected from the bin, that increases the chance that a ripe apricot will be selected from the remaining apricots at the next draw. Finally, selections – even if they are seemingly random – will not always be indepen- dent in practice. If an apricot is drawn for eating, but turns out to be unripe, then another apricot will be immediately selected. That subsequent draw will occur precisely because the previously drawn apricot was unripe: that’s not independence of draws!

Before any analytical use is made of a statistical model, it is clearly important to validate the model, that is, to check that the model captures all the characteristics of the real world that are essential for the purpose at hand. Checking the fit of the model to the real world is a two-stage process: ensuring that the model is practically suitable, and then testing that the model provides a statistically close match between its prediction of real world data and the actual real world data. Good applied statistical work requires careful attention to both stages.

Returning to our apricots example, if the four assumptions set out above are judged sufficiently realistic, then the binomial distribution will be a practically suitable model. However, if we judge it essential to take into account the possibility of three outcomes (apricot ripe, apricot unripe or apricot looks ripe but is unripe), then a practically suitable model will be the trinomial distribution. Alterna- tively, if two possible outcomes are sufficiently real- istic, but we are not comfortable with assuming that the probability of drawing a ripe apricot remains the same each time we select an apricot, then a practically suitable model will be the hypergeomet- ric distribution. You may not be familiar with these standard probability distributions – the trinomial and the hypergeometric – but that need not be an obstacle in following this discussion. The point is that when some standard probability model is not

practically suitable, there is often another one already in the statistician’s tool kit that fits the bill more appropriately. But if there is not, then it is usually possible to develop an improved model from scratch.

Suppose we decide to adopt the binomial model as practically suitable for our apricots example. Next we test whether the model’s fit to real world data is statistically close. To generate some real world observations, we perform, say, 120 random draws of 4 apricots. Suppose we find:

• Number of ripe apricots per draw

4 3 2 1 0

• Observed number of draws

10 30 42 26 12

The binomial model defines the probability of x ripe apricots in a random draw of 4 apricots by the formula 4Cx px (1 - p)4 - x. To proceed, we need to estimate the parameter p, the probability of select- ing a ripe apricot at any one draw. We note the property of the binomial model, that the mean number of ripe apricots in selections of 4 apricots is 4p. Next, we calculate the weighted mean number of ripe apricots per draw from the above real world data. This is [10(4) + 30(3) + 42(2) + 26(1) + 12(0)] / 120 = 2.0. Equating the model’s and the real world data’s mean values, 4p* = 2.0, we find an estimate of p to be p* = 0.5. We can now calculate model- predicted probabilities, and frequencies over 120 draws:

• Number of ripe apricots per draw

4 3 2 1 0

• Predicted probability

0.0625 0.25 0.375 0.25 0.0625

• Predicted number of draws

7.5 30 45 30 7.5

The statistical fit of the binomial model can then be tested by the chi-square goodness of fit test. The test statistic is K = S[(Oi - Pi)2 / Pi], where O is the observed and P is the predicted number of draws, and i = 1, 2 . . . , 5. Small values of K (implying O and P values close together) signal a good fit. The critical value of the chi-square statistic with (here) 3 degrees of freedom and a 5% level of significance is 7.81. This means that if K > 7.81 there is a statis- tically significant difference between the model’s predictions and the real world observations, that is, the data suggest the model is a poor fit. But if K � 7.81 the data do not contradict the model, so we may act as if the model is a good fit. For these

© 2010 The Authors Journal compilation © 2010 Teaching Statistics Trust

Teaching Statistics. Volume 32, Number 1, Spring 2010 • 29

data, K = 4.27, so we conclude that the binomial probability model is a statistically close fit to prob- abilities in the real world. We may, thus, use the binomial model with some confidence in further analyses that may be of interest in connection with this crop of apricots – for example, to infer from a sample whether another bin of these apricots con- tains an unacceptable number of unripe ones.

This has been an example of validation of a univari- ate statistical model. Statistics is also concerned with bivariate models (e.g. the simple regression model) and higher order (multivariate) models. In every case, the procedure for validating a statistical model is the same: firstly, ensure that the model is practically suitable, then test that its predictions are statistically close to what is observed in the real world. With a valid model, it may then become a little easier to understand how it works – this com- plicated place, the world.

Here are our solutions to the five questions posed in our previous column:

Question 1

The medieval English Benedictine monk was the Venerable Bede (673–735), writing in the first chapter of his book De Temporum Ratione (On the Reckoning of Time, 725). His system of dacty- lonomy (Greek: daktylos (finger) + nomos (law)) or finger counting is able to represent numbers up to 9999. It was based on methods used by Arabs and Romans over many thousands of years. In his blog, Laputan Logic, John Hardy reproduces a fifteenth century illustration of Bede’s finger counting system (see http://www.laputanlogic.com/articles/2004/05/ 11-0001.html). How Bede’s system works in prac- tice is explained on pp. 201–207 of Karl Menninger’s Number Words and Number Symbols – A Cultural History of Numbers, MIT Press, 1969 (reprinted Dover, 1992). For more on this and other historical counting systems, see chapter 3 of Georges Ifrah’s The Universal History of Numbers, Harvill Press, 1998.

Question 2

From 1879 onwards, Herman Hollerith (1860– 1929) experimented – firstly at the US Census Bureau and then at the Massachusetts Institute of Technology – with punched paper tape and later with punched cards as ways of recording data, which could be read and tabulated by machine. In

1887, he received a patent for an electric punched- card reader. By 1890, he had devised an electric tabulating system, comprising punching, reading, sorting and tabulating machines. These machines were used to prepare the results of the eleventh census of the USA in 1890. This work was com- pleted in an eighth of the time needed for obtaining manually the results of the previous census in 1880. Pictures of Hollerith’s machines can be seen at http://www.officemuseum.com/data_processing_ machines.htm. An interesting article on Hollerith’s system by Mark Howells, with further pictures, is available at http://www.oz.net/~markhow/writing/ holl.htm.

Question 3

Let Ai be the event that there are initially i red balls in the urn. The answer to this question depends on what assumption we make about the value of the prob- ability P(Ai) for i = 0, 1 . . . , 10. Since we have no prior information on how the urn was initially filled, a simple option is to follow the ‘principle of insuffi- cient reason’ and assume that all the events, A0, A1 . . . , A10, are equally likely. Then, P(Ai) = 1/11. Next, using r to denote the selection of a red ball, we apply Bayes’ Theorem: P(A1|r) = P(r|A1) ¥ P(A1) / SP(r|Ai) ¥ P(Ai). Substituting assumed values: P(A1|r) = [0.1¥1/11] / S[(i/10) ¥ (1/11)] = 1/55.

Another possibility for defining the values P(Ai) is to assume that the urn was initially filled with a random selection of 10 balls from some vast reservoir of balls containing black and red in equal numbers. This implies a binomial distribution for Ai, with P(Ai) = 10Ci (0.5)i ¥ (0.5)10-i = 10Ci (0.5)10. Again, applying Bayes’ Theorem: P(A1|r) = [0.1 ¥ 10Ci (0.5)10] / S[(i/10) ¥ 10Ci (0.5)10] = 1/512.

Question 4

Intuitively, if one good doubles in price and the other halves, there should be no change in the average price level. This is what the geometric mean shows (see table 1). The arithmetic mean, however, here indicates a 6.25% rise in the average price level. There is no merit in using a measure that so contra- dicts intuition. What the arithmetic mean shows is a statistical artefact – a spurious result that arises from the way this mean is defined. Good teachers and textbooks stress that the arithmetic mean is inappropriate for averaging ratios. (It has obvious appeal, however, for those who wish to use statistics to mislead!) For another statistical artefact arising from the arithmetic mean, see the solution to the

© 2010 The Authors Journal compilation © 2010 Teaching Statistics Trust

30 • Teaching Statistics. Volume 32, Number 1, Spring 2010

previous Question 1 in our column in Teaching Sta- tistics, vol. 31, no. 3. And yet another is on display when we say that the mean number of children per family is 2.2.

A common context in which the prices of goods in a ‘basket’ are averaged between two time periods is the construction of a price index, such as the well- known consumer price index (CPI). The ‘basket’ for the CPI is a set of goods (and services) that, surveys show, a large majority of consumers buy regularly. The goods are present in the basket not in equal quantities (as implied in the present question) but in the proportions that they form of the monthly household budget. It follows that, for the CPI, the appropriate average of the ratios of prices between two periods is a weighted geometric mean.

Question 5

The power curve for the equal-tail test of H0: m = m0 against H1: m � m0, based on the mean of a random sample drawn from a Normal population, N( m, s 2), where the value of s 2 is known, typically has the shape of the solid line in figure 1.

In this illustration, the level of significance (corresponding to the ordinate at the minimum value) is 0.05, or 5%. This diagram is occasionally shown in statistics textbooks, sometimes with the comment that it resembles an upside-down Normal distribution.

It is, however, not an upside-down Normal distribu- tion. Precisely expressed, it is the sum of the ordi- nates of two S-shaped curves – a cumulative Normal distribution and a decumulative Normal distribution (i.e. a ‘reversed’ cumulative Normal distribution) – that cross at the point where m = m0 and where each has an ordinate value of 0.025.

The cumulative Normal distribution is the power curve of the one-sided test of H0: m = m0 against H1: m > m0 at the 2.5% level of significance. The decumu- lative Normal distribution is the power curve of the one-sided test of H0: m = m0 against H1: m < m0 at the

2.5% level of significance. The power curve for the two-sided test is the sum of these, in the same way as the probability of a type I error in the two-sided test is the sum of the probabilities of a type I error in each of the one-sided tests.

An alternative graphical demonstration of this explanation is found in Shoesmith, E. (1983), Simple power curve constructions, Teaching Statis- tics, 5(3), 78–83, online at http://www.rsscse.org.uk/ ts/bts/shoesmith/text.html. Shoesmith shows that when the power curve for the two-sided test is plotted on Normal probability paper, it looks like a blunted V – two diagonal lines joined by a curve around the null hypothesis value m = m0.

Now here are five new questions:

1. Lewis Carroll, author of Alice in Wonderland, also wrote a two-volume novel about the adven- tures of two children, Sylvie and Bruno. Here is a quotation from the second volume, Sylvie and Bruno Concluded:

“What do you consider the largest map that would be really useful?” “About six inches to the mile.” “Only six inches!” exclaimed Mein Herr . . . “We actually made a map of the country, on the scale of a mile to the mile!” “Have you used it much?” I enquired. “It has never been spread out, yet,” said Mein Herr: “the farmers objected: they said it would cover the whole country, and shut out the sunlight!”

Is Mein Herr’s map “on the scale of a mile to the mile” a model?

2. Under what circumstances will the Normal probability distribution be a practically suitable statistical model for human heights?

Table 1

Price per kg

Year 1 ($) Year 2 ($)

Bread 6 12 Butter 10 5 Arithmetic mean 8 8.50 Geometric mean 7.75 7.75

0 .0

0 .2

0 .4

0 .6

0 .8

1 .0

P o w e r

m m0

Fig 1.

© 2010 The Authors Journal compilation © 2010 Teaching Statistics Trust

Teaching Statistics. Volume 32, Number 1, Spring 2010 • 31

3. Which statistical model was originally devel- oped from a consideration of the deliberations of juries, and subsequently applied to the science of artillery, both in the early nineteenth century and in the mid-twentieth century, in relation to the German V2 rockets in World War II?

4. The exponential distribution is widely used as a statistical model in the analysis of queuing problems. What particular property does it have that makes it useful in such contexts?

5. A tourist information office is staffed by two officers, and the time they spend serving any cus- tomer is exponentially distributed with the same mean service time for each. When C walks in to the office, she finds A and B each being served by one of the officers. As soon as an officer is free, she will be served. What is the probability that C is the last of the three customers to leave the office?

If you have any comments on this column, please e-mail us at [email protected].

NEWS AND NOTES

As you may have inferred from the inside front cover of this issue and the previous issue, the Royal Statistical Society Centre for Statistical Education has recently relocated from Nottingham Trent Uni- versity to the University of Plymouth. Please note that the .ntu.ac.uk domain name for the RSSCSE websites and the .ntu.ac.uk email addresses are no longer in use. The new RSSCSE website address is www.rsscse.org.uk. The CensusAtSchool and ExperimentsAtSchool sites may be reached using the URL www.experimentsatschool.org.uk.

At the site, www.rsscse.org.uk, by the way, note the heading “Support for Teachers of Statistics”. Here you will find a link for information on possible financial support for professional development and conference attendance provided by the Trustees of Teaching Statistics. In particular, teachers giving presentations at International Conference On Teaching Statistics (ICOTS-8) may apply for finan- cial assistance.

ICOTS-8 will be held in Ljubljana, Slovenia, 11–16 July 2010. The theme of the conference is “Data and Context in Statistics Education: Towards an Evidence-Based Society”, and further information may be found at http://icots8.org/.

The 2010 INternational Technology, Education and Development (INTED) Conference will be held

8–10 March 2010 in Valencia, Spain. Further details may be found at www.inted2010.org.

The 4th International Conference on Mathematics and Statistics will be held 14–17 June 2010 in Athens, Greece. This conference is sponsored by the Mathematics and Statistics Research Unit of the AThens INstitute for Education and Research. Further details may be found at http://www. atiner.gr/docs/Mathematics.htm.

Those of you who find the article “How LO can you GO?” in this issue to be of interest may wish to investigate “How LO can you GO?: Using the Dice- Based Golf Game GOLO to Illustrate Inferences on Proportions and Discrete Probability Distribu- tions”, by Paul Stephenson, Mary Richardson, John Gabrosek and Diann Reischman. This article, available online at http://www.amstat.org/ publications/jse/v17n2/stephenson.html, was pub- lished by the Journal of Statistics Education.

The March issue of Significance will announce a “Crystal Ball” competition open to individuals and groups (e.g. classes). Questions such as “How many goals will be scored in the 2010 World Cup?” and “What will the Dow Jones stand at 11 July 2010?” will apparently be included. Consider incorporating this contest into your classes!

© 2010 The Author Journal compilation © 2010 Teaching Statistics Trust

32 • Teaching Statistics. Volume 32, Number 1, Spring 2010

Copyright of Teaching Statistics is the property of Blackwell Publishing Limited and its content may not be

copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written

permission. However, users may print, download, or email articles for individual use.