reading assignment for statistical processing control

profilesq5zv
1InitialarticlebyWheeler-DoyouhaveLeptokurtophobia.pdf

Do You Have Leptokurtophobia?

The abnormal need for normal distributions

1

The symptoms of leptokurtophobia are (1) routinely asking if your data are normally distributed and

(2) transforming your data to make them appear to be less leptokurtic and more “mound shaped.” If

you have exhibited either of these symptoms then you need to read this article.

The origins of leptokurtophobia go back to the surge in statistical process control (SPC) training in

the 1980s. Before this surge only two universities in the United States were teaching SPC, and only a

handful of instructors had any experience with SPC. As a result many of the SPC instructors of the

1980s were, of necessity, neophytes, and many things that were taught at that time can only be

classified as superstitious nonsense. One of these erroneous ideas was that you must have normally

distributed data before you can put your data on a process behavior chart (also known as a control

chart).

Do You Have Leptokurtophobia?

The abnormal need for normal distributions

2

When he created the process behavior chart, Shewhart was

looking for a way to separate the routine variation from the

exceptional variation. Since the exceptional variation, by

definition, dominates the routine variation, Shewhart figured

that the easiest way to tell the difference would be to filter out

the bulk of the routine variation. After looking at several

different ways of doing this he found that three-sigma limits

will cover all, or almost all, of the routine variation for virtually

all types of data.

To show how three-sigma limits do this, figure 1 contains six

different probability models for routine variation. These models

range from the uniform distribution to the exponential

distribution. (The last three models are leptokurtic.) Each of

these models is standardized so that they all have a mean of

zero and a standard deviation parameter of 1.00. Figure 1 shows

the three-sigma limits and that proportion of the area under

each curve that falls within those three-sigma limits.

Leptokurtophobia

Leptokurtophobes are those who

feel like they must transform the

data to make them appear to be

more like a normal distribution

prior to using the data in a

statistical analysis such as a

control chart. This phobia was

originally held in check by the

difficulty of performing the

nonlinear transformations

usually required. It has recently

become epidemic due to the

availability of software that will

perform the complex

transformations for the

leptokurtophobe.

Leptokurtosis literally means

“thin mound” and refers to

probability models that have a

central mound that is narrower

than that of a normal

distribution. In practice, due to

the mathematics, leptokurtosis

actually refers to those

probability models having

heavier tails than the normal

distribution. By a wide margin,

most leptokurtic distributions

are also skewed.

Do You Have Leptokurtophobia?

The abnormal need for normal distributions

3

Figure 1: How three-sigma limits filter out virtually all of the routine variation regardless of the

probability model used

There are four lessons that can be learned from figure 1.

• The first lesson of figure 1 is that three-sigma limits will filter out virtually all of the

routine variation regardless of the shape of the histogram.

Do You Have Leptokurtophobia?

The abnormal need for normal distributions

4

These six models are radically different, yet in spite of these differences, three-sigma limits cover 98

percent to 100 percent of the area under each curve.

• The second lesson is that any data point that falls outside the three-sigma limits is a

potential signal of a process change.

Since it will be a rare event for routine variation to take you outside the three-sigma limits, it is more

likely that any point that falls outside these limits is a signal of a process change.

• The third lesson is that symmetric, three-sigma limits work with skewed data.

Four of the six models shown are skewed. As we scan down the figure we see that no matter how

skewed the model, no matter how heavy the tail becomes, the three-sigma limits are stretched at

essentially the same rate as the tail. This means that the length of the elongated tail will effectively

determine the three-sigma distance in each case, and that three-sigma limits will cover the bulk of

the elongated tail no matter how skewed the data become.

“But that certainly makes the other limit look silly.” Yes, it does. Here we need to pause and think

about those situations where we have skewed data. In most cases skewed data occur when the data

pile up against a barrier or boundary condition. Whenever a boundary value falls within the

computed limits, the boundary takes precedence over the computed limit, and we end up with a one-

sided chart. When this happens the remaining limit covers the long tail and allows us to separate the

routine variation from potential signals of deviation away from the boundary. Which is how

symmetric, three-sigma limits can work with skewed data.

• The fourth lesson is that any uncertainty in where we draw the three-sigma lines will

not greatly affect the coverage of the limits.

All of the curves are so flat by the time they reach the neighborhood of the three-sigma limits that

any errors we may make when we estimate the limits will have, at most, a minimal effect upon how

the chart works.

The six probability models in figure 1 effectively summarize what was found when this author looked

at more than 1,100 different probability models from seven commonly used families of models.

These 1,143 models effectively covered all of the shape characterization plane, with 916 mound-

shaped models, 182 J-shaped models, and 45 U-shaped models. Eleven hundred and twelve of these

Do You Have Leptokurtophobia?

The abnormal need for normal distributions

5

models (or 97.3%) had better than 97.5 percent of their area covered by symmetric three-sigma

limits.

Thus, three-sigma limits work by brute force. They are sufficiently general to work with all types and

shapes of histograms. They work with skewed data, and they work even when the limits are based on

few data.

To illustrate this point, I used the exponential probability model from figure 1 to generate the 100

values shown in rows in the table in figure 2. The histogram for these values is shown in figure 3.

Since such values should, by definition, display only routine variation, we would hope to find almost

all of the observations within the limits in figure 4. We do. Hence, the process behavior chart will

work as advertised even with skewed data.

Figure 2: 100 observations from the standardized exponential distribution

Figure 3: Histogram of 100 exponential observations

Do You Have Leptokurtophobia?

The abnormal need for normal distributions

6

Figure 4: X chart for 100 exponential observations

Therefore, we do not have to pre-qualify our data before we place them on a process behavior chart.

We do not need to check the data for normality, nor do we need to define a reference distribution

prior to computing limits. Anyone who tells you anything to the contrary is simply trying to

complicate your life unnecessarily.

Transformations of the data

“But the software suggests transforming the data!” Such advice is simply another piece of confusion.

The fallacy of transforming the data is as follows.

The first principle for understanding data is that no data have meaning apart from their context.

Analysis begins with context, is driven by context, and ends with the results being interpreted in the

context of the original data. This principle requires that there must always be a link between what

you do with the data and the original context for the data. Any transformation of the data risks

breaking this linkage.

If a transformation makes sense both in terms of the original data and the objectives of the analysis,

then it will be okay to use that transformation. Transformations of this type might be things like the

use of daily or weekly averages in place of hourly values, or the use of proportions or rates in place of

counts to take into account the differing areas of opportunity in different time periods.

Only you as the user can determine when a transformation will make sense in the context of the data.

(The software cannot do this because it will never know the context.) Moreover, since these sensible

transformations will tend to be fairly simple in nature, they do not tend to distort the data.

A second class of transformations would be those that rescale the data in order to achieve certain

statistical properties. (These are the only type of transformations that any software can suggest.)

Here the objective is usually to make the data appear to be more “normally distributed” in order to

Do You Have Leptokurtophobia?

The abnormal need for normal distributions

7

have an “estimate of dispersion that is independent of the estimate of location.” Unfortunately, these

transformations will tend to be very complex and nonlinear in nature, involving exponential, inverse

exponential, or logarithmic functions. (And just what does the logarithm of the percentage of on-

time shipments represent?) These nonlinear transformations will distort the data in two ways: at one

end of the histogram, values that were originally far apart will now be close together; at the other end

of the histogram, values that were originally close together will now be far apart.

To illustrate the effect of transformations to achieve statistical properties we will use the hot metal

transit times shown in rows in the table in figure 5. These values are the times (to the nearest 5

minutes) between the phone call alerting the steel furnace that a load of hot metal was on the way

and the actual arrival of that load at the steel furnace ladle house.

Figure 5: The hot metal transit times in minutes

Figure 6: Histogram of the hot metal transit times

Do You Have Leptokurtophobia?

The abnormal need for normal distributions

8

Figure 7: Histogram of the logarithms of hot metal transit times

Given the skewed nature of the data in figure 6 some programs would suggest using a logarithmic

transformation. Taking the natural logarithm of each of these transit times' results in the histogram

in figure 7. (The horizontal scales show both the original and transformed values.) Notice how the

values on the left of figure 7 are spaced out while those on the right are crowded together. After the

transformation the distance from 20 to 25 minutes is about the same size as the distance from 140 to

180 minutes. How could you begin to explain this to your boss?

By itself, this distortion of the data is sufficient to call into question the practice of transforming the

data to achieve statistical properties. However, the impact of these non-linear transformations is not

confined to the histograms

Figure 8 shows the X Chart for the original, untransformed data of the table in figure 5. Eleven of the

141 transit times are above the upper limit, confirming the impression given by the histogram that

these data come from a mixture of at least two different processes. Even after the steel furnace gets

the phone call, they still have no idea when the hot metal will arrive at the ladle house.

Figure 8: X chart for the hot metal transit times

Do You Have Leptokurtophobia?

The abnormal need for normal distributions

9

However, if we transform the data before we put them on a process behavior chart we end up with

figure 9. There we find no points outside the limits!

Figure 9: X chart for the logarithms of the hot metal transit times

Clearly the logarithmic transformation has obliterated the signals. What good is a transformation

that changes the message contained within the data? The transformation of the data to achieve

statistical properties is simply a complex way of distorting both the data and the truth.

The results shown here are typical of what happens with nonlinear transformations of the original

data. These transformations hide the signals contained within the data simply because they are based

upon computations that presume there are no signals within the data.

To see how the computations do this, we need to pause to consider the nature of the formulas for

common descriptive statistics. For a descriptive measure of location we usually use the average,

which is simply based upon the sum of the data. However, once we leave the average behind, the

formulas become much more complex. For a descriptive measure of dispersion we commonly use the

global standard deviation statistic, which is a function of the squared deviations from the average.

For descriptive measures of shape we commonly use the skewness and kurtosis statistics which,

respectively, depend upon the third and fourth powers of the deviations of the data from the average.

When we aggregate the data together in this manner and use the second, third, and fourth powers of

the distance between each observation and the average value, we are implicitly assuming that these

seven computations make sense. Whether they be measures of dispersion, or measures of skewness,

or even measures of kurtosis, any high-order descriptive statistic that is computed globally is

implicitly based upon a very strong assumption that the data are homogeneous.

When the data are not homogeneous it is not the shape of the histogram that is wrong, but the

computation and use of the descriptive statistics that is erroneous. We do not need to distort the

histogram to make the transformed values more homogeneous, but we need to stop and question

what the lack of homogeneity means in the context of the original observations.

Do You Have Leptokurtophobia?

The abnormal need for normal distributions

10

So how can we determine when a data set is homogeneous? That is the purpose of the process

behavior chart! Transforming the data to achieve statistical properties prior to placing them on a

process behavior chart is an example of getting everything backwards. It assumes that we need to

make the data more homogeneous prior to checking them for homogeneity. Any recommendation

regarding the transformation of the data prior to placing them on a process behavior chart reveals a

fundamental lack of understanding about the purpose of process behavior charts.

Shewhart’s approach, with its generic three-sigma limits computed empirically from the data, does

not even require the specification of a probability model. In fact, on page 54 of Statistical Method

from the Viewpoint of Quality Control, Shewhart wrote “… we are not concerned with the functional

form of the universe [i.e., the probability model], but merely with the assumption that a universe

exists.” [Italics in the original.]

When you transform the data to achieve statistical properties you deceive both yourself and everyone

else who is not sophisticated enough to catch you in your deception. When you check your data for

normality prior to placing them on a process behavior chart you are practicing statistical voodoo.

Transforming the data prior to using them on a process behavior chart is not only bad advice, it is

also an outright mistake.

Whenever the teachers lack understanding, superstitious nonsense is inevitable. Until you learn to

separate myth from fact you will be fair game for those who were taught the nonsense. And you may

end up with leptokurtophobia without even knowing it.

DISCUSS  ( 5 )HIDE COMMENTS

 LOGIN TO COMMENT( LOGIN / REGISTER )

ABOUT THE AUTHOR

Donald J. Wheeler

Dr. Donald J. Wheeler is a Fellow of both the American Statistical Association and the American

Society for Quality, and is the recipient of the 2010 Deming Medal. As the author of 25 books and

hundreds of articles, he is one of the leading authorities on statistical process control and applied

data analysis. Find out more about Dr. Wheeler’s books at www.spcpress.com

Dr. Wheeler welcomes your questions. You can contact him at [email protected].

Do You Have Leptokurtophobia?

The abnormal need for normal distributions

11