reading assignment for statistical processing control
Do You Have Leptokurtophobia?
The abnormal need for normal distributions
1
The symptoms of leptokurtophobia are (1) routinely asking if your data are normally distributed and
(2) transforming your data to make them appear to be less leptokurtic and more “mound shaped.” If
you have exhibited either of these symptoms then you need to read this article.
The origins of leptokurtophobia go back to the surge in statistical process control (SPC) training in
the 1980s. Before this surge only two universities in the United States were teaching SPC, and only a
handful of instructors had any experience with SPC. As a result many of the SPC instructors of the
1980s were, of necessity, neophytes, and many things that were taught at that time can only be
classified as superstitious nonsense. One of these erroneous ideas was that you must have normally
distributed data before you can put your data on a process behavior chart (also known as a control
chart).
Do You Have Leptokurtophobia?
The abnormal need for normal distributions
2
When he created the process behavior chart, Shewhart was
looking for a way to separate the routine variation from the
exceptional variation. Since the exceptional variation, by
definition, dominates the routine variation, Shewhart figured
that the easiest way to tell the difference would be to filter out
the bulk of the routine variation. After looking at several
different ways of doing this he found that three-sigma limits
will cover all, or almost all, of the routine variation for virtually
all types of data.
To show how three-sigma limits do this, figure 1 contains six
different probability models for routine variation. These models
range from the uniform distribution to the exponential
distribution. (The last three models are leptokurtic.) Each of
these models is standardized so that they all have a mean of
zero and a standard deviation parameter of 1.00. Figure 1 shows
the three-sigma limits and that proportion of the area under
each curve that falls within those three-sigma limits.
Leptokurtophobia
Leptokurtophobes are those who
feel like they must transform the
data to make them appear to be
more like a normal distribution
prior to using the data in a
statistical analysis such as a
control chart. This phobia was
originally held in check by the
difficulty of performing the
nonlinear transformations
usually required. It has recently
become epidemic due to the
availability of software that will
perform the complex
transformations for the
leptokurtophobe.
Leptokurtosis literally means
“thin mound” and refers to
probability models that have a
central mound that is narrower
than that of a normal
distribution. In practice, due to
the mathematics, leptokurtosis
actually refers to those
probability models having
heavier tails than the normal
distribution. By a wide margin,
most leptokurtic distributions
are also skewed.
Do You Have Leptokurtophobia?
The abnormal need for normal distributions
3
Figure 1: How three-sigma limits filter out virtually all of the routine variation regardless of the
probability model used
There are four lessons that can be learned from figure 1.
• The first lesson of figure 1 is that three-sigma limits will filter out virtually all of the
routine variation regardless of the shape of the histogram.
Do You Have Leptokurtophobia?
The abnormal need for normal distributions
4
These six models are radically different, yet in spite of these differences, three-sigma limits cover 98
percent to 100 percent of the area under each curve.
• The second lesson is that any data point that falls outside the three-sigma limits is a
potential signal of a process change.
Since it will be a rare event for routine variation to take you outside the three-sigma limits, it is more
likely that any point that falls outside these limits is a signal of a process change.
• The third lesson is that symmetric, three-sigma limits work with skewed data.
Four of the six models shown are skewed. As we scan down the figure we see that no matter how
skewed the model, no matter how heavy the tail becomes, the three-sigma limits are stretched at
essentially the same rate as the tail. This means that the length of the elongated tail will effectively
determine the three-sigma distance in each case, and that three-sigma limits will cover the bulk of
the elongated tail no matter how skewed the data become.
“But that certainly makes the other limit look silly.” Yes, it does. Here we need to pause and think
about those situations where we have skewed data. In most cases skewed data occur when the data
pile up against a barrier or boundary condition. Whenever a boundary value falls within the
computed limits, the boundary takes precedence over the computed limit, and we end up with a one-
sided chart. When this happens the remaining limit covers the long tail and allows us to separate the
routine variation from potential signals of deviation away from the boundary. Which is how
symmetric, three-sigma limits can work with skewed data.
• The fourth lesson is that any uncertainty in where we draw the three-sigma lines will
not greatly affect the coverage of the limits.
All of the curves are so flat by the time they reach the neighborhood of the three-sigma limits that
any errors we may make when we estimate the limits will have, at most, a minimal effect upon how
the chart works.
The six probability models in figure 1 effectively summarize what was found when this author looked
at more than 1,100 different probability models from seven commonly used families of models.
These 1,143 models effectively covered all of the shape characterization plane, with 916 mound-
shaped models, 182 J-shaped models, and 45 U-shaped models. Eleven hundred and twelve of these
Do You Have Leptokurtophobia?
The abnormal need for normal distributions
5
models (or 97.3%) had better than 97.5 percent of their area covered by symmetric three-sigma
limits.
Thus, three-sigma limits work by brute force. They are sufficiently general to work with all types and
shapes of histograms. They work with skewed data, and they work even when the limits are based on
few data.
To illustrate this point, I used the exponential probability model from figure 1 to generate the 100
values shown in rows in the table in figure 2. The histogram for these values is shown in figure 3.
Since such values should, by definition, display only routine variation, we would hope to find almost
all of the observations within the limits in figure 4. We do. Hence, the process behavior chart will
work as advertised even with skewed data.
Figure 2: 100 observations from the standardized exponential distribution
Figure 3: Histogram of 100 exponential observations
Do You Have Leptokurtophobia?
The abnormal need for normal distributions
6
Figure 4: X chart for 100 exponential observations
Therefore, we do not have to pre-qualify our data before we place them on a process behavior chart.
We do not need to check the data for normality, nor do we need to define a reference distribution
prior to computing limits. Anyone who tells you anything to the contrary is simply trying to
complicate your life unnecessarily.
Transformations of the data
“But the software suggests transforming the data!” Such advice is simply another piece of confusion.
The fallacy of transforming the data is as follows.
The first principle for understanding data is that no data have meaning apart from their context.
Analysis begins with context, is driven by context, and ends with the results being interpreted in the
context of the original data. This principle requires that there must always be a link between what
you do with the data and the original context for the data. Any transformation of the data risks
breaking this linkage.
If a transformation makes sense both in terms of the original data and the objectives of the analysis,
then it will be okay to use that transformation. Transformations of this type might be things like the
use of daily or weekly averages in place of hourly values, or the use of proportions or rates in place of
counts to take into account the differing areas of opportunity in different time periods.
Only you as the user can determine when a transformation will make sense in the context of the data.
(The software cannot do this because it will never know the context.) Moreover, since these sensible
transformations will tend to be fairly simple in nature, they do not tend to distort the data.
A second class of transformations would be those that rescale the data in order to achieve certain
statistical properties. (These are the only type of transformations that any software can suggest.)
Here the objective is usually to make the data appear to be more “normally distributed” in order to
Do You Have Leptokurtophobia?
The abnormal need for normal distributions
7
have an “estimate of dispersion that is independent of the estimate of location.” Unfortunately, these
transformations will tend to be very complex and nonlinear in nature, involving exponential, inverse
exponential, or logarithmic functions. (And just what does the logarithm of the percentage of on-
time shipments represent?) These nonlinear transformations will distort the data in two ways: at one
end of the histogram, values that were originally far apart will now be close together; at the other end
of the histogram, values that were originally close together will now be far apart.
To illustrate the effect of transformations to achieve statistical properties we will use the hot metal
transit times shown in rows in the table in figure 5. These values are the times (to the nearest 5
minutes) between the phone call alerting the steel furnace that a load of hot metal was on the way
and the actual arrival of that load at the steel furnace ladle house.
Figure 5: The hot metal transit times in minutes
Figure 6: Histogram of the hot metal transit times
Do You Have Leptokurtophobia?
The abnormal need for normal distributions
8
Figure 7: Histogram of the logarithms of hot metal transit times
Given the skewed nature of the data in figure 6 some programs would suggest using a logarithmic
transformation. Taking the natural logarithm of each of these transit times' results in the histogram
in figure 7. (The horizontal scales show both the original and transformed values.) Notice how the
values on the left of figure 7 are spaced out while those on the right are crowded together. After the
transformation the distance from 20 to 25 minutes is about the same size as the distance from 140 to
180 minutes. How could you begin to explain this to your boss?
By itself, this distortion of the data is sufficient to call into question the practice of transforming the
data to achieve statistical properties. However, the impact of these non-linear transformations is not
confined to the histograms
Figure 8 shows the X Chart for the original, untransformed data of the table in figure 5. Eleven of the
141 transit times are above the upper limit, confirming the impression given by the histogram that
these data come from a mixture of at least two different processes. Even after the steel furnace gets
the phone call, they still have no idea when the hot metal will arrive at the ladle house.
Figure 8: X chart for the hot metal transit times
Do You Have Leptokurtophobia?
The abnormal need for normal distributions
9
However, if we transform the data before we put them on a process behavior chart we end up with
figure 9. There we find no points outside the limits!
Figure 9: X chart for the logarithms of the hot metal transit times
Clearly the logarithmic transformation has obliterated the signals. What good is a transformation
that changes the message contained within the data? The transformation of the data to achieve
statistical properties is simply a complex way of distorting both the data and the truth.
The results shown here are typical of what happens with nonlinear transformations of the original
data. These transformations hide the signals contained within the data simply because they are based
upon computations that presume there are no signals within the data.
To see how the computations do this, we need to pause to consider the nature of the formulas for
common descriptive statistics. For a descriptive measure of location we usually use the average,
which is simply based upon the sum of the data. However, once we leave the average behind, the
formulas become much more complex. For a descriptive measure of dispersion we commonly use the
global standard deviation statistic, which is a function of the squared deviations from the average.
For descriptive measures of shape we commonly use the skewness and kurtosis statistics which,
respectively, depend upon the third and fourth powers of the deviations of the data from the average.
When we aggregate the data together in this manner and use the second, third, and fourth powers of
the distance between each observation and the average value, we are implicitly assuming that these
seven computations make sense. Whether they be measures of dispersion, or measures of skewness,
or even measures of kurtosis, any high-order descriptive statistic that is computed globally is
implicitly based upon a very strong assumption that the data are homogeneous.
When the data are not homogeneous it is not the shape of the histogram that is wrong, but the
computation and use of the descriptive statistics that is erroneous. We do not need to distort the
histogram to make the transformed values more homogeneous, but we need to stop and question
what the lack of homogeneity means in the context of the original observations.
Do You Have Leptokurtophobia?
The abnormal need for normal distributions
10
So how can we determine when a data set is homogeneous? That is the purpose of the process
behavior chart! Transforming the data to achieve statistical properties prior to placing them on a
process behavior chart is an example of getting everything backwards. It assumes that we need to
make the data more homogeneous prior to checking them for homogeneity. Any recommendation
regarding the transformation of the data prior to placing them on a process behavior chart reveals a
fundamental lack of understanding about the purpose of process behavior charts.
Shewhart’s approach, with its generic three-sigma limits computed empirically from the data, does
not even require the specification of a probability model. In fact, on page 54 of Statistical Method
from the Viewpoint of Quality Control, Shewhart wrote “… we are not concerned with the functional
form of the universe [i.e., the probability model], but merely with the assumption that a universe
exists.” [Italics in the original.]
When you transform the data to achieve statistical properties you deceive both yourself and everyone
else who is not sophisticated enough to catch you in your deception. When you check your data for
normality prior to placing them on a process behavior chart you are practicing statistical voodoo.
Transforming the data prior to using them on a process behavior chart is not only bad advice, it is
also an outright mistake.
Whenever the teachers lack understanding, superstitious nonsense is inevitable. Until you learn to
separate myth from fact you will be fair game for those who were taught the nonsense. And you may
end up with leptokurtophobia without even knowing it.
DISCUSS ( 5 )HIDE COMMENTS
LOGIN TO COMMENT( LOGIN / REGISTER )
ABOUT THE AUTHOR
Donald J. Wheeler
Dr. Donald J. Wheeler is a Fellow of both the American Statistical Association and the American
Society for Quality, and is the recipient of the 2010 Deming Medal. As the author of 25 books and
hundreds of articles, he is one of the leading authorities on statistical process control and applied
data analysis. Find out more about Dr. Wheeler’s books at www.spcpress.com
Dr. Wheeler welcomes your questions. You can contact him at [email protected].
Do You Have Leptokurtophobia?
The abnormal need for normal distributions
11