Statistics Question

angelface
MTH245Lesson16Notes-1.pdf

MTH 245 Lesson 16 Notes Assessing Normality

Typically, if a statistician wants to determine if a data set comes from a population that is normally distributed, they will perform a formal goodness-of-fit test (more on these tests in Lesson 29).

However, it is possible to perform a graphical assessment to get a rough idea if the data are normally distributed. We'll do this using three different graphs: a relative frequency histogram, a modified boxplot, and a Q-Q plot (aka quantile-quantile or normal quantile plot).

Note: if you stare any of the following plots for long enough, you'll start to see patterns even when there are none. Often, a serious departure from normality will itself pretty clearly in one or more plots. If a plot looks "mostly OK," chances are the data are relatively close to normal.

Relative Frequency Histogram. Build the histogram using the method introduced in Lesson 5. Do not worry about lowest lower bound or class width; use the StatCrunch defaults. Under "Display Options:", select "Normal" from the "Overlay distrib.:" pull-down (this will overlay a bell-shaped reference curve on the histogram).

If the data are normally distributed, the histogram should conform roughly to the reference curve. It does not have to be exactly symmetrical for the data to be normal. Note also that for relatively small samples (𝑛𝑛 < 30), the histogram will not be a reliable means of assessing normality, because it is possible that the histogram will not look symmetrical even if the data are normal.

Example 1: Construct a relative frequency histogram using "Lesson 16 Test Data" data set. Do the data appear to be normally distributed? Why or why not? The data do not exactly conform to the red line, but the histogram has a mound-like shape, so it can be argued that the data are approximately normally distributed. Modified Boxplot. Build the histogram using the method introduced in Lesson 7. A normally distributed data set will have very few outliers, if any, and those that exist will be evenly distributed between the left and right sides of the plot. A plot with a large number of outliers to one side is an indication that the data have a skewed (non-normal) distribution. Example 2: Using the same data as for Example 1, construct a modified boxplot. Do the data appear to be normally distributed? Why or why not?

The boxplot has only one outlier, and its shape (narrow, roughly symmetric box with wider, roughly symmetric whiskers) is consistent with normally distributed data.

Q-Q Plot. A Q-Q plot graphs theoretical quantiles of a normal distribution against the sample quantiles of the data set. Ideally, the plot will be a straight line; usually, the pattern is linear in the center, less so on either end. If the pattern appears to be noticeably non-linear, the residuals are probably not normally distributed. The following figure illustrates typical Q-Q plot patterns for certain classes of distributions.

To construct a Q-Q plot in StatCrunch, use the following procedure:

1. Import/enter data. 2. Select "Graph", then "QQ Plot".

3. Select the appropriate data column. 4. Click "Compute!".

Example 3: Using the same data as for Example 1, construct a Q-Q plot. Do the data appear to be normally distributed? Why or why not?

The dots conform closely to the trend line, suggesting the data are normally distributed. Try It Yourself: Using the "Lesson 16 TIY" data set, construct a relative frequency histogram, a modified boxplot, and a Q-Q plot. Do the data appear to be normally distributed? Why or why not?