The Empirical Rule

angelface
MTH245Lesson14Notes-11.pdf

MTH 245 Lesson 14 Notes Graphs of the Normal Distribution

Many real life data sets produce a histogram that is "normal": symmetric, unimodal, and bell-shaped. However, not every bell-shaped distribution is normal. A normal curve has a specific relationship between its height and width. Normal curves can be tall and skinny or short and fat. The figure below shows two different normal curves drawn on the same scale. Both have 𝜇𝜇 = 100, but the one on the left has a standard deviation of 10 and the one on the right has a standard deviation of 5. Notice that the larger standard deviation makes the graph wider (more spread out) and shorter. Figure #6.2.1: Different Normal Distribution Graphs

Every normal distribution curve has the same basic features in common:

− The center, or the highest point, is at 𝜇𝜇, the population mean. − The inflection points—places where the curve changes from a "hill" to

a "valley"—are one standard deviation from 𝜇𝜇. − The area under the whole curve is exactly 1 (or 100%). Therefore, the

area under the half of the curve on either side of the mean is 0.5 (or 50%).

The following graph illustrates these features:

Unlike in a discrete probability distribution, where an event can be that a random variable takes on a single value, in a continuous probability distribution, an event must be defined as a range of values. The probability of observing a value of 𝑥𝑥 between two values 𝑎𝑎 and 𝑏𝑏, or 𝑃𝑃(𝑎𝑎 ≤ 𝑥𝑥 ≤ 𝑏𝑏), is the area under the curve between 𝑎𝑎 and 𝑏𝑏.

The 𝑧𝑧-Score

The normal distribution has the additional unique property that any value of a normally-distributed random variable x with 𝜇𝜇 ≠ 0 and 𝜎𝜎 ≠ 1 can be converted to a corresponding value z of a standard normal random variable (𝜇𝜇 = 0 and 𝜎𝜎 = 1) using the following linear transformation:

𝑧𝑧 = 𝑥𝑥 − 𝜇𝜇 𝜎𝜎

This is known as a 𝑧𝑧-score (or standard score), and it measures the distance, in units (or multiples) of standard deviation, of an observed value of a normal random variable 𝑥𝑥 from the mean 𝜇𝜇. Round 𝑧𝑧-scores to two decimal places.

To find the z-score of an observed value in a data set—that is, the distance of a data value 𝑥𝑥 from the sample mean �̅�𝑥—simply substitute �̅�𝑥 and 𝑠𝑠 for 𝜇𝜇 and 𝜎𝜎:

𝑧𝑧 = 𝑥𝑥 − �̅�𝑥 𝑠𝑠

The 𝑧𝑧-score has the following important properties:

− It is unitless. − The 𝑧𝑧-score of �̅�𝑥 is 0.00 (the mean is not different from itself). − If 𝑥𝑥 < �̅�𝑥, then 𝑥𝑥 has a negative 𝑧𝑧-score; if 𝑥𝑥 > �̅�𝑥, then 𝑥𝑥 has a positive

𝑧𝑧-score. The 𝑧𝑧-score can be used to determine significance using the Range Rule of Thumb. Recall that the Rule states that the bounds on significance are 𝑥𝑥 ± 2𝑠𝑠, and that any value of 𝑥𝑥 between those bounds is not significant. The "2" in those formulas means that those bounds have 𝑧𝑧-scores of ±2. Therefore, if 𝑥𝑥 has a 𝑧𝑧-score where −2 ≤ 𝑧𝑧 ≤ 2, then 𝑥𝑥 − 2𝑠𝑠 ≤ 𝑥𝑥 ≤ 𝑥𝑥 + 2𝑠𝑠 and 𝑥𝑥 is not significant. Similarly, if 𝑥𝑥 has a 𝑧𝑧-score greater than 2, it is significantly high, and if 𝑥𝑥 has a 𝑧𝑧-score less than −2, it is significantly low.

Example 1: Suppose the mean pulse rate for a sample of 40 adult males is �̅�𝑥 = 67.3 bpm with a standard deviation of 𝑠𝑠 = 10.3 bpm. One of the men had a pulse rate of 48 bpm. What is the z-score of this pulse reading? How many units of standard deviation from the sample mean is it, and in which direction (higher/lower)? In the context of this particular sample, would a pulse rate of 48 bpm be considered significantly low?

𝑧𝑧 = 48−67.3 10.3

= −1.87, so 𝑥𝑥 = 48 is 1.87 multiples of standard deviation less than �̅�𝑥 = 67.3. Since −2.00 < 𝑧𝑧 = −1.87 < 2.00, 𝑥𝑥 = 48 is not significantly low. Example 2: Suppose 50 students take a standardized test, their scores have a sample mean of is �̅�𝑥 = 100 with a standard deviation of 𝑠𝑠 = 16. Further suppose that one of the students scored a 135. What is the z-score of this test score? How many units of standard deviation from the sample mean is it, and in which direction? In the context of this particular sample, would a test score of 135 be considered significantly high?

𝑧𝑧 = 135−100 16

= 2.19, so 𝑥𝑥 = 135 is 2.19 multiples of standard deviation greater than �̅�𝑥 = 100. Since 𝑧𝑧 = 2.19 > 2.00, 𝑥𝑥 = 135 is significantly high.

Approximating Normal Probabilities Using the Empirical Rule

Before looking at the process for calculating exact areas under the normal distribution curve, it is useful to consider the Empirical Rule (or the 68-95- 99.7 Rule), which gives approximate values for these areas. The Empirical Rule is an approximation method; we will only use it to get an idea of the sizes of the probabilities for different shadings. The Empirical Rule for any normal distribution states that:

− Roughly 68% of the area is within one standard deviation of the mean.

− Roughly 95% of the area is within two standard deviations. − Roughly 99.7% of the area is within three standard deviations.

The following graph shows the approximate percentage of area under the curve between certain multiples of standard deviation from the mean:

To approximate normal probabilities, find the 𝑧𝑧-scores of the end points of the event interval, then use the graph to add up the approximate area of the segments between those values. This approximation only works if the end points have 𝑧𝑧-scores that are integers between −3 and 3. Warning: Only use the empirical rule for informal estimates, and only when you can't calculate the exact probability!

Example 3: Suppose male resting heart rates are normally distributed with 𝜇𝜇 = 66 beats per minute (bpm) and 𝜎𝜎 = 4 bpm. Use the Empirical Rule to answer the following questions:

a. What percentage of heart rates lie between 54 and 78 bpm?

𝑧𝑧54 = 54−66 4

= −3.00 and 𝑧𝑧78 = 78−66 4

= 3.00. Per the Empirical Rule plot, 99.7% of the area under the curve lies between ±3.00. Therefore, 99.7% of male resting heart rates lie between 54 and 78 bpm.

b. What is the approximate probability that on any random day, a random male's resting heart rate will measure between 62 and 74 bpm?

𝑧𝑧62 = 62−66 4

= −1.00 and 𝑧𝑧74 = 74−66 4

= 2.00. Per the Empirical Rule plot, 68% of the area under the curve lies between ±1.00, and another 13.5% between 1.00 and 2.00, for a total of 81.5%. Therefore, the probability that a random male's heart rate will lie between 62 and 74 bpm is 0.815.

Example 4: The body temperatures of a group of healthy adults have a bell- shaped distribution with a mean of 98.33 °F and a standard deviation of 0.58 °F. Using the Empirical Rule, find each approximate percentage below.

a. What is the approximate probability that a healthy adult will have a body temperature between 97.17 and 98.91 °F?

𝑧𝑧97.17 = 97.17−98.33

0.58 = −2.00 and 𝑧𝑧98.91 =

98.91−98.33 0.58

= 1.00. Therefore, the probability that a random male's heart rate will lie between 97.17 and 98.91 °F is 0.815.

b. What is the approximate probability that a healthy adult will have a body temperature between 96.59 and 98.91 °F?

𝑧𝑧96.59 = 96.59−98.33

0.58 = −3.00 and 𝑧𝑧98.91 =

98.91−98.33 0.58

= 1.00. Therefore, the probability that a random male's heart rate will lie between 97.17 and 98.91 °F is 0.8385.