meavar.pdf

Measures of Variability

© 2019 Laureate Education, Inc.

1

Measures of Variability Program Transcript

[MUSIC PLAYING]

STACY BJORKMAN: Why are measures of variability important? Imagine that you are considering applying to graduate school. You talk to a few schools about financial aid. Val U tells you that their average grad student takes out $15,000 in loans, with a standard deviation of $2,000. Spendy School says that their average student loan is $15,000, too, but their standard deviation is $10,000. Which school's average student loan amount gives you a better picture of what you'll really pay?

Understanding measures of variability will help you think critically about various claims. We'll revisit this example at the end of the tutorial to see if you can explain the truth behind the numbers.

Measures of variability help us understand how scores are spread out around the mean, or the center of the distribution. It answers the question-- how big are the distances between scores? Samples with large variability have lots of distance between scores. Scores are widely dispersed, so measures of central tendency will not accurately summarize the data. Samples with small variability are consistent. Scores are not very spread out, which means that the measures of central tendency will be more accurate summaries of those distributions.

Variability affects the shape of the distribution. Distributions with small variability have scores that cluster tightly around the mean, and distributions with large variability have scores that spread out widely around the mean.

Let's look at a few specific measures of variability. First, we refer to the difference between a score and the mean as the score's deviation from the mean-- the distance that score falls from what is typical. It is calculated by subtracting the mean from the score. So if the mean number of times people in a sample dine out in a month is 3.8 and one person dines out 19 times, that score's deviation from the mean is positive 15.2, because 19 minus 3.8 equals 15.2, which means that the person who dined out 19 times did so 15.2 times more than average. That's the story of the person who ate out 19 times.

Scores that fall above the mean have positive deviations, like we saw here, and scores that fall below the mean have negative deviations. It's all about where the score falls in relation to the mean. In other words, deviation tells you how much error there is between a score and the mean.

Press pause on this tutorial and calculate the deviation of the score 1.4 from the mean of 4.4. The deviation is negative 3.0. 1.4 minus 4.4 equals negative 3. Remember to always take the score minus the mean, not the other way around.

Measures of Variability

© 2019 Laureate Education, Inc.

2

Because the mean falls in the exact center of the distribution, adding the deviations in a sample together will always equal 0, because the positive deviations in the negative deviations cancel each other out. This will become important when we talk about standard deviation in a minute.

Another measure of variability, the range, tells us the distance between the lowest score and the highest score. Remember how we learned that the mode was a crude measure of central tendency because it ignored a lot of information? The same is true of the range. It tells us the difference between the two most extreme scores.

For example, the range of the dining set data is 18. 19 is the highest score, and 1 is the lowest score. Our range of 18 means that the person who dined out the most did so 18 times more than the person who dined out the least. See how the range is impacted by the outlier of 19? Press pause on this tutorial and calculate the range of the data set if 19 was deleted.

Removing the highest value reduced the range to 4. 5 minus 1 equals 4. That's a big difference. A more sophisticated measure of variability is the standard deviation. Like the mean, it takes all scores into account to provide a summary of the typical variability of the entire sample.

Remember that adding all the deviations in a sample together will always equal 0 because the positive deviations and the negative deviations cancel each other out? For that reason, we can't simply average all of the individual scores' deviations. They would average out to 0, which is why things get a little tricky. Don't worry, it's nothing you can't handle, and the standard deviation has an important story to tell.

Here we will examine just one-- the computing formula for the estimated population standard deviation. It lets us estimate the variability in a population based on the variability in a sample. It is the most widely used and easily interpreted measure of variability. You can think of the standard deviation as the average of all the deviations from the mean. Again, it tells us how accurately the mean summarizes the distribution. A large standard deviation suggests the mean is a weaker measure of what is typical-- lots of scores are far from the mean. And a small standard deviation suggests the mean is a stronger measure of what is typical-- lots of scores are close to the mean.

It is also notable that the standard deviation tells us the percentage of scores that fall in different areas under the curve. A defining characteristic of the standard deviation is that most scores, 68% to be exact, fall between one standard deviation below the mean and one standard deviation above the mean. Let's go ahead and compute a standard deviation. We'll use this data set again.

Measures of Variability

© 2019 Laureate Education, Inc.

3

To compute the standard deviation, you need three pieces of information-- the number of participants in the sample, or n, the sum of squared x's, and the squared sum of x. There are 10 participants in this sample, so n equals 10. That's the easy part. Remember that when we square a number, we multiply it by itself. To calculate the sum of the squared x's, square each raw score and then add those together. This is what it would look like for our data set.

When we add all of those squared values together, it equals 420. The sum of squared x's is 420. To calculate the squared sum of x, add the raw scores first, then square that sum. So if you add all the numbers together, it equals 38. And 38 squared is 1,444. The squared sum of x is 1,444.

You now have the basic building blocks to compute the standard deviation-- the n of 10, the sum of squared x's of 420, and the squared sum of x of 1,444. You'll use those building blocks to complete this formula. Let's do the easy part first-- the denominator, or the bottom part of the formula, n minus 1. n is 10, so 10 minus 1 equals 9. That's easy enough.

Next, let's take the squared sum of x divided by n. 1,444 divided by 10 is 144.4. Now we're ready to compute the numerator, or the top part of the formula. The sum of squared x is 420, and the squared sum of x divided by n is 144.4. So 420 minus 144.4 equals 275.6.

Finally, we take the numerator of 275.6 and divide it by the denominator of 9 to get a variance of 30.62. We know that the standard deviation is the square root of the variance, and the square root of 30.62 is 5.5. The standard deviation for this data set is 5.53, which means that, on average, people in this sample differ from the typical number of times eating out in a week by 5.53 times.

But wait, can that be? The mean is only 3.8, so a standard deviation of 5.53 is really high. That means that 68% of scores fall between negative 1.73 and 9.33 times eating out per week. That's the mean of 3.8 minus the standard deviation and plus the standard deviation.

But we can't have a negative number of times eating out per week. What happened? Remember that pesky outlier of 19, and how it affected the mean? It also affected the standard deviation by causing scores to look more spread out, on average, than they really are.

That is one of the disadvantages of the standard deviation. Like the mean, it's affected by extreme scores. Without the score of 19, the standard deviation would surely be lower. Press pause on this tutorial and try calculating the standard deviation of the data set without the number 19. Here is the data set.

Measures of Variability

© 2019 Laureate Education, Inc.

4

The new standard deviation is 1.54. Wow! What a difference. This lower standard deviation tells us that 68% of scores fall between 0.5 and 3.64 times eating out per week.

Again, remember to use caution when interpreting the results of descriptive statistics. They only summarize data, so they cannot tell us why anything is the way it is. They can only describe what we see. We cannot say if these measures of variability are good or bad, nor can we make any assumptions about why the data are the way they are.

Now that you understand the measures of variability, let's revisit the example of the two schools who presented their typical student loan amounts. If you choose Val U, what amount of student loans might you expect to take out? With a mean of $15,000 and a standard deviation of $2,000, most students can expect to take out between $13,000 and $17,000 in loans. If you choose Spendy School, who reported a mean of $15,000 and a standard deviation of $10,000, most students can expect to take out between $5,000 and $25,000. Students of Val U tend to take out similar loan amounts, whereas there is a lot of variation in the loan amounts at Spendy School. This could be a factor when making your decision.

[MUSIC PLAYING]