Marketing Assignments
Section 4
Measures of Dispersion
Rhonda Knehans Drake
Associate Professor, New York University
Data Analytics, Interpretation and Reporting Copyright © 2013
2
• One way to aid in better understanding your sample data is with descriptive measures or statistics.
• Each basic summary statistic has its own unique purpose and, therefore, each plays a critical role in helping you describe and understand your data. However, not fully understanding what each of these statistics are measuring, how they are calculated or when to use one versus the other can cause you to draw erroneous conclusions.
• Did you know the spread of your data reveals how solid your estimate of the mean is? The tighter your spread the better your estimate of the average. In other words, we want our data sets to have as little spread as possible.
• This section will show you how to calculate the basic measures of dispersion.
Introduction
3
• The two main measures of dispersion of concern are:
1. The Range
2. The Variance and Standard Deviation
Measures of Dispersion
4
• The range is the largest observation minus the smallest observation for the variable of concern in your sample.
• The range gives a sense of the “true spread” of all observations in the data set.
• In fact, the range is sometimes referred to as the “spread.” When being reported, it is often accompanied by the minimum and maximum values observed.
• The range is denoted by the following formula:
Range = (the maximum observation) - (the minimum observation)
The Range I
5
• For example, assume we survey 20 people and ask them their online expenditures for 2009.
$100 $50 $100 $150 $125 $100 $80 $75 $125 $150 $150 $175 $50 $80 $25 $100 $100 $75 $125 $100
The Range II
What is the range of this data set?
Max = 175
Min = 25
Range = 175 – 25 = 150
6
• The standard deviation is an average measure of dispersion of each observation in your data set from the mean for the variable of concern.
• In other words, the standard deviation tells us how much, on average, the data lies from the mean.
• It will answer if the observations for the variable of concern lie tightly or are widely dispersed around the mean.
• To obtain the standard deviation, take the square root of the variance.
The Standard Deviation I
7
• The formula for the sample variance (S2) is equal to the sum of each observation minus the mean squared and then divided by your sample size minus one.
S2 = (X – X ) 2 / (n - 1)
• We square the difference because we do not care if the observation is, for example, 5 units above or below the mean but only that it is five units away from the mean.
• We divide by n-1 rather than n because it was proven a long time ago that when you divide my n-1 it provides a much better estimate of the true population variance.
• To determine the standard deviation (S), we take the square root of the variance:
S = √ S2
The Standard Deviation II
8
• We denote the population variance and standard deviation with the Greek letter sigma:
• Population variance = σ2
• Population standard deviation = σ
The Standard Deviation III
9
• Let’s calculate the variance and standard deviation for our online spend example.
The Standard Deviation IV
10
• We can also use the shortcut formula as follows:
Using the shortcut formula for our online spend example we get:
The Standard Deviation V
11
• The larger the variation, the more spread out the data.
• The larger the variation, the more difficult it will be for the marketer to make inferences about the data.
Dispersion of data around the mean.
Variance = S2 2
Variance = S3 2
Variance = S1 2
S3 2 > S2
2 > S1
2
The Standard Deviation VI
12
• There are several rules but we will only focus on the “Empirical Rule” which state that if your sample has a symmetric and bell shaped distribution then:
Data Dispersion Rules I
68% of the observations within a data set will lie within one standard deviation of the mean
95% of the observations within a data set will lie within two standard deviations of the mean
99.7% of the observations within a data set will lie within three standard deviations of the mean
13
• Pictorially this looks as follows:
The Empirical Rule
- + - 2 - 3 + 2 + 3
Data Dispersion Rules II
14
• We determine observations in our data to be outliers by examining if they lie more than 3, 4, 5, or 6 standard deviations from the mean
• It all depends on the quantities you are dealing with.
• Legitimate or not you must remove outliers before examining relationships in the data.
• SAS and SPSS offer a drop down menu where you can easily eliminate outliers.
Outliers
15
• Let’s now use Excel to calculate the variance and standard deviation.
Consider the table below which shows
the ages of 10 college graduates from NYU.
Age
23
25
24
23
24
23
20
23
22
30
Calculating the Standard Deviation I
(Using Excel - PC)
16
• To determine the measures of central tendency in Excel, we first go to data, data analysis and then click “Descriptive Statistics” (just as we did to calculate the mean).
Calculating the Standard Deviation II
(Using Excel - PC)
17
• Highlight your data in the “Input Range,” check “ Labels,” and decide your “Output Range”, then check “Summary Statistics.”
Calculating the Standard Deviation III
(Using Excel - PC)
18
• Then you have the range and standard deviation of the data set.
Calculating the Standard Deviation IV
(Using Excel - PC)
19
4.1 The range, as a measure of spread, has the disadvantage of being influenced by outliers. Illustrate this with an example.
4.2 Can the standard deviation have a negative value? Explain.
4.3 When is the value of the standard deviation for a data set zero? Give on example. Calculate the standard deviation for this example and show that its value is zero.
4.4 The following table gives the
2009 revenues (rounded to
billions of dollars) of the top
10 companies in Fortune
magazine’s Global 500
(Fortune Magazine). Find the
range, variation, and
standard deviation for these
data (by hand and Excel).
2009 Revenue
(in billions of U.S.
Dollars)
Company
1995 Revenue
(in billions of U.S.
dollars)
Mitsubishi (Japan) 184
Mitsui (Japan) 182
Itochu (Japan) 169
General Motors (U.S.) 169
Sumitomo (Japan) 168
Marubeni (Japan) 161
Ford Motor (U.S.) 137
Toyota Motor (Japan) 111
Exxon (U.S.) 110
Royal Dutch/ Shell Group (Brit/Neth) 110
Company
Section 4 Exercises I
20
Section 4 Exercises II
4.5 Consider the following two data sets.
Data Set I: 12 25 37 8 41
Data Set II: 19 32 44 15 48
Note that each value of the second data set is obtained by adding 7 to the corresponding value of the first data set. Calculate the standard deviation for each of these two data sets using the formula for sample data. Comment on the relationship between the two standard deviations.