Marketing Assignments

iam llc
section42013.pdf

Section 4

Measures of Dispersion

Rhonda Knehans Drake

Associate Professor, New York University

Data Analytics, Interpretation and Reporting Copyright © 2013

2

• One way to aid in better understanding your sample data is with descriptive measures or statistics.

• Each basic summary statistic has its own unique purpose and, therefore, each plays a critical role in helping you describe and understand your data. However, not fully understanding what each of these statistics are measuring, how they are calculated or when to use one versus the other can cause you to draw erroneous conclusions.

• Did you know the spread of your data reveals how solid your estimate of the mean is? The tighter your spread the better your estimate of the average. In other words, we want our data sets to have as little spread as possible.

• This section will show you how to calculate the basic measures of dispersion.

Introduction

3

• The two main measures of dispersion of concern are:

1. The Range

2. The Variance and Standard Deviation

Measures of Dispersion

4

• The range is the largest observation minus the smallest observation for the variable of concern in your sample.

• The range gives a sense of the “true spread” of all observations in the data set.

• In fact, the range is sometimes referred to as the “spread.” When being reported, it is often accompanied by the minimum and maximum values observed.

• The range is denoted by the following formula:

Range = (the maximum observation) - (the minimum observation)

The Range I

5

• For example, assume we survey 20 people and ask them their online expenditures for 2009.

$100 $50 $100 $150 $125 $100 $80 $75 $125 $150 $150 $175 $50 $80 $25 $100 $100 $75 $125 $100

The Range II

What is the range of this data set?

Max = 175

Min = 25

Range = 175 – 25 = 150

6

• The standard deviation is an average measure of dispersion of each observation in your data set from the mean for the variable of concern.

• In other words, the standard deviation tells us how much, on average, the data lies from the mean.

• It will answer if the observations for the variable of concern lie tightly or are widely dispersed around the mean.

• To obtain the standard deviation, take the square root of the variance.

The Standard Deviation I

7

• The formula for the sample variance (S2) is equal to the sum of each observation minus the mean squared and then divided by your sample size minus one.

S2 =  (X – X ) 2 / (n - 1)

• We square the difference because we do not care if the observation is, for example, 5 units above or below the mean but only that it is five units away from the mean.

• We divide by n-1 rather than n because it was proven a long time ago that when you divide my n-1 it provides a much better estimate of the true population variance.

• To determine the standard deviation (S), we take the square root of the variance:

S = √ S2

The Standard Deviation II

8

• We denote the population variance and standard deviation with the Greek letter sigma:

• Population variance = σ2

• Population standard deviation = σ

The Standard Deviation III

9

• Let’s calculate the variance and standard deviation for our online spend example.

The Standard Deviation IV

10

• We can also use the shortcut formula as follows:

Using the shortcut formula for our online spend example we get:

The Standard Deviation V

11

• The larger the variation, the more spread out the data.

• The larger the variation, the more difficult it will be for the marketer to make inferences about the data.

Dispersion of data around the mean.

Variance = S2 2

Variance = S3 2

Variance = S1 2

S3 2 > S2

2 > S1

2

The Standard Deviation VI

12

• There are several rules but we will only focus on the “Empirical Rule” which state that if your sample has a symmetric and bell shaped distribution then:

Data Dispersion Rules I

 68% of the observations within a data set will lie within one standard deviation of the mean

 95% of the observations within a data set will lie within two standard deviations of the mean

 99.7% of the observations within a data set will lie within three standard deviations of the mean

13

• Pictorially this looks as follows:

The Empirical Rule



 

 -  + - 2 - 3  + 2  + 3

Data Dispersion Rules II

14

• We determine observations in our data to be outliers by examining if they lie more than 3, 4, 5, or 6 standard deviations from the mean

• It all depends on the quantities you are dealing with.

• Legitimate or not you must remove outliers before examining relationships in the data.

• SAS and SPSS offer a drop down menu where you can easily eliminate outliers.

Outliers

15

• Let’s now use Excel to calculate the variance and standard deviation.

Consider the table below which shows

the ages of 10 college graduates from NYU.

Age

23

25

24

23

24

23

20

23

22

30

Calculating the Standard Deviation I

(Using Excel - PC)

16

• To determine the measures of central tendency in Excel, we first go to data, data analysis and then click “Descriptive Statistics” (just as we did to calculate the mean).

Calculating the Standard Deviation II

(Using Excel - PC)

17

• Highlight your data in the “Input Range,” check “ Labels,” and decide your “Output Range”, then check “Summary Statistics.”

Calculating the Standard Deviation III

(Using Excel - PC)

18

• Then you have the range and standard deviation of the data set.

Calculating the Standard Deviation IV

(Using Excel - PC)

19

4.1 The range, as a measure of spread, has the disadvantage of being influenced by outliers. Illustrate this with an example.

4.2 Can the standard deviation have a negative value? Explain.

4.3 When is the value of the standard deviation for a data set zero? Give on example. Calculate the standard deviation for this example and show that its value is zero.

4.4 The following table gives the

2009 revenues (rounded to

billions of dollars) of the top

10 companies in Fortune

magazine’s Global 500

(Fortune Magazine). Find the

range, variation, and

standard deviation for these

data (by hand and Excel).

2009 Revenue

(in billions of U.S.

Dollars)

Company

1995 Revenue

(in billions of U.S.

dollars)

Mitsubishi (Japan) 184

Mitsui (Japan) 182

Itochu (Japan) 169

General Motors (U.S.) 169

Sumitomo (Japan) 168

Marubeni (Japan) 161

Ford Motor (U.S.) 137

Toyota Motor (Japan) 111

Exxon (U.S.) 110

Royal Dutch/ Shell Group (Brit/Neth) 110

Company

Section 4 Exercises I

20

Section 4 Exercises II

4.5 Consider the following two data sets.

Data Set I: 12 25 37 8 41

Data Set II: 19 32 44 15 48

Note that each value of the second data set is obtained by adding 7 to the corresponding value of the first data set. Calculate the standard deviation for each of these two data sets using the formula for sample data. Comment on the relationship between the two standard deviations.