Marketing Assignments
Section 3
Measures of Central Tendency
Rhonda Knehans Drake
Associate Professor, New York University
Data Analytics, Interpretation and Reporting Copyright © 2013
2
• One way to aid in better understanding your sample data is with descriptive measures or statistics.
• Each basic summary statistic has its own unique purpose and, therefore, each plays a critical role in helping you describe and understand your data. However, not fully understanding what each of these statistics are measuring and how they are calculated or when to use one versus the other can cause you to draw erroneous conclusions.
• For example, Did you know that the average or mean is quite sensitive to extreme data values (outliers) which could cause you to make incorrect conclusions based on this data? Do you know what to do in this case?
• This section will show you how to calculate the basic measures of central tendency.
Introduction
3
• One way to gain a better understanding of your quantitative data is with measures of central tendency.
• These measures tell, for example, where the center of the distribution of income levels lie.
• The Three Measures of Central Tendency are:
– Mean
– Median
– Mode
Measures of Central Tendency
4
• The Mean is the sum of all observations divided by the number of observations.
• The mean or average is the most widely used measure of central tendency of a set of observations.
• We will denote the sample mean as x (pronounced “x bar”), the population mean as (the Greek letter mu), the number of observations in a sample as n, the number of observations in a population as N, the observations for the variable of concern as x1, x2, x3, x4,… and the sum of all observations as x ( is the uppercase Greek letter sigma).
• With the sample mean we are estimating the population mean denoted as µ.
The Mean I
5
Example: Suppose the following are the prices of 5 houses sold in Seattle, in thousands of dollars.
158 189 265 127 191
What is the mean?
The Mean II
6
• Sometimes a data set may contain outliers which are extremely low or high values. They may be legitimate or not legitimate.
Example: Consider our sample prices of houses. Assume the 265 figure was 982 instead.
158 189 982 127 191
This outlier value of 982 will pull up the mean and not be a reflective measure.
New mean = $1,647 / 5 = $329 (in thousands)
So, what do we do in this case?
The Mean III
Old Mean = $186
7
• Median represents the “exact middle” observation for the variable of concern when the values in your sample are ranked from lowest to highest.
• Median is also an important measure of central tendency and is not as sensitive to outliers.
The median is determined by performing the following steps:
1. Rank the observations in your data set from the lowest value to the highest value.
2. Select the (n + 1)/2 observation in this ranked data set, where n is the size of the sample drawn.
If the sample size, n, is an even number then (n + 1)/2 will lie exactly between two observations. In this case, the median is simply the average of these two observations.
The Median I
8
Example: Consider the following 5 observations which are weight loss figures in pounds at a health club after 4 weeks for new members.
10 5 19 8 3
What is the median?
The Median II
3 5 8 10 19
9
Because the mean can be influenced by outliers the best practice is to show both the mean and median on corporate level reports and dashboards.
Corporate Reporting
10
• Mode is merely the most frequently occurring observation for the variable of concern in your sample.
• A less commonly used measure of central tendency is the mode.
The mode is determined by one of the following factors:
1. The most frequently occurring observation in the data set.
2. If all observations within the data set occur the same number of times, there is no mode.
3. If there is a tie for the most frequently occurring observation in the data set, the data set has multiple modes.
There can be more than one mode in a set of values.
The Mode I
11
• Mode is merely the most frequently occurring observation for the variable of concern in your sample.
• A less commonly used measure of central tendency is the mode.
The mode is determined by one of the following factors:
1. The most frequently occurring observation in the data set.
2. If all observations within the data set occur the same number of times, there is no mode.
3. If there is a tie for the most frequently occurring observation in the data set, the data set has multiple modes.
There can be more than one mode in a set of values.
The Mode I
12
Example: Suppose the following are the ages of 10 college graduates from NYU.
23 25 24 23 24 23 20 23 22 30
What is the mode?
The Mode II
Observation Occurrences
20 1
22 1
23 4
24 2
25 1
30 1
13
The relationship between mean, median and mode is shown
below for various data sets.
Mean = Median = Mode
The mean, median and mode are all equal
when the distribution is symmetric and bell-shaped.
Mean Median Mode
The mean is less than the median and mode
when the distribution is skewed to the left.
Mode Median Mean
The mean is greater than the median and mode
when the distribution is skewed to the right.
Mean vs. Median vs. Mode
14
Let’s do an in class example.
Suppose we sample 10 customers off of the database and note their online expenditures for 2010.
$27 $55 $42 $38 $75 $62 $54 $21 $398 $42
Calculate the mean and median, and explain what is happening here?
In Class Example
21 27 38 42 42 54 55 62 75 398
Mean:
Median:
(42 + 54) / 2 = 48
What measure of central tendency is best?
15
How does Excel do this for us?
Consider the table below which shows the ages of 10 college
graduates from NYU.
Age
23
25
24
23
24
23
20
23
22
30
Calculating the Mean, Median and Mode I
(Using Excel – PC or Mac prior version 8)
16
To gain the measures of central tendency in Excel, we first go to data, data analysis and then click “Descriptive Statistics.”
Calculating the Mean, Median and Mode II
(Using Excel – PC or Mac prior version 8)
17
Highlight your data in the “Input Range,” check “ Labels,” and
decide your “Output Range”, then check “Summary Statistics.”
Calculating the Mean, Median and Mode III
(Using Excel – PC or Mac prior version 8)
18
Then you have the mean, median and mode of the data set.
Calculating the Mean, Median and Mode IV
(Using Excel – PC or Mac prior version 8)
19
Then you have the mean, median and mode of the data set.
Calculating the Mean, Median and Mode IV
(Using Excel – PC or Mac prior version 8)
Skewness and kurtosis are a
Measure of the level of
skewness the distribution
exhibits and how peaked the
distribution is.
20
Then you have the mean, median and mode of the data set.
Calculating the Mean, Median and Mode IV
(Using Excel – PC or Mac prior version 8)
The Skewness measure indicates the level of
non-symmetry. If the distribution of the data are
symmetric then skewness will be close to 0
(zero). The further from 0, the more skewed the
data. A negative value indicates a skew to the
left. Here we note the data is slightly skewed
to the right.
Kurtosis is a measure of the peakedness of the
data. Again, for data that is not excessively
peaked, kurtosis is 0 (zero). In this case our
data is somewhat peaked.
http://www.stattutorials.com/EXCEL/EXCEL-DESCRIPTIVE-STATISTICS.html
Levels of Kurtosis
Kurtosis of 0
22
When we examine the average home prices in Seattle, we may also want to compare them to the average home prices in Chicago.
For example, suppose the sample average for Seattle was $186,000 and for Chicago it was $175,000.
Is the observed difference in average home prices significant?
We will learn this in a few weeks.
Teaser for Future Discussion
23
Appendix Part 1
How to load the Analysis Tool
Pak in Windows Excel
24
• Go to the data tab.
• In all likelihood you will not see the “Data Analysis” option on the tool bar as is displayed here.
• So, here is what you do.
Appendix Part 1
25
• Click on the Office Button in
the upper left hand corner
and select “Excel Options”
Appendix Part 1
26
• Select Add-ins and then click on the Go button.
Appendix Part 1
27
• Check off Analysis ToolPak and select Ok.
• It will now be ready to use.
Appendix Part 1
28
Appendix Part 2
Loading StatPlus for Mac Users
29
Appendix Part 2
If you have a Mac, you unfortunately do not have the analysis tool pak option in
Excel. Luckily StatPlus kindly offers a free version for Mac owners with
Microsoft’s approval
http://www.analystsoft.com/en/products/statplusmacle/
30
Appendix Part 2
Once Downloaded, open the program, open the data, go to statistics, then
Basic Statistics and then Descriptive Statistics
31
Appendix Part 2
Highlight the data and click OK
32
Appendix Part 2
Voila!
33
3.1 Which of the three measures of central tendency (the mean, the median, and the mode) can be calculated for quantitative data only, and which ones can be calculated for both quantitative and qualitative data? Illustrate with examples.
3.2 Which of the three measures of central tendency (the mean, the median, and the mode) can assume more than one value for a data set? Give an example of a data set for which that summary measure assumes more than one value.
3.3 Price of cars have a distribution that is skewed to the right with outliers in the right tail. Which of the measures of central tendency is the best to summarize that data set? Explain.
3.4 The following data give the number of car thefts that occurred in a city during the past 12 days.
6 3 7 11 5 3 8 7 2 6 9 13
Find the mean, median, and mode.
Section 3 Exercises I
34
3.5 The following data give the 2010 total area of farmland (in millions of acres) for 10 states (Statistical Abstract of the United States). The data entered in that order, are for the states of Colorado, Iowa, Kansas, Minnesota, Missouri, Nebraska, North Dakota, Oklahoma, South Dakota, and Texas, respectively. (Do in Excel)
33 33 48 30 30 47 40 34 44 129
a) Calculate the mean an median for these data
b) Do theses data contain an outlier? If yes, drop this value and recalculate the mean and median. Which of the two summary measures changes by a larger amount when you drop the outlier?
c) Is the mean or the median a better summary measure for these data? Explain.
3.6 The mean 2009 income for five families was $39,520. What was the total 2009 income of these five families?
Section 3 Exercises II
35
3.7 Consider the following two data sets.
Data set 1: 12 25 37 8 41
Data set 2: 19 32 44 15 48
Notice that each value of the second data set is obtained by adding 7 to the
corresponding value of the first data set.
a) Calculate the mean for each of these two data sets.
b) Comment on the relationship between the two means.
Section 3 Exercises II