Marketing Assignments
Section 5
Data Distribution
Rhonda Knehans Drake
Associate Professor, New York University
Data Analytics, Interpretation and Reporting Copyright © 2013
2
• There are many distributional forms that our marketing data can take on.
• Most data follows a specific form of one type or another.
• Because of this fact, we can make estimates and forecasts about what our data is telling us and do so with a certain level of confidence.
Introduction
3
Examples of common forms of data in industry are:
– Time to failure follows what is known as an exponential distribution. Xerox takes advantage of this well know fact to determine servicing needs and how to set contract prices for its various equipment.
– Department stores take advantage of the fact that the period of time between the arrivals of two successive customers also follows an exponential distribution.
Examples of Data Distribution I
4
Examples of common forms of data in industry are:
– If you are interested in counting the number of occurrences of a specific event within a set time period then we have what is called the Poisson distribution.
– The number of murders in NYC during the month of April follows a Poisson distribution. Based on this fact, the city can then estimate the murder rate for the next month.
– The hypergeometric distribution is used when doing QC checking for defectives in a large lot. With this distribution, you can determine the probability that in a sample of size n from the entire lot of size N, you will accept it when in reality it exceeds your defective rate.
Examples of Data Distribution I
5
• However, the most prevalent and most important functional form for a marketer is the normal distribution.
• Also known as the bell shaped curve.
• Many data elements follow the normal distribution:
– Age
– Income Levels
– GPA’s
– Spend
– Years education
– Etc.
• And because all these measures tend to follow the normal curve, we as marketers and researchers are able to make many inferences about our metrics with a high degree of confidence.
Examples of Data Distribution II
6
• Referring back to the distribution of income from a prior chapter, we now know this data to be normally distributed.
• The distribution of income was symmetric and bell-shaped.
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
$ 1 0 ,0
0 0 -
$ 2 5 ,0
0 0
$ 2 5 ,0
0 0 -
$ 4 0 ,0
0 0
$ 4 0 ,0
0 0 -
$ 5 5 ,0
0 0
$ 5 5 ,0
0 0 -
$ 7 0 ,0
0 0
$ 7 0 ,0
0 0 -
$ 8 5 ,0
0 0
$ 8 5 ,0
0 0 -
$ 1 0 0 , 0 0 0
$ 1 0 0 , 0 0 0 -
$ 1 1 5 , 0 0 0
Incom e Categories
R e la
t iv
e F
re q u e
n c y
Histogram and polygon for the relative frequency distribution of income levels.
Normal Distribution
7
• What would you think the measure of skewness and kurtosis to be for this data?.
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
$ 1 0 ,0
0 0 -
$ 2 5 ,0
0 0
$ 2 5 ,0
0 0 -
$ 4 0 ,0
0 0
$ 4 0 ,0
0 0 -
$ 5 5 ,0
0 0
$ 5 5 ,0
0 0 -
$ 7 0 ,0
0 0
$ 7 0 ,0
0 0 -
$ 8 5 ,0
0 0
$ 8 5 ,0
0 0 -
$ 1 0 0 , 0 0 0
$ 1 0 0 , 0 0 0 -
$ 1 1 5 , 0 0 0
Incom e Categories
R e la
t iv
e F
re q u e
n c y
Histogram and polygon for the relative frequency distribution of income levels.
Normal Distribution
8
• As you recall from Section 4, we discussed the Empirical Rule. This rule stated that if the distribution of your data is symmetric and bell-shaped (now known as normally distributed data):
– 68% of the observations within the data set will lie within one standard deviation of the mean
– 95% of the observations within the data set will lie within two standard deviations of the mean
– 99.7% of the observations within the data set will lie within three standard deviations of the mean
The Spread of Normally Distributed Data I
9
• Pictorially, this looks as follows:
- + - 2 - 3 + 2 + 3
The Spread of Normally Distributed Data II
10
5.1 Rite Aid Pharmacy wishes to monitor the number of customers arriving at
the checkout counter on Sunday afternoons for staffing purposes. What
distributional form will this data follow?
5.2 How is the time to failure for GE light bulbs distributed?
5.3 How would you suspect the average age of your customer base to be
distributed?
5.4 Income on your customer database is distributed normally with a mean of
$55,000 and a standard deviation of $10,000. What percent of the
database do you estimated will have an income within the range $35,000
to $75,000.
5.5 How do the width and height of a normal distribution change when its
mean remains the same but its standard deviation decreases? Show this
graphically.
5.6 How do the width and or height of a normal distribution change when its
standard deviation remains the same but its mean increases? Show this
graphically.
Section 5 Exercises