Marketing Assignments

profileiam llc
section62013-1.pdf

Section 6

The Central Limit Theorem

Rhonda Knehans Drake

Associate Professor, New York University

Data Analytics, Interpretation and Reporting Copyright © 2013

2

• The assessment of sample means (average and percentages) are the basis of many every day business decisions.

• Therefore understanding exactly how an average is distributed is KEY to properly assessing one versus another (Pre vs. Post, Control vs. Test, etc.).

• Luck would have it that it that averages will always follow a normal distribution as n gets large (n>30).

Introduction

3

• When you take a sample from a population and calculate the average dollars spent, for example, that average or mean has certain distributional properties.

• According to the Central Limit Theorem, regardless of how the population from which we sampled is distributed, the sample mean or response rate or click through rate (for n  30) will be normally distributed with a mean equal to the mean of the population from which the sample came and a standard deviation equal to the standard deviation of the population from which the sample came divided by the square root of the sample size.

• This was proven a long time ago. And we can take advantage of it.

The Central Limit Theorem I

4

• So what does this mean….even if the distribution of dollars spent is highly skewed and not symmetric and therefore not normal at all, your statistics such as average spend will be.

• Let’s take a look at what the Central Limit Theorem is saying.

The Central Limit Theorem I

5

ACME Database

(10,000,000 Customer

Record)

Sample 1,000

(n=10,000

customers)

Sample 4

(n=10,000

customers)

Sample 3

(n=10,000

customers)

Sample 2

(n=10,000

customers)

Sample 1

(n=10,000

customers)

X1,000 = avg. income

of sample 1,000

X4= avg. income of

sample 1,000

X3= avg. income of

sample 1,000

X2 = avg. income of

sample 1,000

X1 = avg. income of

sample 1,000

The Central Limit Theorem II

• The database analyst at ACME Direct draws 1,000 random samples, of size 10,000 each, from the database and observes the data element “household income.”

6

• The analyst then calculates the mean “household income” for each of the 1,000 samples and creates a frequency distribution of the average incomes from the 1,000 samples drawn.

Income Range Frequency

Less than $10,000 32

$10,000 - $20,000 119

$20,000 - $30,000 187

$30,000 - $40,000 326

$40,000 - $50,000 192

$50,000 - $60,000 116

$60,000+ 28

Total 1,000

The Central Limit Theorem III

7

• The analyst creates a histogram using the frequency table and notes the distribution of these 1,000 sample mean income values is normally distributed (symmetric and bell-shaped).

Distribution of 1,000 Sample Means

0

50

100

150

200

250

300

350

Less than

$10,000

$10,000 -

$20,000

$20,000 -

$30,000

$30,000 -

$40,000

$40,000 -

$50,000

$50,000 -

$60,000

$60,000+

Income Range s

F re

q u

e n

c y

The Central Limit Theorem IV

8

• According to the Central Limit Theorem, the histogram will be normally distributed (bell-shaped and symmetric) with

– a mean equal to the true mean “household income” level of all people on the database and

– a standard deviation equal to the true standard deviation for all people on the database (when divided by the square root of n).

The Central Limit Theorem V

9

So important that the normal curve was printed on the 10 German

Deutsche Mark until 1993.

Gauss and the Deutsche Mark

10

Suppose you work for American Express and conducted a test to

1,000 new card members to excite spend.

• Based on your test you received an average spend value of $175 for this test with a standard deviation of $25.

• You know this is not reality because you only conducted a test.

How can you estimate the spend level for rollout to all new card members?

• Based on the CLT we can say with 95% certainty, true spend will lie some where within plus or minus 2 standard deviations of our average.

• So, in other words, the true spend should lie somewhere between $125 and $225 with 95% certainty

An Example of the CLT in Practice

11

6.1 You note the number of arrivals each day at Starbucks in Grand Central

Station between the hours of 5 pm and 6 pm for the months of April and

May. You do the same for the Starbucks at Penn Station. You calculate

the average for Grand Central and the average for Penn Station.

a) How is the variable number of arrivals between 5pm and 6pm

distributed?

b) How is the average number of arrivals for Grand Central and Penn

Station distributed?

6.2 You know income to be normally distributed on your customer file. You

sample 20 people on the file. How will the average income be distributed

for this small sample?

6.3 Income on your database is highly skewed. You sample 20 people on the

file. How will the average income be distributed for this small sample?

Section 6 Exercises I

12

6.4 Income on your database is highly skewed. You sample 1,000 people on

the file. How will the average income be distributed for this sample of size

1,000?

Section 6 Exercises II