what could have gone wrong

profileddhaocc
session_5_sampling_methods.pdf

Sampling B U S 1 1 5 W I N T E R 2 0 1 7

The Process of Gathering Primary Data

2

Define what you want to learn

The populationDefine whom you want to learn from

Decide whether you could/want to reach all or some of them

All: Census Some: Sampling

Learn from the selected people

The research question

Research methods

Known As…

Census vs. Sampling

3

Census measures the Parameter ◦ A characteristic or measure of a

population ◦ The variable that you are

interested in (e.g., how much all of our target consumers like us)

Sampling measures the Statistic ◦ A characteristic or measure of a

sample ◦ Statistics are calculated from sample

data and used to estimate population parameters (e.g., how much this sample of consumers like us)

Census vs. Sampling

Ideal method◦ Census: Ask everyone in the city◦

Realistic (cost effective) method◦ Sample from the population: ◦ Nielsen Ratings

4

Example: TV Ratings

5

Procedure for Drawing a Sample

6

◦ Step 1: Identify the sampling frame ◦ Step 2: Select a sampling method ◦ Step 3: Determine the Sample Size ◦ Step 4: Collect data from the sample elements

Identify the sampling frame

7

Sampling frame

The LIST of population elements from which a sample (n) will be drawn ◦ May not cover the entire population – non-coverage error ◦ BMW customer vs. McDonald customer

Sampling methods

8

◦ Probability samples

◦ Everyone has an known chance

◦ Non-probability samples

◦ Everyone’s chance affected by the researcher’s judgment

Simple Random Sample (SRS)

9

Each member of the population has an equal probability to be selected.

Systematic Sampling

10

A random start with a constant skip interval.

Cluster Sampling

11

SRS among mutually exclusive clusters; census among each selected cluster.

Stratified Sampling

12

SRS within mutually exclusive strata.

Stratified vs. Cluster Sampling

13

◦ Homogeneity within group ◦ Heterogeneity between groups ◦ All groups are included ◦ Example: sample by ethnicity ◦ Purpose: increase precision; assure

representation under randomness

◦ Homogeneity between groups ◦ Heterogeneity within group ◦ Random selection of groups ◦ Example: sample by class section ◦ Purpose: decrease cost

Stratified SamplingCluster Sampling

Sampling methods

14

Non-probability Sampling

Definition: Approach whoever most accessible

Disadvantage: non- or infrequent visitors underrepresented

15

Convenience Sampling

Non-probability Sampling

Definition: Subjective choice

Disadvantage: Rely on researcher’s knowledge and experience

16

Judgment Sampling

Non-probability Sampling

Definition: Selection of additional members is based on their relationship with the current one

Disadvantage: opposite voice underrepresented

17

Snowball Sampling

Non-probability Sampling

Definition: Convenience sampling within each mutually exclusive strata

Disadvantage: non- or infrequent visitors (within the strata) underrepresented

18

Quota Sampling

The Nielsen Method

19

Error ◦ What do we wish to learn from a marketing research?

◦ Information. In most cases, the information about a population – usually the mean of a certain variable (e.g., how much our target consumers like our product).

◦ How to find the information (i.e., the parameter)?

◦ Census – ask/observe everyone, which is usually not feasible. So, we do sampling.

◦ The likely distance between a statistic and the parameter is called ERROR. A random sample will always have error, usually expressed as “± X” or “± X%”.

◦ If you replicate the research with another random sample from the same population, the finding will be “very likely” to fall in that ± range. So does the parameter.

20

Intuition

The number of people in the sample? The actual average rating in that population? Population diversity (i.e., how much the individuals’ opinion differ from each other)? The number of people in that population? The sampling method?

21

What could matter for the error?

Error

22

“confidence” “Maximum error allowed”

◦ What do we wish to learn from a marketing research?

◦ Information. In most cases, the information about a population – usually the mean of a certain variable (e.g., how much our target consumers like our product).

◦ How to find the information (i.e., the parameter)?

◦ Census – ask/observe everyone, which is usually not feasible. So, we do sampling.

◦ The likely distance between a statistic and the parameter is called ERROR. A random sample will always have error, usually expressed as “± X” or “± X%”.

◦ If you replicate the research with another random sample from the same population, the finding will be “very likely” to fall in that ± range. So does the parameter.

Sample Size Calculation *

23

2.5%2.5%

o The parameter

-1.96 SD +1.96 SD

“Z Score”

Confidence Tail Z

90% 5% 1.645

95% 2.5% 1.96

99% 0.5% 2.575

100% 0 ∞

Sample mean follows Normal distribution.

The statistic x

Errorallowed = Z * sample standard deviation Errorallowed² = Z² * sample variance = Z² * [σ2/n]

σ2 : population variance; n: sample size ⇒ n = σ2* Z² / Error²

Maximum error allowed

* Supplementary reading; not required in the exam

Sample Size Calculation

24

n = σ2*Z²/Error²

Questions: 1. To reduce error, n? 2. To increase confidence, n?

σ2: the variance of the population – how different the population elements are from each other Error: the range that the population parameter may fall into Z: (the Z score of) the confidence level that the population parameter may fall into the above range

Sample Size Calculation

25

Suppose you want to learn the average of all UCR students’ monthly living expense. You want keep the error of your result within “±$50” with 90% confidence.

How many students do you need to sample?

n = σ2*Z2/Error2

Confidence = 90%, so Z = 1.645

Error = 50

σ2 = ? σ2 learned from secondary data, experience, or preliminary survey

Sample Size Calculation

26

Suppose you want to learn the proportion of UCR students whose average monthly living expense is greater than $1,000. You want keep the error of your result within “±5%” with 95% confidence.

How many students do you need to sample?

n = σ2*Z2/Error2

Confidence = 95%, so Z = 1.96

Error = 0.05

σ2= ? Binomial Distribution (proportion data) σ2 = p*(1-p)

Stratified Sampling *

27

◦ Homogeneity within group ◦ Heterogeneity between groups ◦ All groups are included ◦ Example: sample by ethnicity ◦ Purpose: Increase precision

Stratified Sampling

Error² = z² * [σ²/n]

* Supplementary reading; not required in the exam

Stratified Sampling *

28

L: total number of strata

Nl: the population size of stratum l

nl: the sample size of stratum l

xl̅: the mean of the sample from stratum l Wl: the weight assigned to stratum l (0<Wl<1), so that

x̅ = Σ Wl* xl̅ (l = 1, 2, …, L)

Var(x̅) ≈ ΣWl²σl2/nl

Neyman Allocation: nl = n*Wlσl/ΣWlσl, with which Var(x̅) reaches the minimum value: Σ(Wlσl)2/n

* Supplementary reading; not required in the exam

Critical Thinking Questions

Sample size does not depend on population size?

29

n = σ2*Z2/Error2

◦ Nielsen TV Rating sampling ◦ 114M households ◦ 20,000 sample ◦ Reasonable?

In business practice, sample size sometimes IS a function of

population size – based on this formula, think of why.