what could have gone wrong
Sampling B U S 1 1 5 W I N T E R 2 0 1 7
The Process of Gathering Primary Data
2
Define what you want to learn
The populationDefine whom you want to learn from
Decide whether you could/want to reach all or some of them
All: Census Some: Sampling
Learn from the selected people
The research question
Research methods
Known As…
Census vs. Sampling
3
Census measures the Parameter ◦ A characteristic or measure of a
population ◦ The variable that you are
interested in (e.g., how much all of our target consumers like us)
Sampling measures the Statistic ◦ A characteristic or measure of a
sample ◦ Statistics are calculated from sample
data and used to estimate population parameters (e.g., how much this sample of consumers like us)
Census vs. Sampling
Ideal method◦ Census: Ask everyone in the city◦
Realistic (cost effective) method◦ Sample from the population: ◦ Nielsen Ratings
4
Example: TV Ratings
5
Procedure for Drawing a Sample
6
◦ Step 1: Identify the sampling frame ◦ Step 2: Select a sampling method ◦ Step 3: Determine the Sample Size ◦ Step 4: Collect data from the sample elements
Identify the sampling frame
7
Sampling frame
The LIST of population elements from which a sample (n) will be drawn ◦ May not cover the entire population – non-coverage error ◦ BMW customer vs. McDonald customer
Sampling methods
8
◦ Probability samples
◦ Everyone has an known chance
◦ Non-probability samples
◦ Everyone’s chance affected by the researcher’s judgment
Simple Random Sample (SRS)
9
Each member of the population has an equal probability to be selected.
Systematic Sampling
10
A random start with a constant skip interval.
Cluster Sampling
11
SRS among mutually exclusive clusters; census among each selected cluster.
Stratified Sampling
12
SRS within mutually exclusive strata.
Stratified vs. Cluster Sampling
13
◦ Homogeneity within group ◦ Heterogeneity between groups ◦ All groups are included ◦ Example: sample by ethnicity ◦ Purpose: increase precision; assure
representation under randomness
◦ Homogeneity between groups ◦ Heterogeneity within group ◦ Random selection of groups ◦ Example: sample by class section ◦ Purpose: decrease cost
Stratified SamplingCluster Sampling
Sampling methods
14
Non-probability Sampling
Definition: Approach whoever most accessible
Disadvantage: non- or infrequent visitors underrepresented
15
Convenience Sampling
Non-probability Sampling
Definition: Subjective choice
Disadvantage: Rely on researcher’s knowledge and experience
16
Judgment Sampling
Non-probability Sampling
Definition: Selection of additional members is based on their relationship with the current one
Disadvantage: opposite voice underrepresented
17
Snowball Sampling
Non-probability Sampling
Definition: Convenience sampling within each mutually exclusive strata
Disadvantage: non- or infrequent visitors (within the strata) underrepresented
18
Quota Sampling
The Nielsen Method
19
Error ◦ What do we wish to learn from a marketing research?
◦ Information. In most cases, the information about a population – usually the mean of a certain variable (e.g., how much our target consumers like our product).
◦ How to find the information (i.e., the parameter)?
◦ Census – ask/observe everyone, which is usually not feasible. So, we do sampling.
◦ The likely distance between a statistic and the parameter is called ERROR. A random sample will always have error, usually expressed as “± X” or “± X%”.
◦ If you replicate the research with another random sample from the same population, the finding will be “very likely” to fall in that ± range. So does the parameter.
20
Intuition
The number of people in the sample? The actual average rating in that population? Population diversity (i.e., how much the individuals’ opinion differ from each other)? The number of people in that population? The sampling method?
21
What could matter for the error?
Error
22
“confidence” “Maximum error allowed”
◦ What do we wish to learn from a marketing research?
◦ Information. In most cases, the information about a population – usually the mean of a certain variable (e.g., how much our target consumers like our product).
◦ How to find the information (i.e., the parameter)?
◦ Census – ask/observe everyone, which is usually not feasible. So, we do sampling.
◦ The likely distance between a statistic and the parameter is called ERROR. A random sample will always have error, usually expressed as “± X” or “± X%”.
◦ If you replicate the research with another random sample from the same population, the finding will be “very likely” to fall in that ± range. So does the parameter.
Sample Size Calculation *
23
2.5%2.5%
o The parameter
-1.96 SD +1.96 SD
“Z Score”
Confidence Tail Z
90% 5% 1.645
95% 2.5% 1.96
99% 0.5% 2.575
100% 0 ∞
Sample mean follows Normal distribution.
The statistic x
Errorallowed = Z * sample standard deviation Errorallowed² = Z² * sample variance = Z² * [σ2/n]
σ2 : population variance; n: sample size ⇒ n = σ2* Z² / Error²
Maximum error allowed
* Supplementary reading; not required in the exam
Sample Size Calculation
24
n = σ2*Z²/Error²
Questions: 1. To reduce error, n? 2. To increase confidence, n?
σ2: the variance of the population – how different the population elements are from each other Error: the range that the population parameter may fall into Z: (the Z score of) the confidence level that the population parameter may fall into the above range
Sample Size Calculation
25
Suppose you want to learn the average of all UCR students’ monthly living expense. You want keep the error of your result within “±$50” with 90% confidence.
How many students do you need to sample?
n = σ2*Z2/Error2
Confidence = 90%, so Z = 1.645
Error = 50
σ2 = ? σ2 learned from secondary data, experience, or preliminary survey
Sample Size Calculation
26
Suppose you want to learn the proportion of UCR students whose average monthly living expense is greater than $1,000. You want keep the error of your result within “±5%” with 95% confidence.
How many students do you need to sample?
n = σ2*Z2/Error2
Confidence = 95%, so Z = 1.96
Error = 0.05
σ2= ? Binomial Distribution (proportion data) σ2 = p*(1-p)
Stratified Sampling *
27
◦ Homogeneity within group ◦ Heterogeneity between groups ◦ All groups are included ◦ Example: sample by ethnicity ◦ Purpose: Increase precision
Stratified Sampling
Error² = z² * [σ²/n]
* Supplementary reading; not required in the exam
Stratified Sampling *
28
L: total number of strata
Nl: the population size of stratum l
nl: the sample size of stratum l
xl̅: the mean of the sample from stratum l Wl: the weight assigned to stratum l (0<Wl<1), so that
x̅ = Σ Wl* xl̅ (l = 1, 2, …, L)
Var(x̅) ≈ ΣWl²σl2/nl
Neyman Allocation: nl = n*Wlσl/ΣWlσl, with which Var(x̅) reaches the minimum value: Σ(Wlσl)2/n
* Supplementary reading; not required in the exam
Critical Thinking Questions
Sample size does not depend on population size?
29
n = σ2*Z2/Error2
◦ Nielsen TV Rating sampling ◦ 114M households ◦ 20,000 sample ◦ Reasonable?
In business practice, sample size sometimes IS a function of
population size – based on this formula, think of why.