help with case
BUSMGT 712 Business
Analytics
Week 3
–
Sampling Distribution & Confidence Interval
Agenda
· Sampling
· Sampling Error
· Sampling Distribution
· Confidence Interval
2
Sampling,
Sampling Error,
Sampling Distribution
3
What is ‘Sampling’ and Why?
· Sampling – The process by which members of a population are selected for a sample.
· Random sampling - A sampling method to gather a representative sample of the population data.
· The purpose of sampling is to obtain sufficient information to draw a valid inference about a population.
Estimating Population Parameters
|
27 |
20 |
24 |
25 |
26 |
22 |
22 |
|
26 |
27 |
24 |
21 |
22 |
20 |
22 |
|
27 |
22 |
20 |
27 |
26 |
27 |
27 |
|
20 |
25 |
20 |
27 |
27 |
27 |
20 |
|
23 |
22 |
27 |
26 |
25 |
27 |
26 |
|
26 |
21 |
24 |
27 |
21 |
21 |
23 |
|
35 |
25 |
26 |
22 |
25 |
27 |
24 |
|
27 |
20 |
22 |
22 |
27 |
23 |
22 |
|
20 |
20 |
20 |
27 |
21 |
21 |
20 |
|
27 |
22 |
27 |
20 |
24 |
24 |
25 |
|
27 |
26 |
21 |
21 |
25 |
23 |
25 |
|
24 |
24 |
24 |
22 |
20 |
23 |
23 |
|
20 |
20 |
26 |
23 |
27 |
24 |
20 |
|
40 |
23 |
20 |
26 |
24 |
22 |
|
|
20 |
22 |
21 |
27 |
26 |
22 |
|
|
34 |
21 |
26 |
27 |
24 |
20 |
|
|
24 |
23 |
26 |
20 |
20 |
21 |
|
|
24 |
24 |
20 |
24 |
27 |
24 |
|
|
25 |
22 |
23 |
24 |
22 |
24 |
|
|
26 |
26 |
25 |
24 |
27 |
23 |
|
· For example, if we define Cohort 14 as a population which has 133 students.
· The age of every student is on the right-hand side.
· Population Mean=24
· Population Standard Deviation=3
Estimating Population Parameters
· In most cases, it is very hard to get data for a population.
· So, our dataset is more likely for a sample.
· Let’s do an experiment and randomly select 5 students from Cohort 14 and ask for their ages.
https://hotcubator.com.au/research/acomplete-guide-to-sampling-techniques/
Estimating Population Parameters
|
22 |
|
22 |
|
20 |
|
21 |
|
24 |
· For this random sample:
· Sample Mean = 22
· Sample Standard Deviation=1
Recall that
· True Population Mean=24
· True Population Standard Deviation=3
Estimating Population Parameters
· Some definitions that you have to know:
· Estimation involves assessing the value of an unknown population parameter using sample data.
· Estimators are the measures used to estimate population parameters. – E.g., sample mean, sample variance, sample proportion
· A point estimate is a single number derived from sample data that is used to estimate the value of a population parameter.
Estimating Population Parameters
|
22 |
|
22 |
|
20 |
|
21 |
|
24 |
· So, this is a case of point estimation.
· The sample mean and standard deviation are called ‘estimator’.
· Sample Mean = 22
· Sample Standard Deviation=1
Recall that
· True Population Mean=24
· True Population Standard Deviation=3
· Can this sample perfectly predict the population mean???
Estimating Population Parameters
|
22 |
|
27 |
|
20 |
|
26 |
|
23 |
· Now, I randomly select 5 students from the population again:
· Sample Mean = 24
· Sample Standard Deviation=3
Recall that
· True Population Mean=24
· True Population Standard Deviation=3
· This sample seems to be a ‘better’ random sample.
Estimating Population Parameters
|
|
Sample 1 |
Sample 2 |
Sample 3 |
Sample 4 |
Sample 5 |
Sample 6 |
Sample 7 |
Sample 8 |
Sample 9 |
Sample 10 |
|
|
24 |
22 |
27 |
22 |
24 |
20 |
20 |
26 |
20 |
26 |
|
|
28 |
27 |
26 |
21 |
23 |
26 |
25 |
35 |
40 |
20 |
|
|
20 |
20 |
27 |
20 |
22 |
23 |
25 |
27 |
20 |
27 |
|
|
22 |
26 |
20 |
21 |
22 |
24 |
23 |
20 |
34 |
22 |
|
|
27 |
23 |
23 |
24 |
27 |
22 |
20 |
27 |
24 |
25 |
|
Mean |
24 |
24 |
25 |
22 |
24 |
23 |
23 |
27 |
28 |
24 |
|
SD |
3 |
3 |
3 |
2 |
2 |
2 |
3 |
5 |
9 |
3 |
• This time, I have randomly select 5 students from Cohort 14 for 10 times, and I can obtain the ‘mean’ and ‘SD’ for each sample.
Estimating Population Parameters
|
|
Sample 1 |
Sample 2 |
Sample 3 |
Sample 4 |
Sample 5 |
Sample 6 |
Sample 7 |
Sample 8 |
Sample 9 |
Sample 10 |
|
|
24 |
22 |
27 |
22 |
24 |
20 |
20 |
26 |
20 |
26 |
|
|
28 |
27 |
26 |
21 |
23 |
26 |
25 |
35 |
40 |
20 |
|
|
20 |
20 |
27 |
20 |
22 |
23 |
25 |
27 |
20 |
27 |
|
|
22 |
26 |
20 |
21 |
22 |
24 |
23 |
20 |
34 |
22 |
|
|
27 |
23 |
23 |
24 |
27 |
22 |
20 |
27 |
24 |
25 |
|
Mean |
24 |
24 |
25 |
22 |
24 |
23 |
23 |
27 |
28 |
24 |
|
SD |
3 |
3 |
3 |
2 |
2 |
2 |
3 |
5 |
9 |
3 |
· The sample statistics (e.g. mean, s.d.) differ from sample to sample
· Among, these samples, sample 1, 2, and 10 provide perfect estimation of population mean and s.d.
· The variation in the sample also quite differ and so we can observe some ‘big’ and ‘small’ s.d.s (e.g. for sample 9, the s.d.=9; and for sample 4, s.d.=1)
Sampling Error
|
|
Sample 1 |
Sample 2 |
Sample 3 |
Sample 4 |
Sample 5 |
Sample 6 |
Sample 7 |
Sample 8 |
Sample 9 |
Sample 10 |
|
|
24 |
22 |
27 |
22 |
24 |
20 |
20 |
26 |
20 |
26 |
|
|
28 |
27 |
26 |
21 |
23 |
26 |
25 |
35 |
40 |
20 |
|
|
20 |
20 |
27 |
20 |
22 |
23 |
25 |
27 |
20 |
27 |
|
|
22 |
26 |
20 |
21 |
22 |
24 |
23 |
20 |
34 |
22 |
|
|
27 |
23 |
23 |
24 |
27 |
22 |
20 |
27 |
24 |
25 |
|
Mean |
24 |
24 |
25 |
22 |
24 |
23 |
23 |
27 |
28 |
24 |
|
SD |
3 |
3 |
3 |
2 |
2 |
2 |
3 |
5 |
9 |
3 |
· Sampling Error: The variation that occurs due to selecting a single sample from the population.
· It is primarily based on the variation in the population itself and on the size of the sample selected. Larger sample have less sampling error, but more costly to take.
Estimating Population Parameters
|
|
Sample 1 |
Sample 2 |
Sample 3 |
Sample 4 |
Sample 5 |
Sample 6 |
Sample 7 |
Sample 8 |
Sample 9 |
Sample 10 |
|
|
20 |
23 |
27 |
25 |
24 |
27 |
23 |
26 |
26 |
24 |
|
|
25 |
25 |
21 |
27 |
26 |
22 |
25 |
23 |
27 |
21 |
|
|
22 |
24 |
20 |
20 |
24 |
20 |
25 |
24 |
21 |
20 |
|
|
25 |
20 |
27 |
20 |
22 |
21 |
27 |
20 |
27 |
27 |
|
|
27 |
23 |
22 |
20 |
23 |
20 |
20 |
26 |
22 |
22 |
|
|
26 |
27 |
20 |
20 |
23 |
35 |
20 |
20 |
25 |
25 |
|
|
24 |
20 |
26 |
25 |
25 |
24 |
25 |
20 |
24 |
26 |
|
|
20 |
21 |
26 |
27 |
24 |
26 |
26 |
27 |
26 |
26 |
|
|
24 |
22 |
22 |
22 |
22 |
27 |
21 |
22 |
24 |
27 |
|
|
21 |
40 |
24 |
22 |
24 |
27 |
26 |
23 |
23 |
24 |
|
|
21 |
27 |
27 |
27 |
26 |
24 |
22 |
27 |
27 |
23 |
|
|
24 |
26 |
27 |
27 |
27 |
27 |
21 |
21 |
21 |
27 |
|
|
23 |
27 |
25 |
26 |
27 |
23 |
21 |
27 |
23 |
25 |
|
|
27 |
24 |
20 |
26 |
23 |
21 |
27 |
25 |
20 |
24 |
|
|
24 |
27 |
40 |
24 |
27 |
23 |
24 |
22 |
26 |
27 |
|
|
23 |
23 |
27 |
27 |
22 |
20 |
24 |
26 |
25 |
20 |
|
|
20 |
21 |
21 |
24 |
25 |
27 |
21 |
27 |
25 |
24 |
|
|
24 |
22 |
21 |
27 |
25 |
23 |
24 |
22 |
35 |
27 |
|
|
20 |
26 |
22 |
25 |
20 |
21 |
22 |
25 |
21 |
25 |
|
|
21 |
24 |
24 |
24 |
27 |
26 |
20 |
27 |
21 |
40 |
|
Mean |
23 |
25 |
24 |
24 |
24 |
24 |
23 |
24 |
24 |
25 |
|
S.D. |
2 |
4 |
5 |
3 |
2 |
4 |
2 |
3 |
3 |
4 |
• When we increase the sample size, it seems like the sampling error is less.
Sampling Distribution
|
|
Sample 1 |
Sample 2 |
Sample 3 |
Sample 4 |
Sample 5 |
Sample 6 |
Sample 7 |
Sample 8 |
Sample 9 |
Sample 10 |
|
|
24 |
22 |
27 |
22 |
24 |
20 |
20 |
26 |
20 |
26 |
|
|
28 |
27 |
26 |
21 |
23 |
26 |
25 |
35 |
40 |
20 |
|
|
20 |
20 |
27 |
20 |
22 |
23 |
25 |
27 |
20 |
27 |
|
|
22 |
26 |
20 |
21 |
22 |
24 |
23 |
20 |
34 |
22 |
|
|
27 |
23 |
23 |
24 |
27 |
22 |
20 |
27 |
24 |
25 |
|
Mean |
24 |
24 |
25 |
22 |
24 |
23 |
23 |
27 |
28 |
24 |
|
SD |
3 |
3 |
3 |
2 |
2 |
2 |
3 |
5 |
9 |
3 |
• If you create a histogram for all the samples’ mean, you will find in this case, it is right-skewed.
Sampling Distribution
|
|
Sample 1 |
Sample 2 |
Sample 3 |
Sample 4 |
Sample 5 |
Sample 6 |
Sample 7 |
Sample 8 |
Sample 9 |
Sample 10 |
|
|
20 |
23 |
27 |
25 |
24 |
27 |
23 |
26 |
26 |
24 |
|
|
25 |
25 |
21 |
27 |
26 |
22 |
25 |
23 |
27 |
21 |
|
|
22 |
24 |
20 |
20 |
24 |
20 |
25 |
24 |
21 |
20 |
|
|
25 |
20 |
27 |
20 |
22 |
21 |
27 |
20 |
27 |
27 |
|
|
27 |
23 |
22 |
20 |
23 |
20 |
20 |
26 |
22 |
22 |
|
|
26 |
27 |
20 |
20 |
23 |
35 |
20 |
20 |
25 |
25 |
|
|
24 |
20 |
26 |
25 |
25 |
24 |
25 |
20 |
24 |
26 |
|
|
20 |
21 |
26 |
27 |
24 |
26 |
26 |
27 |
26 |
26 |
|
|
24 |
22 |
22 |
22 |
22 |
27 |
21 |
22 |
24 |
27 |
|
|
21 |
40 |
24 |
22 |
24 |
27 |
26 |
23 |
23 |
24 |
|
|
21 |
27 |
27 |
27 |
26 |
24 |
22 |
27 |
27 |
23 |
|
|
24 |
26 |
27 |
27 |
27 |
27 |
21 |
21 |
21 |
27 |
|
|
23 |
27 |
25 |
26 |
27 |
23 |
21 |
27 |
23 |
25 |
|
|
27 |
24 |
20 |
26 |
23 |
21 |
27 |
25 |
20 |
24 |
|
|
24 |
27 |
40 |
24 |
27 |
23 |
24 |
22 |
26 |
27 |
|
|
23 |
23 |
27 |
27 |
22 |
20 |
24 |
26 |
25 |
20 |
|
|
20 |
21 |
21 |
24 |
25 |
27 |
21 |
27 |
25 |
24 |
|
|
24 |
22 |
21 |
27 |
25 |
23 |
24 |
22 |
35 |
27 |
|
|
20 |
26 |
22 |
25 |
20 |
21 |
22 |
25 |
21 |
25 |
|
|
21 |
24 |
24 |
24 |
27 |
26 |
20 |
27 |
21 |
40 |
|
Mean |
23 |
25 |
24 |
24 |
24 |
24 |
23 |
24 |
24 |
25 |
|
S.D. |
2 |
4 |
5 |
3 |
2 |
4 |
2 |
3 |
3 |
4 |
• However, in this case, when the sample size increased, the distribution of 20 sample means is quite like symmetric ‘bell’ shaped (normal distribution).
Sampling Distribution
• Sampling Distribution – The probability distribution of a sample statistic for all possible samples of a given size n for a given population.
n=5 n=20
Data Distribution vs. Sampling Distribution
Population Distribution (𝑥𝑖) Sampling Distribution (𝑥ҧ) Sampling Distribution (𝑥ҧ)
N=133 n=5 n=20
11
11
11
Central Limit Theorem
· When the population has a normal distribution, the sampling distribution of is normally distributed for any sample size
x
· When the population does not have a normal distribution, the central limit theorem is helpful in identifying the shape of the sampling distribution of
x
· Central Limit Theorem: In selecting random samples of size n from a population, the sampling distribution of the sample mean can be
x
approximated by a normal distribution as the sample size becomes large.
Central Limit Theorem
· Left panel shows that none of the populations are normally distributed
· Bottom three panels show the shape of the sampling distribution for samples n=2, n=5, and n=30
· General statistical practice is to assume that, for most applications, the sampling distribution can be approximated by normal distribution whenever the sample size is 30 or more
A Comparison of the Sampling Distributions of for Simple
x
Random Samples of n=20 and n=50 Students
Confidence Interval
Interval Estimation
· Because a point estimator (e.g. Sample Mean) CANNOT be expected to provide the exact value of a population parameter,
· Therefore, interval estimation is frequently used to generate an estimate of the value of a population parameter
· An interval estimate is often computer by adding and subtracting a value, called the margin of error, to the point estimate
· The general form of an interval estimate is:
Point estimate Margin of error
Interval Estimation
· However, the question is, how to decide the size for ‘Margin of Error’???
· Now, we have to rely on the ‘Empirical Rules’
19
19
19
Interval Estimation - Empirical Rules
· For many data sets encountered in practice:
· Approximately 68% of the observations fall within one
standard deviation of the mean
x
−s and
x
+s z=1
· Approximately 95% fall within two standard deviations of
the mean
x
2 .s z=2
· Approximately 99.7% fall within three standard deviations
of the mean
x
3 .s z=3
· These rules are commonly used to characterize the natural variation in manufacturing processes and other business phenomena.
Internal Estimation
Interval Estimation of the Population Mean (n>=30)
For any normally distributed random variable:
· 90% of the values lie within 1.645 standard errors of the mean • 95% of the values lie within 1.960 standard errors of the mean • 99% of the values lie within 2.576 standard errors of the mean
· Formula for a sample standard error:
What is ‘Standard Error of the Mean’
· The standard error of the mean is the standard deviation of the sampling distribution of the mean.
· So, the difference between SD and SE is:
Sampling Distribution of the Sample Mean
|
|
|
19
19
Internal Estimation - Example
|
|
Sample 4 |
|
|
22 |
|
|
21 |
|
|
20 |
|
|
21 |
|
|
24 |
|
|
22 |
|
|
|
If we are ‘unlucky’ and we have obtained sample 4:
Sample Mean=22
Sample SD=2 n=5
SE
0.89
If your confidence level is 68%, then the Z-score for the confidence interval should be: 1
UCL=Mean+z*SE=22+1*0.89=22.9
LCL=Mean+z*SE=22-1*0.89=21.1
Internal Estimation - Example
|
|
Sample 4 |
|
|
22 |
|
|
21 |
|
|
20 |
|
|
21 |
|
|
24 |
|
|
22 |
|
|
|
If we are ‘unlucky’ and we have obtained sample 4:
Sample Mean=22
Sample SD=2 n=5
SE
0.89
If your confidence level is 90%, then the Z-score for the confidence interval should be: 1.65
UCL=Mean+z*SE=22+1.65*0.89=23.5
LCL=Mean+z*SE=22-1.65*0.89=20.5
Internal Estimation - Example
|
|
Sample 4 |
|
|
22 |
|
|
21 |
|
|
20 |
|
|
21 |
|
|
24 |
|
|
22 |
|
|
|
If we are ‘unlucky’ and we have obtained sample 4:
Sample Mean=22
Sample SD=2 n=5
SE
0.89
If your confidence level is 95%, then the Z-score for the confidence interval should be: 1.96
UCL=Mean+z*SE=22+1.96*0.89=23.8
LCL=Mean+z*SE=22-1.96*0.89=20.2
Internal Estimation - Example
|
|
Sample 4 |
|
|
22 |
|
|
21 |
|
|
20 |
|
|
21 |
|
|
24 |
|
|
22 |
|
|
|
If we are ‘unlucky’ and we have obtained sample 4:
Sample Mean=22
Sample SD=2 n=5
SE
0.89
If your confidence level is 99%, then the Z-score for the confidence interval should be: 2.58
UCL=Mean+z*SE=22+2.58*0.89=24.3
LCL=Mean+z*SE=22-2.58*0.89=19.7
Mean
2
SD
Does it work?
29
Mean
2
SD
Does it work?
29
Mean
2
SD
29
Thank you
19
19
19