help with case

profileorangepink
Week3-SamplingDistributionandConfidenceInterval.docx

BUSMGT 712 Business

Analytics

Week 3

Sampling Distribution & Confidence Interval

Agenda

· Sampling

· Sampling Error

· Sampling Distribution

· Confidence Interval

2

Sampling,

Sampling Error,

Sampling Distribution

3

What is ‘Sampling’ and Why?

· Sampling – The process by which members of a population are selected for a sample.

· Random sampling - A sampling method to gather a representative sample of the population data.

· The purpose of sampling is to obtain sufficient information to draw a valid inference about a population.

Estimating Population Parameters

27

20

24

25

26

22

22

26

27

24

21

22

20

22

27

22

20

27

26

27

27

20

25

20

27

27

27

20

23

22

27

26

25

27

26

26

21

24

27

21

21

23

35

25

26

22

25

27

24

27

20

22

22

27

23

22

20

20

20

27

21

21

20

27

22

27

20

24

24

25

27

26

21

21

25

23

25

24

24

24

22

20

23

23

20

20

26

23

27

24

20

40

23

20

26

24

22

20

22

21

27

26

22

34

21

26

27

24

20

24

23

26

20

20

21

24

24

20

24

27

24

25

22

23

24

22

24

26

26

25

24

27

23

· For example, if we define Cohort 14 as a population which has 133 students.

· The age of every student is on the right-hand side.

· Population Mean=24

· Population Standard Deviation=3

Estimating Population Parameters

· In most cases, it is very hard to get data for a population.

· So, our dataset is more likely for a sample.

· Let’s do an experiment and randomly select 5 students from Cohort 14 and ask for their ages.

https://hotcubator.com.au/research/acomplete-guide-to-sampling-techniques/

Estimating Population Parameters

22

22

20

21

24

· For this random sample:

· Sample Mean = 22

· Sample Standard Deviation=1

Recall that

· True Population Mean=24

· True Population Standard Deviation=3

Estimating Population Parameters

· Some definitions that you have to know:

· Estimation involves assessing the value of an unknown population parameter using sample data.

· Estimators are the measures used to estimate population parameters. – E.g., sample mean, sample variance, sample proportion

· A point estimate is a single number derived from sample data that is used to estimate the value of a population parameter.

Estimating Population Parameters

22

22

20

21

24

· So, this is a case of point estimation.

· The sample mean and standard deviation are called ‘estimator’.

· Sample Mean = 22

· Sample Standard Deviation=1

Recall that

· True Population Mean=24

· True Population Standard Deviation=3

· Can this sample perfectly predict the population mean???

Estimating Population Parameters

22

27

20

26

23

· Now, I randomly select 5 students from the population again:

· Sample Mean = 24

· Sample Standard Deviation=3

Recall that

· True Population Mean=24

· True Population Standard Deviation=3

· This sample seems to be a ‘better’ random sample.

Estimating Population Parameters

Sample 1

Sample 2

Sample 3

Sample 4

Sample 5

Sample 6

Sample 7

Sample 8

Sample 9

Sample 10

24

22

27

22

24

20

20

26

20

26

28

27

26

21

23

26

25

35

40

20

20

20

27

20

22

23

25

27

20

27

22

26

20

21

22

24

23

20

34

22

27

23

23

24

27

22

20

27

24

25

Mean

24

24

25

22

24

23

23

27

28

24

SD

3

3

3

2

2

2

3

5

9

3

• This time, I have randomly select 5 students from Cohort 14 for 10 times, and I can obtain the ‘mean’ and ‘SD’ for each sample.

Estimating Population Parameters

Sample 1

Sample 2

Sample 3

Sample 4

Sample 5

Sample 6

Sample 7

Sample 8

Sample 9

Sample 10

24

22

27

22

24

20

20

26

20

26

28

27

26

21

23

26

25

35

40

20

20

20

27

20

22

23

25

27

20

27

22

26

20

21

22

24

23

20

34

22

27

23

23

24

27

22

20

27

24

25

Mean

24

24

25

22

24

23

23

27

28

24

SD

3

3

3

2

2

2

3

5

9

3

· The sample statistics (e.g. mean, s.d.) differ from sample to sample

· Among, these samples, sample 1, 2, and 10 provide perfect estimation of population mean and s.d.

· The variation in the sample also quite differ and so we can observe some ‘big’ and ‘small’ s.d.s (e.g. for sample 9, the s.d.=9; and for sample 4, s.d.=1)

Sampling Error

Sample 1

Sample 2

Sample 3

Sample 4

Sample 5

Sample 6

Sample 7

Sample 8

Sample 9

Sample 10

24

22

27

22

24

20

20

26

20

26

28

27

26

21

23

26

25

35

40

20

20

20

27

20

22

23

25

27

20

27

22

26

20

21

22

24

23

20

34

22

27

23

23

24

27

22

20

27

24

25

Mean

24

24

25

22

24

23

23

27

28

24

SD

3

3

3

2

2

2

3

5

9

3

· Sampling Error: The variation that occurs due to selecting a single sample from the population.

· It is primarily based on the variation in the population itself and on the size of the sample selected. Larger sample have less sampling error, but more costly to take.

Estimating Population Parameters

Sample 1

Sample 2

Sample 3

Sample 4

Sample 5

Sample 6

Sample 7

Sample 8

Sample 9

Sample 10

20

23

27

25

24

27

23

26

26

24

25

25

21

27

26

22

25

23

27

21

22

24

20

20

24

20

25

24

21

20

25

20

27

20

22

21

27

20

27

27

27

23

22

20

23

20

20

26

22

22

26

27

20

20

23

35

20

20

25

25

24

20

26

25

25

24

25

20

24

26

20

21

26

27

24

26

26

27

26

26

24

22

22

22

22

27

21

22

24

27

21

40

24

22

24

27

26

23

23

24

21

27

27

27

26

24

22

27

27

23

24

26

27

27

27

27

21

21

21

27

23

27

25

26

27

23

21

27

23

25

27

24

20

26

23

21

27

25

20

24

24

27

40

24

27

23

24

22

26

27

23

23

27

27

22

20

24

26

25

20

20

21

21

24

25

27

21

27

25

24

24

22

21

27

25

23

24

22

35

27

20

26

22

25

20

21

22

25

21

25

21

24

24

24

27

26

20

27

21

40

Mean

23

25

24

24

24

24

23

24

24

25

S.D.

2

4

5

3

2

4

2

3

3

4

• When we increase the sample size, it seems like the sampling error is less.

Sampling Distribution

Sample 1

Sample 2

Sample 3

Sample 4

Sample 5

Sample 6

Sample 7

Sample 8

Sample 9

Sample 10

24

22

27

22

24

20

20

26

20

26

28

27

26

21

23

26

25

35

40

20

20

20

27

20

22

23

25

27

20

27

22

26

20

21

22

24

23

20

34

22

27

23

23

24

27

22

20

27

24

25

Mean

24

24

25

22

24

23

23

27

28

24

SD

3

3

3

2

2

2

3

5

9

3

• If you create a histogram for all the samples’ mean, you will find in this case, it is right-skewed.

Sampling Distribution

Sample 1

Sample 2

Sample 3

Sample 4

Sample 5

Sample 6

Sample 7

Sample 8

Sample 9

Sample 10

20

23

27

25

24

27

23

26

26

24

25

25

21

27

26

22

25

23

27

21

22

24

20

20

24

20

25

24

21

20

25

20

27

20

22

21

27

20

27

27

27

23

22

20

23

20

20

26

22

22

26

27

20

20

23

35

20

20

25

25

24

20

26

25

25

24

25

20

24

26

20

21

26

27

24

26

26

27

26

26

24

22

22

22

22

27

21

22

24

27

21

40

24

22

24

27

26

23

23

24

21

27

27

27

26

24

22

27

27

23

24

26

27

27

27

27

21

21

21

27

23

27

25

26

27

23

21

27

23

25

27

24

20

26

23

21

27

25

20

24

24

27

40

24

27

23

24

22

26

27

23

23

27

27

22

20

24

26

25

20

20

21

21

24

25

27

21

27

25

24

24

22

21

27

25

23

24

22

35

27

20

26

22

25

20

21

22

25

21

25

21

24

24

24

27

26

20

27

21

40

Mean

23

25

24

24

24

24

23

24

24

25

S.D.

2

4

5

3

2

4

2

3

3

4

• However, in this case, when the sample size increased, the distribution of 20 sample means is quite like symmetric ‘bell’ shaped (normal distribution).

Sampling Distribution

Sampling Distribution – The probability distribution of a sample statistic for all possible samples of a given size n for a given population.

n=5 n=20

Data Distribution vs. Sampling Distribution

Population Distribution (𝑥𝑖) Sampling Distribution (𝑥ҧ) Sampling Distribution (𝑥ҧ)

N=133 n=5 n=20

11

11

11

Central Limit Theorem

· When the population has a normal distribution, the sampling distribution of is normally distributed for any sample size x

· When the population does not have a normal distribution, the central limit theorem is helpful in identifying the shape of the sampling distribution of x

· Central Limit Theorem: In selecting random samples of size n from a population, the sampling distribution of the sample mean can be x approximated by a normal distribution as the sample size becomes large.

Central Limit Theorem

· Left panel shows that none of the populations are normally distributed

· Bottom three panels show the shape of the sampling distribution for samples n=2, n=5, and n=30

· General statistical practice is to assume that, for most applications, the sampling distribution can be approximated by normal distribution whenever the sample size is 30 or more

A Comparison of the Sampling Distributions of for Simple x Random Samples of n=20 and n=50 Students

Confidence Interval

Interval Estimation

· Because a point estimator (e.g. Sample Mean) CANNOT be expected to provide the exact value of a population parameter,

· Therefore, interval estimation is frequently used to generate an estimate of the value of a population parameter

· An interval estimate is often computer by adding and subtracting a value, called the margin of error, to the point estimate

· The general form of an interval estimate is:

Point estimate  Margin of error

Interval Estimation

· However, the question is, how to decide the size for ‘Margin of Error’???

· Now, we have to rely on the ‘Empirical Rules’

19

19

19

Interval Estimation - Empirical Rules

· For many data sets encountered in practice:

· Approximately 68% of the observations fall within one

standard deviation of the mean x s and x +s z=1

· Approximately 95% fall within two standard deviations of

the mean x  2 .s z=2

· Approximately 99.7% fall within three standard deviations

of the mean x 3 .s z=3

· These rules are commonly used to characterize the natural variation in manufacturing processes and other business phenomena.

Internal Estimation

Interval Estimation of the Population Mean (n>=30)

For any normally distributed random variable:

· 90% of the values lie within 1.645 standard errors of the mean • 95% of the values lie within 1.960 standard errors of the mean • 99% of the values lie within 2.576 standard errors of the mean

· Formula for a sample standard error:

What is ‘Standard Error of the Mean’

· The standard error of the mean is the standard deviation of the sampling distribution of the mean.

· So, the difference between SD and SE is:

Sampling Distribution of the Sample Mean

Sample 1

Sample 2

Sample 3

Sample 4

Sample 5

Sample 6

Sample 7

Sample 8

Sample 9

Sample 10

24

22

27

22

24

20

20

26

20

26

28

27

26

21

23

26

25

35

40

20

20

20

27

20

22

23

25

27

20

27

22

26

20

21

22

24

23

20

34

22

27

23

23

24

27

22

20

27

24

25

Mean

24

24

25

22

24

23

23

27

28

24

SD

3

3

3

2

2

2

3

5

9

3

19

19

Internal Estimation - Example

Sample 4

22

21

20

21

24

22

If we are ‘unlucky’ and we have obtained sample 4:

Sample Mean=22

Sample SD=2 n=5

SE0.89

If your confidence level is 68%, then the Z-score for the confidence interval should be: 1

UCL=Mean+z*SE=22+1*0.89=22.9

LCL=Mean+z*SE=22-1*0.89=21.1

Internal Estimation - Example

Sample 4

22

21

20

21

24

22

If we are ‘unlucky’ and we have obtained sample 4:

Sample Mean=22

Sample SD=2 n=5

SE0.89

If your confidence level is 90%, then the Z-score for the confidence interval should be: 1.65

UCL=Mean+z*SE=22+1.65*0.89=23.5

LCL=Mean+z*SE=22-1.65*0.89=20.5

Internal Estimation - Example

Sample 4

22

21

20

21

24

22

If we are ‘unlucky’ and we have obtained sample 4:

Sample Mean=22

Sample SD=2 n=5

SE0.89

If your confidence level is 95%, then the Z-score for the confidence interval should be: 1.96

UCL=Mean+z*SE=22+1.96*0.89=23.8

LCL=Mean+z*SE=22-1.96*0.89=20.2

Internal Estimation - Example

Sample 4

22

21

20

21

24

22

If we are ‘unlucky’ and we have obtained sample 4:

Sample Mean=22

Sample SD=2 n=5

SE0.89

If your confidence level is 99%, then the Z-score for the confidence interval should be: 2.58

UCL=Mean+z*SE=22+2.58*0.89=24.3

LCL=Mean+z*SE=22-2.58*0.89=19.7

Mean

2

SD

Does it work?

29

Mean

2

SD

Does it work?

29

Mean

2

SD

29

Thank you

19

19

19