help with case

orangepink

Week3-SamplingDistributionandConfidenceInterval.docx

Home >Business & Finance homework help >Accounting homework help >help with case

BUSMGT 712 Business

Analytics

Week 3

–

Sampling Distribution & Confidence Interval

Agenda

· Sampling

· Sampling Error

· Sampling Distribution

· Confidence Interval

Sampling,

Sampling Error,

Sampling Distribution

What is ‘Sampling’ and Why?

· Sampling – The process by which members of a population are selected for a sample.

· Random sampling - A sampling method to gather a representative sample of the population data.

· The purpose of sampling is to obtain sufficient information to draw a valid inference about a population.

Estimating Population Parameters

27	20	24	25	26	22	22
26	27	24	21	22	20	22
27	22	20	27	26	27	27
20	25	20	27	27	27	20
23	22	27	26	25	27	26
26	21	24	27	21	21	23
35	25	26	22	25	27	24
27	20	22	22	27	23	22
20	20	20	27	21	21	20
27	22	27	20	24	24	25
27	26	21	21	25	23	25
24	24	24	22	20	23	23
20	20	26	23	27	24	20
40	23	20	26	24	22
20	22	21	27	26	22
34	21	26	27	24	20
24	23	26	20	20	21
24	24	20	24	27	24
25	22	23	24	22	24
26	26	25	24	27	23

· For example, if we define Cohort 14 as a population which has 133 students.

· The age of every student is on the right-hand side.

· Population Mean=24

· Population Standard Deviation=3

Estimating Population Parameters

· In most cases, it is very hard to get data for a population.

· So, our dataset is more likely for a sample.

· Let’s do an experiment and randomly select 5 students from Cohort 14 and ask for their ages.

https://hotcubator.com.au/research/acomplete-guide-to-sampling-techniques/

Estimating Population Parameters

· For this random sample:

· Sample Mean = 22

· Sample Standard Deviation=1

Recall that

· True Population Mean=24

· True Population Standard Deviation=3

Estimating Population Parameters

· Some definitions that you have to know:

· Estimation involves assessing the value of an unknown population parameter using sample data.

· Estimators are the measures used to estimate population parameters. – E.g., sample mean, sample variance, sample proportion

· A point estimate is a single number derived from sample data that is used to estimate the value of a population parameter.

Estimating Population Parameters

· So, this is a case of point estimation.

· The sample mean and standard deviation are called ‘estimator’.

· Sample Mean = 22

· Sample Standard Deviation=1

Recall that

· True Population Mean=24

· True Population Standard Deviation=3

· Can this sample perfectly predict the population mean???

Estimating Population Parameters

· Now, I randomly select 5 students from the population again:

· Sample Mean = 24

· Sample Standard Deviation=3

Recall that

· True Population Mean=24

· True Population Standard Deviation=3

· This sample seems to be a ‘better’ random sample.

Estimating Population Parameters

	Sample 1	Sample 2	Sample 3	Sample 4	Sample 5	Sample 6	Sample 7	Sample 8	Sample 9	Sample 10
	24	22	27	22	24	20	20	26	20	26
	28	27	26	21	23	26	25	35	40	20
	20	20	27	20	22	23	25	27	20	27
	22	26	20	21	22	24	23	20	34	22
	27	23	23	24	27	22	20	27	24	25
Mean	24	24	25	22	24	23	23	27	28	24
SD	3	3	3	2	2	2	3	5	9	3

• This time, I have randomly select 5 students from Cohort 14 for 10 times, and I can obtain the ‘mean’ and ‘SD’ for each sample.

Estimating Population Parameters

	Sample 1	Sample 2	Sample 3	Sample 4	Sample 5	Sample 6	Sample 7	Sample 8	Sample 9	Sample 10
	24	22	27	22	24	20	20	26	20	26
	28	27	26	21	23	26	25	35	40	20
	20	20	27	20	22	23	25	27	20	27
	22	26	20	21	22	24	23	20	34	22
	27	23	23	24	27	22	20	27	24	25
Mean	24	24	25	22	24	23	23	27	28	24
SD	3	3	3	2	2	2	3	5	9	3

· The sample statistics (e.g. mean, s.d.) differ from sample to sample

· Among, these samples, sample 1, 2, and 10 provide perfect estimation of population mean and s.d.

· The variation in the sample also quite differ and so we can observe some ‘big’ and ‘small’ s.d.s (e.g. for sample 9, the s.d.=9; and for sample 4, s.d.=1)

Sampling Error

	Sample 1	Sample 2	Sample 3	Sample 4	Sample 5	Sample 6	Sample 7	Sample 8	Sample 9	Sample 10
	24	22	27	22	24	20	20	26	20	26
	28	27	26	21	23	26	25	35	40	20
	20	20	27	20	22	23	25	27	20	27
	22	26	20	21	22	24	23	20	34	22
	27	23	23	24	27	22	20	27	24	25
Mean	24	24	25	22	24	23	23	27	28	24
SD	3	3	3	2	2	2	3	5	9	3

· Sampling Error: The variation that occurs due to selecting a single sample from the population.

· It is primarily based on the variation in the population itself and on the size of the sample selected. Larger sample have less sampling error, but more costly to take.

Estimating Population Parameters

	Sample 1	Sample 2	Sample 3	Sample 4	Sample 5	Sample 6	Sample 7	Sample 8	Sample 9	Sample 10
	20	23	27	25	24	27	23	26	26	24
	25	25	21	27	26	22	25	23	27	21
	22	24	20	20	24	20	25	24	21	20
	25	20	27	20	22	21	27	20	27	27
	27	23	22	20	23	20	20	26	22	22
	26	27	20	20	23	35	20	20	25	25
	24	20	26	25	25	24	25	20	24	26
	20	21	26	27	24	26	26	27	26	26
	24	22	22	22	22	27	21	22	24	27
	21	40	24	22	24	27	26	23	23	24
	21	27	27	27	26	24	22	27	27	23
	24	26	27	27	27	27	21	21	21	27
	23	27	25	26	27	23	21	27	23	25
	27	24	20	26	23	21	27	25	20	24
	24	27	40	24	27	23	24	22	26	27
	23	23	27	27	22	20	24	26	25	20
	20	21	21	24	25	27	21	27	25	24
	24	22	21	27	25	23	24	22	35	27
	20	26	22	25	20	21	22	25	21	25
	21	24	24	24	27	26	20	27	21	40
Mean	23	25	24	24	24	24	23	24	24	25
S.D.	2	4	5	3	2	4	2	3	3	4

• When we increase the sample size, it seems like the sampling error is less.

Sampling Distribution

	Sample 1	Sample 2	Sample 3	Sample 4	Sample 5	Sample 6	Sample 7	Sample 8	Sample 9	Sample 10
	24	22	27	22	24	20	20	26	20	26
	28	27	26	21	23	26	25	35	40	20
	20	20	27	20	22	23	25	27	20	27
	22	26	20	21	22	24	23	20	34	22
	27	23	23	24	27	22	20	27	24	25
Mean	24	24	25	22	24	23	23	27	28	24
SD	3	3	3	2	2	2	3	5	9	3

• If you create a histogram for all the samples’ mean, you will find in this case, it is right-skewed.

Sampling Distribution

	Sample 1	Sample 2	Sample 3	Sample 4	Sample 5	Sample 6	Sample 7	Sample 8	Sample 9	Sample 10
	20	23	27	25	24	27	23	26	26	24
	25	25	21	27	26	22	25	23	27	21
	22	24	20	20	24	20	25	24	21	20
	25	20	27	20	22	21	27	20	27	27
	27	23	22	20	23	20	20	26	22	22
	26	27	20	20	23	35	20	20	25	25
	24	20	26	25	25	24	25	20	24	26
	20	21	26	27	24	26	26	27	26	26
	24	22	22	22	22	27	21	22	24	27
	21	40	24	22	24	27	26	23	23	24
	21	27	27	27	26	24	22	27	27	23
	24	26	27	27	27	27	21	21	21	27
	23	27	25	26	27	23	21	27	23	25
	27	24	20	26	23	21	27	25	20	24
	24	27	40	24	27	23	24	22	26	27
	23	23	27	27	22	20	24	26	25	20
	20	21	21	24	25	27	21	27	25	24
	24	22	21	27	25	23	24	22	35	27
	20	26	22	25	20	21	22	25	21	25
	21	24	24	24	27	26	20	27	21	40
Mean	23	25	24	24	24	24	23	24	24	25
S.D.	2	4	5	3	2	4	2	3	3	4

• However, in this case, when the sample size increased, the distribution of 20 sample means is quite like symmetric ‘bell’ shaped (normal distribution).

Sampling Distribution

• Sampling Distribution – The probability distribution of a sample statistic for all possible samples of a given size n for a given population.

n=5 n=20

Data Distribution vs. Sampling Distribution

Population Distribution (𝑥𝑖) Sampling Distribution (𝑥ҧ) Sampling Distribution (𝑥ҧ)

N=133 n=5 n=20

Central Limit Theorem

· When the population has a normal distribution, the sampling distribution of is normally distributed for any sample size x

· When the population does not have a normal distribution, the central limit theorem is helpful in identifying the shape of the sampling distribution of x

· Central Limit Theorem: In selecting random samples of size n from a population, the sampling distribution of the sample mean can be x approximated by a normal distribution as the sample size becomes large.

Central Limit Theorem

· Left panel shows that none of the populations are normally distributed

· Bottom three panels show the shape of the sampling distribution for samples n=2, n=5, and n=30

· General statistical practice is to assume that, for most applications, the sampling distribution can be approximated by normal distribution whenever the sample size is 30 or more

A Comparison of the Sampling Distributions of for Simple x Random Samples of n=20 and n=50 Students

Confidence Interval

Interval Estimation

· Because a point estimator (e.g. Sample Mean) CANNOT be expected to provide the exact value of a population parameter,

· Therefore, interval estimation is frequently used to generate an estimate of the value of a population parameter

· An interval estimate is often computer by adding and subtracting a value, called the margin of error, to the point estimate

· The general form of an interval estimate is:

Point estimate  Margin of error

Interval Estimation

· However, the question is, how to decide the size for ‘Margin of Error’???

· Now, we have to rely on the ‘Empirical Rules’

Interval Estimation - Empirical Rules

· For many data sets encountered in practice:

· Approximately 68% of the observations fall within one

standard deviation of the mean x −s and x +s z=1

· Approximately 95% fall within two standard deviations of

the mean x  2 .s z=2

· Approximately 99.7% fall within three standard deviations

of the mean x 3 .s z=3

· These rules are commonly used to characterize the natural variation in manufacturing processes and other business phenomena.

Internal Estimation

Interval Estimation of the Population Mean (n>=30)

For any normally distributed random variable:

· 90% of the values lie within 1.645 standard errors of the mean • 95% of the values lie within 1.960 standard errors of the mean • 99% of the values lie within 2.576 standard errors of the mean

· Formula for a sample standard error:

What is ‘Standard Error of the Mean’

· The standard error of the mean is the standard deviation of the sampling distribution of the mean.

· So, the difference between SD and SE is:

Sampling Distribution of the Sample Mean

	Sample 1	Sample 2	Sample 3	Sample 4	Sample 5	Sample 6	Sample 7	Sample 8	Sample 9	Sample 10
	24	22	27	22	24	20	20	26	20	26
	28	27	26	21	23	26	25	35	40	20
	20	20	27	20	22	23	25	27	20	27
	22	26	20	21	22	24	23	20	34	22
	27	23	23	24	27	22	20	27	24	25
Mean	24	24	25	22	24	23	23	27	28	24
SD	3	3	3	2	2	2	3	5	9	3

Internal Estimation - Example

	Sample 4
	22
	21
	20
	21
	24
	22

If we are ‘unlucky’ and we have obtained sample 4:

Sample Mean=22

Sample SD=2 n=5

SE0.89

If your confidence level is 68%, then the Z-score for the confidence interval should be: 1

UCL=Mean+z*SE=22+1*0.89=22.9

LCL=Mean+z*SE=22-1*0.89=21.1

Internal Estimation - Example

	Sample 4
	22
	21
	20
	21
	24
	22

If we are ‘unlucky’ and we have obtained sample 4:

Sample Mean=22

Sample SD=2 n=5

SE0.89

If your confidence level is 90%, then the Z-score for the confidence interval should be: 1.65

UCL=Mean+z*SE=22+1.65*0.89=23.5

LCL=Mean+z*SE=22-1.65*0.89=20.5

Internal Estimation - Example

	Sample 4
	22
	21
	20
	21
	24
	22

If we are ‘unlucky’ and we have obtained sample 4:

Sample Mean=22

Sample SD=2 n=5

SE0.89

If your confidence level is 95%, then the Z-score for the confidence interval should be: 1.96

UCL=Mean+z*SE=22+1.96*0.89=23.8

LCL=Mean+z*SE=22-1.96*0.89=20.2

Internal Estimation - Example

	Sample 4
	22
	21
	20
	21
	24
	22

If we are ‘unlucky’ and we have obtained sample 4:

Sample Mean=22

Sample SD=2 n=5

SE0.89

If your confidence level is 99%, then the Z-score for the confidence interval should be: 2.58

UCL=Mean+z*SE=22+2.58*0.89=24.3

LCL=Mean+z*SE=22-2.58*0.89=19.7

Mean

Does it work?

Mean

Does it work?

Mean

Thank you

27	20	24	25	26	22	22
26	27	24	21	22	20	22
27	22	20	27	26	27	27
20	25	20	27	27	27	20
23	22	27	26	25	27	26
26	21	24	27	21	21	23
35	25	26	22	25	27	24
27	20	22	22	27	23	22
20	20	20	27	21	21	20
27	22	27	20	24	24	25
27	26	21	21	25	23	25
24	24	24	22	20	23	23
20	20	26	23	27	24	20
40	23	20	26	24	22
20	22	21	27	26	22
34	21	26	27	24	20
24	23	26	20	20	21
24	24	20	24	27	24
25	22	23	24	22	24
26	26	25	24	27	23

27	20	24	25	26	22	22
26	27	24	21	22	20	22
27	22	20	27	26	27	27
20	25	20	27	27	27	20
23	22	27	26	25	27	26
26	21	24	27	21	21	23
35	25	26	22	25	27	24
27	20	22	22	27	23	22
20	20	20	27	21	21	20
27	22	27	20	24	24	25
27	26	21	21	25	23	25
24	24	24	22	20	23	23
20	20	26	23	27	24	20
40	23	20	26	24	22
20	22	21	27	26	22
34	21	26	27	24	20
24	23	26	20	20	21
24	24	20	24	27	24
25	22	23	24	22	24
26	26	25	24	27	23

27	20	24	25	26	22	22
26	27	24	21	22	20	22
27	22	20	27	26	27	27
20	25	20	27	27	27	20
23	22	27	26	25	27	26
26	21	24	27	21	21	23
35	25	26	22	25	27	24
27	20	22	22	27	23	22
20	20	20	27	21	21	20
27	22	27	20	24	24	25
27	26	21	21	25	23	25
24	24	24	22	20	23	23
20	20	26	23	27	24	20
40	23	20	26	24	22
20	22	21	27	26	22
34	21	26	27	24	20
24	23	26	20	20	21
24	24	20	24	27	24
25	22	23	24	22	24
26	26	25	24	27	23