Computer Base Module 3 - Reflection

profileshakithachase
Camm_4e_Ch06_PPT-Module3reflection2.pptx

Business Analytics

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Statistical Inference

Chapter 6

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Introduction (Slide 1 of 2)

A census collects data from every element in the population of interest.

Many potential difficulties associated with taking a census; it may be:

Expensive.

Time consuming.

Misleading.

Unnecessary.

Impractical.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Introduction (Slide 2 of 2)

Statistical inference uses sample data to make estimates of or draw conclusions about one or more characteristics of a population.

The sampled population is the population from which the sample is drawn.

A frame is a list of elements from which the sample will be selected.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

4

Selecting a Sample

Sampling from a Finite Population

Sampling from an Infinite Population

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Selecting a Sample (Slide 1 of 4)

Parameter: A measurable factor that defines a characteristic of a population, process, or system.

Sampling from a Finite Population:

Statisticians recommend selecting a probability sample when sampling from a finite population because a probability sample allows you to make valid statistical inferences about the population.

Simple Random Sample (Finite Population):

A simple random sample of size n from a finite population of size N is a sample selected such that each possible sample of size n has the same probability of being selected.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Selecting a Sample (Slide 2 of 4)

Figure 6.1: Using Excel to Select a Simple Random Sample

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Selecting a Sample (Slide 3 of 4)

Sampling from an Infinite Population:

With an infinite population, you cannot select a simple random sample because you cannot construct a frame consisting of all the elements.

Statisticians recommend selecting what is called a random sample.

Random Sample (Infinite Population):

A random sample of size n from an infinite population is a sample selected such that the following conditions are satisfied:

Each element selected comes from the same population.

Each element is selected independently.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Selecting a Sample (Slide 4 of 4)

Care and judgment must be implemented in the selection process for a random sample from an infinite population:

Each element selected comes from the same population.

Each element is selected independently.

Situations involving sampling from an infinite population are usually associated with a process that operates over time.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Point Estimation

Practical Advice

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Point Estimation (Slide 1 of 5)

To estimate the value of a population parameter, compute a corresponding characteristic of the sample—a sample statistic.

Using the data in Table 6.1:

The sample mean is:

The sample proportion is:

The sample standard deviation is:

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Meghan Cook (MC) - I took a screenshot for the first equation because the numbers changed.

Point Estimation (Slide 2 of 5)

Calculating sample mean, sample standard deviation, and sample proportion is called point estimation:

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Point Estimation (Slide 3 of 5)

Table 6.1: Annual Salary and Training Program Status for a Simple Random Sample of 30 EAI Employees

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Point Estimation (Slide 4 of 5)

Table 6.2: Summary of Point Estimates Obtained from a Simple Random Sample of 30 EAI Employees

The point estimates differ somewhat from the values of corresponding population parameters.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Point Estimation (Slide 5 of 5)

Practical Advice:

When making inferences, it is important to have a close correspondence between the sampled population and the target population:

Target population: Population about which we want to make inferences.

Sampled population: Population from which the sample is taken.

Good judgment is a necessary ingredient of sound statistical practice.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Sampling Distributions

Sampling Distribution of

Sampling Distribution of

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Sampling Distributions (Slide 1 of 18)

A random variable is a quantity whose values are not known with certainty.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Sampling Distributions (Slide 2 of 18)

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Sampling Distributions (Slide 3 of 18)

Mean Annual Salary ($) Frequency Relative Frequency
69,500.00–69,999.99 2 0.004
70,000.00–70,499.99 16 0.032
70,500.00–70,999.99 52 0.104
71,000.00–71,499.99 101 0.202
71,500.00–71,999.99 133 0.266
72,000.00–72,499.99 110 0.220
72,500.00–72,999.99 54 0.108
73,000.00–73,499.99 26 0.052
73,500.00–73,999.99 6 0.012
Totals: 500 1.000

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

19

Sampling Distributions (Slide 4 of 18)

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Sampling Distributions (Slide 5 of 18)

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Sampling Distributions (Slide 6 of 18)

Sampling distribution has:

An expected value or mean.

A standard deviation.

A characteristic shape or form.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Sampling Distributions (Slide 7 of 18)

When the expected value of a point estimator equals the population parameter, we say the point estimator is unbiased.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Sampling Distributions (Slide 8 of 18)

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Sampling Distributions (Slide 9 of 18)

Finite population correction factor:

In many practical sampling situations, the finite population correction factor is close to 1, so the difference between the values of the standard deviation for the finite and infinite populations is negligible.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Sampling Distributions (Slide 10 of 18)

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Sampling Distributions (Slide 11 of 18)

When the population has a normal distribution, the sampling

When the population does not have a normal distribution, the central limit theorem is helpful in identifying the shape of the sampling

Central limit theorem:

In selecting random samples of size n from a population, the sampling

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Sampling Distributions (Slide 12 of 18)

Figure 6.4: Illustration of the Central Limit Theorem for Three Populations

Top panel shows that none of the populations are normally distributed.

Bottom three panels show the shape of the sampling distribution for samples n = 2, n = 5, and n = 30.

General statistical practice is to assume that, for most applications, the sampling distribution can be approximated by normal distribution whenever the sample size is 30 or more.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Sampling Distributions (Slide 13 of 18)

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Sampling Distributions (Slide 14 of 18)

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Sampling Distributions (Slide 15 of 18)

The formula for computing the sample proportion is:

where

x = the number of elements in the sample that possess the characteristic of interest.

n = sample size.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Sampling Distributions (Slide 16 of 18)

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Sampling Distributions (Slide 17 of 18)

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Sampling Distributions (Slide 18 of 18)

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Interval Estimation

Interval Estimation of the Population Mean

Interval Estimation of the Population Proportion

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Interval Estimation (Slide 1 of 15)

Because a point estimator cannot be expected to provide the exact value of a population parameter, interval estimation is frequently used to generate an estimate of the value of a population parameter.

An interval estimate is often computed by adding and subtracting a value, called the margin of error, to the point estimate.

The general form of an interval estimate is:

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Interval Estimation (Slide 2 of 15)

Interval Estimation of the Population Mean:

An interval estimate provides information about how close the point estimate is to the value of the population parameter.

General form of an interval estimate of a population mean is:

General form of an interval estimate of a population proportion is:

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Interval Estimation (Slide 3 of 15)

Interval Estimation of the Population Mean (cont.):

For any normally distributed random variable:

90% of the values lie within 1.645 standard deviations of the mean.

95% of the values lie within 1.960 standard deviations of the mean.

99% of the values lie within 2.576 standard deviations of the mean.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Interval Estimation (Slide 4 of 15)

Figure 6.8: Sampling Distribution of the Sample Mean

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Interval Estimation (Slide 5 of 15)

If the sampling distribution follows a normal distribution, address this additional source of uncertainty by using a probability distribution known as the t distribution:

A family of similar probability distributions.

The shape of each specific one depends on a parameter referred to as the degrees of freedom.

Similar in shape to the standard normal distribution, but wider.

As the degrees of freedom increase, the t distribution narrows, its peak becomes higher, and it becomes more similar to the standard normal distribution.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Interval Estimation (Slide 6 of 15)

Figure 6.9: Comparison of the Standard Normal Distribution with t Distributions with 10 and 20 Degrees of Freedom

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Interval Estimation (Slide 7 of 15)

Figure 6.10: t Distribution with 29 Degrees of Freedom

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Interval Estimation (Slide 8 of 15)

Figure 6.11: Intervals Formed Around Sample Means from 10 Independent Random Samples

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Interval Estimation (Slide 9 of 15)

Because approximately 90% of all the intervals constructed will contain the population mean, we say that we are approximately 90% confident that the interval will include the population mean:

Say that the interval has been established at the 90% confidence level.

The value of 0.90 is referred to as the confidence coefficient.

The interval is called the 90% confidence interval.

The level of significance is the probability that the interval estimation procedure will generate an interval that does not contain the population mean:

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Interval Estimation (Slide 10 of 15)

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Interval Estimation (Slide 11 of 15)

Table 6.5: Credit Card Balances for a Sample of 70 Households

9,430 14,661 7,159 9,071 9,691 11,032
7,535 12,195 8,137 3,603 11,448 6,525
4,078 10,544 9,467 16,804 8,279 5,239
5,604 13,659 12,595 13,479 5,649 6,195
5,179 7,061 7,917 14,044 11,298 12,584
4,416 6,245 11,346 6,817 4,353 15,415
10,676 13,021 12,806 6,845 3,467 15,917
1,627 9,719 4,972 10,493 6,191 12,591
10,112 2,200 11,356 615 12,851 9,743
6,567 10,746 7,117 13,627 5,337 10,324
13,627 12,744 9,465 12,557 8,372
18,719 5,742 19,263 6,232 7,445

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Interval Estimation (Slide 12 of 15)

Figure 6.13: 95% Confidence Interval for Credit Card Balances

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Interval Estimation (Slide 13 of 15)

Interval Estimation of the Population Proportion:

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Interval Estimation (Slide 14 of 15)

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Interval Estimation (Slide 15 of 15)

Figure 6.15: 95% Confidence Interval for Survey of Women Golfers

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Hypothesis Tests

Developing Null and Alternative Hypothesis

Type I and Type II Errors

Hypothesis Test of the Population Mean

Hypothesis Test of the Population Proportion

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Hypothesis Tests (Slide 1 of 27)

The tentative conjecture is called the null hypothesis.

The opposite of what is stated in the null hypothesis is the alternative hypothesis.

The hypothesis testing procedure uses data from a sample to test the validity of the two competing statements about a population.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Hypothesis Tests (Slide 2 of 27)

Developing Null and Alternative Hypotheses:

Context of the situation is very important in determining how the hypotheses should be stated.

All hypothesis testing applications involve collecting a random sample and using the sample results to provide evidence for drawing a conclusion.

Ask:

What is the purpose of collecting the sample?

What conclusions are we hoping to make?

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Hypothesis Tests (Slide 3 of 27)

Developing Null and Alternative Hypotheses (cont.):

Many applications of hypothesis testing involve an attempt to gather evidence in support of a research hypothesis—best to begin with the alternative hypothesis and make it the conclusion that the researcher hopes to support.

Not all hypothesis tests involve research hypothesis:

Begin with a belief or a conjecture that a statement about the value of a population parameter is true.

Use a hypothesis test to challenge the conjecture and determine whether there is statistical evidence to conclude that the conjecture is incorrect.

Helpful to develop the null hypothesis first; the alternative hypothesis is that the belief or conjecture is incorrect.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Hypothesis Tests (Slide 4 of 27)

Developing Null and Alternative Hypotheses (cont.):

Depending upon the situation, hypothesis tests about a population parameter may take one of three forms:

Two use inequalities in the null hypothesis.

Third one uses an equality in the null hypothesis:

First two forms are called one-tailed tests.

Third form is called a two-tailed test.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Hypothesis Tests (Slide 5 of 27)

Type I and Type II Errors:

Table 6.6: Errors and Correct Conclusions in Hypothesis Testing

Population Condition
H0 True Ha True
Conclusion Do Not Reject H0 Correct conclusion Type II error
Reject H0 Type I error Correct conclusion

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Hypothesis Tests (Slide 6 of 27)

Type I and Type II Errors (cont.):

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Hypothesis Tests (Slide 7 of 27)

Level of Significance:

The level of significance is the probability of making a Type I error when the null hypothesis is true as an equality.

The person responsible for the hypothesis test specifies the level of significance and the probability of making a Type I error.

Applications of hypothesis testing that only control the Type I error are called significance tests.

Most applications of hypothesis testing control the probability of making a Type I error; they do not always control the probability of making a Type II error.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Hypothesis Tests (Slide 8 of 27)

Hypothesis Test of the Population Mean:

One tailed tests about a population mean take one of the following forms:

Develop the null and alternative hypothesis for the test.

Specify the level of significance for the test.

Collect the sample data and compute the value of what is called a test statistic.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Hypothesis Tests (Slide 9 of 27)

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Hypothesis Tests (Slide 10 of 27)

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Hypothesis Tests (Slide 11 of 27)

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Hypothesis Tests (Slide 12 of 27)

The key question for a lower-tail test is, How small must the test statistic t be before we choose to reject the null hypothesis?

P Value:

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Hypothesis Tests (Slide 13 of 27)

Figure 6.18: Hypothesis Test about a Population Mean

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Hypothesis Tests (Slide 14 of 27)

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Hypothesis Tests (Slide 15 of 27)

Different decision makers may express different opinions concerning the cost of making a Type I error and may choose a different level of significance.

Providing the p value as part of the hypothesis testing results allows decision makers to compare the reported p value to his or her own level of significance.

The level of significance indicates the strength of evidence that is needed in the sample data before rejection of the null hypothesis.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Hypothesis Tests (Slide 16 of 27)

For an upper-tail test, the p value is the probability of obtaining a value for the test statistic as large as or larger than that provided by the sample.

Computation of p Values for One-Tailed Tests:

1. Compute the value of the test statistic using equation (6.11).

2. Lower-tail test: Using the t distribution, compute the probability that t is less than or equal to the value of the test statistic (area in the lower tail).

3. Upper-tail test: Using the t distribution, compute the probability that t is greater than or equal to the value of the test statistic (area in the upper tail).

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Hypothesis Tests (Slide 17 of 27)

In hypothesis testing, the general form for a two-tailed test about population mean is:

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Hypothesis Tests (Slide 18 of 27)

Figure 6.20: p Value for the Holiday Toys Two-Tailed Hypothesis Test

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Hypothesis Tests (Slide 19 of 27)

Figure 6.21: Two-Tailed Hypothesis Test for Holiday Toys

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Hypothesis Tests (Slide 20 of 27)

Computation of p Values for Two-Tailed Tests:

1. Compute the value of the test statistic using equation (6.11).

2. If the value of the test statistic is in the upper tail, compute the probability that t is greater than or equal to the value of the test statistic (the upper-tail area). If the value of the test statistic is in the lower tail, compute the probability that t is less than or equal to the value of the test statistic (the lower-tail area).

3. Double the probability (or tail area) from step 2 to obtain the p value.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Hypothesis Tests (Slide 21 of 27)

Table 6.7: Summary of Hypothesis Tests About a Population Mean

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Hypothesis Tests (Slide 22 of 27)

Steps of Hypothesis Testing:

Step 1. Develop the null and alternative hypotheses.

Step 2. Specify the level of significance.

Step 3. Collect the sample data and compute the value of the test statistic.

Step 4. Use the value of the test statistic to compute the p value.

Step 5. Reject

Step 6. Interpret the statistical conclusion in the context of the application.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Hypothesis Tests (Slide 23 of 27)

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Hypothesis Tests (Slide 24 of 27)

Hypothesis Test of the Population Proportion:

The three forms for a hypothesis test about a population proportion are:

The first form is called a lower-tail test.

The second form is called an upper-tail test.

The third form is called a two-tailed test.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Hypothesis Tests (Slide 25 of 27)

Figure 6.22: Calculation of the p Value for the Pine Creek Hypothesis Test

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Hypothesis Tests (Slide 26 of 27)

Figure 6.23: Hypothesis Test for Pine Creek Golf Course

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Hypothesis Tests (Slide 27 of 27)

Table 6.8: Summary of Hypothesis Tests About a Population Proportion

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Big Data, Statistical Inference, and Practical Significance

Sampling Error

Nonsampling Error

Big Data

Understanding What Big Data Is

Big Data and Sampling Error

Big Data and the Precision of Confidence Intervals

Implications of Big Data for Confidence Intervals

Big Data, Hypothesis Testing, and p Values

Implications of Big Data in Hypothesis Testing

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Big Data, Statistical Inference, and Practical Significance (Slide 1 of 19)

Sampling Error:

Sampling error is the deviation of the sample from the population that results from random sampling.

If repeated independent random samples of the same size are collected from the population of interest using probability sampling techniques, on average the samples will be representative of the population.

The random collection of sample data does not ensure that any single sample will be perfectly representative of the population of interest.

Sampling error is unavoidable when collecting a random sample.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Big Data, Statistical Inference, and Practical Significance (Slide 2 of 19)

Nonsampling Error:

Deviations of the sample from the population that occur for reasons other than random sampling are referred to as nonsampling error:

If the research objective and the population from which a sample is to be drawn are not aligned, the data collected will not help accomplish the research objective; this type of error is referred to as a coverage error.

Even when the sample is taken from the appropriate population, nonsampling error can occur when segments of the target population are systematically underrepresented or overrepresented in the sample; this type of error is referred to as a nonresponse error.

A measurement error is an incorrect measurement of the characteristic of interest.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Big Data, Statistical Inference, and Practical Significance (Slide 3 of 19)

Nonsampling Error (cont.):

Care must be taken to minimize the introduction of nonsampling error:

Carefully define the target population.

Carefully design the data collection process and train the data collectors.

Pretest the data collection procedure to identify potential sources of error.

Use stratified random sampling when population-level information about an important qualitative variable is available.

Use cluster sampling when the population can be divided into heterogeneous subgroups or clusters.

Use systematic sampling when population-level information about an important quantitative variable is available.

Recognize that every random sample will suffer from some degree of sampling error.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Big Data, Statistical Inference, and Practical Significance (Slide 4 of 19)

Big Data:

Recent estimates state that approximately 2.5 quintillion bytes of data are created worldwide each day.

The data sets that are generated are so large or complex that current data processing capacity and/or analytic methods are not adequate for analyzing the data; these are examples of big data.

Sources of big data: sensors, mobile devices, Internet activities, digital processes, and social media.

A few years ago, a petabyte of data seemed almost unimaginably large, but we now routinely describe data in terms of yottabytes.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Big Data, Statistical Inference, and Practical Significance (Slide 5 of 19)

Table 6.9: Terminology for Describing the Size of Data Sets

Number of Bytes Metric Name
10001 kB kilobyte
10002 MB megabyte
10003 GB gigabyte
10004 TB terabyte
10005 PB petabyte
10006 EB exabyte
10007 ZB zettabyte
10008 YB yottabyte

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Big Data, Statistical Inference, and Practical Significance (Slide 6 of 19)

Understanding What Big Data Is:

The processes that generate big data can be described by four attributes or dimensions:

Volume—the amount of data generated.

Variety—the diversity in types and structures of data generated.

Veracity—the reliability of the data generated.

Velocity—the speed at which the data are generated.

Big data can be tall data: A data set that has so many observations that traditional statistical inference has little meaning.

Big data can also be wide data: A data set that has so many variables that simultaneous consideration of all variables is infeasible.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Big Data, Statistical Inference, and Practical Significance (Slide 7 of 19)

Big Data and Sampling Error:

Table 6.10 shows how the standard error of the sampling distribution of the sample mean time spent by individual customers when they visit PDT’s web site decreases as the sample size increases.

Table 6.11 shows how the standard error of the sampling distribution of the proportion of the sample that clicked on any of the ads featured on PenningtonDailyTimes.com decreases as the sample size increases.

In both Table 6.10 and Table 6.11, the standard error when n = 1,000,000,000 is one ten-thousandth of the standard error when n = 10.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Big Data, Statistical Inference, and Practical Significance (Slide 8 of 19)

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Big Data, Statistical Inference, and Practical Significance (Slide 9 of 19)

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Big Data, Statistical Inference, and Practical Significance (Slide 10 of 19)

Big Data and the Precision of Confidence Intervals:

Confidence intervals are powerful tools for making inferences about population parameters, but the validity of any interval estimate depends on the quality of the data used to develop the interval estimate.

Confidence intervals for the population mean and population proportion become more narrow as the size of the sample increases.

The potential sampling error also decreases as the sample size increases.

Table 6.12 and Table 6.13 show that at a given confidence level, the margins of error decrease as the sample sizes increase.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Big Data, Statistical Inference, and Practical Significance (Slide 11 of 19)

Table 6.12: Margin of Error for Interval Estimates of the Population Mean at the 95% Confidence Level for Various Sample Sizes n

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Big Data, Statistical Inference, and Practical Significance (Slide 12 of 19)

Table 6.13: Margin of Error for Interval Estimates of the Population Mean at the 95% Confidence Level for Various Sample Sizes n

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Big Data, Statistical Inference, and Practical Significance (Slide 13 of 19)

Implications of Big Data for Confidence Intervals:

There are three possible reasons that a sample mean differs from last year’s population mean:

Sampling error.

Nonsampling error.

The population mean has changed since last year.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Big Data, Statistical Inference, and Practical Significance (Slide 14 of 19)

Implications of Big Data for Confidence Intervals (cont.):

If the sample has provided reliable evidence and the population mean has changed since last year, you must still consider the potential impact of the difference between the sample mean and the mean from last year. Example:

If a 0.1 second difference has a consequential effect on what PDT can charge for advertising on its site, this result could have practical business implications for PDT.

Otherwise, there may be no practical significance of the 0.1 second difference.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Big Data, Statistical Inference, and Practical Significance (Slide 15 of 19)

Implications of Big Data for Confidence Intervals (cont.):

Confidence intervals are extremely useful, but only effective when properly applied:

Interval estimates become increasingly precise as the sample size increases; extremely large samples will yield extremely precise estimates.

No interval estimate, no matter how precise, will accurately reflect the parameter being estimated unless the sample is relatively free of nonsampling error.

When using interval estimation, it is always important to carefully consider whether a random sample of the population of interest has been taken.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Big Data, Statistical Inference, and Practical Significance (Slide 16 of 19)

Big Data, Hypothesis Testing, and p Values:

The standard error of the associated sampling distributions decrease as the sample size increases.

In Tables 6.14 and 6.15, the p value associated with a given difference between a point estimate and a hypothesized value of a parameter decreases as the sample size increases.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Big Data, Statistical Inference, and Practical Significance (Slide 17 of 19)

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Big Data, Statistical Inference, and Practical Significance (Slide 18 of 19)

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Big Data, Statistical Inference, and Practical Significance (Slide 19 of 19)

Implications of Big Data in Hypothesis Testing:

The results of any hypothesis test, no matter the sample size, are only reliable if the sample is relatively free of nonsampling error.

If nonsampling error is introduced in the data collection process, the likelihood of making a Type I or Type II error may be higher than if the sample data are free of nonsampling error.

When testing a hypothesis, it is always important to think carefully about whether a random sample of the population of interest has been taken.

No business decision should be based solely on statistical inference; practical significance should always be considered in conjunction with statistical significance.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

End of Chapter 6

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

(

)

2

325,009,260

$3,384

129

i

xx

s

n

S-

===

-

19

0.63

30

x

p

n

===

m

s

The sample mean is the point estimator

of the population mean .

The sample standard deviation is the po

int estimator of the population

standard deviation .

The sample proportion is the point est

x

s

p

imator of the population proportion .

p

The numerical value obtained for , , or

is called the point estimate.

xsp

x

p

Because the various possible values of

are the result of different

simple random samples, the probability d

istribution of is called

the sampling distribution of .

x

x

x

m

Knowledge of the sample distribution and

its properties enables us

to make probability statements about how

close the sample mean

is to the population mean .

x

Table 6.3: Values of and from 500 Simp

le Random Samples of

30 EAI Employees

xp

Table 6.4: Frequency

and Relative

Frequency

Distributions of

from 500

Simple Random

Samples of 30

EAI Employees

x

Figure 6.2: Relative Frequency

Histogram of

Values from 500

Simple Random

Samples of Size 30 Each

x

Figure 6.3: Relative

Frequency

Histogram of

Values from 500

Simple Random

Samples of Size 30

Each

p

Sampling distribution of :

x

The sampling distribution of is the pro

bability distribution of all

possible values of the sample mean .

x

x

s

=

The formula for the standard deviation o

f depends on whether the population

is finite or infinite.

Using the following notation:

the standard deviation of , or the st

andard error of the mean.

x

x

x

s

=

=

=

the standard deviation of the population

.

the sample size.

the population size.

n

N

1

Nn

N

-

-

===

3,348

Estimated standard error: 611.3.

30

x

s

s

n

s

s

===

4,000

True standard error: 730.3.

30

x

n

s

The difference between and is due to s

ampling error.

xx

s

distribution of is normally distributed

for any sample size.

x

distribution of .

x

distribution of the sample mean can be

approximated by a normal

distribution as the sample size becomes

large.

x

Figure 6.5:

Sampling

Distribution of

for the Mean

Annual Salary of a

Simple Random

Sample of 30 EAI

Employees

x

==

Figure 6.6: A Comparison of the Sampling

Distributions of

for Simple Random Samples of 30 and 100

EAI Employees.

x

nn

Sampling Distribution of :

p

The sample proportion is the point esti

mator of the population

proportion .

p

p

x

p

n

=

Sampling distribution of : The sampling

distribution of is the probability

distribution of all possible values of t

he sample proportion .

pp

p

Figure 6.7: Sampling

Distribution of

for the Proportion of

EAI Employees Who

Participated in the

Management

Training Program

p

(

)

The sampling distribution of can be app

roximated by a normal

distribution whenever 5 and 15.

p

npnp

³-³

Point estimate Margin of error

±

Margin of error

x

±

Margin of error

p

±

±

Use Excel's T.INV.2T function to find th

e value from a distribution such that a

given

percentage of the distribution is includ

ed in the interval for any degrees of f

reedom.

t

t

level of significance 1 confidence coef

ficient

a

==-

a

Figure 6.12:

Distribution with 2

Area or Probability in

the Upper Tail

t

The sampling distribution of plays a ke

y role in computing the margin of

error in the interval estimate.

p

Figure 6.14: Normal Approximation of the

Sampling Distribution of

p

00

:

H

mm

=

0

:

a

H

mm

¹

00

:

H

mm

³

0

:

a

H

mm

<

00

:

H

mm

£

0

:

a

H

mm

>

0

First row shows what can happen if the c

onclusion is to accept :

H

0

0

If is true, this conclusion is correct.

If is true, we made a Type II error (ac

cepted when it is false).

a

H

HH

0

Second row shows what can happen if the

conclusion is to reject :

H

00

0

If is true, we made a Type I error (rej

ected when it is true).

If is true, rejecting is correct.

a

HH

HH

mm

mm

³

<

00

0

Lower-Tail Test

:

:

a

H

H

mm

mm

£

>

00

0

Upper-Tail Test

:

:

a

H

H

Figure 6.16: Sampling Distribution of f

or the Hilltop Coffee Study

When the Null Hypothesis Is True as an E

quality (3)

x

m

=

Figure 6.17: Lower-Tail Probability for

3 from a Distribution

with 35 Degrees of Freedom

tt

=-

m

Use the -distributed random variable as

a test statistic to

determine whether deviates from the hyp

othesized value of

enough to justify rejecting the null hyp

othesis.

tt

x

0

A value is the probability, assuming th

at is true, of obtaining a

random sample of size that results in a

test statistic at least as

extreme as the one observed in the curre

nt sample.

pH

n

==

Figure 6.19: Value

for the Hilltop

Coffee Study When

2.92 and 0.17

p

xs

00

0

:

:

a

H

H

mm

mm

=

¹

0

if the .

Hp

a

£

00

:

Hpp

=

0

:

a

Hpp

¹

00

:

Hpp

³

0

:

a

Hpp

<

00

:

Hpp

£

0

:

a

Hpp

>

=

Table 6.10: Standard Error of the Sample

Mean When 20 at Various

Sample Sizes

xs

n

=

Table 6.11: Standard Error of the Sample

Proportion When 0.51 at

Various Sample Sizes

pp

n

m

0

When the sample size is very large, alm

ost any difference between the

sample mean and the hypothesized popula

tion mean results in

rejection of the null hypothesis.

n

x

m

£=

0

Table 6.14: Values of the Test Statistic

and the Values for the Test of the Nu

ll

Hypothesis : 84 and Sample Mean 84.1 Sec

onds for Various

Sample Sizes

tp

Hx

n

£=

0

Table 6.15: Values of the Test Statistic

and the Values for the Test of the Nu

ll

Hypothesis : .50 and Sample Mean .51 for

Various Sample Sizes

zp

Hppn