WK3 DQ - Data Analysis & Business Intelligence
Sampling, Sampling Methods, and the Central Limit Theorem
Chapter 8
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
8-1
This chapter begins our study of sampling. Sampling is a process of selecting items from a population so we can use the information to make judgments or inferences about the population.
1
Learning Objectives
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
LO8-1 Explain why populations are sampled and describe four methods to sample a population
LO8-2 Define sampling error
LO8-3 Explain the sampling distribution of the sample mean.
LO8-4 Recite the central limit theorem and define the mean and standard error of the sampling distribution of the sample mean
LO8-5 Apply the central limit theorem to calculate probabilities
8-2
Reasons for Sampling a Population
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
To contact the whole populations would be time consuming.
The cost of studying all the items in a populations may be prohibitive.
The physical impossibility of checking all items in the population.
The destructive nature of some tests.
8-3
There are many practical reasons why we prefer to select portions or samples of a population to observe and measure when studying characteristics of a population.
3
Probability Sampling Methods
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Simple random sample - all members of the population have the same chance of being selected for the sample
Systematic sample - a random starting point is selected, and then every kth item thereafter is selected for the sample
Stratified sample - the population is divided into several groups, called strata, and then a random sample is selected from each stratum
Clustered sampling - the population is divided into primary units, then samples are drawn from the primary units
8-4
These are not the only methods of sampling available to a researcher. If you become involved in a major research project, consult books devoted to sample theory and sample design.
4
Simple Random Sampling
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
The most widely used method of sampling is a simple random sample
Example
There were 750 Major League Baseball players at the end of the 2016 season. A committee of 10 players is to be formed to study the issue of concussions. To make sure every player has an equal chance of being selected, write each name on a piece of paper, place the names in a box and mix them up, then draw 10 names.
SIMPLE RANDOM SAMPLE A sample selected so that each item or person in the population has the same chance of being selected.
8-5
In an unbiased sample, all members of the population have a chance of being selected for the sample. If the population is relatively small, one could place all the items of interest, say names, in a box and select the sample. For large samples like the one in this example, this is very time-consuming.
5
Using a Table of Random Numbers
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Suppose the population of interest is the 750 Major League Baseball players on the active rosters of the 30 teams at the end of the 2016 season. A committee of 10 players is to be formed to study the issue of concussions. To make sure every player has an equal chance of being selected, use a table of random numbers.
Prepare of list of all the players and number them 1 through 750
Randomly pick a starting place in the random number table
Select 10 three-digit numbers between 1 and 750
8-6
Other ways to select a sample are to use a table of random numbers like the one in Appendix B.4 or to use a statistical software program or Excel to select a simple random sample. This example shows how to select random numbers using a portion of a random number table. To choose a starting point, you could close your eyes and simply point at a number in the table. Another way is to randomly pick a column and a row. Suppose the time is 3:04; go to column three and down to row four. Here, the number is 03759, so we use the first three digits in the five digit number. To continue selecting, you could move in any direction, in this example, we move right. The next number is 44723, so we choose player 447, the next number is 961 and is too high so we skip it and we skip the next number, 784 because it is also too high. Then the next number is 18910, so we select player 189 and so on until we have a list of 10 players. The instructions on how to use Excel to select a random sample are in Appendix C.
6
Systematic Random Sampling
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
If you do not have a list of the entire population to begin with, you can use the systematic random sample
Example
Stood’s Grocery Store wants to study the length of time customers spend in their store
Randomly select the days of the week, the times, and the starting point of the study, then systematically select the customers and measure the time each spends in the store
SYSTEMATIC RANDOM SAMPLE A random starting point is selected, and then every kth member of the population is selected.
8-7
Caution: If the population is in some order already, like invoices arranged in increasing dollar amounts, the systematic procedure should not be used.
7
Stratified Random Sampling
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
When the population can be divided into groups based on some characteristic, use stratified random sampling
Example
A study of 50 of the 352 largest US firms’ ad spending
Begin by identifying the strata, then use random sampling within each group based on relative frequencies to collect the sample
STRATIFIED RANDOM SAMPLE A population is divided into subgroups, called strata, and a sample is randomly selected from each stratum.
8-8
This example of stratified random sampling groups the firms by profitability, measured by the percent return on equity. The advantage of stratified random sampling is that this method may more accurately reflect the population characteristic since at least one firm will be selected from each strata, even those with low numbers; which might not happen when using random sampling or systematic random sampling.
8
Cluster Sampling
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Cluster sampling is a common type of sampling, used to reduce the cost of sampling over large geographic areas
Example
Suppose we wish to sample residents of the 12 counties in the greater Chicago area about government policy. Randomly select 3 counties and then select a random sample of the residents in each of the 3 counties.
CLUSTER SAMPLING A population is divided into clusters using naturally occurring geographic or other boundaries. Then clusters are randomly selected and a sample is collected by randomly selecting from each cluster.
8-9
In this example, we wish to determine the views of residents in the greater Chicago area about state and federal environmental protection policies. Selecting a random sample of residents in the region and personally contacting each one would be time-consuming and expensive. Instead, let the counties serve as the primary unit and select a random sample of the counties and then each of the selected county’s residents. This is a combination of cluster sampling and simple random sampling.
9
Sampling Error
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
It is unlikely the mean of a sample will be exactly equal to the mean of the population
Example
The Foxtrot Inn’s number of rooms rented in June. The mean number of rooms rented, is 3.13
Taking three random samples of size 5, we find sample means, , of 3.80, 3.40 and 1.80. The sampling error is the difference between each
SAMPLING ERROR The difference between a sample statistic and its corresponding population parameter.
8-10
In this example, the population is the number of rooms rented each of the 30 days in the month of June. The sampling errors are 3.80 – 3.13 = .67; 3.40-3.13 = .27; 1.80 – 3.13 = -1.33. Notice, the sampling error may be a positive value or a negative value. Using the combination formula, we find there are 142,506 possible samples of size 5 possible with a population of 30 values. If you summed the sampling errors for all 142,506 samples, the result would equal 0. This is because the sample mean is an unbiased estimator of the population mean.
We can expect there to be sampling error between sample standard deviations and the corresponding population standard deviation as well.
10
Sampling Distribution of the Sample Mean
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
How can we determine how accurate the sample mean is?
For a given sample size, the mean of all possible sample means selected from a population is equal to the population mean
There is less variation in the distribution of the sample mean than in the population distribution
The sampling distribution of the sample mean tends to become bell-shaped
SAMPLING DISTRIBUTION OF THE SAMPLE MEAN A probability distribution of all possible sample means of a given sample size.
8-11
In the next few slides, we’ll use the sampling distribution of the sample mean to help explain how we can rely on sample estimates.
11
Sampling Distribution Example
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Tartus Industries has seven production employees (the population). The hourly earnings of each employee is given in the table.
What is the population mean?
What is the sampling distribution of the sample mean for samples of size 2?
What is the mean of the sampling distribution?
What observations can be made about the population and the sampling distribution?
8-12
Here is an example using a small population of just 7 to highlight the relationship between the population mean and the various sample means.
12
Sampling Distribution Example (2 of 5)
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Tartus Industries has seven production employees (the population). The hourly earnings of each employee is given in the table.
What is the population mean?
= $15.43
8-13
We learned how to calculate a population mean in chapter 3; it is designated with the Greek letter mu, .
13
Sampling Distribution Example (3 of 5)
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
What is the sampling distribution of the sample mean for samples of size 2?
8-14
This is Table 8-3 in the text and is a table of all samples of size 2 taken from the population of the 7 Tartus Industries production employees. There are 21 possible samples and the sample mean of hourly earnings has been calculated for each.
14
Sampling Distribution Example (4 of 5)
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Tartus Industries has seven production employees (the population).
3. What is the mean of the sampling distribution?
8-15
This is Table 8-4 in the textbook and is a probability distribution called the sampling distribution of the sample mean for n=2.
15
Sampling Distribution Example (5 of 5)
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
The mean of the distribution of the sample mean ($15.43) is equal to the mean of the population, =
The spread in the distribution of the sample mean is less than the spread in the population values
The shapes of the population and sample distributions are different
4. What observations can be made about the population and the sampling distribution?
8-16
The mean of the population is exactly equal to the mean of the sample means. The sample means range from $14 to $17 while the population values range from $14 to $18. The shape of the sampling distribution of the sample mean and the shape of the frequency distribution of the population values are different; as wee see in this chart (8-2), the distribution of the sample mean tends to be more bell-shaped and approximate the normal distribution.
16
Central Limit Theorem
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
If the population follows a normal probability distribution, then for any sample size the sampling distribution of the sample mean will also be normal
If the population distribution is symmetrical, you will see the normal shape of the distribution of the sample mean emerge with samples as small as 10
If the distribution is skewed or has thick tails, it may require samples of 30 or more to observe the normality feature
THE CENTRAL LIMIT THEOREM If samples of a particular size are selected from any population, the sampling distribution of the sample mean is approximately a normal distribution. The approximation improves with larger samples.
8-17
The Central Limit Theorem is one of the most useful conclusions in statistics and is true for all population distributions. This allows us to use the normal probability distribution to create confidence intervals for the population mean (chapter 9) and perform tests of hypothesis (chapter 10).
17
Central Limit Theorem Results
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
8-18
This is Chart 8-3 from the textbook; at the top of the chart there are 4 different population distributions. Following each of these downward, we observe that the sample distributions appear more normal as sample size increases. In other words, we observe the convergence to a normal distribution regardless of the shape of the population distribution.
18
Central Limit Theory Conclusions
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
The mean of the distribution of sample means will be exactly equal to the population mean, if we select all possible samples of same size from the population
=
The standard deviation of the sampling distribution of the sample mean is also called the standard error of the mean
There will be less dispersion in the sampling distribution of the sample mean, , than in the population
8-19
Even if we do not select all samples, we can expect the mean of the distribution of the samples means to be close to the population mean. If the population standard deviation is sigma, the standard deviation of the distribution of sample means is sigma divided by the square root of n. Note that when the sample size is increased, the standard error of the mean decreases.
19
Normal Distribution
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
If the population follows a normal distribution, the sampling distribution of the sample mean will also follow the normal distribution for samples of any size
If the population is not normally distributed, the sampling distribution of the sample mean will approach a normal distribution when the sample size is at least 30
Assume the population standard deviation is known
To determine the probability that a sample mean falls in a particular region, use the following formula:
8-20
Use this formula when the population standard deviation is known or is assumed to be known.
20
Using the Sampling Distribution Example
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
The Quality Assurance Dept. for Cola, Inc. maintains records regarding the amount of cola in its jumbo bottle. The actual amount of cola in each bottle varies a small amount from one bottle to another. Records indicate the amounts of cola follow the normal distribution, the mean amount of cola in the bottles is 31.2 ounces, and the standard deviation is 0.4 ounces. At 8 a.m. today, the quality technician randomly selected 16 bottles from the filling line. The mean amount was 31.38 ounces. Is this an unlikely result? Is it a likely the process is putting too much soda in the bottle? Is the sampling error of 0.18 ounce unusual?
z = = = 1.80
We conclude that it is unlikely; there is less than a 4% chance. The process is putting too much soda in the bottles.
8-21
First we find z, then we use the table in Appendix B.3 or statistical software to determine probability. In this example, we find that it is unlikely, less than a 4% chance, we could select a sample of 16 observations from a normal population with a mean of 31.2 ounces and a population standard deviation of 0.4 ounce and find the sample mean equal to or greater than 31.38 ounces. We conclude the process is putting too much cola in the bottles. The quality technician should see the production supervisor about reducing the amount of soda in each bottle.
21
Chapter 8 Practice Problems
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
8-22
Question 1
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
8-23
The following is a list of 24 Marco’s Pizza stores in Lucas County. The stores are identified by numbering them 00 through 23. Also noted is whether the store is corporate owned (C) or manager owned (M). A sample of four locations is to be selected and inspected for customer convenience, safety, cleanliness, and other features.
LO8-1
Question 1 (continued)
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
8-24
The random numbers selected are 08, 18, 11, 54, 02, 41, and 54. Which stores are selected?
Using a random number table (Appendix B.4) or statistical software, select your own sample of locations.
Using systematic random sampling, every seventh location is selected starting with the third store in the list. Which locations will be included in the sample?
Using stratified random sampling, select three locations. Two should be corporate owned and one should be manager owned.
LO8-1
Question 7
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
8-25
A population consists of the following five values: 12, 12, 14, 15, and 20.
List all samples of size 3, and compute the mean of each sample.
Compute the mean of the distribution of sample means and the population mean. Compare the two values.
Compare the dispersion in the population with that of the sample means.
LO8-2
Question 11
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
8-26
Appendix B.4 is a table of random numbers that are uniformly distributed. Hence, each digit from 0 to 9 has the same likelihood of occurrence.
Draw a graph showing the population distribution of random numbers. What is the population mean?
Following are the first 10 rows of five digits from the table of random numbers in Appendix B.4. Assume that these are 10 random samples of five values each. Determine the mean of each sample and plot the means on a chart similar to Chart 8–4. Compare the mean of the sampling distribution of the sample mean with the population mean.
LO8-3
Question 15
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
8-27
A normal population has a mean of $60 and standard deviation of $12. You select random samples of nine.
Apply the central limit theorem to describe the sampling distribution of the sample mean with n = 9. With the small sample size, what condition is necessary to apply the central limit theorem?
What is the standard error of the sampling distribution of sample means?
What is the probability that a sample mean is greater than $63?
What is the probability that a sample mean is less than $56?
What is the probability that a sample mean is between $56 and $63?
What is the probability that the sampling error (x¯ − μ) would be $9 or more? That is, what is the probability that the estimate of the population mean is less than $51 or more than $69?
LO8-5