Statistics questions

angelface
MTH245Lesson17Notes-1.pdf

MTH 245 Lesson 17 Notes Sampling Distributions

Sample statistics (such as �̅�𝑥) are themselves random variables. The reason for this is that no two samples of the same size 𝑛𝑛 from a given population will contain the exact same set of data values, except by (extremely rare) coincidence. Since the samples are chosen randomly, their statistics will not necessarily be equal due to random variation.

Since sample statistics are random variables, they have their own probability distributions, which are called sampling distributions. A sampling distribution defines the probabilities associated with values of a statistic for all possible samples of size 𝑛𝑛 drawn from the same population.

Theoretically, a sampling distribution for a particular statistic can be constructed by randomly selecting all possible samples of size n from the population, calculating the statistic value for each sample, and then constructing a grouped frequency histogram of the values. This only works in practice for populations whose size N is relatively small. Therefore, statisticians use a set of standard sampling distributions. One of the more commonly used of these is the normal distribution, which we will use in a number of cases in Lessons 18-31.

The Sampling Distribution of the Sample Proportion (�̂�𝑝)

The population proportion 𝑝𝑝 is the proportion of "favorable" observations (or "successes") associated with a binomial random variable 𝑥𝑥. Note: this is the same 𝑝𝑝 (the probability of observing a "success" in an individual binomial trial) that we were introduced to in Chapter 5.

Up to this point, we have accepted the value of 𝑝𝑝 as a given. However, since it is itself a population parameter, it usually needs to be estimated using a sample proportion �̂�𝑝. The sample statistic �̂�𝑝 is the proportion "successes" in a given sample of 𝑛𝑛 binomial trials.

The true sampling distribution of �̂�𝑝 is binomial. However, it can be shown that when 𝑛𝑛�̂�𝑝 > 5 and 𝑛𝑛(1 − �̂�𝑝) > 5, the sampling distribution of �̂�𝑝 can be closely approximated by a normal distribution. We won't explore the math behind this approximation here but will make use of it in later chapters.

The Sampling Distribution of the Sample Mean (�̅�𝑥)

Consider a random variable 𝑥𝑥 whose probability distribution has mean 𝜇𝜇 and standard deviation 𝜎𝜎.

The Central Limit Theorem (CLT) states that for all random samples of 𝑥𝑥 of the same size 𝑛𝑛, and for 𝑛𝑛 ≥ 30, the sampling distribution of �̅�𝑥 can be approximated by a normal distribution with mean 𝜇𝜇�̅�𝑥 = 𝜇𝜇 and standard deviation 𝜎𝜎�̅�𝑥 = 𝜎𝜎 √𝑛𝑛⁄ , regardless of the probability distribution of x.

The value 𝜎𝜎�̅�𝑥 is called the standard error of �̅�𝑥 to distinguish it from 𝜎𝜎, the standard deviation of the original random variable 𝑥𝑥.

Note that if x is normal to begin with, then �̅�𝑥 is normally distributed regardless of sample size.

Example 1: Complete the Sampling Distribution activity (located in the Canvas module for this lesson). After 100,000 simulated samples of size 𝑛𝑛 = 50, what is the shape of the distribution of the �̅�𝑥 values?

Applying the Central Limit Theorem

If the original random variable x is normal, or if it’s not normal and 𝑛𝑛 ≥ 30, then we can use the methods presented in Lesson 15 to calculate probabilities associated with �̅�𝑥, as well as determine which values of �̅�𝑥 (if any) are significant.

If 𝑛𝑛 < 30, this normal approximation doesn’t work, and we need to use an alternative method (such as bootstrapping, introduced in Lesson 23).

Example 2: For a certain brand of gumdrops, 𝑥𝑥, the weight of an individual candy, has a non-normal distribution with a mean of 0.41 oz and a standard deviation of 0.05 oz. Suppose the gumdrops are sold in bulk packages containing 200 candies each.

a. What is the distribution of �̅�𝑥, the mean weight of a bag of 200 candies?

Since 𝑛𝑛 ≥ 30, we can invoke the Central Limit Theorem and assume �̅�𝑥 has an approximately normal distribution.

b. What are the mean and standard error of that distribution?

The mean 𝜇𝜇 = 0.41 and the standard error 𝜎𝜎�̅�𝑥 = 𝜎𝜎 √𝑛𝑛⁄ = 0.05 √200⁄ = 0.004.

c. What is the probability that a bag of 200 candies will have a mean weight per candy of at most 0.40 oz?

Using the StatCrunch normal distribution calculator with 𝜇𝜇 = 0.41 and 𝜎𝜎�̅�𝑥 = 0.004, we get 𝑃𝑃(�̅�𝑥 ≤ 0.40) = 0.006.

We use of �̅�𝑥 instead of 𝑥𝑥 because we're calculating a probability related to the mean weight of 200 candies instead of to the weight of an individual candy.

Note: StatCrunch's normal distribution calculator does not distinguish between standard deviation and standard error, so we need to insert 𝜎𝜎�̅�𝑥 = 0.004 in the "Std. Dev." field when solving Central Limit Theorem problems.

d. Assuming significance level 𝛼𝛼 = 0.05, is 0.40 oz a significantly low value of �̅�𝑥?

Using the probability method for determining significant values of a normal distribution (Lesson 15), since 𝑃𝑃(�̅�𝑥 ≤ 0.40) = 0.006 < 0.025, 0.40 oz a significantly low value of �̅�𝑥.

Example 3: Suppose an elevator is rated with a capacity of 16 passengers and a maximum load of 2,500 lb, for a maximum weight per person of 156.25 lb. The random variable 𝑥𝑥, the weight of an individual adult, is distributed as shown in the table below. In a worst-case scenario, an elevator could be loaded with 16 men. If so, would the resulting load exceed rated capacity?

Males Females 𝜇𝜇 182.9 lb 165.0 lb 𝜎𝜎 40.8 lb 45.6 lb

Distribution Normal Normal

a. Find the probability that a single randomly selected adult male has a weight greater than 156.25 lb.

To calculate this probability, we to use the distribution of individual weights. Plugging the parameter values from the "Males" column in above table into the StatCrunch normal probability gives 𝑃𝑃(𝑥𝑥 ≥ 156.25) = 0.743 ,

b. Find the probability that a random sample of 16 adult males has a mean weight greater than 156.25 lb. What does the result suggest about the posted maximum capacity of 16?

Here we need the probability that the mean weight for random samples of 16 adult males. This requires us to find the standard error: 𝜎𝜎�̅�𝑥 = 40.8 √16⁄ = 10.2. Inputting this value and 𝜇𝜇 = 182.9 into the normal distribution calculator, we get 𝑃𝑃(�̅�𝑥 ≥ 156.25) = 0.996. Since the elevator is likely to fail if it is loaded with 16 men, the passenger capacity needs to be reduced.

c. Suppose the maximum capacity is reduced to 10 passengers. (Assume the maximum load remains at 2,500 lb.) How likely is the elevator to fail?

If the passenger capacity decreases to 10 and the load remains at 2,500 lb, the maximum weight per person is now 2500/10 = 250 lb/person. The standard error is now 𝜎𝜎�̅�𝑥 = 40.8 √10⁄ = 12.9, and the probability of failure is now 𝑃𝑃(�̅�𝑥 ≥ 250) ≈ 0.000. The elevator is now unlikely to fail.

Example 4: A school needs to purchase 25 desks for a kindergarten classroom. These desks must accommodate the sitting heights of five-year- old kindergarten students, the distribution of which is described in the table below.

Boys Girls 𝜇𝜇 61.8 cm 61.2 cm 𝜎𝜎 2.9 cm 3.1 cm

Distribution Normal Normal

a. What sitting height will accommodate 95% of individual boys?

We need to find the 95th percentile of the distribution of individual sitting heights. For that, we simply use the parameter values in the above table, which yields a 95th percentile of 66.6 cm.

b. What is the 95th percentile of the mean sitting heights of random

samples of 25 boys?

To answer this part of the problem, we need the 95th percentile of the distribution of average sitting heights for random samples of 25 boys. This requires us to find the standard error: 𝜎𝜎�̅�𝑥 = 2.9 √25⁄ = 0.58. Inputting this value and 𝜇𝜇 = 61.8 into the normal distribution calculator, we get a 95th percentile for samples of 25 boys of 62.8 cm.

c. What percentage of the population of kindergarten boys would be

able to fit in desks designed with the sitting height from Part a but would not fit in desks designed with the sitting height from Part b?

The boys who would fit into the bigger desks but not the smaller ones have sitting heights of between 62.8 and 66.6 cm. To find the percentage, we need to calculate 𝑃𝑃(62.8 ≤ 𝑥𝑥 ≤ 66.6) using the distribution of individual sitting heights (𝜇𝜇 = 61.8, 𝜎𝜎 = 2.9). Using the normal distribution calculator, we get 𝑃𝑃(62.8 ≤ 𝑥𝑥 ≤ 66.6) = 0.316 = 31.6%.

Example 5: A certain drink container has a labeled weight of 12 fluid ounces. The manufacturer claims that 𝑥𝑥, the weight of an individual container, has mean 𝜇𝜇 = 12.00 and standard deviation 𝜎𝜎 = 0.11. Suppose a random sample of 𝑛𝑛 = 36 containers is weighed, and the mean weight is �̅�𝑥 = 12.19 fl oz.

a. Assuming the manufacturer's claim is true, what is the probability that a random sample of 36 containers has a mean weight of 12.19 fl oz or greater?

The standard error: 𝜎𝜎�̅�𝑥 = 0.11 √36⁄ = 0.018. Inputting this value and 𝜇𝜇 = 12.00 into the normal distribution calculator, we get 𝑃𝑃(�̅�𝑥 ≥ 12.19) ≈ 0.000.

b. Assuming the manufacturer's claim is true, is mean weight of 12.19 fl

oz or greater a significantly high value of �̅�𝑥? (Assume 𝛼𝛼 = 0.01.)

Since 𝑃𝑃(�̅�𝑥 ≥ 12.19) < 0.005, 12.19 is a significantly high value. In fact, its z-score is 10.56 (you should be able to verify this using the formula in Lesson 14).

c. Is it likely that a sample of 36 containers would have a mean weight

of 12.19 fl oz due to random chance? What does this result suggest about the manufacturer's claim that 𝜇𝜇 = 12.00?

If the manufacturers claim is true, it is extremely unlikely that a sample of 36 containers would have a mean weight of 12.19 fl oz simply by random chance. This suggests that the manufacturer's claim is probably false (and the true value of 𝜇𝜇 is likely greater than 12 fl oz).