Statistics questions
MTH 245 Lesson 13 Notes The Binomial Trial
An experiment with exactly two possible outcomes, each of which has a fixed probability of occurring, is referred to as a binomial trial (aka a Bernoulli trial). The outcomes of such an experiment may take any format— 0 or 1, yes or no, pass or fail, etc. Typically, one of the outcomes—the one of interest—is referred to as a "success" and the other as a "failure." The probability distribution of a single Bernoulli trial is
𝑥𝑥 𝑃𝑃(𝑥𝑥) "success" 𝑝𝑝 "failure" 1 − 𝑝𝑝
The probability 𝑝𝑝 may be calculated as either an empirical or a theoretical probability, or its value may be assumed based on other information.
Example 1: Consider an experiment consisting of a single flip of a U.S. quarter. Suppose we know from experimentation that this particular quarter will turn up "tails" 494 times out of every 1,000 flips. If we view "tails" as a success, what is the probability distribution for this experiment?
Since the experiment has only two possible outcomes, and we can safely assume that the probability of "tails" won't change between replications, then the experiment is a Bernoulli trial with the following distribution:
𝑥𝑥 𝑃𝑃(𝑥𝑥) "tails" 494 1000⁄ = 0.494
"heads" 1 − (494 1000⁄ ) = 1 − 0.494 = 0.506 Note that in the previous example, we assumed that the theoretical probability of success ("tails") matched the empirical probability we observed during our experimentation. This assumption may not be justified; in Lesson 18, we'll learn how to tell if it is.
The Binomial Probability Process
A process consisting of a sequence of binomial trials is a binomial process if all four of the following conditions hold:
1. There are a fixed, finite number of trials. 2. The trials must be identical; that is, the same outcome must be defined
as a "success" for all trials in the process. 3. Each trial must be independent of all the others. Note: this assumption
can be relaxed if the Five Percent Condition applies. 4. The probability of success must remain constant over all trials.
Example 2: Determine if the following processes are binomial.
a. A fair coin is flipped four times and the result of each flip is recorded. The outcome "heads" is considered a success. (Assume the method flipping the coin does not influence the outcome of any one trial.)
Binomial. There are a total of four trials, success is defined as "heads" for each trial, the trials are independent, and the probability of success doesn't change from trial to trial.
b. Four cards are drawn with replacement from a standard 52-card poker deck, and the label (e.g., ace, 2, etc.) of each card is recorded. A trial is considered a success if a jack is drawn.
Not binomial. The trials are not binomial trials because each has more than two possible outcomes.
c. Four cards are drawn without replacement from a standard 52-card poker deck, and number the color of each card is recorded.
Not binomial. There are only four trials, and they're binomial because each has only two possible outcomes (red or black). However, because the cards are drawn without replacement, the trials are dependent, and the probability of drawing red changes with each trial.
The Binomial Probability Formula
If p is the probability of success for any one binomial trial and 𝑥𝑥 is the number of successes in 𝑛𝑛 trials, then the probability that there will be exactly k successes in n trials is
𝑃𝑃(𝑥𝑥 = 𝑘𝑘) = 𝐶𝐶𝑘𝑘𝑛𝑛 ⋅ 𝑝𝑝𝑘𝑘 ⋅ (1 − 𝑝𝑝)𝑛𝑛−𝑘𝑘
where 𝐶𝐶𝑘𝑘𝑛𝑛 is the number of possible sequences of 𝑛𝑛 trials that contain 𝑘𝑘 successes. A distribution with this probability function is referred to as a binomial distribution with 𝑛𝑛 trials and probability of success 𝑝𝑝.
To calculate the probability that 𝑥𝑥 will take on a range of values, evaluate the above formula for each value of 𝑥𝑥 in the range, then add the probabilities. For example, 𝑃𝑃(𝑥𝑥 ≤ 2) = 𝑃𝑃(𝑥𝑥 = 0) + 𝑃𝑃(𝑥𝑥 = 1) + 𝑃𝑃(𝑥𝑥 = 2).
We will not calculate binomial probabilities by hand. StatCrunch contains a calculator that will evaluate the Binomial Formula for us.
Shape of the Binomial Distribution
The shape of a binomial distribution is determined by 𝑝𝑝. The closer 𝑝𝑝 is to 0.5, the closer the distribution is to symmetric. As 𝑝𝑝 approaches 0, the distribution grows more and more right-skewed; as 𝑝𝑝 approaches 1, the opposite is true, and the distribution grow more and more left-skewed. The three graphs below illustrate this effect.
Calculating Binomial Probabilities Using StatCrunch
1. Select Stats Calculators Binomial.
2. Click "Standard" for one-sided inequalities or "Between" for two- sided.
3. Input 𝑛𝑛, 𝑝𝑝, and 𝑥𝑥.
4. Use the pull-down to adjust the operator as needed.
5. Click "Compute".
Example 3: An unfair coin is flipped four times in a row (𝑛𝑛 = 4) and the results recorded. Suppose we define "tails" as a success, and further suppose the probability of observing "tails" on any given flip is 𝑝𝑝 = 0.6.
a. What is the probability that exactly one of the four flips results in "tails"?
𝑃𝑃(𝑥𝑥 = 1) = 0.154
b. What is the probability that at most one of the four flips results in "tails"?
𝑃𝑃(𝑥𝑥 ≤ 1) = 0.179
c. What is the probability that at least two of the four flips result in "tails"?
𝑃𝑃(𝑥𝑥 ≥ 2) = 0.821
d. What is the probability that the four flips will result in between one and three "tails"?
𝑃𝑃(1 ≤ 𝑥𝑥 ≤ 3) = 0.845
Parameters of the Binomial Probability Distribution
Unlike the cumbersome process we learned in Lesson 12, the binomial probability distribution has parameters that are easy to calculate:
𝜇𝜇 = 𝑛𝑛 ⋅ 𝑝𝑝
𝜎𝜎2 = 𝑛𝑛 ⋅ 𝑝𝑝 ⋅ (1 − 𝑝𝑝)
𝜎𝜎 = �𝜎𝜎2 = �𝑛𝑛 ⋅ 𝑝𝑝 ⋅ (1 − 𝑝𝑝)
Example 4: A coin is flipped four times in a row and results recorded. Define "tails" as a "success."
a. Assume the coin is fair; that is, the probability of observing "tails" on any given flip is 𝑝𝑝 = 0.5. Calculate 𝜇𝜇, 𝜎𝜎2, and 𝜎𝜎.
𝜇𝜇 = 4 ⋅ 0.5 = 2.0
𝜎𝜎2 = 4 ⋅ 0.5 ⋅ (1 − 0.5) = 1.0
𝜎𝜎 = �4 ⋅ 0.5 ⋅ (1 − 0.5) = √1.0 = 1.0
b. Calculate 𝜇𝜇, 𝜎𝜎2, and 𝜎𝜎 assuming the coin is biased toward "tails." Use 𝑝𝑝 = 0.7.
𝜇𝜇 = 4 ⋅ 0.7 = 2.0
𝜎𝜎2 = 4 ⋅ 0.7 ⋅ (1 − 0.7) = 0.84 ≈ 0.8
𝜎𝜎 = �4 ⋅ 0.7 ⋅ (1 − 0.7) = √0.84 ≈ 0.9
Statistical Significance and the Binomial Distribution
To determine which values of a binomial random variable are significant, we use the probability method from Lesson 12. Example 5: Suppose a fair coin (𝑝𝑝 = 0.5) is flipped eight times (𝑛𝑛 = 8) and the number of "tails" (𝑥𝑥) is recorded. Which values of 𝑥𝑥 are significant?
Note that the mean of this distribution is 𝜇𝜇 = 8 ⋅ 0.5 = 4.0
Significantly low values of 𝑥𝑥:
𝑥𝑥 = 0: 𝑃𝑃(𝑥𝑥 ≤ 0) = 0.004 ≤ 0.025, so 𝑥𝑥 = 0 is significantly low.
𝑥𝑥 = 1: 𝑃𝑃(𝑥𝑥 ≤ 1) = 0.035 > 0.025, so 𝑥𝑥 = 1 is not significant.
𝑥𝑥 = {2, 3}: Since 𝑥𝑥 = 1 is not significant, 𝑥𝑥 = 2 and 𝑥𝑥 = 3—which are closer to 𝜇𝜇, are also not significant.
Significantly high values of 𝑥𝑥:
𝑥𝑥 = 8: 𝑃𝑃(𝑥𝑥 ≥ 8) = 0.004 ≤ 0.025, so 𝑥𝑥 = 8 is significantly high.
𝑥𝑥 = 7: 𝑃𝑃(𝑥𝑥 ≥ 7) = 0.035 > 0.025, so 𝑥𝑥 = 71 is not significant.
𝑥𝑥 = {5, 6}: Since 𝑥𝑥 = 7 is not significant, 𝑥𝑥 = 5 and 𝑥𝑥 = 6—which are closer to 𝜇𝜇, are also not significant.
Example 6: According to the U. S. Department of Labor, workers at American companies who are subject to workplace drug testing fail those tests (i.e., test positive) at a rate of 4%. Suppose a company's screening policy involves periodically selecting 14 employees at random during each round of screening.
a. What is the mean and standard deviation of the number of positives during a screening?
𝑛𝑛 = 14, 𝑝𝑝 = 0.04
𝜇𝜇 = 14 ⋅ 0.04 ≈ 0.6
𝜎𝜎 = �14 ⋅ 0.04 ⋅ (1 − 0.04) ≈ 0.7
b. Suppose that during the most recent screening, three employees tested positive. Is that a significantly high number of positives?
𝐹𝐹𝐹𝐹𝐹𝐹 𝑎𝑎 𝑏𝑏𝑏𝑏𝑛𝑛𝐹𝐹𝑏𝑏𝑏𝑏𝑎𝑎𝑏𝑏 𝑑𝑑𝑏𝑏𝑑𝑑𝑑𝑑𝐹𝐹𝑏𝑏𝑏𝑏𝑑𝑑𝑑𝑑𝑏𝑏𝐹𝐹𝑛𝑛 𝑤𝑤𝑏𝑏𝑑𝑑ℎ 𝑛𝑛 = 14 𝑎𝑎𝑛𝑛𝑑𝑑 𝑝𝑝 = 0.04, 𝑃𝑃(𝑥𝑥 ≥ 3) = 0.017 ≤ 0.025, so three is a significantly high number of positives.
c. Suppose that during the next screening, none of the employees test positive. Is that a significantly low number of positives?
𝑃𝑃(𝑥𝑥 ≤ 0) = 0.565 > 0.025, so zero positives is not a significantly low result.
Example 7: In a clinical trial of a certain cholesterol drug, 6 out of 94 subjects in the treatment group developed headaches as a side effect. If the manufacturer claims that the incidence of headaches is 4 percent or less, is 6 a significantly high value?
Suppose we define our experiment to be the selection of a patient from among the 94 in the sample, and the outcomes of that experiment to be whether they developed headaches ("success") or not ("failure").
Also, assume that the probability any one patient develops headaches is 0.04, per the manufacturer's claim (we'll accept this claim as fact for now, but we'll demonstrate how to test if it's reasonable in Lesson 19).
Then 𝑥𝑥—the number of patients out of 94 who will develop headaches—has a binomial distribution with 𝑛𝑛 = 94 and 𝑝𝑝 = 0.04.
Using the StatCrunch binomial calculator, we find that for this particular binomial distribution, 𝑃𝑃(𝑥𝑥 ≥ 6) = 0.175. Since 𝑃𝑃(𝑥𝑥 ≥ 6) > 0.025, six is not a significantly high number of patients with headaches.
What would be a significantly high number? (Hint: calculate probabilities for different values of x to find the smallest value with probability less than 0.025.)