Provide Reflection Of This Week That Contains 1 Paragraph, Which Address At Least One Or Two Of The Following Topics
Chapter 5
The Role of Probability
Learning Objectives (1 of 3)
Define the terms “equally likely” and “at random”
Compute and interpret unconditional and conditional probabilities
Evaluate and interpret independence of events
Explain the key features of the binomial distribution model
Learning Objectives (2 of 3)
Calculate probabilities using the binomial formula
Explain the key features of the normal distribution model
Calculate probabilities using the standard normal distribution table
Compute and interpret percentiles of the normal distribution
Learning Objectives (3 of 3)
Define and interpret the standard error
Explain sampling variability
Apply and interpret the results of the Central Limit Theorem
Two Areas of Biostatistics
Goal: Statistical Inference
Descriptive Statistics
Sampling from a Population
Sampling: Population Size = N, Sample Size = n (1 of 2)
Simple random sample
Enumerate all members of population N (sampling frame), select n individuals at random (each has same probability of being selected).
Systematic sample
Start with sampling frame; determine sampling interval (N/n); select first person at random from first (N/n) and every (N/n) thereafter.
Stratified sample
Organize population into mutually exclusive strata; select individuals at random within each stratum.
Convenience sample
Non-probability sample (not for inference)
Quota sample
Select a predetermined number of individuals into sample from groups of interest.
Sampling: Population Size = N, Sample Size = n (2 of 2)
Basics
Probability reflects the likelihood that outcome will occur.
0 ≤ Probability ≤ 1
P(Select any child) = 1/5290 = 0.0002
Example 5.1. Basic Probability (1 of 2)
Example 5.1. Basic Probability (2 of 2)
P(Select a boy) = 2560/5290 = 0.484
P(Select boy age 10) = 418/5290 = 0.079
P(Select child at least 8 years of age)
= (846 + 881 + 918)/5290
= 2645/5290 = 0.500
Conditional Probability
Probability of outcome in a specific subpopulation
Example 5.1.
P(Select 9-year-old from among girls) = P(Select 9-year-old | girl)
= 461/2730 = 0.169
P(Select boy | 6 years of age)
= 379/892=0.425
Example 5.2. Conditional Probability (1 of 2)
Example 5.2. Conditional Probability (2 of 2)
P(Prostate cancer | Low PSA)
= 3/64 = 0.047
P(Prostate cancer | Moderate PSA)
= 13/41 = 0.317
P(Prostate cancer | High PSA)
= 12/15 = 0.80
Sensitivity and Specificity
Sensitivity = True positive fraction
= P(test+ | disease)
Specificity = True negative fraction
= P(test– | disease free)
False negative fraction = P(test– | disease)
False positive fraction = P(test+ | disease free)
Example 5.4. Sensitivity and Specificity
Sensitivity and Specificity
Sensitivity = P(test+ | disease) = 9/10 = 0.90
Specificity = P(test– | disease free)
= 4449/4800 = 0.927
False negative fraction = P(test– | disease)
= 1/10 = 0.10
False positive fraction = P(test+ | disease free)
= 351/4800 = 0.073
Independence
Two events, A and B, are independent if P(A | B) = P(A) or if P(B | A) = P(B)
Example 5.2.
Is screening test independent of prostate cancer diagnosis?
P(Prostate cancer) = 28/120 = 0.023
P(Prostate cancer | Low PSA) = 0.047
P(Prostate cancer | Moderate PSA) = 0.317
P(Prostate cancer | High PSA) = 0.80
Bayes’ Theorem (1 of 2)
Using Bayes’ Theorem we revise or update a probability based on additional information.
Prior probability is an initial probability.
Posterior probability is a probability that is revised or updated based on additional information.
Bayes’ Theorem (2 of 2)
Example (1 of 2)
In Boston, 51% of adults are male.
One adult is randomly selected to participate in a study.
Prior probability of selecting a male = 0.51
Example (2 of 2)
Selected participant is a smoker.
9.5% of males in Boston smoke as compared to 1.7% of females.
Find the probability that we selected a male given he is a smoker.
Example: Find P(M | S)
P(M) = 0.51 P(M') = 0.49
P(S | M) = 0.095 P(S | M') = 0.017
Bayes’ Theorem
Knowing the participant smokes—increases P(M)
P(disease) = 0.002
Sensitivity = 0.85 = P(test+ | disease)
P(test+) = 0.08 and P(test–) = 0.92
What is P(disease | test+)?
Example 5.8. Bayes’ Theorem (1 of 3)
What is P(disease | test+)?
P(disease) = 0.002
Sensitivity = 0.85 = P(test+ | disease)
P(test+) = 0.08 and P(test–) = 0.92
Example 5.8. Bayes’ Theorem (2 of 3)
Example 5.8. Bayes’ Theorem (3 of 3)
P(disease) = 0.002
Sensitivity = 0.85 = P(test+ | disease)
P(test+) = 0.08 and P(test–) = 0.92
Model for discrete outcome
Process or experiment has two possible outcomes: success and failure.
Replications of process are independent.
P(success) is constant for each replication.
Binomial Distribution (1 of 2)
Binomial Distribution (2 of 2)
Notation
n = number of times process is replicated
p = P(success)
x = number of successes of interest
0 ≤ x ≤ n
Example 5.9. Binomial Distribution
Medication for allergies is effective in reducing symptoms in 80% of patients. If medication is given to 10 patients, what is the probability it is effective in 7?
= 120(0.2097)(0.008) = 0.2013
Antibiotic is claimed to be effective in 70% of the patients. If antibiotic is given to five patients, what is the probability it is effective on exactly three?
Success = Antibiotic is effective: n = 5, p = 0.7, x = 3
= 10(0.343)(0.09) = 0.3087
Binomial Distribution (1 of 4)
What is the probability that the antibiotic is effective on all five?
Binomial Distribution (2 of 4)
What is the probability that the antibiotic is effective on at least three?
P(X ≥ 3) = P(3) + P(4) + P(5)
= 0.3087 + 0.3601 + 0.1681 = 0.8369
Binomial Distribution (3 of 4)
Binomial Distribution (4 of 4)
Mean and variance of the binomial distribution
m = np
s2 = np (1 – p)
For example, the mean (or expected) number of patients in whom the antibiotic is effective is 5*0.7 = 3.5
Model for continuous outcome
Mean = median = mode
Normal Distribution (1 of 3)
Notation: m = mean and s = standard deviation
m-3s m-2s m-s m m+s m+2s m+3s
Normal Distribution (2 of 3)
Normal Distribution (3 of 3)
Properties of normal distribution
I) The normal distribution is symmetric about the mean (i.e., P(X > m) = P(X < m) = 0.5).
ii) The mean and variance, m and s2, completely characterize the normal distribution.
iii) The mean = the median = the mode.
P(m - s < X < m + s) = 0.68
P(m - 2s < X < m + 2s) = 0.95
P(m - 3s < X < m + 3s) = 0.99
iv) P(a < X < b) = the area under the normal curve from a to b.
Body mass index (BMI) for men age 60 is normally distributed with a mean of 29 and standard deviation of 6.
What is the probability that a male has BMI less than 29?
Example 5.11. Normal Distribution (1 of 10)
Example 5.11. Normal Distribution (2 of 10)
11 17 23 29 35 41 47
P(X<29)=0.5
0.5 0.5
Example 5.11. Normal Distribution (3 of 10)
Body mass index (BMI) for men age 60 is normally distributed with a mean of 29 and standard deviation of 6.
What is the probability that a male has BMI less than 35?
Example 5.11. Normal Distribution (4 of 10)
11 17 23 29 35 41 47
P(X<35)=?
Example 5.11. Normal Distribution (5 of 10)
Example 5.11. Normal Distribution (6 of 10)
11 17 23 29 35 41 47
P(X < 35) = 0.5 + 0.34 = 0.84
0.5 0.34
Standard Normal Distribution Z
Normal distribution with m = 0 and s = 1
-3 -2 -1 0 1 2 3
11 17 23 29 35 41 47
P(X < 35) = P(Z < 1) = ?
Example 5.11. Normal Distribution (7 of 10)
P(X < 35) = P(Z < 1).
Using Table 1, P(Z < 1.00) = 0.8413
Table 1. Probabilities of Z
Table entries represent P(Z < Zi)
Zi .00 .01 .02 .03 .04 …
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 …
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 …
.
.
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 …
Example 5.11. Normal Distribution (8 of 10)
11 17 23 29 35 41 47
P(X<30)=?
What is the probability that a male has BMI less than 30?
Example 5.11. Normal Distribution (9 of 10)
Example 5.11. Normal Distribution (10 of 10)
P(X < 30)= P(Z < 0.17) = 0.5675
Percentiles of the Normal Distribution
The kth percentile is defined as the score that holds k percent of the scores below it (e.g., 90th percentile is the score that holds 90% of the scores below it.
Q1 = 25th percentile
Median = 50th percentile
Q3 = 75th percentile
Percentiles (1 of 2)
For the normal distribution, the following is used to compute percentiles:
X = m + Z s
where
m = mean of the random variable X,
s = standard deviation, and
Z = value from the standard normal distribution for the desired percentile (See Table 1A, next slide).
Percentiles of the standard normal distribution
(Table 1A)
| Percentile | Z |
| 1st | –2.326 |
| 2.5th | –1.960 |
| 5th | –1.645 |
| 10th | –1.282 |
| 50th | 0 |
| 90th | 1.282 |
| 95th | 1.645 |
| 97.5th | 1.960 |
| 99th | 2.326 |
Percentiles (2 of 2)
Example 5.12. Percentiles of the Normal Distribution
BMI in men follows a normal distribution with m = 29, s = 6; BMI in women follows a normal distribution with m = 28, s = 7.
The 90th percentile of BMI for men
X = 29 + 1.282(6) = 36.69.
The 90th percentile of BMI for women
X = 28 + 1.282(7) = 36.97.
Central Limit Theorem
Suppose we have a population with known mean m and standard deviation s. If we take simple random samples of size n with replacement, then for large n, the sampling distribution of the sample means is approximately normal with mean and standard deviation.
Application
Non-normal population
Take samples of size n, as long as n is sufficiently large (usually n ≥ 30 suffices).
The distribution of the sample mean is approximately normal, therefore can use Z to compute probabilities.
HDL cholesterol has a mean of 54 and standard deviation of 17 in patients over 50. A physician has 40 patients over age 50 and wants to know the probability that their mean cholesterol is above 60.
Example 5.18. Central Limit Theorem (1 of 2)
Example 5.18. Central Limit Theorem (2 of 2)
Example
Suppose we wish to estimate the mean of a population (m) whose standard deviation is known and equal to 12, and a simple random sample of 100 individuals is selected from the population.
Find the probability that the sample mean is no more than 2 units from the population mean.
Sampling Distribution of Sample Mean
Central Limit Theorem
P( m – 2 < < m + 2) = ??
Z = { (m – 2) – m }/12/100 = –2/1.2 = –1.67
Z = { (m + 2) – m }/12/100 = 2/1.2 = 1.67
Then: P(–1.67 < Z < 1.67) = 0.9525 – 0.0475 = 0.905
The probability that the sample mean is no more than 2 units from the population mean is 0.905, or 90.5%.
n, X
= ?
SAMPLE
POPULATION
n
SAMPLES
n
n
n
n
n
n
n
n
n
Population
N
0.05
1.645
0.95
4
3
2
1
0
-1
-2
-3
-4
0
- 2
+ 2