Provide Reflection Of This Week That Contains 1 Paragraph, Which Address At Least One Or Two Of The Following Topics

profiledream86
Biostatistics_CH05_Probability-2.pptx

Chapter 5

The Role of Probability

Learning Objectives (1 of 3)

Define the terms “equally likely” and “at random”

Compute and interpret unconditional and conditional probabilities

Evaluate and interpret independence of events

Explain the key features of the binomial distribution model

Learning Objectives (2 of 3)

Calculate probabilities using the binomial formula

Explain the key features of the normal distribution model

Calculate probabilities using the standard normal distribution table

Compute and interpret percentiles of the normal distribution

Learning Objectives (3 of 3)

Define and interpret the standard error

Explain sampling variability

Apply and interpret the results of the Central Limit Theorem

Two Areas of Biostatistics

Goal: Statistical Inference

Descriptive Statistics

Sampling from a Population

Sampling: Population Size = N, Sample Size = n (1 of 2)

Simple random sample

Enumerate all members of population N (sampling frame), select n individuals at random (each has same probability of being selected).

Systematic sample

Start with sampling frame; determine sampling interval (N/n); select first person at random from first (N/n) and every (N/n) thereafter.

Stratified sample

Organize population into mutually exclusive strata; select individuals at random within each stratum.

Convenience sample

Non-probability sample (not for inference)

Quota sample

Select a predetermined number of individuals into sample from groups of interest.

Sampling: Population Size = N, Sample Size = n (2 of 2)

Basics

Probability reflects the likelihood that outcome will occur.

0 ≤ Probability ≤ 1

P(Select any child) = 1/5290 = 0.0002

Example 5.1. Basic Probability (1 of 2)

Example 5.1. Basic Probability (2 of 2)

P(Select a boy) = 2560/5290 = 0.484

P(Select boy age 10) = 418/5290 = 0.079

P(Select child at least 8 years of age)

= (846 + 881 + 918)/5290

= 2645/5290 = 0.500

Conditional Probability

Probability of outcome in a specific subpopulation

Example 5.1.

P(Select 9-year-old from among girls) = P(Select 9-year-old | girl)

= 461/2730 = 0.169

P(Select boy | 6 years of age)

= 379/892=0.425

Example 5.2. Conditional Probability (1 of 2)

Example 5.2. Conditional Probability (2 of 2)

P(Prostate cancer | Low PSA)

= 3/64 = 0.047

P(Prostate cancer | Moderate PSA)

= 13/41 = 0.317

P(Prostate cancer | High PSA)

= 12/15 = 0.80

Sensitivity and Specificity

Sensitivity = True positive fraction

= P(test+ | disease)

Specificity = True negative fraction

= P(test– | disease free)

False negative fraction = P(test– | disease)

False positive fraction = P(test+ | disease free)

Example 5.4. Sensitivity and Specificity

Sensitivity and Specificity

Sensitivity = P(test+ | disease) = 9/10 = 0.90

Specificity = P(test– | disease free)

= 4449/4800 = 0.927

False negative fraction = P(test– | disease)

= 1/10 = 0.10

False positive fraction = P(test+ | disease free)

= 351/4800 = 0.073

Independence

Two events, A and B, are independent if P(A | B) = P(A) or if P(B | A) = P(B)

Example 5.2.

Is screening test independent of prostate cancer diagnosis?

P(Prostate cancer) = 28/120 = 0.023

P(Prostate cancer | Low PSA) = 0.047

P(Prostate cancer | Moderate PSA) = 0.317

P(Prostate cancer | High PSA) = 0.80

Bayes’ Theorem (1 of 2)

Using Bayes’ Theorem we revise or update a probability based on additional information.

Prior probability is an initial probability.

Posterior probability is a probability that is revised or updated based on additional information.

Bayes’ Theorem (2 of 2)

Example (1 of 2)

In Boston, 51% of adults are male.

One adult is randomly selected to participate in a study.

Prior probability of selecting a male = 0.51

Example (2 of 2)

Selected participant is a smoker.

9.5% of males in Boston smoke as compared to 1.7% of females.

Find the probability that we selected a male given he is a smoker.

Example: Find P(M | S)

P(M) = 0.51 P(M') = 0.49

P(S | M) = 0.095 P(S | M') = 0.017

Bayes’ Theorem

Knowing the participant smokes—increases P(M)

P(disease) = 0.002

Sensitivity = 0.85 = P(test+ | disease)

P(test+) = 0.08 and P(test–) = 0.92

What is P(disease | test+)?

Example 5.8. Bayes’ Theorem (1 of 3)

What is P(disease | test+)?

P(disease) = 0.002

Sensitivity = 0.85 = P(test+ | disease)

P(test+) = 0.08 and P(test–) = 0.92

Example 5.8. Bayes’ Theorem (2 of 3)

Example 5.8. Bayes’ Theorem (3 of 3)

P(disease) = 0.002

Sensitivity = 0.85 = P(test+ | disease)

P(test+) = 0.08 and P(test–) = 0.92

Model for discrete outcome

Process or experiment has two possible outcomes: success and failure.

Replications of process are independent.

P(success) is constant for each replication.

Binomial Distribution (1 of 2)

Binomial Distribution (2 of 2)

Notation

n = number of times process is replicated

p = P(success)

x = number of successes of interest

0 ≤ x ≤ n

Example 5.9. Binomial Distribution

Medication for allergies is effective in reducing symptoms in 80% of patients. If medication is given to 10 patients, what is the probability it is effective in 7?

= 120(0.2097)(0.008) = 0.2013

Antibiotic is claimed to be effective in 70% of the patients. If antibiotic is given to five patients, what is the probability it is effective on exactly three?

Success = Antibiotic is effective: n = 5, p = 0.7, x = 3

= 10(0.343)(0.09) = 0.3087

Binomial Distribution (1 of 4)

What is the probability that the antibiotic is effective on all five?

Binomial Distribution (2 of 4)

What is the probability that the antibiotic is effective on at least three?

P(X ≥ 3) = P(3) + P(4) + P(5)

= 0.3087 + 0.3601 + 0.1681 = 0.8369

Binomial Distribution (3 of 4)

Binomial Distribution (4 of 4)

Mean and variance of the binomial distribution

m = np

s2 = np (1 – p)

For example, the mean (or expected) number of patients in whom the antibiotic is effective is 5*0.7 = 3.5

Model for continuous outcome

Mean = median = mode

Normal Distribution (1 of 3)

Notation: m = mean and s = standard deviation

m-3s m-2s m-s m m+s m+2s m+3s

Normal Distribution (2 of 3)

Normal Distribution (3 of 3)

Properties of normal distribution

I) The normal distribution is symmetric about the mean (i.e., P(X > m) = P(X < m) = 0.5).

ii) The mean and variance, m and s2, completely characterize the normal distribution.

iii) The mean = the median = the mode.

P(m - s < X < m + s) = 0.68

P(m - 2s < X < m + 2s) = 0.95

P(m - 3s < X < m + 3s) = 0.99

iv) P(a < X < b) = the area under the normal curve from a to b.

Body mass index (BMI) for men age 60 is normally distributed with a mean of 29 and standard deviation of 6.

What is the probability that a male has BMI less than 29?

Example 5.11. Normal Distribution (1 of 10)

Example 5.11. Normal Distribution (2 of 10)

11 17 23 29 35 41 47

P(X<29)=0.5

0.5 0.5

Example 5.11. Normal Distribution (3 of 10)

Body mass index (BMI) for men age 60 is normally distributed with a mean of 29 and standard deviation of 6.

What is the probability that a male has BMI less than 35?

Example 5.11. Normal Distribution (4 of 10)

11 17 23 29 35 41 47

P(X<35)=?

Example 5.11. Normal Distribution (5 of 10)

Example 5.11. Normal Distribution (6 of 10)

11 17 23 29 35 41 47

P(X < 35) = 0.5 + 0.34 = 0.84

0.5 0.34

Standard Normal Distribution Z

Normal distribution with m = 0 and s = 1

-3 -2 -1 0 1 2 3

11 17 23 29 35 41 47

P(X < 35) = P(Z < 1) = ?

Example 5.11. Normal Distribution (7 of 10)

P(X < 35) = P(Z < 1).

Using Table 1, P(Z < 1.00) = 0.8413

Table 1. Probabilities of Z

Table entries represent P(Z < Zi)

Zi .00 .01 .02 .03 .04 …

0.0 0.5000 0.5040 0.5080 0.5120 0.5160 …

0.1 0.5398 0.5438 0.5478 0.5517 0.5557 …

.

.

1.0 0.8413 0.8438 0.8461 0.8485 0.8508 …

Example 5.11. Normal Distribution (8 of 10)

11 17 23 29 35 41 47

P(X<30)=?

What is the probability that a male has BMI less than 30?

Example 5.11. Normal Distribution (9 of 10)

Example 5.11. Normal Distribution (10 of 10)

P(X < 30)= P(Z < 0.17) = 0.5675

Percentiles of the Normal Distribution

The kth percentile is defined as the score that holds k percent of the scores below it (e.g., 90th percentile is the score that holds 90% of the scores below it.

Q1 = 25th percentile

Median = 50th percentile

Q3 = 75th percentile

Percentiles (1 of 2)

For the normal distribution, the following is used to compute percentiles:

X = m + Z s

where

m = mean of the random variable X,

s = standard deviation, and

Z = value from the standard normal distribution for the desired percentile (See Table 1A, next slide).

Percentiles of the standard normal distribution

(Table 1A)

Percentile Z
1st –2.326
2.5th –1.960
5th –1.645
10th –1.282
50th 0
90th 1.282
95th 1.645
97.5th 1.960
99th 2.326

Percentiles (2 of 2)

Example 5.12. Percentiles of the Normal Distribution

BMI in men follows a normal distribution with m = 29, s = 6; BMI in women follows a normal distribution with m = 28, s = 7.

The 90th percentile of BMI for men

X = 29 + 1.282(6) = 36.69.

The 90th percentile of BMI for women

X = 28 + 1.282(7) = 36.97.

Central Limit Theorem

Suppose we have a population with known mean m and standard deviation s. If we take simple random samples of size n with replacement, then for large n, the sampling distribution of the sample means is approximately normal with mean and standard deviation.

Application

Non-normal population

Take samples of size n, as long as n is sufficiently large (usually n ≥ 30 suffices).

The distribution of the sample mean is approximately normal, therefore can use Z to compute probabilities.

HDL cholesterol has a mean of 54 and standard deviation of 17 in patients over 50. A physician has 40 patients over age 50 and wants to know the probability that their mean cholesterol is above 60.

Example 5.18. Central Limit Theorem (1 of 2)

Example 5.18. Central Limit Theorem (2 of 2)

Example

Suppose we wish to estimate the mean of a population (m) whose standard deviation is known and equal to 12, and a simple random sample of 100 individuals is selected from the population.

Find the probability that the sample mean is no more than 2 units from the population mean.

Sampling Distribution of Sample Mean

Central Limit Theorem

P( m – 2 < < m + 2) = ??

Z = { (m – 2) – m }/12/100 = –2/1.2 = –1.67

Z = { (m + 2) – m }/12/100 = 2/1.2 = 1.67

Then: P(–1.67 < Z < 1.67) = 0.9525 – 0.0475 = 0.905

The probability that the sample mean is no more than 2 units from the population mean is 0.905, or 90.5%.

n, X

 = ?

SAMPLE

POPULATION

n

SAMPLES

n

n

n

n

n

n

n

n

n

Population

N

0.05

1.645

0.95

4

3

2

1

0

-1

-2

-3

-4

0

 - 2

 + 2