Statistics

profileDickson34
Chapter18narrated.pptx

Chapter 18

Cross-Tabulated Counts

Dr. Bochi McKinney

Chapter 18

3/22/2017

Basic Biostat

1

Devotional

“I pray that you, being rooted and established in love, may have the power, together with all the saints, to grasp how wide and long and high and deep is the love of Christ” (Ephesians 3:17-18).

In Chapter 18:

18.1 Types of Samples

18.2 Naturalistic and Cohort Samples

18.3 Chi-Square Test of Association

18.4 Test for Trend

18.5 Case-Control Samples

18.6 Matched Pairs

§18.1 Types of Samples

The prior chapter considered categorical response variables with two possible outcomes

This chapter considers categorical variables with any number of possible outcomes

Types of Samples, cont.

Data may be generated by:

I. Naturalistic Samples. An SRS with data then cross-classified according to the explanatory variable and response variable.

II. Purposive Cohort Samples. Fixed numbers of individuals selected according to the explanatory factor.

III. Case-Control Samples. Fixed numbers of individuals selected according to the outcome variable.

Illustrative Example

Take an SRS from the population; then cross-classify individuals with respect to explanatory and response variables.

bZhou et al. (1996). Association between prior cytomegalovirus infection and the risk of restenosis after coronary atherectomy.

New England Journal of Medicine, 335(9), 624–630.

Purposive Cohort Samples: Illustrative Example

Select predetermined numbers of exposed and nonexposed individuals; then ascertain outcomes in individuals.

cWriting Group for the Women’s Health Initiative Investigators. (2002). Risks and benefits of estrogen plus progestin in healthy postmenopausal women: Principal results from the Women’s Health Initiative randomized controlled trial. JAMA, 288(3), 321–333.

Case-Control Samples: Illustrative Examples

Identify individuals who are positive for the outcome (cases); then sample the population for negative (controls).

Lesko, S. M., Rosenberg, L., & Shapiro, S. (1993). A case-control study of baldness in relation to myocardial infarction in men. JAMA, 269(8), 998–1003.

§18.2 Naturalistic and Cohort Samples

Data from a naturalistic sample are shown in this 5-by-2 table

Let us always put the explanatory variable in row of such table (for uniformity)

Totals are tallied in table margins

Smoke + Smoke − Total
High school 12 38 50
Assoc. degree 18 67 85
Some college 27 95 122
UG degree 32 239 271
Grad degree 5 52 57
Total 94 491 585

Marginal Distributions

For naturalistic samples (only) describe marginal distributions

These may be reported graphically or in terms of percentages

Top figure: column marginal distribution

Bottom figure: row marginal distribution

Conditional Percents

The relationship between the row variable and column variable is explored with conditional percents. Two types of conditional percents :

Row percents  use in cohort and naturalistic samples (describe prevalence and incidence)

Column percents  use in case-control samples

Incidence and Prevalence (Naturalistic and Cohort Samples only)

The top table demonstrates R-by-C table notation (R rows and C columns)

For naturalistic and cohort samples, row percents in column 1 represent group incidence or prevalences

Smoke+ Smoke- Total
Group 1 a1 b1 n1
Group 2 a2 b2 n2
n3
Group R aR bR nR
Total m1 m2 N

Prevalences - Example

This table shows prevalence by education level

Example of calculation, prevalence group 1:

Relative Risks, R-by-2 Tables

Let group 1 represent the least exposed group

Relative risks are calculated as follows:

RRs, R-by-2 Tables, Example

This table lists RR for the illustrative data

Example of calculation

Notice the downward dose-response in RRs

Odds Ratios, R-by-2 Tables

The odds of an event is the ratio of successes to failures:

The odds ratios associated with exposure level i in a R-by-2 table is

Interpretation. ORs similar to RRs, e.g., OR≈1 implies no association (see chapter for details)

Table 18.4 Odds ratios, education, and smoking illustrative data

Responses with More than Two Levels of Outcome

Efficacy of Echinacea. A randomized controlled clinical trial pitted echinacea vs. placebo in the treatment of upper respiratory symptoms in children. The response variable was severity of illness classified as: mild, moderate or severe.

Echinacea, Conditional Distributions

Row percents are calculated to determine the incidence of each outcome.

Example of calculation, top right table cell (data prior slide) % severe w/echinacea = 48 / 329 × 100% = 14.6%

Conclusion: the treatment group fared slightly worse than the control group: 14.6% of treatment group experienced severe symptoms compared to 10.9% of the control group.

§18.3 Chi-Square Test of Association

A. Hypotheses. H0: no association in population versus Ha: association in population

B. Test statistic.

C. P-value. Convert the X2stat to a P-value with a Table E or software program.

Chi-Square Test - Example

Data below reveal a negative association between smoking and education level. Let us test

H0: no association in the population vs.

Ha: association in the population.

χ2, Expected Frequencies

Chi-Square Statistic - Example

Chi-Square Test, P-value

X2stat= 13.20 with 4 df

Using Table E, find the row for 4 df

Find the chi-square values in this row that bracket 13.20

Bracketing values are 11.14 (P = .025) and 13.28 (P = .01).

Thus, .025 < P < .01 (closer to .01)

  Probability in right tail
df 0.98 0.25 0.20 0.15 0.10 0.05 0.025 0.01 0.01
4 0.48 5.39 5.99 6.74 7.78 9.49 11.14 13.28 14.86

The P-value = AUC in the tail beyond X2stat

Illustrative example X2stat= 13.20 with 4 df

Chi-Square By Computer

Here are results for the illustrative data from WinPepi > Compare2.exe > Program F Categorical Data

Yates’ Continuity Corrected Chi-Square Statistic

Two different chi-square statistics are used in practice

Pearson’s chi-square statistic (covered) is

Yates’ continuity-corrected chi-square statistic is:

The continuity-corrected method produces smaller chi-square statistics and larger P-values.

Both chi-square are used in practice.

Chi-Square, cont.

How the chi-square works. When observed values = expected values, the chi-square statistic is 0. When the observed minus expected values gets large and evidence against H0 mounts

Avoid chi-square tests in small samples. Do not use a chi-square test when more than 20% of the cells have expected values that are less than 5.

Chi-Square, cont.

3. Supplement chi-squares with measures of association. Chi-square statistics do not measure the strength of association. Use descriptive statistics or RRs to quantify “strength”.

4. Chi-square and z tests (Ch 17) produce identical P-values. The relationship between the statistics is:

18.4 Test for Trend

See pp. 483 – 488

§18.5 Case-Control Samples

Case-control sampling method

Identify all cases in the population

From the same source population, randomly select a series of non-cases (controls)

Ascertain the exposure status of cases and controls

Cross-tabulate the exposure status of cases and controls

This provides an efficient way to study rare outcomes

Figure 18.8 Case–control sampling by the incidence density method

As cases are identified in the population; select at random one or more noncases (controls) for each case at time of occurrence.

This advanced concepts allows students to see that case-control studies are a type of longitudinal “time-failure” design.

Odds Ratio

Cases Controls Total
Exposed a1 b1 n1
Nonexposed a2 b2 n2
Total m1 m2 N

With incidence density sampling, the OR is a direct estimate of the rate ratio in the population!

Cross-tabulate the count of cases and controls according to their exposure status:

cross-product ratio

Case-Control Illustrative Example

Cases: men diagnosed with esophageal cancer

Controls: noncases selected at random from electoral lists in same region

Exposure = alcohol consumption dichotomized at 80 gms/day

Interpretation: The rate ratio associated with high-alcohol consumption is about 5.6

Tuyns, A. J., Pequignot, G., & Jensen, O. M. (1977). [Esophageal cancer in Ille-et-Vilaine in relation to levels of alcohol and tobacco consumption. Risks are multiplying]. Bulletin du Cancer, 64(1), 45–60. Data stored online as individual records in the file bd1.* as variables alc2 and case. Data set from APHA data exchange, November 1988.

(1– α)100% CI for the OR

Note use of the natural logarithmic scale

90% CI for the OR – Example

Cases Cntls
E+ 96 109
E− 104 666

Case-Control - Example

Results from WinPepi > Compare2.exe > A.

WinPepi uses a slightly different formula than ours; the Mid-P results are similar to ours.

Table 18.17 Case–control data of alcohol and esophageal cancer with alcohol consumption recorded according to four levels.

With an ordinal exposure, compare each exposure level to the non-exposed group (next slide):

Case-Control, Ordinal Levels of Exposure

Note dose-response relationship

18.6 Matched Pairs

With matched-pair samples, each participant is carefully matched to a unique individual as part of the selection process

This technique is used to mitigate confounding by the matching factor

Both cohort and case-control samples may avail themselves of matching

Here’s the notation for matched-pair case-control data:

The odds ratio associate with exposure is:

The confidence interval is:

Case E+ Case E−
Control E+ a b
Control E− c d

Matched Pairs - Example

A matched case-control study found 45 pairs in which the case but not the control had a low fruit/veg diet; it found 24 pairs in which the control but not the case had a low fruit/veg diet

Case E+ Case E−
Cntl E+ unknown 24
Cntl E− 45 unknown

The odds ratio suggests 88% higher risk in low fruit/veg consumers.

Matched Pair Example, cont.

Data are compatible with ORs between 1.14 and 3.07

WinPepi’s PairEtc.exe program A calculates exact confidence intervals for ORs from matched-pair data. Hand calculated limits will be similar except in small samples.

Hypothesis Test, Matched Pairs

A. H0: OR = 1

B. McNemar’s test statistic.

C. P-values. Convert zstat to P-value with Table B or Table F

If fewer than 5 discordancies are expected, use an exact binomial procedure (see text).

Hypothesis Test, Example

Case E+ Case E−
Control E+ unknown 24
Control E− 45 unknown

Questions and or concerns?

100%

alcolumn tot

count cell

percent column 

100%

totalrow

count cell

percent row 

i

i

i

n

a

p

i

=

ˆ

group

,

proportion

prevalence

or

Incidence

24

.

0

50

12

ˆ

1

1

1

=

=

=

n

a

p

1

ˆ

ˆ

ˆ

p

p

R

R

i

i

=

88

.

0

2400

.

0

2118

.

0

ˆ

ˆ

ˆ

1

2

2

=

=

=

p

p

R

R

1

1

/

/

ˆ

b

a

b

a

R

O

i

i

=

i

i

i

b

a

o

/

=

(

)

)

1

)(

1

(

total

table

al

column tot

total

row

calculated

cell

in

count

expected

and

cell

count,

observed

where

cells

all

2

2

stat

-

-

=

´

=

º

º

-

=

C

å

C

R

df

E

i

E

i

O

E

E

O

i

i

i

i

i

i

total

table

al

column tot

total

row

s

frequencie

xpected

´

=

i

E

E

(

)

cells

all

2

2

stat

å

-

=

C

i

i

i

E

E

O

(

)

|

|

cells

all

2

2

1

2

c

stat,

å

-

-

=

C

i

i

i

E

E

O

stat

2

df

1

stat with

z

=

C

1

2

2

1

ˆ

b

a

b

a

R

O

=

64

.

5

109

104

666

96

ˆ

1

2

2

1

=

×

×

=

=

b

a

b

a

R

O

R

O

SE

z

R

O

e

ˆ

ln

2

1

ˆ

ln

×

±

-

a

2

2

1

1

1

1

1

1

ˆ

ln

where

b

a

b

a

R

O

SE

+

+

+

=

)

52

.

7

,

23

.

4

(

1.645

use

confidence

90%

For

1752

.

0

7229

.

1

)

640

.

5

ln(

ˆ

ln

0181

.

2

,

4417

.

1

2882

.

0

7229

.

1

)

1752

.

0

)(

645

.

1

(

7299

.

1

666

1

109

1

104

1

96

1

ˆ

ln

=

=

=

=

=

+

+

+

=

=

=

±

±

e

e

e

z

SE

R

O

R

O

b

c

R

O

SE

1

1

ˆ

ln

where

+

=

88

.

1

24

45

ˆ

=

=

=

b

c

R

O