answer
Confounding
1
NOTE: For those of you who had difficulty with standardization, it is IMPERATIVE that you go back and review and get the concepts solidified. You will need if for the following sections.
OVERVIEW
Explanation of confounding
Assessing confounding
Conditions for confounding
Controlling for confounding
Adjusting for confounders
2
Overview of Confounding
Recall: Purpose of epi studies is to determine the “truth” about the relationship between E and D.
Most Ds have many Es.
Sometimes Es are related to each other
We start this lesson by recalling that the whole purpose of epi studies is to determine the “truth” about the relationship between E and D. So far, we’ve looked at determining point estimates of one exposure’s relationship to the disease. Recall, too, though, that most diseases have more than one exposure (think back to webs of causation). Sometimes these different exposures are also related to one another, something known as confounding.
3
Confounding is…
A distortion in the point estimate due to the presence of a second exposure.
This “other E” is responsible for, or somehow related to, D.
As such, the magnitude of the association between E and D is unclear
4
Confounding is a distortion in the point estimate (measure of association, RR, OR, etc.) due to the presence of a second exposure. When something other than the E of interest is responsible for, or somehow related to D, the magnitude of the association between E and D becomes unclear. That’s because the presence of this "third" variable – the "other E" -- gets in the way… In more scientific terms, we say the results are confounded by this other exposure.
Illustrating a point…
A f-up study reveals that people who drink alcohol are 5 as likely to have lung cancer as non-drinkers (RR=5.1)
Should we thus conclude that there is a strong relationship between drinking alcohol and lung cancer ?
5
Before we move on to the concepts, let’s think about the issue of determining the “truth” between E and D.
I’ll use an example to illustrate:
Let’s say we did a follow-up study of alcohol consumption and lung cancer. We find that people who drink alcohol are 5 times as likely to have of lung cancer as non-drinkers (RR=5.1). Should we thus conclude that there is a strong relationship between drinking alcohol and lung cancer and recommend that people should refrain from drinking??
Not yet.
Even if we’ve addressed issues of validity (bias) and precision (chance), we still need to rule out confounding before we can say it’s the “truth.”
6
There are four possible explanations for RR results:
1) bias (systematic error)
2) chance (random error)
3) confounding
4) “the truth”
Assuming we've addressed precision and validity issues, let's now look at confounding before we assume this RR is "the truth".
Assessing Confounding
Could the results I found be explained by the presence of another E that is different than the E of interest?
7
When we look at confounding, we ask the question "Could the results I found be explained by the presence of some other E than the primary E of interest?"
To answer this question, we must ask ourselves two things:
1) what are the known and highly suspected risk factors for the D of interest?
2) Do our groups differ much in their histories of exposure to these other risk factors? That is, do some of the people in the primary E of interest group also have this other suspected E (written as E*) and some do not? If so, do these exposures account for some (or all) of the effect (RR)?
If we know #1, we can measure history of exposure to these factors so that we can answer #2.
Class Example: Lung Cancer and Alcohol
1. What are the other known or highly suspected risk factors for lung cancer?
2. Do our groups differ in their histories of exposure to these other risk factors?
8
Back to the example…
Question 1: What are there other known risk factors for lung cancer? Anyone? =) I think I heard someone say “smoking,” which is a great answer! We are fully aware that cigarette smoking is a known risk factor of lung cancer.
Question 2: Do our alcohol groups (E+ and E-) differ in their history of exposure to smoking?
Well, we can measure history of smoking among our alcohol+ and alcohol- groups in order to assess potential confounding…. Let’s say we do that and find that many or most of our alcohol+ subjects (the drinkers) are also in the smoking+ group (smokers), while very few of our alcohol- subjects (the non-drinkers) are in the smoking- group (non-smokers).
Uh - oh
After adjusting:
RR = 1.2
9
So, after adjusting for the presence of our second E* (smoking), we find that RR=1.2, or that the there really isn’t any significant difference in lung cancer between people who drink and those who don’t. (remember, null = 1).
Thus, the crude (unadjusted) RR=5, which suggested a very strong relationship between alcohol consumption and lung cancer, was quite misleading in that the alcohol itself was not responsible for the results observed. MOST of the effect was due to the fact the many/most of the drinkers smoked and very few of the non-drinkers did. Therefore, it was the smoking, not the drinking, that was responsible for most of the incidence rate of lung cancer among drinkers.
As the adjusted RR is quite different from the crude RR, we can conclude quite confidently that smoking is a confounder in the relationship between alcohol and lung cancer.
Still with me?
So let’s go back to the original alcohol-lung cancer scenario and pretend that when I asked the question “What are the other known or highly suspected risk factors for lung cancer?,” one of you answered “popcorn” instead of smoking.
We all know which one of you gave that answer.
10
WHAT?!?!?!
Well, sure enough, the drinkers and non-drinkers differed on their popcorn consumption, so we adjusted for popcorn consumption. Lo and behold, after adjusting for popcorn consumption, our RR fell to 1.3.
Does this mean we conclude that popcorn consumption is a confounder in the relationship between alcohol use and lung cancer….. which then means popcorn consumption is a risk factor for lung cancer???
11
Is eating popcorn really associated with lung cancer?
Well, the crude RR and adjusted RR are very different…..
12
At first glance, you may say yes because of the significant difference in the crude and adjusted RRs. However, in order for an exposure to qualify as a confounder, two conditions must be met. If you think back to the two questions for assessing confounding, you’ll should know what they are. But I will tell them to you again, in a simpler form. And when I do, I would highly recommend memorizing them as I predict that you will be asked about them later. Perhaps during a quiz or a test. Yes, this is a hint.
Two Conditions for Confounding
In order for there to be confounding, these two conditions must exist:
1. E* must be associated with D independent of its association with E
2. E* must be associated with E
Note – the confounding E is written as E*
13
First Example: Alcohol & Cigarettes
This slide shows how in the first example, smoking (our suspected confounding exposure) meets both of the two criteria for confounding:
there is clear, independent evidence that smoking and lung cancer is associated independent of alcohol use AND
most of our drinkers smoked, therefore E* is associated with E
Therefore, we would want to adjust for smoking to determine if it affects the relationship between drinking and smoking.
14
Second Example: Alcohol & Popcorn
In our second example, only the second condition has been met.
2. Popcorn is associated with drinking.
However, the first condition is not met insomuch as there is no independent association between eating popcorn and getting lung cancer and no biological plausibility (to date!) that such a relationship exists.
Thus, popcorn cannot be a confounder, and the RR that we got when adjusting for popcorn is a fallacy, and that really all of the smokers also ate a bunch of popcorn while drinking.
NOTE: Even though I adjusted for popcorn consumption for this example, in real life we would not do so because both conditions were not met.
15
Controlling Confounding
Good Planning is Necessary!
Design Stage
Subject characteristic restriction
Matching
Analysis Stage
Stratified analysis
Modeling/multivariate analysis
16
Confounding is much more of a problem in observational studies than in experimental studies. Because we do not manipulate exposure in observational studies, it is usually NOT true that our two groups (E+ and E- or D+ and D-) are alike with respect to other known and suspected risk factors of diseases. Therefore, we must be cautious and vigilant about the potential of confounding when we study any E D relationship. We must always question whether a non-null point estimate is due to differences in an exposure other than the E of interest.
We must always be prepared to control for potential confounders. To control for a confounder you must:
1. know WHAT to measure
2. measure it in your study subjects
To limit the effects of confounding, good planning is necessary. In the design stage, we can do two things to avoid confounding
1. subject characteristic restriction: you can choose to study only those subjects free of the potential confounder. For instance, in our alcohol-lung cancer study, you could exclude all smokers from the sample. You would have to then locate non-smoking drinkers and non-drinkers for your study. The drawbacks are that you cut down on pool of potential subjects and it may be more difficult to generalize results
2. Matching: select subjects in your two groups (E+ and E- or cases and controls) that have a similar profile with regard to the potential confounders. Matching can be done one-on-one as where you select an appropriate control for each case you use or it can done as "frequency matching" where you make sure your summary statistics are the same in one group as in the other. Regardless of which method you use, if we are successful in matching, then E* will not qualify as a confounder as E* will be equally as common in each group. What we have done here, in essence, is make the groups "the same" with regard to the potential confound, thus equally distributing the threat of E*. The drawbacks with matching are that you must search very hard for just the right subjects; this requires an enormous amount of time and effort, which becomes cost prohibitive, because many people have to be interviewed or examined to find the right subjects.
We can also limit confounding in the analysis stage in two primary ways:
1. stratified analysis: calculated RR estimates in narrow strata of the confounding factor. For example, we could look at the alcohol-lung cancer relationship in 3 groups: light smokers, moderate smokers, and heavy smokers. A RR can be calculated for each of the groups using 2 x 2 tables. You'll learn how to do this later!! I can feel your excitement!!!!!
2. odeling/multivariate analysis: a mathematical model can quantify the effects of the exposure of interest E and the potential confounder E*. "Pieces" of the RR can then be attributed to each E and E*. You will not learn how to do this in this class. I'm so sorry. I know that must disappoint you greatly. =)
Note: In the example used earlier, we magically controlled for smoking and popcorn in the analysis stage. You’re about to learn how that magic happens. =)
Example
A case-control study was done to determine the relationship between stomach cancer and h pylori infection. Among the 130 stomach cancer cases, 62 had h pylori infection as compared to 35 of the 130 controls. Of those who had h pylori infection, 44 stomach cance cases and 15 controls had a high nitrate diet. Among those without h pylori infection, 26 stomach cancer cases and 10 controls had a high nitrate diet
17
To determine if there’s confounding, we need look at the crude and adjusted point estimates, as we did in examples earlier. These next several slides will walk you through the process, using the example on this slide. For clarity, I will bold the part of the example that is being illustrated in each slide.
Before we begin, let’s determine what our exposures of interest are for developing stomach cancer by reading through the case together.
A case-control study was done to determine the relationship between stomach cancer and h pylori infection. Among the 130 stomach cancer cases, 62 had h pylori infection as compared to 35 of the 130 controls. Of those who had h pylori infection, 44 stomach cancer cases and 15 controls had a high nitrate diet. Among those without h pylori infection, 26 stomach cancer cases and 10 controls had a high nitrate diet
Typically, the primary exposure will be mentioned first. Thus, h pylori infection is our primary exposure of interest. The second exposure of interest is a diet high in nitrates. Dietary nitrates are independently associated with stomach cancer, and is also associated with h pylori infections.
| Ca (D+) | Co (D-) | Total | |
| H pylori (E+) | 62 | 35 | 97 |
| No H pylori (E-) | 68 | 95 | 163 |
| Total | 130 | 130 | 260 |
Step 1: Set up Data Table for E and Calculate Crude OR
18
We first construct our 2X2 data table with the primary E of interest and D and then calculate our crude measure of effect – in this case, OR. You’ve done this all semester, so this should be a piece of cake. However, I’ll review…
Example: A case-control study was done to determine the relationship between stomach cancer and h pylori infection. Among the 130 stomach cancer cases, 62 had h pylori infection as compared to 35 of the 130 controls. Of those who had h pylori infection, 44 stomach cancer cases and 15 controls had a high nitrate diet. Among those without h pylori infection, 26 stomach cancer cases and 10 controls had a high nitrate diet
We plug the known numbers into the table and calculate the rest. To review, let’s take the cases (D+) column total to illustrate. We know that we have a total of 130 cases and we know that 200 of them – our “a cell” of E+D+ -- had exposure to h pylori infection. To find out how many cases did not have exposure to h pylori infection – our “c cell” of E-D+ -- we subtract our “a cell” from the total cases.
OR = (a x d) / (b x c) = 2.5 This is our crude (unadjusted) OR and what we will use for comparisons.
Interpretation: Among stomach cancer cases, h pylori infection was 2.5 as common than controls.
| Ca | Co | Total | |
| High Nitrate Diet (E*+) | |||
| H Pylori | 44 | 15 | 59 |
| No H Pylori | 26 | 10 | 36 |
| E*+ Total | 70 | 25 | 95 |
| Low Nitrate Diet (E*-) | |||
| H pylori | |||
| No H pylori | |||
| E*- Total | |||
| Grand Total | 130 | 130 | 260 |
Step 2: Stratify E by E*
19
Next, we will stratify this data table on the suspected confounding exposure (E*) – a high nitrate diet. To do this, we are going to create two 2x2s, one for each level of E* :
1. High nitrate diet (E*+); and,
2. Low nitrate diet ( E*- )
If you look at the table, you’ll see it’s just two 2x2 tables stacked on top of each other… I’ve plugged the “known” numbers in from original 2x2, and recalculated totals.
Example: A case-control study was done to determine the relationship between stomach cancer and h pylori infection. Among the 130 cases, 62 had h pylori infection as compared to 35 of the 130 controls. Of those who had h pylori infection, 44 cases and 15 controls had a high nitrate diet. Among those without h pylori infection, 26 cases and 10 controls had a high nitrate diet
Now, here’s where our power of epi magic come into play. Drum roll please!
| Ca | Co | Total | |
| High Nitrate Diet (E*+) | |||
| H Pylori | 44 | 15 | 59 |
| No H Pylori | 26 | 10 | 36 |
| E*+ Total | 70 | 25 | 95 |
| Low Nitrate Diet (E*-) | |||
| H pylori | |||
| No H pylori | |||
| E*- Total | 130 – 70 = 60 | 130 – 25 = 105 | 60+105 = 165 |
| Grand Total | 130 | 130 | 260 |
Still Step 2: Stratify E by E*
20
We’re going to go slowly here… Let’s add in the case and control column totals for low nitrate.
As a math check, lets make sure that the E*+ total (95) and the E*- total (165) equals the grand total (260). It does. YAY!
| Ca | Co | Total | |
| High Nitrate Diet (E*+) | |||
| H Pylori | 44 | 15 | 59 |
| No H Pylori | 26 | 10 | 36 |
| E*+ Total | 70 | 25 | 95 |
| Low Nitrate Diet (E*-) | |||
| H pylori | 62- 44 = 18 | 35-15 =20 | 18+20 =38 |
| No H pylori | 68 – 26 =42 | 95-10 =85 | 42+85=127 |
| E*- Total | 60 | 105 | 165 |
| Grand Total | 130 | 130 | 260 |
More Step 2
21
To fill in the next part, we have to refer back to our primary E 2x2 in step 1. There we had the following:
Cell a (E+D+) = 62
Cell b (E+D-) = 35
Cell c (E-D+) = 68
Cell d (E-D-) = 95
This is where thinking and math come into play. Let’s look at the Case (D+) column in the low-nitrate 2x2. From Step 1, we know that there are 62 cases of stomach cancer among those with h pylori infections. Therefore, the “a cell” in the low nitrate diet is what’s left over after subtracting the known number of high-nitrate ( E*) cases of h pylori (44) from the total h pylori cases: 62 – 44. Similarly, there were 68 cases of stomach cancer among those who DID NOT have h pylori. Thus to get the c cell value for the low nitrate diet, we subtract high nitrate c cell from the total: 68 - 26.
You do the same for the control (E*-) cells also. , and calculate your totals. Make sure to double check that all of your totals give you the grand total.
See how that works!! Just take your time and think it through. This is the hardest part, I promise. =)
Note: a mediasite lecture is posted that walks you through this process in greater detail, so don’t panic if this doesn’t quite make sense yet.
Calculate E* Stratum-Specific ORs
OR E*+ = (44 x 10) / (15 x 26)
= 440/390
= 1.3
OR E*- = (18 x 85) / (20 x 42)
= 1530/840
= 1.8
22
All we’re doing here is calculating the OR as normal for each E* 2x2
Compare Crude OR to S-S ORs
Crude E OR = 2.5
Stratified E* ORs = 1.3 (high nitrate), 1.8 (low nitrate)
So, let’s take a moment and look at what we know: The crude OR for h pylori and stomach cancer was 2.5, which suggests the odds of exposure are greater among those who have stomach cancer. Both estimates of the odds ratio for the secondary exposure – high nitrate and low nitrate diets -- are lower than the odds ratio based on the entire sample.
Hmm….Wouldn't you expect to find the crude odds ratio to be a weighted average of the stratified odds ratios? Think about it….
When the stratified E* ORs are similar to the crude E OR, then the presence of E* makes no difference; thus, E* is not confounding the relationship between E and D. However, when the stratified E* ORs are different than the crude E OR, it means that the presence of E* has an effect on the relationship between E and D.
The next several slides go over three general rules on determining if E* is a confounder in the relationship between E and D.
23
General Rule #1
Suspect confounding if:
E* + OR < Crude OR > E* - OR
or
2. E*+ OR > Crude OR < E* - OR
General rule 1: When both of the s-s E* ORs are either higher than or lower than the crude E OR, then we suspect confounding.
(this holds true for RR also).
24
Our Data
OR high nitrate < Crude OR > OR low nitrate
1.3 < 2.5 > 1.8
So, in our example, both of the E* strata-specific ORs (high nitrate diet, low nitrate diet) are less than the crude OR. So, by virtue of general rule 1, we can say confounding is suspected.
25
General Rule #2
Adjust confounding only if:
E*+ OR ≈ E*- OR
General rule 2: Control (adjust) for confounding if the our two stratum-specific E* ORs are similar to one another. However, if the stratum-specific E* estimates are sufficiently different from one another, they should not be combined, as this would obscure useful information. You’ll learn about this in your next lesson.
If the E* stratum-specific ORs are similar to one another, they can be combined to obtain an un-confounded (adjusted) OR so that we can know the truth about the relationship of E and D without the effect of E* Another way of saying this is that we “control for” the dietary nitrate level.
In our example, the s-s E* ORs are similar (1.3 and 1.8); thus, conditions for general rule #2 are met.
26
Step 3: Calculate Adjusted E* OR
OR Mantel-Haenszel = (ad/n) / (bc/n)
27
The way we adjust for confounding is by using the Cochran Mantel-Haenszel test.
OR Mantel-Haenszel = (ad/n) / (bc/n)
Scary as it looks, all it’s really doing is multiplying the concordant cells (a,d) of each E* strata divided by that strata’s n (total) and adding them together. This becomes the numerator. Then, we do the same thing with the E* discordant cells (b,c) and make that our denominator. Thus, it looks like this:
OR Mantel-Haenszel= [ (high nitrate ad/n) + (low nitrate ad/n)] / [ (high nitrate bc/n) + (low nitrate bc/n)]
= [(44x10)/95 + (18 x 85)/165)] ÷ [(15x26)/95 + (20 x 42)/165]
= (440/95 + 1530/165) ÷ (390/95 + 840/165)
= (4.6 + 9.3) ÷ (4.1 + 5.1)
= 13.9 ÷ 9.2
= 1.5
(Note: is the symbol that means “sum of” )
Step 4: Compare Crude OR to Adjusted OR
OR crude = 2.5
OR adjusted = 1.5
28
So…. Is there meaningful confounding? Did adjusting for dietary nitrate level “clean up” the data (remove the effects of E*) so we have a better knowledge of the relationship between E and D?
General Rule # 3
Confounding exists when:
1. crude E OR ≥ 110% adjusted OR
2. Crude E OR ≤ 90% adjusted OR
To answer this, we go to general rule 3, which evaluatesthe magnitude of confounding by observing the degree of discrepancy observed between the crude and adjusted ORs. Generally, when the crude estimate changes by at least 10%, meaningful confounding exists.
So, we look at the difference between the two estimates to determine the degree of discrepancy. If there’s at least a 10% difference in the two estimates, whether it’s higher or lower, we conclude that there’s meaningful confounding. If it’s less than 10%, the confounding is not meaningful, meaning that the presence of E* does not obscure the E D truth, and we’ve basically done all of this work for nothing. Haha. Actually, it’s not for nothing. It’s for E D truth!!
29
Step 5: State Conclusion
Dietary nitrate level is a meaningful confounder in the relationship between h pylori infection and stomach cancer.
So in order to state our conclusion, we need to determine if the adjusted OR is at least 10% different than the crude OR. We can do that in a couple of ways. First, we could calculate the upper and lower cut-off points for meaningful confounding using a + 10% of the crude:
Upper Limit: 2.5 x 1.10 = 2.8
Lower Limit: 2.5 x .90 = 2.3
Then, we’d look to see if the adjusted OR is either equal to or more than the UL OR, or equal to or less than the LL.
Or, you could just do it the simple way through division: OR Crude/ OR adjusted = 2.5/1.5 = 1.7 = 70% difference between crude and adjusted.. (null = 1, so it’s what ever is above or below the null).
Now, just so you’ll know, there are other ways of controlling (adjusting) for confounding other than M-H. I won’t force you to learn multivariate analysis though. You’re welcome. =)
That’s it for this lesson.
30