Epidemiology quiz
Reliability
Jennifer Deal, PhD Johns Hopkins University
The material in this video is subject to the copyright of the owners of the material and is being provided for educational purposes under rules of fair use for registered students in this course only. No additional copies of the copyrighted work may be made or distributed.
Defining Reliability
3
Remember Validity?
► Validity: the ability of a test to distinguish between individuals who have a disease and individuals who do not have the disease 1. Sensitivity 2. Specificity
4
What Is Reliability?
► Reliability: the degree to which the results from a test can be replicated
5
Validity vs. Reliability—1
► Validity ► Are the test results correct?
► Reliability ► Are the test results the same when the
same test is repeated in the same individual under similar conditions?
6
Validity vs. Reliability—2
► Validity ► Are the test results correct? ► Analogous to accuracy
► Reliability ► Are the test results the same when the
same test is repeated in the same individual under similar conditions?
► Analogous to precision
7
Validity vs. Reliability—3
► Valid test results: Are the results correct? ► Reliable test results: Are the results repeatable?
Source: Gordis. (5th ed.).
8
Valid and Reliable
9
Factors Affecting Reliability—1
► The same test in the same person may not always yield the same result
10
Factors Affecting Reliability—2
► The same test in the same person may not always yield the same result ► Why? Because of variation related to…
● The patient ● Laboratory procedures ● Persons reading the test
11
Factors Affecting Reliability—3
► The same test in the same person may not always yield the same result ► Why? Because of variation related to…
● The patient Intra-subject = within the same patient: Do factors (physiologic, environmental)
within the patient affect the test results?
12
Factors Affecting Reliability—4
► The same test in the same person may not always yield the same result ► Why? Because of variation related to…
● The patient Intra-subject = within the same patient: Do factors (physiologic, environmental)
within the patient affect the test results? ● Laboratory procedures and persons reading the test Intra-observer = within the same observer (Does the same observer interpret the
test result in the same way every time?) Inter-observer = between two or more observers (Do two different observers
interpret the same test result in the same way?)
13
Intrasubject (within Subject) Variation: Blood Pressure Changes in a 73-Year-Old Woman over a 24-Hour Period
Source: Harvard Health Publishing. (2009). Experts call for home blood pressure monitoring. Accessed December 17, 2018.
14
Intrasubject (within Subject) Variation: Cortisol Release over 24 Hours
Source: Warde Medical Laboratory. Accessed December 17, 2018.
15
Interobserver (between Observers) Variation: Reliability of Lumbar Spine MRI Findings
Similar findings across observers Findings differed by observer
The material in this video is subject to the copyright of the owners of the material and is being provided for educational purposes under rules of fair use for registered students in this course only. No additional copies of the copyrighted work may be made or distributed.
Estimating Interobserver Reliability
2
Measuring Interobserver Reliability for Categorial Outcomes
► Percent agreement
► Percent positive agreement
► Kappa statistic
3
Percent Agreement—1
4
Percent Agreement—2
5
Percent Agreement—3
► 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 𝑎𝑎𝑎𝑎𝑃𝑃𝑃𝑃𝑃𝑃𝑎𝑎𝑃𝑃𝑃𝑃𝑃𝑃 = 𝐴𝐴 +𝐹𝐹+𝐾𝐾+𝑃𝑃 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑟𝑟𝑟𝑟𝑇𝑇𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟
× 100%
6
Percent Agreement—4
► 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 𝑎𝑎𝑎𝑎𝑃𝑃𝑃𝑃𝑃𝑃𝑎𝑎𝑃𝑃𝑃𝑃𝑃𝑃 = 𝑇𝑇+𝑟𝑟 𝑇𝑇+𝑏𝑏+𝑐𝑐+𝑟𝑟
7
Percent Agreement—5
► 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 𝑎𝑎𝑎𝑎𝑃𝑃𝑃𝑃𝑃𝑃𝑎𝑎𝑃𝑃𝑃𝑃𝑃𝑃 = 𝑇𝑇+𝑟𝑟 𝑇𝑇+𝑏𝑏+𝑐𝑐+𝑟𝑟
► In this case, a high percent agreement is due solely to observer agreement about who is negative
8
Percent Agreement—6
► Percent agreement when at least one test is positive = a / a + b + c
► Tests that both observers classified as negative are removed from the calculation (both the numerator and the denominator!)
9
Kappa Statistic—1
► Some agreement happens by chance…
Source: BBC News. Accessed December 17, 2018, at https://www.bbc.com/news/technology-15060310
10
Kappa Statistic—2
► Sometimes observers will agree solely by chance
► The kappa statistic allows us to calculate the level of agreement independent of chance
► What is the agreement between observers beyond what would be expected by chance alone?
11
First Step to Calculate the Kappa Statistic
► Numerator:
► How much better is the observed agreement than the agreement expected by chance alone? ► 𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷 𝒂𝒂𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷 𝒐𝒐𝒐𝒐𝒐𝒐𝑷𝑷𝑷𝑷𝒐𝒐𝑷𝑷𝒐𝒐 − 𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷 𝒂𝒂𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷 𝑷𝑷𝒆𝒆𝒆𝒆𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝒐𝒐 𝒐𝒐𝒃𝒃 𝑷𝑷𝒄𝒄𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷 𝒂𝒂𝒂𝒂𝒐𝒐𝑷𝑷𝑷𝑷
12
Second Step to Calculate the Kappa Statistic
► Denominator:
► What is the maximum the observers could have improved upon the agreement expected by chance alone? ► 100% − 𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷 𝒂𝒂𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷 𝑷𝑷𝒆𝒆𝒆𝒆𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝒐𝒐 𝒐𝒐𝒃𝒃 𝑷𝑷𝒄𝒄𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷 𝒂𝒂𝒂𝒂𝒐𝒐𝑷𝑷𝑷𝑷
13
Third Step to Calculate the Kappa Statistic
► 𝑁𝑁𝑁𝑁𝑁𝑁𝑟𝑟𝑟𝑟𝑇𝑇𝑇𝑇𝑇𝑇𝑟𝑟 𝐷𝐷𝑟𝑟𝑟𝑟𝑇𝑇𝑁𝑁𝑟𝑟𝑟𝑟𝑇𝑇𝑇𝑇𝑇𝑇𝑟𝑟
:
► Of the maximum possible improvement in agreement expected beyond chance alone, what proportion has in fact occurred?
► 𝜅𝜅 𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷 𝒂𝒂𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷 𝒐𝒐𝒐𝒐𝒐𝒐𝑷𝑷𝑷𝑷𝒐𝒐𝑷𝑷𝒐𝒐 − 𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷 𝒂𝒂𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷 𝑷𝑷𝒆𝒆𝒆𝒆𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝒐𝒐 𝒐𝒐𝒃𝒃 𝑷𝑷𝒄𝒄𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷 𝒂𝒂𝒂𝒂𝒐𝒐𝑷𝑷𝑷𝑷 100%−𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷 𝒂𝒂𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷 𝑷𝑷𝒆𝒆𝒆𝒆𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝒐𝒐 𝒐𝒐𝒃𝒃 𝑷𝑷𝒄𝒄𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷 𝒂𝒂𝒂𝒂𝒐𝒐𝑷𝑷𝑷𝑷
14
How to Calculate the Percent Observed Agreement (a Reminder)
► 𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷 𝒂𝒂𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷 𝒐𝒐𝒐𝒐𝒐𝒐𝑷𝑷𝑷𝑷𝒐𝒐𝑷𝑷𝒐𝒐 =
41+27 75
× 100 = 90.7%
15
How to Calculate the Percent Agreement Expected by Chance Alone—1
16
How to Calculate the Percent Agreement Expected by Chance Alone— 2
17
How to Calculate the Percent Agreement Expected by Chance Alone— 3
18
How to Calculate the Percent Agreement Expected by Chance Alone— 4
19
How to Calculate the Percent Agreement Expected by Chance Alone— 5
► Observer A classified 45 tests as YES
► Observer B classified 58.7% of tests as YES
► If Observer B were to classify the 45 tests that Observer A classified as YES, we expect that s/he would classify 58.7% of those 45 tests as YES
► 45 * 0.587 = 26.4 tests would be classified as YES
20
How to Calculate the Percent Agreement Expected by Chance Alone— 6
21
How to Calculate the Percent Agreement Expected by Chance Alone— 7
► 𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷 𝒂𝒂𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷 𝑷𝑷𝒆𝒆𝒆𝒆𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝒐𝒐 𝒐𝒐𝒃𝒃
𝑷𝑷𝒄𝒄𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷 𝒂𝒂𝒂𝒂𝒐𝒐𝑷𝑷𝑷𝑷 = 26.4 + 12.4
75 × 100
= 51.7%
22
And Now, the Kappa
► Percent agreement observed = 90.7%
► Percent agreement expected by chance alone = 51.7%
► 𝜅𝜅 𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷 𝒂𝒂𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷 𝒐𝒐𝒐𝒐𝒐𝒐𝑷𝑷𝑷𝑷𝒐𝒐𝑷𝑷𝒐𝒐 − 𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷 𝒂𝒂𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷 𝑷𝑷𝒆𝒆𝒆𝒆𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝒐𝒐 𝒐𝒐𝒃𝒃 𝑷𝑷𝒄𝒄𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷 𝒂𝒂𝒂𝒂𝒐𝒐𝑷𝑷𝑷𝑷 100%−𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷 𝒂𝒂𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷 𝑷𝑷𝒆𝒆𝒆𝒆𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝒐𝒐 𝒐𝒐𝒃𝒃 𝑷𝑷𝒄𝒄𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷 𝒂𝒂𝒂𝒂𝒐𝒐𝑷𝑷𝑷𝑷
► 𝜅𝜅 90.7%−51.7% 100%−51.7%
= 39.0% 48.3%
= 0.81
23
Interpretation of Kappa
Value of kappa Strength of agreement
0.0 No agreement better than chance alone
<0.20 Poor
0.21–0.40 Fair
0.41–0.60 Moderate
0.61–0.80 Good
0.81–1.00 Very good
Source: Altman. (1991). Practical statistics for medical research. Chapman and Hall: London.
The material in this video is subject to the copyright of the owners of the material and is being provided for educational purposes under rules of fair use for registered students in this course only. No additional copies of the copyrighted work may be made or distributed.
Reliability for Continuous Variables
2
Reliability for Categorical Variables
► The measures of reliability we have discussed (percent agreement, kappa) are for categorical variables
► Examples ► Two categories: having hypertension vs. not having hypertension ► More than two categories: normotensive, hypertension stage 1, hypertension stage 2
3
Reliability for Continuous Variables—1
► But measures of reliability can also be calculated for continuous variables ► Example: systolic blood pressure (mmHg)
4
Reliability for Continuous Variables—2
► But measures of reliability can also be calculated for continuous variables ► Example: systolic blood pressure (mmHg)
► Reliability can be assessed visually ► Scatterplots ► Bland-Altman plots
5
Reliability for Continuous Variables—3
► But measures of reliability can also be calculated for continuous variables ► Example: systolic blood pressure (mmHg)
► Reliability can be assessed visually ► Scatterplots ► Bland-Altman plots
► Reliability can be quantified by calculating ► Correlation coefficient ► Coefficient of variation ► And using regression
6
Reliability for Continuous Variables—4
► Biostatistics courses → ► Reliability can be assessed visually
● Scatterplots ● Bland-Altman plots
► Reliability can be quantified by calculating ● Correlation coefficient ● Coefficient of variation ● And using regression
7
Scatterplots to Assess Reliability
8
Summary of Reliability Measures—Categorical Variables
► Compare repeated measures
► 2x2 tables
► Percent agreement, percent positive agreement, kappa statistic
9
Summary of Reliability Measures—Continuous Variables
► Compare repeated measures
► Scatterplots, Bland-Altman plots
► Correlation coefficient, coefficient of variation, regression
10
When we discussed validity, we discussed how you could categorize continuous variables, and how the choice of that cutpoint affects
sensitivity and specificity. …
11
… But is there a way to assess validity of a continuous variable?
12
Scatterplots to Assess Validity (and Reliability)
13
Scatterplots to Assess Validity and Reliability—1
14
Scatterplots to Assess Validity and Reliability—2
15
Scatterplots to Assess Validity and Reliability—3
16
Scatterplots to Assess Validity and Reliability—4
17
Scatterplots to Assess Validity and Reliability—5
18
Scatterplots to Assess Validity and Reliability—6
19
We have discussed validity and reliability in the context of screening or diagnostic tests, but these concepts are more broadly applicable to
other measurement tools that may be used in a research study
20
Quality Assurance and Quality Control—1
► Quality assurance (QA) ► Any method, procedure, or approach
for collecting, processing, or analyzing data aimed at maintaining or improving their reliability or validity
► Quality control (QC) ► An aggregate of sampling and testing
procedures based on statistical theory and analysis designed to ensure adequate quality in relation to a finished product
Source: Clinical trials dictionary: Terminology and usage recommendations.
21
Quality Assurance and Quality Control—2
► Quality assurance (QA) ► Any method, procedure, or approach
for collecting, processing, or analyzing data aimed at maintaining or improving their reliability or validity
► E.g., development and pilot testing of study protocol
► Quality control (QC) ► An aggregate of sampling and testing
procedures based on statistical theory and analysis designed to ensure adequate quality in relation to a finished product
► E.g., site visit to observe how study staff administer the protocol
Source: Clinical trials dictionary: Terminology and usage recommendations.
22
Minimizing Error (Maximizing Validity and Reliability) in a Research Study
► Rigorous and detailed protocol (documented!) using standardized procedures and measurements
► Use reliable and accurate instruments and methods
► Rigorously train and re-train staff
► Conduct measurements consistently and correctly
► QA and QC to identify and address problems
► Incorporate reliability and validity studies into the protocol
23
Lessons Learned
► Reliability is the degree to which the results from a test can be replicated
► Reliability is analogous to precision (validity is analogous to accuracy)
► Factors affecting reliability ► Intrasubject ► Intra-observer ► Interobserver
► Calculating reliability for categorical variables ► Percent agreement ► Percent positive agreement ► Kappa statistic (accounts for chance)
- 28575
- Reliability
- Defining Reliability
- Remember Validity?
- What Is Reliability?
- Validity vs. Reliability—1
- Validity vs. Reliability—2
- Validity vs. Reliability—3
- Valid and Reliable
- Factors Affecting Reliability—1
- Factors Affecting Reliability—2
- Factors Affecting Reliability—3
- Factors Affecting Reliability—4
- Intrasubject (within Subject) Variation: Blood Pressure Changes in a 73-Year-Old Woman over a 24-Hour Period
- Intrasubject (within Subject) Variation: Cortisol Release over 24 Hours
- Interobserver (between Observers) Variation: Reliability of Lumbar Spine MRI Findings
- 28576
- Estimating Interobserver Reliability
- Measuring Interobserver Reliability for Categorial Outcomes
- Percent Agreement—1
- Percent Agreement—2
- Percent Agreement—3
- Percent Agreement—4
- Percent Agreement—5
- Percent Agreement—6
- Kappa Statistic—1
- Kappa Statistic—2
- First Step to Calculate the Kappa Statistic
- Second Step to Calculate the Kappa Statistic
- Third Step to Calculate the Kappa Statistic
- How to Calculate the Percent Observed Agreement (a Reminder)
- How to Calculate the Percent Agreement Expected by Chance Alone—1
- How to Calculate the Percent Agreement Expected by Chance Alone—2
- How to Calculate the Percent Agreement Expected by Chance Alone—3
- How to Calculate the Percent Agreement Expected by Chance Alone—4
- How to Calculate the Percent Agreement Expected by Chance Alone—5
- How to Calculate the Percent Agreement Expected by Chance Alone—6
- How to Calculate the Percent Agreement Expected by Chance Alone—7
- And Now, the Kappa
- Interpretation of Kappa
- 28577
- Reliability for Continuous Variables
- Reliability for Categorical Variables
- Reliability for Continuous Variables—1
- Reliability for Continuous Variables—2
- Reliability for Continuous Variables—3
- Reliability for Continuous Variables—4
- Scatterplots to Assess Reliability
- Summary of Reliability Measures—Categorical Variables
- Summary of Reliability Measures—Continuous Variables
- When we discussed validity, we discussed how you could categorize continuous variables, and how the choice of that cutpoint affects sensitivity and specificity. …
- … But is there a way to assess validity of a continuous variable?
- Scatterplots to Assess Validity (and Reliability)
- Scatterplots to Assess Validity and Reliability—1
- Scatterplots to Assess Validity and Reliability—2
- Scatterplots to Assess Validity and Reliability—3
- Scatterplots to Assess Validity and Reliability—4
- Scatterplots to Assess Validity and Reliability—5
- Scatterplots to Assess Validity and Reliability—6
- We have discussed validity and reliability in the context of screening or diagnostic tests, but these concepts are more broadly applicable to other measurement tools that may be used in a research study
- Quality Assurance and Quality Control—1
- Quality Assurance and Quality Control—2
- Minimizing Error (Maximizing Validity and Reliability) in a Research Study
- Lessons Learned