Epidemiology quiz

profilegoergesandy2023
Reliability.pdf

Reliability

Jennifer Deal, PhD Johns Hopkins University

The material in this video is subject to the copyright of the owners of the material and is being provided for educational purposes under rules of fair use for registered students in this course only. No additional copies of the copyrighted work may be made or distributed.

Defining Reliability

3

Remember Validity?

► Validity: the ability of a test to distinguish between individuals who have a disease and individuals who do not have the disease 1. Sensitivity 2. Specificity

4

What Is Reliability?

► Reliability: the degree to which the results from a test can be replicated

5

Validity vs. Reliability—1

► Validity ► Are the test results correct?

► Reliability ► Are the test results the same when the

same test is repeated in the same individual under similar conditions?

6

Validity vs. Reliability—2

► Validity ► Are the test results correct? ► Analogous to accuracy

► Reliability ► Are the test results the same when the

same test is repeated in the same individual under similar conditions?

► Analogous to precision

7

Validity vs. Reliability—3

► Valid test results: Are the results correct? ► Reliable test results: Are the results repeatable?

Source: Gordis. (5th ed.).

8

Valid and Reliable

9

Factors Affecting Reliability—1

► The same test in the same person may not always yield the same result

10

Factors Affecting Reliability—2

► The same test in the same person may not always yield the same result ► Why? Because of variation related to…

● The patient ● Laboratory procedures ● Persons reading the test

11

Factors Affecting Reliability—3

► The same test in the same person may not always yield the same result ► Why? Because of variation related to…

● The patient  Intra-subject = within the same patient: Do factors (physiologic, environmental)

within the patient affect the test results?

12

Factors Affecting Reliability—4

► The same test in the same person may not always yield the same result ► Why? Because of variation related to…

● The patient  Intra-subject = within the same patient: Do factors (physiologic, environmental)

within the patient affect the test results? ● Laboratory procedures and persons reading the test  Intra-observer = within the same observer (Does the same observer interpret the

test result in the same way every time?)  Inter-observer = between two or more observers (Do two different observers

interpret the same test result in the same way?)

13

Intrasubject (within Subject) Variation: Blood Pressure Changes in a 73-Year-Old Woman over a 24-Hour Period

Source: Harvard Health Publishing. (2009). Experts call for home blood pressure monitoring. Accessed December 17, 2018.

14

Intrasubject (within Subject) Variation: Cortisol Release over 24 Hours

Source: Warde Medical Laboratory. Accessed December 17, 2018.

15

Interobserver (between Observers) Variation: Reliability of Lumbar Spine MRI Findings

Similar findings across observers Findings differed by observer

The material in this video is subject to the copyright of the owners of the material and is being provided for educational purposes under rules of fair use for registered students in this course only. No additional copies of the copyrighted work may be made or distributed.

Estimating Interobserver Reliability

2

Measuring Interobserver Reliability for Categorial Outcomes

► Percent agreement

► Percent positive agreement

► Kappa statistic

3

Percent Agreement—1

4

Percent Agreement—2

5

Percent Agreement—3

► 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 𝑎𝑎𝑎𝑎𝑃𝑃𝑃𝑃𝑃𝑃𝑎𝑎𝑃𝑃𝑃𝑃𝑃𝑃 = 𝐴𝐴 +𝐹𝐹+𝐾𝐾+𝑃𝑃 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑟𝑟𝑟𝑟𝑇𝑇𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟

× 100%

6

Percent Agreement—4

► 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 𝑎𝑎𝑎𝑎𝑃𝑃𝑃𝑃𝑃𝑃𝑎𝑎𝑃𝑃𝑃𝑃𝑃𝑃 = 𝑇𝑇+𝑟𝑟 𝑇𝑇+𝑏𝑏+𝑐𝑐+𝑟𝑟

7

Percent Agreement—5

► 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 𝑎𝑎𝑎𝑎𝑃𝑃𝑃𝑃𝑃𝑃𝑎𝑎𝑃𝑃𝑃𝑃𝑃𝑃 = 𝑇𝑇+𝑟𝑟 𝑇𝑇+𝑏𝑏+𝑐𝑐+𝑟𝑟

► In this case, a high percent agreement is due solely to observer agreement about who is negative

8

Percent Agreement—6

► Percent agreement when at least one test is positive = a / a + b + c

► Tests that both observers classified as negative are removed from the calculation (both the numerator and the denominator!)

9

Kappa Statistic—1

► Some agreement happens by chance…

Source: BBC News. Accessed December 17, 2018, at https://www.bbc.com/news/technology-15060310

10

Kappa Statistic—2

► Sometimes observers will agree solely by chance

► The kappa statistic allows us to calculate the level of agreement independent of chance

► What is the agreement between observers beyond what would be expected by chance alone?

11

First Step to Calculate the Kappa Statistic

► Numerator:

► How much better is the observed agreement than the agreement expected by chance alone? ► 𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷 𝒂𝒂𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷 𝒐𝒐𝒐𝒐𝒐𝒐𝑷𝑷𝑷𝑷𝒐𝒐𝑷𝑷𝒐𝒐 − 𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷 𝒂𝒂𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷 𝑷𝑷𝒆𝒆𝒆𝒆𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝒐𝒐 𝒐𝒐𝒃𝒃 𝑷𝑷𝒄𝒄𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷 𝒂𝒂𝒂𝒂𝒐𝒐𝑷𝑷𝑷𝑷

12

Second Step to Calculate the Kappa Statistic

► Denominator:

► What is the maximum the observers could have improved upon the agreement expected by chance alone? ► 100% − 𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷 𝒂𝒂𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷 𝑷𝑷𝒆𝒆𝒆𝒆𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝒐𝒐 𝒐𝒐𝒃𝒃 𝑷𝑷𝒄𝒄𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷 𝒂𝒂𝒂𝒂𝒐𝒐𝑷𝑷𝑷𝑷

13

Third Step to Calculate the Kappa Statistic

► 𝑁𝑁𝑁𝑁𝑁𝑁𝑟𝑟𝑟𝑟𝑇𝑇𝑇𝑇𝑇𝑇𝑟𝑟 𝐷𝐷𝑟𝑟𝑟𝑟𝑇𝑇𝑁𝑁𝑟𝑟𝑟𝑟𝑇𝑇𝑇𝑇𝑇𝑇𝑟𝑟

:

► Of the maximum possible improvement in agreement expected beyond chance alone, what proportion has in fact occurred?

► 𝜅𝜅 𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷 𝒂𝒂𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷 𝒐𝒐𝒐𝒐𝒐𝒐𝑷𝑷𝑷𝑷𝒐𝒐𝑷𝑷𝒐𝒐 − 𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷 𝒂𝒂𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷 𝑷𝑷𝒆𝒆𝒆𝒆𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝒐𝒐 𝒐𝒐𝒃𝒃 𝑷𝑷𝒄𝒄𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷 𝒂𝒂𝒂𝒂𝒐𝒐𝑷𝑷𝑷𝑷 100%−𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷 𝒂𝒂𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷 𝑷𝑷𝒆𝒆𝒆𝒆𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝒐𝒐 𝒐𝒐𝒃𝒃 𝑷𝑷𝒄𝒄𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷 𝒂𝒂𝒂𝒂𝒐𝒐𝑷𝑷𝑷𝑷

14

How to Calculate the Percent Observed Agreement (a Reminder)

► 𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷 𝒂𝒂𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷 𝒐𝒐𝒐𝒐𝒐𝒐𝑷𝑷𝑷𝑷𝒐𝒐𝑷𝑷𝒐𝒐 =

41+27 75

× 100 = 90.7%

15

How to Calculate the Percent Agreement Expected by Chance Alone—1

16

How to Calculate the Percent Agreement Expected by Chance Alone— 2

17

How to Calculate the Percent Agreement Expected by Chance Alone— 3

18

How to Calculate the Percent Agreement Expected by Chance Alone— 4

19

How to Calculate the Percent Agreement Expected by Chance Alone— 5

► Observer A classified 45 tests as YES

► Observer B classified 58.7% of tests as YES

► If Observer B were to classify the 45 tests that Observer A classified as YES, we expect that s/he would classify 58.7% of those 45 tests as YES

► 45 * 0.587 = 26.4 tests would be classified as YES

20

How to Calculate the Percent Agreement Expected by Chance Alone— 6

21

How to Calculate the Percent Agreement Expected by Chance Alone— 7

► 𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷 𝒂𝒂𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷 𝑷𝑷𝒆𝒆𝒆𝒆𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝒐𝒐 𝒐𝒐𝒃𝒃

𝑷𝑷𝒄𝒄𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷 𝒂𝒂𝒂𝒂𝒐𝒐𝑷𝑷𝑷𝑷 = 26.4 + 12.4

75 × 100

= 51.7%

22

And Now, the Kappa

► Percent agreement observed = 90.7%

► Percent agreement expected by chance alone = 51.7%

► 𝜅𝜅 𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷 𝒂𝒂𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷 𝒐𝒐𝒐𝒐𝒐𝒐𝑷𝑷𝑷𝑷𝒐𝒐𝑷𝑷𝒐𝒐 − 𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷 𝒂𝒂𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷 𝑷𝑷𝒆𝒆𝒆𝒆𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝒐𝒐 𝒐𝒐𝒃𝒃 𝑷𝑷𝒄𝒄𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷 𝒂𝒂𝒂𝒂𝒐𝒐𝑷𝑷𝑷𝑷 100%−𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷 𝒂𝒂𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷 𝑷𝑷𝒆𝒆𝒆𝒆𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝒐𝒐 𝒐𝒐𝒃𝒃 𝑷𝑷𝒄𝒄𝒂𝒂𝑷𝑷𝑷𝑷𝑷𝑷 𝒂𝒂𝒂𝒂𝒐𝒐𝑷𝑷𝑷𝑷

► 𝜅𝜅 90.7%−51.7% 100%−51.7%

= 39.0% 48.3%

= 0.81

23

Interpretation of Kappa

Value of kappa Strength of agreement

0.0 No agreement better than chance alone

<0.20 Poor

0.21–0.40 Fair

0.41–0.60 Moderate

0.61–0.80 Good

0.81–1.00 Very good

Source: Altman. (1991). Practical statistics for medical research. Chapman and Hall: London.

The material in this video is subject to the copyright of the owners of the material and is being provided for educational purposes under rules of fair use for registered students in this course only. No additional copies of the copyrighted work may be made or distributed.

Reliability for Continuous Variables

2

Reliability for Categorical Variables

► The measures of reliability we have discussed (percent agreement, kappa) are for categorical variables

► Examples ► Two categories: having hypertension vs. not having hypertension ► More than two categories: normotensive, hypertension stage 1, hypertension stage 2

3

Reliability for Continuous Variables—1

► But measures of reliability can also be calculated for continuous variables ► Example: systolic blood pressure (mmHg)

4

Reliability for Continuous Variables—2

► But measures of reliability can also be calculated for continuous variables ► Example: systolic blood pressure (mmHg)

► Reliability can be assessed visually ► Scatterplots ► Bland-Altman plots

5

Reliability for Continuous Variables—3

► But measures of reliability can also be calculated for continuous variables ► Example: systolic blood pressure (mmHg)

► Reliability can be assessed visually ► Scatterplots ► Bland-Altman plots

► Reliability can be quantified by calculating ► Correlation coefficient ► Coefficient of variation ► And using regression

6

Reliability for Continuous Variables—4

► Biostatistics courses → ► Reliability can be assessed visually

● Scatterplots ● Bland-Altman plots

► Reliability can be quantified by calculating ● Correlation coefficient ● Coefficient of variation ● And using regression

7

Scatterplots to Assess Reliability

8

Summary of Reliability Measures—Categorical Variables

► Compare repeated measures

► 2x2 tables

► Percent agreement, percent positive agreement, kappa statistic

9

Summary of Reliability Measures—Continuous Variables

► Compare repeated measures

► Scatterplots, Bland-Altman plots

► Correlation coefficient, coefficient of variation, regression

10

When we discussed validity, we discussed how you could categorize continuous variables, and how the choice of that cutpoint affects

sensitivity and specificity. …

11

… But is there a way to assess validity of a continuous variable?

12

Scatterplots to Assess Validity (and Reliability)

13

Scatterplots to Assess Validity and Reliability—1

14

Scatterplots to Assess Validity and Reliability—2

15

Scatterplots to Assess Validity and Reliability—3

16

Scatterplots to Assess Validity and Reliability—4

17

Scatterplots to Assess Validity and Reliability—5

18

Scatterplots to Assess Validity and Reliability—6

19

We have discussed validity and reliability in the context of screening or diagnostic tests, but these concepts are more broadly applicable to

other measurement tools that may be used in a research study

20

Quality Assurance and Quality Control—1

► Quality assurance (QA) ► Any method, procedure, or approach

for collecting, processing, or analyzing data aimed at maintaining or improving their reliability or validity

► Quality control (QC) ► An aggregate of sampling and testing

procedures based on statistical theory and analysis designed to ensure adequate quality in relation to a finished product

Source: Clinical trials dictionary: Terminology and usage recommendations.

21

Quality Assurance and Quality Control—2

► Quality assurance (QA) ► Any method, procedure, or approach

for collecting, processing, or analyzing data aimed at maintaining or improving their reliability or validity

► E.g., development and pilot testing of study protocol

► Quality control (QC) ► An aggregate of sampling and testing

procedures based on statistical theory and analysis designed to ensure adequate quality in relation to a finished product

► E.g., site visit to observe how study staff administer the protocol

Source: Clinical trials dictionary: Terminology and usage recommendations.

22

Minimizing Error (Maximizing Validity and Reliability) in a Research Study

► Rigorous and detailed protocol (documented!) using standardized procedures and measurements

► Use reliable and accurate instruments and methods

► Rigorously train and re-train staff

► Conduct measurements consistently and correctly

► QA and QC to identify and address problems

► Incorporate reliability and validity studies into the protocol

23

Lessons Learned

► Reliability is the degree to which the results from a test can be replicated

► Reliability is analogous to precision (validity is analogous to accuracy)

► Factors affecting reliability ► Intrasubject ► Intra-observer ► Interobserver

► Calculating reliability for categorical variables ► Percent agreement ► Percent positive agreement ► Kappa statistic (accounts for chance)

  • 28575
    • Reliability
    • Defining Reliability
    • Remember Validity?
    • What Is Reliability?
    • Validity vs. Reliability—1
    • Validity vs. Reliability—2
    • Validity vs. Reliability—3
    • Valid and Reliable
    • Factors Affecting Reliability—1
    • Factors Affecting Reliability—2
    • Factors Affecting Reliability—3
    • Factors Affecting Reliability—4
    • Intrasubject (within Subject) Variation: Blood Pressure Changes in a 73-Year-Old Woman over a 24-Hour Period
    • Intrasubject (within Subject) Variation: Cortisol Release over 24 Hours
    • Interobserver (between Observers) Variation: Reliability of Lumbar Spine MRI Findings
  • 28576
    • Estimating Interobserver Reliability
    • Measuring Interobserver Reliability for Categorial Outcomes
    • Percent Agreement—1
    • Percent Agreement—2
    • Percent Agreement—3
    • Percent Agreement—4
    • Percent Agreement—5
    • Percent Agreement—6
    • Kappa Statistic—1
    • Kappa Statistic—2
    • First Step to Calculate the Kappa Statistic
    • Second Step to Calculate the Kappa Statistic
    • Third Step to Calculate the Kappa Statistic
    • How to Calculate the Percent Observed Agreement (a Reminder)
    • How to Calculate the Percent Agreement Expected by Chance Alone—1
    • How to Calculate the Percent Agreement Expected by Chance Alone—2
    • How to Calculate the Percent Agreement Expected by Chance Alone—3
    • How to Calculate the Percent Agreement Expected by Chance Alone—4
    • How to Calculate the Percent Agreement Expected by Chance Alone—5
    • How to Calculate the Percent Agreement Expected by Chance Alone—6
    • How to Calculate the Percent Agreement Expected by Chance Alone—7
    • And Now, the Kappa
    • Interpretation of Kappa
  • 28577
    • Reliability for Continuous Variables
    • Reliability for Categorical Variables
    • Reliability for Continuous Variables—1
    • Reliability for Continuous Variables—2
    • Reliability for Continuous Variables—3
    • Reliability for Continuous Variables—4
    • Scatterplots to Assess Reliability
    • Summary of Reliability Measures—Categorical Variables
    • Summary of Reliability Measures—Continuous Variables
    • When we discussed validity, we discussed how you could categorize continuous variables, and how the choice of that cutpoint affects sensitivity and specificity. …
    • … But is there a way to assess validity of a continuous variable?
    • Scatterplots to Assess Validity (and Reliability)
    • Scatterplots to Assess Validity and Reliability—1
    • Scatterplots to Assess Validity and Reliability—2
    • Scatterplots to Assess Validity and Reliability—3
    • Scatterplots to Assess Validity and Reliability—4
    • Scatterplots to Assess Validity and Reliability—5
    • Scatterplots to Assess Validity and Reliability—6
    • We have discussed validity and reliability in the context of screening or diagnostic tests, but these concepts are more broadly applicable to other measurement tools that may be used in a research study
    • Quality Assurance and Quality Control—1
    • Quality Assurance and Quality Control—2
    • Minimizing Error (Maximizing Validity and Reliability) in a Research Study
    • Lessons Learned