Applied Biostatistics Essay

profileCharmsmany
Article1.pdf

COMMUNICATIONS IN STATISTICS: CASE STUDIES, DATA ANALYSIS AND APPLICATIONS 2020, VOL. 6, NO. 2, 109–122 https://doi.org/10.1080/23737484.2020.1719559

A non-inferiority test and sample size determination for comparing two predictive values of diagnostic tests

Kanae Takahashia and Kouji Yamamotob

a Department of Medical Statistics, Osaka City University Graduate School of Medicine, Osaka, Japan; b Department of Biostatistics, School of Medicine, Yokohama City University, Yokohama, Japan

ABSTRACT Positive and negative predictive values describe the clinical per- formance of the diagnostic test when it is applied to a cohort of individuals. Testing non-inferiority of predictive values may be adequate in some cases due to the simplicity of the examination and the invasiveness of the test, but no non-inferiority test for predictive values has been proposed yet. In this study, we pro- pose a non-inferiority test that investigates the non-inferiority of predictive values in paired designs, and formulae for the calcu- lation of sample size to execute the non-inferiority test of two predictive values. Simulation studies showed that the proposed non-inferiority test is useful for comparing two predictive values in clinical trials.

ARTICLE HISTORY Received 21 May 2019 Accepted 19 January 2020

KEYWORDS Negative predictive value; paired design; positive predictive value; score test

1. Introduction

Diagnostic tests are important for the early detection and treatment of dis- ease in medicine. Sensitivity (SE) and specificity (SP) are commonly used for quantifying the diagnostic ability of the test. The SE is the probability that a diseased individual has a positive result, and the SP is the probability that a non-diseased individual has a negative result. On the other hand, the positive predictive value (PPV) and negative predictive value (NPV) describe how well a test predicts abnormality, and are also used to quantify the diagnostic ability of the test. The PPV is the probability of disease when the diagnostic test result is positive, and the NPV is the probability of no disease when the diagnostic test result is negative. Although the predictive values observed in one study do not apply universally because these values depend on the prevalence of the condition being diagnosed, the predictive values inform patients about the probability that the diagnostic test will give the correct diagnosis. Compared to SE and SP, the predictive values are more patient focused and are often more relevant in patient cases (Altman and Bland 1994; Guggenmoos-Holzmann and van Houwelingen 2000; Pepe 2004; Zhou et al. 2011).

CONTACT Kanae Takahashi [email protected] cu.ac.jp Department of Medical Statistics, Osaka City University Graduate School of Medicine, 1-4-3 Asahi-machi, Abeno-ku, Osaka 545-8585, Japan.

Supplemental data for this article can be accessed on the publisher’s website. © 2020 Taylor & Francis Group, LLC

110 K. TAKAHASHI AND K. YAMAMOTO

There are several methods to test the equality of predictive values in paired designs (Leisenring, Alonzo, and Pepe 2000; Moskowitz and Pepe 2006; Wang, Davis, and Soong 2006; Nofuentes, del Castillo, and Alonso 2012; Kosinski 2013). Some researchers utilize the multivariate central limit theorem and the delta method, and others use regression frameworks. These methods were developed for superiority trials. However, testing of non-inferiority may be adequate in some cases. For example, when a new diagnostic test is easier to implement and less invasive than the existing test, it may not be necessary to test the superiority of predictive values. There are some non-inferiority study for SE and SP (Dupuy et al. 2007; Cuevas et al. 2011; Bisser et al. 2016). However, no non-inferiority test for predictive values has been proposed yet and there are almost no non-inferiority study for PPV and NPV. In this article, we propose a non-inferiority test for conducting a clinical trial that investigates the non-inferiority of predictive values in paired designs. In addition, we provide formulae for the calculation of sample size to achieve statistical significance for testing the non-inferiority of two predictive values.

We organize the article as follows. We propose a non-inferiority test in Sec. 2, and sample size formulae in Sec. 3. We conduct simulation studies to evaluate the performance of the proposed test and sample size formulae in Sec. 4. We present an example in Sec. 5, and conclude with a discussion in Sec. 6.

2. Proposed non-inferiority test

This section introduces notations and presents a non-inferiority test for com- paring predictive values of two diagnostic tests in a paired design. Table 1 shows general notations for cell probabilities. Using these notations, we can represent the PPV for test 1 (PPV1) and for test 2 (PPV2) as below.

PPV1 = p1•1

p1•1 + p1•2 = p1•1

p1•• , PPV2 =

p•11 p•11 + p•12

= p•11 p•1•

.

Similarly, we can express the NPV for test 1 (NPV1) and for test 2 (NPV2) as below.

NPV1 = p2•2

p2•1 + p2•2 = p2•2

p2•• , NPV2 =

p•22 p•21 + p•22

= p•22 p•2•

.

Without loss of generality, we can set test 1 as a new screening test and test 2 as an existing test. The null hypotheses of the non-inferiority test are H0 : PPV1 =

Table 1. General notations. Diseased subjects Non-diseased subjects

Test 2 Test 2

+ − + − Test 1 + p111 p121 p1•1 Test 1 + p112 p122 p1•2

− p211 p221 p2•1 − p212 p222 p2•2 p•11 p•21 p••1 p•12 p•22 p••2

COMMUNICATIONS IN STATISTICS: CASE STUDIES, DATA ANALYSIS AND APPLICATIONS 111

PPV2 − �1 and H0 : NPV1 = NPV2 − �2, and the alternative hypotheses are H1 : PPV1 > PPV2 − �1 and H1 : NPV1 > NPV2 − �2, where �1(> 0) and �2(> 0) are the non-inferiority margins.

To test the difference between two PPVs, Wang, Davis, and Soong (2006), Nofuentes, del Castillo, and Alonso (2012), and Kosinski (2013) pro- posed multinomial-based Wald statistics. They considered the observed frequencies in Table 2 to have a multinomial distribution with p = (p111, p121, p211, p221, p112, p122, p212, p222)T.

Let PPV = (PPV1, PPV2)T be a vector whose components are the PPVs of the two diagnostic tests, and let P̂PV be the maximum likelihood estimate (MLE) of PPV. The MLE of {pijk} is p̂ijk = nijk/N, where N is the total sample size, and the P̂PV = (P̂PV1, P̂PV2)T is as below. P̂PV1 =

p̂1•1 p̂1••

= n1•1 n1•1 + n1•2

= n1•1 n1••

, P̂PV2 = p̂•11 p̂•1•

= n•11 n•11 + n•12

= n•11 n•1•

.

Using the delta-method and the multivariate central limit theorem, we have √

N(P̂PV − PPV) ≈ N (

0, [ ∂PPV ∂pT

] ( diag(p) − ppT

) [ ∂PPV ∂pT

]T) ,

where diag(p) is an 8 × 8 diagonal matrix whose ith element is the ith element of p.

The Wald statistic for testing H0 : PPV1 = PPV2 − �1 is as below.

TWPPV = √

N P̂PV1 − P̂PV2 + �1√

P̂PV1(1−P̂PV1) p̂1••

+ P̂PV2(1−P̂PV2)p̂•1• − 2C W PPV

,

with

CWPPV = p̂111(1 − P̂PV1)(1 − P̂PV2) + p̂112P̂PV1P̂PV2

p̂1•• × p̂•1• .

The test statistic is distributed asymptotically as a standard normal distribution under the null hypothesis.

Similarly, let NPV = (NPV1, NPV2)T, and let ̂NPV be the MLE of NPV. The ̂NPV = (̂NPV1, ̂NPV2)T, with ̂NPV1 =

p̂2•2 p̂2••

= n2•2 n2•1 + n2•2

= n2•2 n2••

, ̂NPV2 = p̂•22 p̂•2•

= n•22 n•21 + n•22

= n•22 n•2•

.

Table 2. Observed frequency notations. Diseased subjects Non-diseased subjects

Test 2 Test 2

+ − + − Test 1 + n111 n121 n1•1 Test 1 + n112 n122 n1•2

− n211 n221 n2•1 − n212 n222 n2•2 n•11 n•21 n••1 n•12 n•22 n••2

112 K. TAKAHASHI AND K. YAMAMOTO

Then the Wald statistic for testing H0 : NPV1 = NPV2 − �2 is

TWNPV = √

N ̂NPV1 − ̂NPV2 + �2√

̂NPV1(1−̂NPV1) p̂2••

+ ̂NPV2(1−̂NPV2)p̂•2• − 2C W NPV

,

with

CWNPV = p̂221 ̂NPV1 ̂NPV2 + p̂222(1 − ̂NPV1)(1 − ̂NPV2)

p̂2•• × p̂•2• .

The test statistic is also distributed asymptotically as a standard normal distribu- tion under the null hypothesis. The variance–covariance matrices of P̂PV and ̂NPV are given in Appendix A.

On the other hand, for the score statistics, we consider the MLE of {pijk} under each null hypothesis that could be obtained, for example by applying the Newton–Raphson method to the log-likelihood equations. The score statistic for testing H0 : PPV1 = PPV2 − �1 is

TSPPV = √

N P̂PV1 − P̂PV2 + �1√

(P̃PV2−�1)(1−P̃PV2+�1) p̃1••

+ P̃PV2(1−P̃PV2)p̃•1• − 2C S PPV

,

with

CSPPV = p̃111(1 − P̃PV2 + �1)(1 − P̃PV2) + p̃112(P̃PV2 − �1)P̃PV2

p̃1•• × p̃•1• ,

where p̃ijk and P̃PV2 are calculated from the MLE of pijk under the null hypothesis.

Similarly, the score statistic for testing H0 : NPV1 = NPV2 − �2 is

TSNPV = √

N ̂NPV1 − ̂NPV2 + �2√

(˜NPV2−�2)(1−˜NPV2−�2) p̃2••

+ ˜NPV2(1−˜NPV2)p̃•2• − 2C S NPV

,

with

CSNPV = p̃221(˜NPV2 − �2)˜NPV2 + p̃222(1 − ˜NPV2 + �2)(1 − ˜NPV2)

p̃2•• × p̃•2• ,

where p̃ijk and ˜NPV2 are calculated from the MLE of pijk under the null hypothesis. The test statistic is also distributed asymptotically as a standard normal distribution under the corresponding null hypothesis.

3. Sample size formulae for non-inferiority tests

This section presents sample size formulae for the non-inferiority tests. For instance, let us assume that we plan a non-inferiority trial for PPVs of two diag- nostic tests and have to calculate the sample size needed for specific significance

COMMUNICATIONS IN STATISTICS: CASE STUDIES, DATA ANALYSIS AND APPLICATIONS 113

level α and power 1 − β. Suppose we know the disease prevalence π and the conditional correlation between the two tests (ρD+ for diseased individuals and ρD− for non-diseased individuals). Then we can express {pijk} by PPV1, PPV2, NPV1, NPV2, π , ρD+, and ρD−. The details are shown in Appendix B. For testing H0 : PPV1 = PPV2 − �1, the required sample size that satisfies the desired power of at least 1 − β for a given significance level α is

NPPV = ( Zα√V PH0 + Zβ √V PH1

PPV1 − PPV2 + �1 )2

.

V PH0 is the variance of the difference between two PPVs under the null hypoth- esis, and V PH1 is the variance of the difference between two PPVs under the alternative hypothesis. V PH1 is

V PH1 = PPV1(1 − PPV1)

p1•• + PPV2(1 − PPV2)

p•1•

− 2 p111(1 − PPV1)(1 − PPV2) + p112PPV1PPV2 p1•• × p•1•

.

For V PH0 , we can apply Dunnett and Gent’s estimate (Dunnett and Gent 1977), because we cannot obtain the closed-form MLE of pijk under the null hypothesis. The estimate of PPV2 is

PPV∗2 = n1•1 + n•11 + n1••�1

n1•• + n•1• .

V PH0 and the sample size formula with application of Dunnett and Gent’s estimate are

V P∗H0 = (PPV∗2 − �1)(1 − PPV∗2 + �1)

p1•• + PPV

∗ 2(1 − PPV∗2)

p•1•

− 2 p111(1 − PPV ∗ 2 + �1)(1 − PPV∗2) + p112(PPV∗2 − �1)PPV∗2

p1•• × p•1• ,

N DPPV = ( Zα√V P∗H0 + Zβ √V PH1

PPV1 − PPV2 + �1 )2

. (1)

Similarly, we can calculate the required sample size for testing H0 : NPV1 = NPV2 − �2 based on the same information as above. The sample size formula is

NNPV = ( Zα√V NH0 + Zβ √V NH1

NPV1 − NPV2 + �2 )2

,

114 K. TAKAHASHI AND K. YAMAMOTO

where

V NH1 = NPV1(1 − NPV1)

p2•• + NPV2(1 − NPV2)

p•2•

− 2 p221NPV1NPV2 + p222(1 − NPV1)(1 − NPV2) p2•• × p•2•

,

and we can apply Dunnett and Gent’s estimate for V NH0 . The estimate of NPV2 is

NPV∗2 = n2•2 + n•22 + n2••�2

n2•• + n•2• .

V NH0 and the sample size formula with application of Dunnett and Gent’s estimate are

V N∗H0 = (NPV∗2 − �2)(1 − NPV∗2 + �2)

p2•• + NPV

∗ 2(1 − NPV∗2)

p•2•

− 2 p221(NPV ∗ 2 − �2)NPV∗2 + p222(1 − NPV∗2 + �2)(1 − NPV∗2)

p2•• × p•2• ,

N DNPV = ( Zα√V N∗H0 + Zβ √V NH1

NPV1 − NPV2 + � )2

. (2)

4. Simulation

We perform simulation studies to assess the performance of the proposed non- inferiority test statistics and sample size formulae introduced in Secs. 2 and 3. The performance measures are the empirical type I error rate and empirical power. We execute 100,000 repeated simulations for each method. The simu- lation is conducted in the same way as Kosinski (2013), except for setting ρD+, ρD−, and p as in Appendix B.

The eight true parameters are set for each simulation scenario (PPV1, PPV2, NPV1, NPV2, the non-inferiority margins �1 and �2, π , ρD+, and ρD−), and data with total sample size N are generated from these parameters.

These parameters define the corresponding true multinomial distribution with p = (p111, p121, p211, p221, p112, p122, p212, p222)T. To derive probabilities of this multinomial distribution, we first calculate SE1, SE2, SP1, and SP2 of each scenario based on the π and predictive values. Then, we compute p by using ρD+, ρD−. The details of the calculation method for p are given in Appendix B. We add 0.5 to each cell of the generated data-set when the data-set leads to an inability to compute one of the considered statistics because of cells with a value of zero or predictive values equal to zero or one.

We set PPV for the new diagnostic test (PPV1) = 0.65; PPV for the existing diagnostic test (PPV2) = 0.75; NPV for the new diagnostic test (NPV1) = 0.70; NPV for the existing diagnostic test (NPV2) = 0.80; �1 = �2 = 0.10; π =

COMMUNICATIONS IN STATISTICS: CASE STUDIES, DATA ANALYSIS AND APPLICATIONS 115

0.4, 0.5, and 0.6; ρD+ = 0.3, and 0.5; and ρD− = 0.3, and 0.5. The total sample size N is assumed to be 50, 100, and 500. The nominal type I error rate is set to 0.05 (one-sided test).

Table 3 shows the empirical type I error rates for the proposed non-inferiority test. The empirical type I error rates of TSPPV and T

S NPV are closer to the nominal

type I error rate of 0.05 than those of TWPPV and T W NPV when the sample size is

large, and the empirical type I error rates of TWPPV and T W NPV exceed the nominal

type I error rate of 0.05 in most simulation settings. Table 4 shows the empirical power of the proposed non-inferiority test. We

set PPV1 = PPV2 = 0.75; NPV1 = NPV2 = 0.80; �1 = �2 = 0.10; π = 0.4, 0.5, and 0.6; ρD+ = 0.3, and 0.5; and ρD− = 0.3, and 0.5. TWPPV and TWNPV are omitted here because the empirical type I error rates of these test statistics exceed the nominal type I error rate.

As shown in Table 4, the empirical powers TSPPV and T S NPV are affected by π ,

ρD+, and ρD−, because the variance of TSPPV and T S NPV contain p111, p221, and

p222. In addition, to assess the performance of the proposed sample size formulae,

we use the same settings as Table 4 to calculate the required sample size that satisfies the desired power of at least 0.8 for a given significance level of 0.05 (one-sided test). The results are shown in Table 5. We calculate the required sample sizes (NPPV and NNPV) through simulation (10,000 replications) using score statistics. On the other hand, N DPPV and N

D NPV are calculated using formu-

lae (1) and (2). With any of the settings, N DPPV and N

D NPV are about 10% lower than NPPV and

NNPV. Additionally, the sample size tends to change greatly depending on the prevalence.

5. Example

As an example, we used the data on coronary artery disease (CAD) (Weiner et al. 1979). The study from which the data were derived used a paired design, and information on both exercise stress testing (EST) results and clinical history are available. The data, obtained from 1,023 CAD and 442 CAD free subjects, are shown in Table 6. We are interested in the non-inferiority of the clinical history compared to EST, and apply the proposed non-inferiority test to the data. We set the non-inferiority margins �1 = 0.1 and test the non-inferiority of the PPV of the clinical history against that of EST.

The PPV of EST is estimated as P̂PVEST = 815/(815 + 115) = 0.876, and the PPV of the clinical history is estimated as P̂PVch = 969/(969 + 245) = 0.798. The PPV of EST and that of clinical history under H0 : PPVch = PPVEST−�1 are estimated as P̃PVEST = 0.885 and P̃PVch = 0.785, respectively. Subsequently, we calculate TWPPV = 1.923, and TSPPV = 1.890. Here, we have

116 K. TAKAHASHI AND K. YAMAMOTO

Ta bl

e 3.

Em pi

ric al

ty pe

Ie rr

or ra

te .

� 1

, �

2 π

ρ D +

ρ D −

Sa m

pl e

si ze

TW P PV

TS P PV

TW N PV

TS N PV

� 1

, �

2 π

ρ D +

ρ D −

Sa m

pl e

si ze

TW P PV

TS P PV

TW N PV

TS N PV

0. 05

0. 4

0. 3

0. 5

50 0.

07 6

0. 04

4 0.

06 7

0. 04

7 0.

1 0.

4 0.

3 0.

5 50

0. 08

0 0.

04 2

0. 07

4 0.

04 4

10 0

0. 06

1 0.

04 9

0. 05

8 0.

04 7

10 0

0. 06

3 0.

04 6

0. 06

4 0.

04 7

20 0

0. 05

6 0.

04 9

0. 05

5 0.

04 8

20 0

0. 05

8 0.

04 8

0. 05

9 0.

04 7

50 0

0. 05

2 0.

04 8

0. 05

3 0.

04 9

50 0

0. 05

5 0.

04 9

0. 05

5 0.

04 9

0. 5

0. 3

50 0.

06 9

0. 04

8 0.

07 5

0. 04

3 0.

5 0.

3 50

0. 07

4 0.

04 7

0. 08

8 0.

04 0

10 0

0. 05

9 0.

05 0

0. 06

4 0.

04 5

10 0

0. 06

2 0.

04 9

0. 07

5 0.

04 5

20 0

0. 05

5 0.

05 0

0. 05

9 0.

04 8

20 0

0. 05

7 0.

04 9

0. 06

6 0.

04 6

50 0

0. 05

2 0.

05 0

0. 05

5 0.

04 8

50 0

0. 05

3 0.

04 9

0. 05

7 0.

04 6

0. 5

0. 3

0. 5

50 0.

07 0

0. 04

7 0.

07 0

0. 04

8 0.

5 0.

3 0.

5 50

0. 07

5 0.

04 3

0. 07

2 0.

04 6

10 0

0. 06

0 0.

04 8

0. 05

8 0.

04 8

10 0

0. 06

4 0.

04 5

0. 06

2 0.

04 8

20 0

0. 05

6 0.

04 9

0. 05

5 0.

04 9

20 0

0. 06

0 0.

04 8

0. 05

6 0.

04 8

50 0

0. 05

3 0.

04 9

0. 05

1 0.

04 8

50 0

0. 05

4 0.

04 7

0. 05

3 0.

04 8

0. 5

0. 3

50 0.

06 1

0. 04

8 0.

08 0

0. 04

3 0.

5 0.

3 50

0. 06

6 0.

04 6

0. 08

2 0.

04 2

10 0

0. 05

7 0.

04 9

0. 06

1 0.

04 6

10 0

0. 06

1 0.

04 8

0. 06

7 0.

04 5

20 0

0. 05

3 0.

04 9

0. 05

8 0.

04 9

20 0

0. 05

5 0.

04 8

0. 06

2 0.

04 8

50 0

0. 05

1 0.

04 9

0. 05

3 0.

04 9

50 0

0. 05

3 0.

04 9

0. 05

6 0.

04 8

0. 6

0. 3

0. 5

50 0.

07 8

0. 04

3 0.

10 5

0. 03

8 0.

6 0.

3 0.

5 50

0. 09

5 0.

03 7

0. 15

0 0.

03 6

10 0

0. 06

6 0.

04 5

0. 06

7 0.

04 8

10 0

0. 08

0 0.

04 2

0. 08

5 0.

04 9

20 0

0. 06

2 0.

04 7

0. 05

9 0.

04 9

20 0

0. 06

9 0.

04 5

0. 06

6 0.

04 8

50 0

0. 05

5 0.

04 8

0. 05

5 0.

05 0

50 0

0. 06

1 0.

04 6

0. 05

9 0.

04 9

0. 5

0. 3

50 0.

06 8

0. 04

6 0.

11 2

0. 02

9 0.

5 0.

3 50

0. 08

1 0.

04 1

0. 15

7 0.

02 3

10 0

0. 06

0 0.

04 8

0. 07

2 0.

04 5

10 0

0. 06

9 0.

04 5

0. 08

5 0.

04 1

20 0

0. 05

7 0.

04 9

0. 06

1 0.

04 8

20 0

0. 06

3 0.

04 6

0. 06

9 0.

04 6

50 0

0. 05

4 0.

04 9

0. 05

5 0.

04 9

50 0

0. 05

8 0.

04 9

0. 05

9 0.

04 7

COMMUNICATIONS IN STATISTICS: CASE STUDIES, DATA ANALYSIS AND APPLICATIONS 117

Ta bl

e 4.

Em pi

ric al

po w

er .

� 1

, �

2 π

ρ D +

ρ D −

Sa m

pl e

si ze

TS P PV

TS N PV

� 1

, �

2 π

ρ D +

ρ D −

Sa m

pl e

si ze

TS P PV

TS N PV

0. 05

0. 4

0. 3

0. 5

50 0.

11 7

0. 17

5 0.

1 0.

4 0.

3 0.

5 50

0. 24

5 0.

41 6

10 0

0. 18

7 0.

28 3

10 0

0. 44

2 0.

68 5

20 0

0. 29

4 0.

46 0

20 0

0. 70

1 0.

92 1

50 0

0. 54

7 0.

79 6

50 0

0. 96

9 0.

99 9

0. 5

0. 3

50 0.

11 5

0. 19

3 0.

5 0.

3 50

0. 22

9 0.

47 7

10 0

0. 16

7 0.

33 0

10 0

0. 38

6 0.

76 8

20 0

0. 25

5 0.

54 7

20 0

0. 62

2 0.

96 6

50 0

0. 46

7 0.

88 4

50 0

0. 93

0 1.

00 0

0. 5

0. 3

0. 5

50 0.

16 8

0. 13

1 0.

5 0.

3 0.

5 50

0. 38

9 0.

28 8

10 0

0. 26

9 0.

20 4

10 0

0. 65

3 0.

49 3

20 0

0. 44

2 0.

32 1

20 0

0. 90

5 0.

75 8

50 0

0. 77

6 0.

59 5

50 0

0. 99

9 0.

98 2

0. 5

0. 3

50 0.

15 5

0. 13

5 0.

5 0.

3 50

0. 34

6 0.

31 4

10 0

0. 23

3 0.

23 3

10 0

0. 57

0 0.

56 5

20 0

0. 37

1 0.

38 0

20 0

0. 83

2 0.

83 8

50 0

0. 67

7 0.

69 8

50 0

0. 99

4 0.

99 5

0. 6

0. 3

0. 5

50 0.

23 3

0. 08

7 0.

6 0.

3 0.

5 50

0. 57

2 0.

17 0

10 0

0. 39

9 0.

14 1

10 0

0. 85

7 0.

30 4

20 0

0. 64

5 0.

21 2

20 0

0. 98

9 0.

51 5

50 0

0. 94

5 0.

39 0

50 0

1. 00

0 0.

85 1

0. 5

0. 3

50 0.

20 7

0. 07

8 0.

5 0.

3 50

0. 50

0 0.

16 9

10 0

0. 33

7 0.

14 7

10 0

0. 77

6 0.

33 9

20 0

0. 54

5 0.

24 0

20 0

0. 96

5 0.

58 5

50 0

0. 88

0 0.

45 7

50 0

1. 00

0 0.

91 1

118 K. TAKAHASHI AND K. YAMAMOTO

Table 5. Sample size. �1 , �2 π ρD+ ρD− NPPV NDPPV NNPV N

D NPV

0.05 0.4 0.3 0.5 996 952 513 494 0.5 0.3 1282 1236 388 366

0.5 0.3 0.5 542 518 870 845 0.5 0.3 694 688 666 636

0.6 0.3 0.5 301 281 1871 1607 0.5 0.3 395 380 1397 1238

0.1 0.4 0.5 0.3 106 95 347 308 0.3 0.5 260 237 136 123

0.5 0.5 0.3 322 308 108 91 0.3 0.5 147 129 221 211

0.6 0.5 0.3 183 172 179 159 0.3 0.5 87 70 431 400

Table 6. Example data. CAD+ CAD−

EST EST

+ − + − Clinical history + 786 183 969 Clinical history + 69 176 245

− 29 25 54 − 46 151 197 815 208 1023 115 327 442

p = 0.027 for TWPPV and p = 0.029 for TSPPV, and we conclude that the non- inferiority of PPV of the clinical history against that of EST is confirmed with significance level α = 0.05.

The required sample size for testing H0 : PPVch = PPVEST − �1 can be calculated based on the data as shown in Table 6. The parameters necessary for calculating the sample size are estimated from Table 6 as follows; π = 0.698, PPVEST = 0.876, PPVch = 0.798, SEEST = 0.797, SEch = 0.947, ρD+ = 0.152, and �1 = 0.1. Using this information, we can calculate N DPPV = 2,369. From the results of simulations in Sec. 4, N DPPV tends to be about 10% lower than NPPV. Therefore, it is better to add about 237 samples to N DPPV.

In addition, we executed non-inferiority test to the CAD data with �1 = 0.05. The PPV of EST and that of clinical history under H0 : PPVch = PPVEST − �1 are estimated as P̃PVEST = 0.862 and P̃PVch = 0.812, respectively. We calculate TWPPV = −2.479, and TSPPV = −2.500. Here, we have p = 0.993 for TWPPV and p = 0.994 for TSPPV.

6. Discussion

Based on the simulation results in this study, the performance of the score statistics for testing non-inferiority is considered better because the empirical type I error rate is close to the nominal type I error rate of 0.05, regardless of sample size. On the other hand, the Wald statistics for testing non-inferiority are easy to implement, but the empirical type I error rate tends to exceed the nominal type I error rate, even when the sample size is large. Similar results

COMMUNICATIONS IN STATISTICS: CASE STUDIES, DATA ANALYSIS AND APPLICATIONS 119

were obtained when simulation was performed with different parameter settings (Tables S3 and S4 in the supplementary materials). However, the score statistics is often incomputable and adding 0.5 to each cell of the generated data-set when the prevalence and sample size are small (Tables S1 and S2 in the supplementary materials). Therefore, it is not recommended to use the score statistics when the prevalence and sample size are small.

We have proposed sample size formulae for the differences between two PPVs and two NPVs. Simulations are necessary to calculate NPPV and NNPV, whereas N DPPV and N

D NPV can be calculated without simulations, and there is not much

difference between NPPV and N DPPV and between NNPV and N D NPV. Similar results

were obtained with different parameter settings (Table S5 in the supplementary materials). Therefore, the proposed formulae are considered useful. In the sim- ulation, we assumed that we already had information about disease prevalence π and the conditional correlations between the two tests (ρD+ and ρD−). Since the required sample size is influenced by π and conditional correlations, it is recommended to use the information of previous study or a pilot sample and to conduct the sensitivity analysis for evaluating the effect of the assumption for π and conditional correlations, especially when the sample size may be estimated to be small.

Although Moskowitz and Pepe (2006) proposed sample size formulae for comparing the PPV and NPV values of two tests, and can also be used to test for non-inferiority, the intended use of those formulae is to compare the relative predictive values of two tests (PPV1/PPV2, NPV1/NPV2), while the formulae we propose are intended to compare the difference between two tests (PPV1 − PPV2, NPV1 − NPV2). In addition, to use Moskowitz and Pepe’s formulae, it is necessary to set each cell probabilities (pijk), but the formulae we propose require only that ρD+ and ρD− be set.

In this study we propose non-inferiority tests for comparing two predic- tive values in clinical trials, as well as corresponding sample size formulae. We evaluate the performances of these tests and their sample size formu- lae in clinical trials through simulation studies and with real-life data. We consider that the proposed score statistics for testing non-inferiority have good performance and can be widely used regardless of sample size and prevalence.

Acknowledgments

We are grateful to the associate editor and two reviewers for their helpful suggestions.

Funding

This research was partially supported by Grant-in-Aid for Scientific Research (C) No. 18K11195 and Grant-in-Aid for Young Scientists No. 18K17325.

120 K. TAKAHASHI AND K. YAMAMOTO

References

Altman, D. G., and J. M. Bland. 1994. “Diagnostic Tests 2—Predictive Values.” British Medical Journal 309 (6947):102. doi:10.1136/bmj.309.6947.102.

Berry, G., C. L. Smith, P. Macaskill, and L. Irwig. 2002. “Analytic Methods for Comparing Two Dichotomous Screening or Diagnostic Tests Applied to Two Populations of Differing Disease Prevalence When Individuals Negative on Both Tests Are Unverified.” Statistics in Medicine 21 (6):853–862. doi:10.1002/sim.1066.

Bisser, S., C. Lumbala, E. Nguertoum, V. Kande, L. Flevaud, G. Vatunga, M. Boelaert, P. Buscher, T. Josenando, P. R. Bessell, et al. 2016. “Sensitivity and Specificity of a Prototype Rapid Diagnostic Test for the Detection of Trypanosoma brucei gambiense Infection: A Multi-Centric Prospec- tive Study.” PLoS Neglected Tropical Diseases 10 (4):16. doi:10.1371/journal.pntd.0004608.

Cuevas, L. E., M. A. Yassin, N. Al-Sonboli, L. Lawson, I. Arbide, N. Al-Aghbari, J. B. Sherchand, A. Al-Absi, E. N. Emenyonu, Y. Merid, et al. 2011. “A Multi-Country Non-Inferiority Cluster Randomized Trial of Frontloaded Smear Microscopy for the Diagnosis of Pulmonary Tuber- culosis.” PLoS Medicine 8 (7):11. doi:10.1371/journal.pmed.1000443.

Dunnett, C. W., and M. Gent. 1977. “Significance Testing to Establish Equivalence Between Treatments, With Special Reference to Data in Form of 2 × 2 Tables.” Biometrics 33 (4):593– 602. doi:10.2307/2529457.

Dupuy, A., L. Dehen, E. Bourrat, C. Lacroix, M. Benderdouche, L. Dubertret, P. Morel, M. F. de Chauvin, and A. Petit. 2007. “Accuracy of Standard Dermoscopy for Diagnosing Sca- bies.” Journal of the American Academy of Dermatology 56 (1):53–62. doi:10.1016/j.jaad.2006. 07.025.

Georgiadis, M. P., W. O. Johnson, I. A. Gardner, and R. Singh. 2003. “Correlation-Adjusted Esti- mation of Sensitivity and Specificity of Two Diagnostic Tests.” Journal of the Royal Statistical Society: Series C (Applied Statistics) 52:63–76. doi:10.1111/1467-9876.00389.

Guggenmoos-Holzmann, I., and H. C. van Houwelingen. 2000. “The (In)validity of Sensitivity and Specificity.” Statistics in Medicine 19 (13):1783–1792. doi:10.1002/1097- 0258(20000715)19:13<1783::aid-sim497>3.0.co;2-b.

Kosinski, A. S. 2013. “A Weighted Generalized Score Statistic for Comparison of Predictive Values of Diagnostic Tests.” Statistics in Medicine 32 (6):964–977. doi:10.1002/sim.5587.

Leisenring, W., T. Alonzo, and M. S. Pepe. 2000. “Comparisons of Predictive Values of Binary Medical Diagnostic Tests for Paired Designs.” Biometrics 56 (2):345–351. doi:10.1111/j.0006- 341X.2000.00345.x.

Moskowitz, C. S., and M. S. Pepe. 2006. “Comparing the Predictive Values of Diagnostic Tests: Sample Size and Analysis for Paired Study Designs.” Clinical Trials 3 (3):272–279. doi:10.1191/1740774506cn147oa.

Nofuentes, J. A. R., J. D. L. del Castillo, and M. A. M. Alonso. 2012. “Global Hypothesis Test to Simultaneously Compare the Predictive Values of Two Binary Diagnostic Tests.” Computational Statistics & Data Analysis 56 (5):1161–1173. doi:10.1016/j.csda.2011.06.003.

Pepe, M. S. 2004. The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford Statistical Science Series. Oxford: Oxford University Press.

Vacek, P. M. 1985. “The Effect of Conditional Dependence on the Evaluation of Diagnostic-Test.” Biometrics 41 (4):959–968. doi:10.2307/2530967.

Wang, W., C. S. Davis, and S. J. Soong. 2006. “Comparison of Predictive Values of Two Diagnostic Tests from the Same Sample of Subjects Using Weighted Least Squares.” Statistics in Medicine 25 (13):2215–2229. doi:10.1002/sim.2332.

Weiner, D. A., T. J. Ryan, C. H. McCabe, J. W. Kennedy, M. Schloss, F. Tristani, B. R. Chaitman, and L. D. Fisher. 1979. “Exercise Stress Testing—Correlations Among His- tory of Angina, ST-Segment Response and Prevalence of Coronary-Artery Disease in the Coronary-Artery Surgery Study (CASS).” New England Journal of Medicine 301 (5):230–235. doi:10.1056/nejm197908023010502.

Zhou, X., D. K. McClish, N. A. Obuchowski, and ebrary, Inc. 2011. Statistical Methods in Diagnostic Medicine. Wiley Series in Probability and Statistics. 2nd ed. Hoboken, NJ: Wiley.

COMMUNICATIONS IN STATISTICS: CASE STUDIES, DATA ANALYSIS AND APPLICATIONS 121

Appendix A. The multinomial-based variance-covariance matrix of predic- tive values

The vector of positive predictive values is

PPV = (

PPV1 PPV2

) ,

and the derivative of positive predictive values is

∂PPV ∂pT

= (

1−PPV1 p1••

1−PPV1 p1•• 0 0

−PPV1 p1••

−PPV1 p1•• 0 0

1−PPV2 p•1• 0

1−PPV2 p•1• 0

−PPV2 p•1• 0

−PPV2 p•1• 0

) .

Using the delta-method, the multinomial-based variance-covariance matrix of positive pre- dictive values is

V(PPV) ≈ [

∂PPV ∂pT

] (diag(p) − ppT)

[ ∂PPV ∂pT

]T =

[ ∂PPV ∂pT

] diag(p)

[ ∂PPV ∂pT

]T =

( PPV1(1−PPV1) p1••

p111(1−PPV1)(1−PPV2)+p112 PPV1 PPV2 p1••×p•1•

p111(1−PPV1)(1−PPV2)+p112 PPV1 PPV2 p1••×p•1•

PPV2(1−PPV2) p•1•

) .

Similarly, the vector of negative predictive values, the derivative of negative predictive values, and the multinomial-based variance-covariance matrix of negative predictive values are

NPV = (

NPV1 NPV2

) ,

∂NPV ∂pT

= (

0 0 −NPV1p2•• −NPV1

p2•• −NPV1

p2•• 0 0 1−NPV1

p2•• 1−NPV1

p2•• 0 −NPV2p•2• 0

−NPV2 p•2• 0

1−NPV2 p•2• 0

1−NPV2 p•2•

) ,

and

V(NPV) ≈ [

∂NPV ∂pT

] (diag(p) − ppT)

[ ∂NPV ∂pT

]T =

[ ∂NPV ∂pT

] diag(p)

[ ∂NPV ∂pT

]T =

( NPV1(1−NPV1) p2••

p221 NPV1 NPV2+p222(1−NPV1)(1−NPV2) p2••×p•2•

p221 NPV1 NPV2+p222(1−NPV1)(1−NPV2) p2••×p•2•

NPV2(1−NPV2) p•2•

) .

Appendix B. The equations for the cell probabilities

The sensitivity for test 1 (SE1) and for test 2 (SE2), and the specificity for test 1 (SP1) and for test 2 (SP2) can be solved from the prevalence of disease π , PPV1, PPV2, NPV1, and NPV2 as below.

SE1 = PPV1(NPV1 − 1 + π) π(PPV1 + NPV1 − 1)

,

SE2 = PPV2(NPV2 − 1 + π) π(PPV2 + NPV2 − 1)

,

SP1 = NPV1(PPV1 − π)

(1 − π)(PPV1 + NPV1 − 1) ,

SP2 = NPV2(PPV2 − π)

(1 − π)(PPV2 + NPV2 − 1) .

122 K. TAKAHASHI AND K. YAMAMOTO

The conditional correlations between test outcomes (Vacek 1985; Berry et al. 2002; Georgiadis et al. 2003) are as below.

ρD+ = corr(test 1, test 2 | Diseased subjects) = π p111 − (p111 + p121)(p111 + p211)√

(p111 + p121)(p211 + p221)(p111 + p211)(p121 + p221) ,

ρD− = corr(test 1, test 2 | Non-diseased subjects) = (1 − π)p222 − (p212 + p222)(p122 + p222)√

(p212 + p222)(p112 + p122)(p122 + p222)(p112 + p212) .

From SE1, SE2, SP1, SP2, ρD+, and ρD−, we can solve for the cell probarilities as below.

p111 = π(ρD+ √

SE1(1 − SE1)(SE2)(1 − SE2) + SE1SE2), p121 = π SE1 − p111, p211 = π SE2 − p111, p221 = π − p111 − p121 − p211, p222 = (1 − π)(ρD−

√ SP1(1 − SP1)(SP2)(1 − SP2) + SP1SP2),

p212 = (1 − π)SP1 − p222, p122 = (1 − π)SP2 − p222, p112 = (1 − π) − p122 − p212 − p222.

Copyright of Communications in Statistics: Case Studies & Data Analysis is the property of Taylor & Francis Ltd and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.

  • Abstract
    • 1. Introduction
    • 2. Proposed non-inferiority test
    • 3. Sample size formulae for non-inferiority tests
    • 4. Simulation
    • 5. Example
    • 6. Discussion
    • Acknowledgments
    • Funding
    • References
    • Appendix A. The multinomial-based variance-covariance matrix of predictive values
    • Appendix B. The equations for the cell probabilities