Evaluate Sampling Method and Sample Size

profilelatice1
GPowerManual.pdf

G * Power 3.1 manual January 21, 2021

This manual is not yet complete. We will be adding help on more tests in the future. If you cannot find help for your test in this version of the manual, then please check the G*Power website to see if a more up-to-date version of the manual has been made available.

Contents

1 Introduction 2

2 The G * Power calculator 7

3 Exact: Correlation - Difference from constant (one sample case) 9

4 Exact: Proportion - difference from constant (one sample case) 11

5 Exact: Proportion - inequality, two dependent groups (McNemar) 14

6 Exact: Proportions - inequality of two independent groups (Fisher’s exact-test) 17

7 Exact test: Multiple Regression - random model 18

8 Exact: Proportion - sign test 22

9 Exact: Generic binomial test 23

10 F test: Fixed effects ANOVA - one way 24

11 F test: Fixed effects ANOVA - special, main effects and interactions 26

12 t test: Linear Regression (size of slope, one group) 31

13 F test: Multiple Regression - omnibus (deviation of R2 from zero), fixed model 33

14 F test: Multiple Regression - special (increase of R2), fixed model 36

15 F test: Inequality of two Variances 39

16 t test: Correlation - point biserial model 40

17 t test: Linear Regression (two groups) 42

18 t test: Linear Regression (two groups) 45

19 t test: Means - difference between two dependent means (matched pairs) 48

20 t test: Means - difference from constant (one sam- ple case) 50

21 t test: Means - difference between two independent means (two groups) 52

22 Wilcoxon signed-rank test: Means - difference from constant (one sample case) 53

23 Wilcoxon signed-rank test: (matched pairs) 55

24 Wilcoxon-Mann-Whitney test of a difference be- tween two independent means 59

25 t test: Generic case 63

26 χ2 test: Variance - difference from constant (one sample case) 64

27 z test: Correlation - inequality of two independent Pearson r’s 65

28 z test: Correlation - inequality of two dependent Pearson r’s 66

29 Z test: Multiple Logistic Regression 70

30 Z test: Poisson Regression 75

31 Z test: Tetrachoric Correlation 80

References 84

1

1 Introduction

G * Power (Fig. 1 shows the main window of the program) covers statistical power analyses for many different statisti- cal tests of the

• F test,

• t test,

• χ2-test and

• z test families and some

• exact tests.

G * Power provides effect size calculators and graphics options. G * Power supports both a distribution-based and a design-based input mode. It contains also a calculator that supports many central and noncentral probability distribu- tions.

G * Power is free software and available for Mac OS X and Windows XP/Vista/7/8.

1.1 Types of analysis

G * Power offers five different types of statistical power analysis:

1. A priori (sample size N is computed as a function of power level 1 − β, significance level α, and the to-be- detected population effect size)

2. Compromise (both α and 1 − β are computed as func- tions of effect size, N, and an error probability ratio q = β/α)

3. Criterion (α and the associated decision criterion are computed as a function of 1− β, the effect size, and N)

4. Post-hoc (1− β is computed as a function of α, the pop- ulation effect size, and N)

5. Sensitivity (population effect size is computed as a function of α, 1 − β, and N)

1.2 Program handling

Perform a Power Analysis Using G * Power typically in- volves the following three steps:

1. Select the statistical test appropriate for your problem.

2. Choose one of the five types of power analysis available

3. Provide the input parameters required for the analysis and click "Calculate".

Plot parameters In order to help you explore the param- eter space relevant to your power analysis, one parameter (α, power (1 − β), effect size, or sample size) can be plotted as a function of another parameter.

1.2.1 Select the statistical test appropriate for your prob- lem

In Step 1, the statistical test is chosen using the distribution- based or the design-based approach.

Distribution-based approach to test selection First select the family of the test statistic (i.e., exact, F−, t−, χ2, or z- test) using the Test family menu in the main window. The Statistical test menu adapts accordingly, showing a list of all tests available for the test family.

Example: For the two groups t-test, first select the test family based on the t distribution.

Then select Means: Difference between two independent means (two groups) option in the Statictical test menu.

Design-based approach to the test selection Alterna- tively, one might use the design-based approach. With the Tests pull-down menu in the top row it is possible to select

• the parameter class the statistical test refers to (i.e., correlations and regression coefficients, means, propor- tions, or variances), and

• the design of the study (e.g., number of groups, inde- pendent vs. dependent samples, etc.).

The design-based approach has the advantage that test op- tions referring to the same parameter class (e.g., means) are located in close proximity, whereas they may be scattered across different distribution families in the distribution- based approach.

Example: In the Tests menu, select Means, then select Two inde- pendent groups" to specify the two-groups t test.

2

Figure 1: The main window of G * Power

1.2.2 Choose one of the five types of power analysis available

In Step 2, the Type of power analysis menu in the center of the main window is used to choose the appropriate analysis type and the input and output parameters in the window change accordingly.

Example: If you choose the first item from the Type of power analysis menu the main window will display input and output parameters appropriate for an a priori power analysis (for t tests for independent groups if you followed the example provided in Step 1).

In an a priori power analysis, sample size N is computed as a function of

• the required power level (1 − β),

• the pre-specified significance level α, and

• the population effect size to be detected with probabil- ity (1 − β).

In a criterion power analysis, α (and the associated deci- sion criterion) is computed as a function of

• 1-β,

• the effect size, and

• a given sample size.

In a compromise power analysis both α and 1 − β are computed as functions of

• the effect size,

• N, and

• an error probability ratio q = β/α.

In a post-hoc power analysis the power (1 − β) is com- puted as a function of

3

• α,

• the population effect size parameter, and

• the sample size(s) used in a study.

In a sensitivity power analysis the critical population ef- fect size is computed as a function of

• α,

• 1 − β, and

• N.

1.2.3 Provide the input parameters required for the anal- ysis

In Step 3, you specify the power analysis input parameters in the lower left of the main window.

Example: An a priori power analysis for a two groups t test would require a decision between a one-tailed and a two-tailed test, a specification of Cohen’s (1988) effect size measure d un- der H1, the significance level α, the required power (1 − β) of the test, and the preferred group size allocation ratio n2 /n1.

Let us specify input parameters for

• a one-tailed t test,

• a medium effect size of d = .5,

• α = .05,

• (1 − β) = .95, and • an allocation ratio of n2 /n1 = 1

This would result in a total sample size of N = 176 (i.e., 88 observation units in each group). The noncentrality parameter δ defining the t distribution under H1, the decision criterion to be used (i.e., the critical value of the t statistic), the degrees of freedom of the t test and the actual power value are also displayed.

Note that the actual power will often be slightly larger than the pre-specified power in a priori power analyses. The reason is that non-integer sample sizes are always rounded up by G * Power to obtain integer values consistent with a power level not less than the pre-specified one.

Because Cohen’s book on power analysis Cohen (1988) appears to be well known in the social and behavioral sci- ences, we made use of his effect size measures whenever possible. In addition, wherever available G * Power pro- vides his definitions of "‘small"’, "‘medium"’, and "‘large"’ effects as "‘Tool tips"’. The tool tips may be optained by moving the cursor over the "‘effect size"’ input parameter field (see below). However, note that these conventions may have different meanings for different tests.

Example: The tooltip showing Cohen’s measures for the effect size d used in the two groups t test

If you are not familiar with Cohen’s measures, if you think they are inadequate for your test problem, or if you have more detailed information about the size of the to-be- expected effect (e.g., the results of similar prior studies), then you may want to compute Cohen’s measures from more basic parameters. In this case, click on the Determine button to the left the effect size input field. A drawer will open next to the main window and provide access to an effect size calculator tailored to the selected test.

Example: For the two-group t-test users can, for instance, spec- ify the means µ1, µ2 and the common standard deviation (σ = σ1 = σ2) in the populations underlying the groups to cal- culate Cohen’s d = |µ1 − µ2|/σ. Clicking the Calculate and transfer to main window button copies the computed effect size to the appropriate field in the main window

In addition to the numerical output, G * Power displays the central (H0) and the noncentral (H1) test statistic distri- butions along with the decision criterion and the associated error probabilities in the upper part of the main window. This supports understanding the effects of the input pa- rameters and is likely to be a useful visualization tool in

4

the teaching of, or the learning about, inferential statistics. The distributions plot may be copied, saved, or printed by clicking the right mouse button inside the plot area.

Example: The menu appearing in the distribution plot for the t-test after right clicking into the plot.

The input and output of each power calculation in a G*Power session are automatically written to a protocol that can be displayed by selecting the "Protocol of power analyses" tab in the main window. You can clear the proto- col, or to save, print, and copy the protocol in the same way as the distributions plot.

(Part of) the protocol window.

1.2.4 Plotting of parameters

G * Power provides to possibility to generate plots of one of the parameters α, effectsize, power and sample size, de- pending on a range of values of the remaining parameters.

The Power Plot window (see Fig. 2) is opened by click- ing the X-Y plot for a range of values button located in the lower right corner of the main window. To ensure that all relevant parameters have valid values, this button is only enabled if an analysis has successfully been computed (by clicking on calculate).

The main output parameter of the type of analysis se- lected in the main window is by default selected as the de-

pendent variable y. In an a prior analysis, for instance, this is the sample size.

The button X-Y plot for a range of values at to bottom of the main window opens the plot window.

By selecting the appropriate parameters for the y and the x axis, one parameter (α, power (1 − β), effect size, or sam- ple size) can be plotted as a function of another parame- ter. Of the remaining two parameters, one can be chosen to draw a family of graphs, while the fourth parameter is kept constant. For instance, power (1 − β) can be drawn as a function of the sample size for several different population effects sizes, keeping α at a particular value.

The plot may be printed, saved, or copied by clicking the right mouse button inside the plot area.

Selecting the Table tab reveals the data underlying the plot (see Fig. 3); they may be copied to other applications by selecting, cut and paste.

Note: The Power Plot window inherits all input param- eters of the analysis that is active when the X-Y plot for a range of values button is pressed. Only some of these parameters can be directly manipulated in the Power Plot window. For instance, switching from a plot of a two-tailed test to that of a one-tailed test requires choosing the Tail(s): one option in the main window, fol- lowed by pressing the X-Y plot for range of values button.

5

Figure 2: The plot window of G * Power

Figure 3: The table view of the data for the graphs shown in Fig. 2

6

2 The G * Power calculator

G * Power contains a simple but powerful calculator that can be opened by selecting the menu label "Calculator" in the main window. Figure 4 shows an example session. This small example script calculates the power for the one-tailed t test for matched pairs and demonstrates most of the avail- able features:

• There can be any number of expressions

• The result is set to the value of the last expression in the script

• Several expression on a line are separated by a semi- colon

• Expressions can be assigned to variables that can be used in following expressions

• The character # starts a comment. The rest of the line following # is ignored

• Many standard mathematical functions like square root, sin, cos etc are supported (for a list, see below)

• Many important statistical distributions are supported (see list below)

• The script can be easily saved and loaded. In this way a number of useful helper scripts can be created.

The calculator supports the following arithmetic opera- tions (shown in descending precedence):

• Power: ^ (2^3 = 8)

• Multiply: ∗ (2 ∗ 2 = 4)

• Divide: / (6/2 = 3)

• Plus: + (2 + 3 = 5)

• Minus: - (3 − 2 = 1)

Supported general functions

• abs(x) - Absolute value |x|

• sin(x) - Sine of x

• asin(x) - Arc sine of x

• cos(x) - Cosine of x

• acos(x) - Arc cosine of x

• tan(x) - Tangent of x

• atan(x) - Arc tangent of x

• atan2(x,y) - Arc tangent of y/x

• exp(x) - Exponential ex

• log(x) - Natural logarithm ln(x)

• sqrt(x) - Square root √

x

• sqr(x) - Square x2

• sign(x) - Sign of x: x < 0 → −1, x = 0 → 0, x > 0 → 1.

• lngamma(x) Natural logarithm of the gamma function ln(Γ(x))

• frac(x) - Fractional part of floating point x: frac(1.56) is 0.56.

• int(x) - Integer part of float point x: int(1.56) is 1.

• min(x,y) - Minimum of x and y

• max(x,y) - Maximum of x and y

• uround(x,m) - round x up to a multiple of m uround(2.3, 1) is 3, uround(2.3, 2) = 4.

Supported distribution functions (CDF = cumulative distribution function, PDF = probability density func- tion, Quantile = inverse of the CDF). For informa- tion about the properties of these distributions check http://mathworld.wolfram.com/.

• zcdf(x) - CDF zpdf(x) - PDF zinv(p) - Quantile of the standard normal distribution.

• normcdf(x,m,s) - CDF normpdf(x,m,s) - PDF norminv(p,m,s) - Quantile of the normal distribution with mean m and standard deviation s.

• chi2cdf(x,df) - CDF chi2pdf(x,df) - PDF chi2inv(p,df) - Quantile of the chi square distribution with d f degrees of free- dom: χ2d f (x).

• fcdf(x,df1,df2) - CDF fpdf(x,df1,df2) - PDF finv(p,df1,df2) - Quantile of the F distribution with d f1 numerator and d f2 de- nominator degrees of freedom Fd f1,d f2 (x).

• tcdf(x,df) - CDF tpdf(x,df) - PDF tinv(p,df) - Quantile of the Student t distribution with d f degrees of free- dom td f (x).

• ncx2cdf(x,df,nc) - CDF ncx2pdf(x,df,nc) - PDF ncx2inv(p,df,nc) - Quantile of noncentral chi square distribution with d f degrees of freedom and noncentrality parameter nc.

• ncfcdf(x,df1,df2,nc) - CDF ncfpdf(x,df1,df2,nc) - PDF ncfinv(p,df1,df2,nc) - Quantile of noncentral F distribution with d f1 numerator and d f2 denominator degrees of freedom and noncentrality parameter nc.

7

Figure 4: The G * Power calculator

• nctcdf(x,df,nc) - CDF nctpdf(x,df,nc) - PDF nctinv(p,df,nc) - Quantile of noncentral Student t distribution with d f degrees of freedom and noncentrality parameter nc.

• betacdf(x,a,b) - CDF betapdf(x,a,b) - PDF betainv(p,a,b) - Quantile of the beta distribution with shape parameters a and b.

• poisscdf(x,λ) - CDF poisspdf(x,λ) - PDF poissinv(p,λ) - Quantile poissmean(x,λ) - Mean of the poisson distribution with mean λ.

• binocdf(x,N,π) - CDF binopdf(x,N,π) - PDF binoinv(p,N,π) - Quantile of the binomial distribution for sample size N and suc- cess probability π.

• hygecdf(x,N,ns,nt) - CDF hygepdf(x,N,ns,nt) - PDF hygeinv(p,N,ns,nt) - Quantile of the hypergeometric distribution for samples of size N from a population of total size nt with ns successes.

• corrcdf(r,ρ,N) - CDF corrpdf(r,ρ,N) - PDF corrinv(p,ρ,N) - Quantile of the distribution of the sample correlation coefficient r for population correlation ρ and samples of size N.

• mr2cdf(R2, ρ2,k,N) - CDF mr2pdf(R2, ρ2,k,N) - PDF mr2inv(p,ρ2,k,N) - Quantile of the distribution of the sample squared multiple cor- relation coefficient R2 for population squared multiple correlation coefficient ρ2, k −1 predictors, and samples of size N.

• logncdf(x,m,s) - CDF lognpdf(x,m,s) - PDF logninv(p,m,s) - Quantile of the log-normal distribution, where m, s denote mean and standard deviation of the associated normal distri- bution.

• laplcdf(x,m,s) - CDF laplpdf(x,m,s) - PDF laplinv(p,m,s) - Quantile of the Laplace distribution, where m, s denote location and scale parameter.

• expcdf(x,λ - CDF exppdf(x,λ) - PDF expinv(p,λ - Quantile of the exponential distribution with parameter λ.

• unicdf(x,a,b) - CDF unipdf(x,a,b) - PDF uniinv(p,a,b) - Quantile of the uniform distribution in the intervall [a, b].

8

3 Exact: Correlation - Difference from constant (one sample case)

The null hypothesis is that in the population the true cor- relation ρ between two bivariate normally distributed ran- dom variables has the fixed value ρ0. The (two-sided) al- ternative hypothesis is that the correlation coefficient has a different value: ρ 6= ρ0:

H0 : ρ − ρ0 = 0 H1 : ρ − ρ0 6= 0.

A common special case is ρ0 = 0 (see e.g. Cohen, 1969, Chap. 3). The two-sided test (“two tails”) should be used if there is no restriction on the direction of the deviation of the sample r from ρ0. Otherwise use the one-sided test (“one tail”).

3.1 Effect size index

To specify the effect size, the conjectured alternative corre- lation coefficient ρ should be given. ρ must conform to the following restrictions: −1 + ε < ρ < 1 − ε, with ε = 10−6. The proper effect size is the difference between ρ and ρ0: ρ − ρ0. Zero effect sizes are not allowed in a priori analyses. G * Power therefore imposes the additional restriction that |ρ − ρ0| > ε in this case.

For the special case ρ0 = 0, Cohen (1969, p.76) defines the following effect size conventions:

• small ρ = 0.1

• medium ρ = 0.3

• large ρ = 0.5

Pressing the Determine button on the left side of the ef- fect size label opens the effect size drawer (see Fig. 5). You can use it to calculate |ρ| from the coefficient of determina- tion r2.

Figure 5: Effect size dialog to determine the coefficient of deter- mination from the correlation coefficient ρ.

3.2 Options

The procedure uses either the exact distribution of the cor- relation coefficient or a large sample approximation based on the z distribution. The options dialog offers the follow- ing choices:

1. Use exact distribution if N < x. The computation time of the exact distribution increases with N, whereas that of the approximation does not. Both procedures are

asymptotically identical, that is, they produce essen- tially the same results if N is large. Therefore, a thresh- old value x for N can be specified that determines the transition between both procedures. The exact proce- dure is used if N < x, the approximation otherwise.

2. Use large sample approximation (Fisher Z). With this op- tion you select always to use the approximation.

There are two properties of the output that can be used to discern which of the procedures was actually used: The option field of the output in the protocol, and the naming of the critical values in the main window, in the distribution plot, and in the protocol (r is used for the exact distribution and z for the approximation).

3.3 Examples

In the null hypothesis we assume ρ0 = 0.60 to be the corre- lation coefficient in the population. We further assume that our treatment increases the correlation to ρ = 0.65. If we require α = β = 0.05, how many subjects do we need in a two-sided test?

• Select Type of power analysis: A priori

• Options Use exact distribution if N <: 10000

• Input Tail(s): Two Correlation ρ H1: 0.65 α err prob: 0.05 Power (1-β err prob): 0.95 Correlation ρ H0: 0.60

• Output Lower critical r: 0.570748 Upper critical r: 0.627920 Total sample size: 1928 Actual power: 0.950028

In this case we would reject the null hypothesis if we ob- served a sample correlation coefficient outside the inter- val [0.571, 0.627]. The total sample size required to ensure a power (1 − β) > 0.95 is 1928; the actual power for this N is 0.950028.

In the example just discussed, using the large sample ap- proximation leads to almost the same sample size N = 1929. Actually, the approximation is very good in most cases. We now consider a small sample case, where the deviation is more pronounced: In a post hoc analysis of a two-sided test with ρ0 = 0.8, ρ = 0.3, sample size 8, and α = 0.05 the exact power is 0.482927. The approximation gives the slightly lower value 0.422599.

3.4 Related tests

Similar tests in G * Power 3.0:

• Correlation: Point biserial model

• Correlations: Two independent Pearson r’s (two sam- ples)

9

3.5 Implementation notes

Exact distribution. The H0-distribution is the sam- ple correlation coefficient distribution sr(ρ0, N), the H1- distribution is sr(ρ, N), where N denotes the total sam- ple size, ρ0 denotes the value of the baseline correlation assumed in the null hypothesis, and ρ denotes the ‘alter- native correlation’. The (implicit) effect size is ρ − ρ0. The algorithm described in Barabesi and Greco (2002) is used to calculate the CDF of the sample coefficient distribution.

Large sample approximation. The H0-distribution is the standard normal distribution N(0, 1), the H1-distribution is N(Fz(ρ)− Fz(ρ0))/σ, 1), with Fz(r) = ln((1 + r)/(1−r))/2 (Fisher z transformation) and σ =

√ 1/(N − 3).

3.6 Validation

The results in the special case of ρ0 = 0 were compared with the tabulated values published in Cohen (1969). The results in the general case were checked against the values produced by PASS (Hintze, 2006).

10

4 Exact: Proportion - difference from constant (one sample case)

The problem considered in this case is whether the proba- bility π of an event in a given population has the constant value π0 (null hypothesis). The null and the alternative hy- pothesis can be stated as:

H0 : π − π0 = 0 H1 : π − π0 6= 0.

A two-tailed binomial tests should be performed to test this undirected hypothesis. If it is possible to predict a pri- ori the direction of the deviation of sample proportions p from π0, e.g. p − π0 < 0, then a one-tailed binomial test should be chosen.

4.1 Effect size index

The effect size g is defined as the deviation from the con- stant probability π0, that is, g = π − π0.

The definition of g implies the following restriction: ε ≤ (π0 + g) ≤ 1 − ε. In an a priori analysis we need to re- spect the additional restriction |g| > ε (this is in accordance with the general rule that zero effect hypotheses are unde- fined in a priori analyses). With respect to these constraints, G * Power sets ε = 10−6.

Pressing the Determine button on the left side of the ef- fect size label opens the effect size drawer:

You can use this dialog to calculate the effect size g from π0 (called P1 in the dialog above) and π (called P2 in the dialog above) or from several relations between them. If you open the effect dialog, the value of P1 is set to the value in the constant proportion input field in the main window. There are four different ways to specify P2:

1. Direct input: Specify P2 in the corresponding input field below P1

2. Difference: Choose difference P2-P1 and insert the difference into the text field on the left side (the dif- ference is identical to g).

3. Ratio: Choose ratio P2/P1 and insert the ratio value into the text field on the left side

4. Odds ratio: Choose odds ratio and insert the odds ra- tio (P2/(1 − P2))/(P1/(1 − P1)) between P1 and P2 into the text field on the left side.

The relational value given in the input field on the left side and the two proportions given in the two input fields on the right side are automatically synchronized if you leave one of the input fields. You may also press the Sync values button to synchronize manually.

Press the Calculate button to preview the effect size g resulting from your input values. Press the Transfer to main window button to (1) to calculate the effect size g = π − π0 = P2 − P1 and (2) to change, in the main window, the Constant proportion field to P1 and the Effect size g field to g as calculated.

4.2 Options

The binomial distribution is discrete. It is thus not normally possible to arrive exactly at the nominal α-level. For two- sided tests this leads to the problem how to “distribute” α to the two sides. G * Power offers the three options listed here, the first option being selected by default:

1. Assign α/2 to both sides: Both sides are handled inde- pendently in exactly the same way as in a one-sided test. The only difference is that α/2 is used instead of α. Of the three options offered by G * Power , this one leads to the greatest deviation from the actual α (in post hoc analyses).

2. Assign to minor tail α/2, then rest to major tail (α2 = α/2, α1 = α − α2): First α/2 is applied to the side of the central distribution that is farther away from the noncentral distribution (minor tail). The criterion used for the other side is then α − α1, where α1 is the actual α found on the minor side. Since α1 ≤ α/2 one can conclude that (in post hoc analyses) the sum of the ac- tual values α1 + α2 is in general closer to the nominal α-level than it would be if α/2 were assigned to both side (see Option 1).

3. Assign α/2 to both sides, then increase to minimize the dif- ference of α1 + α2 to α: The first step is exactly the same as in Option 1. Then, in the second step, the critical values on both sides of the distribution are increased (using the lower of the two potential incremental α- values) until the sum of both actual α values is as close as possible to the nominal α.

Press the Options button in the main window to select one of these options.

4.3 Examples

We assume a constant proportion π0 = 0.65 in the popula- tion and an effect size g = 0.15, i.e. π = 0.65 + 0.15 = 0.8. We want to know the power of a one-sided test given α = .05 and a total sample size of N = 20.

• Select Type of power analysis: Post hoc

• Options Alpha balancing in two-sided tests: Assign α/2 on both sides

11

Figure 6: Distribution plot for the example (see text)

• Input Tail(s): One Effect size g: 0.15 α err prob: 0.05 Total sample size: 20 Constant proportion: 0.65

• Output Lower critical N: 17 Upper critical N: 17 Power (1-β err prob): 0.411449 Actual α: 0.044376

The results show that we should reject the null hypoth- esis of π = 0.65 if in 17 out of the 20 possible cases the relevant event is observed. Using this criterion, the actual α is 0.044, that is, it is slightly lower than the requested α of 5%. The power is 0.41.

Figure 6 shows the distribution plots for the example. The red and blue curves show the binomial distribution under H0 and H1, respectively. The vertical line is positioned at the critical value N = 17. The horizontal portions of the graph should be interpreted as the top of bars ranging from N − 0.5 to N + 0.5 around an integer N, where the height of the bars correspond to p(N).

We now use the graphics window to plot power val- ues for a range of sample sizes. Press the X-Y plot for a range of values button at the bottom of the main win- dow to open the Power Plot window. We select to plot the power as a function of total sample size. We choose a range of samples sizes from 10 in steps of 1 through to 50. Next, we select to plot just one graph with α = 0.05 and effect size g = 0.15. Pressing the Draw Plot button produces the plot shown in Fig. 7. It can be seen that the power does not increase monotonically but in a zig-zag fashion. This behav- ior is due to the discrete nature of the binomial distribution that prevents that arbitrary α value can be realized. Thus, the curve should not be interpreted to show that the power for a fixed α sometimes decreases with increasing sample size. The real reason for the non-monotonic behaviour is that the actual α level that can be realized deviates more or less from the nominal α level for different sample sizes.

This non-monotonic behavior of the power curve poses a problem if we want to determine, in an a priori analysis, the minimal sample size needed to achieve a certain power. In these cases G * Power always tries to find the lowest sam- ple size for which the power is not less than the specified value. In the case depicted in Fig. 7, for instance, G * Power

would choose N = 16 as the result of a search for the sam- ple size that leads to a power of at least 0.3. All types of power analyses except post hoc are confronted with sim- ilar problems. To ensure that the intended result has been found, we recommend to check the results from these types of power analysis by a power vs. sample size plot.

4.4 Related tests

Similar tests in G * Power 3.0:

• Proportions: Sign test.

4.5 Implementation notes

The H0-distribution is the Binomial distribution B(N, π0), the H1-distribution the Binomial distribution B(N, g + π0). N denotes the total sample size, π0 the constant proportion assumed in the null hypothesis, and g the effect size index as defined above.

4.6 Validation

The results of G * Power for the special case of the sign test, that is π0 = 0.5, were checked against the tabulated values given in Cohen (1969, chapter 5). Cohen always chose from the realizable α values the one that is closest to the nominal value even if it is larger then the nominal value. G * Power , in contrast, always requires the actual α to be lower then the nominal value. In cases where the α value chosen by Cohen happens to be lower then the nominal α, the results computed with G * Power were very similar to the tabu- lated values. In the other cases, the power values computed by G * Power were lower then the tabulated ones.

In the general case (π0 6= 0.5) the results of post hoc analyses for a number of parameters were checked against the results produced by PASS (Hintze, 2006). No differences were found in one-sided tests. The results for two-sided tests were also identical if the alpha balancing method “As- sign α/2 to both sides” was chosen in G * Power .

12

Figure 7: Plot of power vs. sample size in the binomial test (see text)

13

5 Exact: Proportion - inequality, two dependent groups (McNemar)

This procedure relates to tests of paired binary responses. Such data can be represented in a 2 × 2 table:

Standard Treatment Yes No

Yes π11 π12 πt No π21 π22 1 − πt

πs 1 − πs 1

where πij denotes the probability of the respective re- sponse. The probability πD of discordant pairs, that is, the probability of yes/no-response pairs, is given by πD = π12 + π21. The hypothesis of interest is that πs = πt, which is formally identical to the statement π12 = π21.

Using this fact, the null hypothesis states (in a ratio no- tation) that π12 is identical to π21, and the alternative hy- pothesis states that π12 and π21 are different:

H0 : π12/π21 = 1 H1 : π12/π21 6= 1.

In the context of the McNemar test the term odds ratio (OR) denotes the ratio π12/π21 that is used in the formulation of H0 and H1.

5.1 Effect size index

The Odds ratio π12/π21 is used to specify the effect size. The odds ratio must lie inside the interval [10−6, 106]. An odds ratio of 1 corresponds to a null effect. Therefore this value must not be used in a priori analyses.

In addition to the odds ratio, the proportion of discordant pairs, i.e. πD , must be given in the input parameter field called Prop discordant pairs. The values for this propor- tion must lie inside the interval [ε, 1 − ε], with ε = 10−6.

If πD and d = π12 − π21 are given, then the odds ratio may be calculated as: OR = (d + πD)/(d − πD).

5.2 Options

Press the Options button in the main window to select one of the following options.

5.2.1 Alpha balancing in two-sided tests

The binomial distribution is discrete. It is therefore not normally possible to arrive at the exact nominal α-level. For two-sided tests this leads to the problem how to “dis- tribute” α to the two sides. G * Power offers the three op- tions listed here, the first option being selected by default:

1. Assign α/2 to both sides: Both sides are handled inde- pendently in exactly the same way as in a one-sided test. The only difference is that α/2 is used instead of α. Of the three options offered by G * Power , this one leads to the greatest deviation from the actual α (in post hoc analyses).

2. Assign to minor tail α/2, then rest to major tail (α2 = α/2, α1 = α − α2): First α/2 is applied to the side of

the central distribution that is farther away from the noncentral distribution (minor tail). The criterion used on the other side is then α − α1, where α1 is the actual α found on the minor side. Since α1 ≤ α/2 one can conclude that (in post hoc analyses) the sum of the ac- tual values α1 + α2 is in general closer to the nominal α-level than it would be if α/2 were assigned to both sides (see Option 1).

3. Assign α/2 to both sides, then increase to minimize the dif- ference of α1 + α2 to α: The first step is exactly the same as in Option 1. Then, in the second step, the critical values on both sides of the distribution are increased (using the lower of the two potential incremental α- values) until the sum of both actual α values is as close as possible to the nominal α.

5.2.2 Computation

You may choose between an exact procedure and a faster approximation (see implementation notes for details):

1. Exact (unconditional) power if N < x. The computation time of the exact procedure increases much faster with sample size N than that of the approximation. Given that both procedures usually produce very similar re- sults for large sample sizes, a threshold value x for N can be specified which determines the transition be- tween both procedures. The exact procedure is used if N < x; the approximation is used otherwise. Note: G * Power does not show distribution plots for exact computations.

2. Faster approximation (assumes number of discordant pairs to be constant). Choosing this option instructs G * Power to always use the approximation.

5.3 Examples

As an example we replicate the computations in O’Brien (2002, p. 161-163). The assumed table is:

Standard Treatment Yes No

Yes .54 .08 .62 No .32 .06 .38

.86 .14 1

In this table the proportion of discordant pairs is πD = .32 + .08 = 0.4 and the Odds Ratio OR = π12/π21 = 0.08/.32 = 0.25. We want to compute the exact power for a one-sided test. The sample size N, that is, the number of pairs, is 50 and α = 0.05.

• Select Type of power analysis: Post hoc

• Options Computation: Exact

• Input Tail(s): One Odds ratio: 0.25 α err prob: 0.05 Total sample size: 50 Prop discordant pairs: 0.4

14

• Output Power (1-β err prob): 0.839343 Actual α: 0.032578 Proportion p12: 0.08 Proportion p21: 0.32

The power calculated by G * Power (0.839343) corresponds within the given precision to the result computed by O’Brien (0.839). Now we use the Power Plot window to cal- culate the power for several other sample sizes and to gen- erate a graph that gives us an overview of a section of the parameter space. The Power Plot window can be opened by pressing the X-Y plot for a range of values button in the lower part of the main window.

In the Power Plot window we choose to plot the power on the Y-Axis (with markers and displaying the values in the plot) as a function of total sample size. The sample sizes shall range from 50 in steps of 25 through to 150. We choose to draw a single plot. We specify α = 0.05 and Odds ratio = 0.25.

The results shown in figure 8 replicate exactly the values in the table in O’Brien (2002, p. 163)

To replicate the values for the two-sided case, we must decide how the α error should be distributed to the two sides. The method chosen by O’Brien corresponds to Op- tion 2 in G * Power (“Assign to minor tail α/2, then rest to major tail”, see above). In the main window, we select Tail(s) "Two" and set the other input parameters exactly as shown in the example above. For sample sizes 50, 75, 100, 125, 150 we get power values 0.798241, 0.930639, 0.980441, 0.994839, and 0.998658, respectively, which are again equal to the values given in O’Brien’s table.

5.4 Related tests

5.5 Implementation notes

Exact (unconditional) test . In this case G * Power calcu- lates the unconditional power for the exact conditional test: The number of discordant pairs ND is a random variable with binomial distribution B(N, πD), where N denotes the total number of pairs, and πD = π12 + π21 denotes the probability of discordant pairs. Thus P(ND) = (

N ND

)(π11 +

π22) ND (π12 + π21)

N−ND . Conditional on ND , the frequency f12 has a binomial distribution B(ND , π0 = π12/πD) and we test the H0: π0 = 0.5. Given the conditional ‘bino- mial power’ Pow(ND , π0|ND = i) the exact unconditional power is ∑Ni P(ND = i)Pow(ND , π0|Nd = i). The summa- tion starts at the most probable value for ND and then steps outward until the values are small enough to be ignored.

Fast approximation . In this case an ordinary one sample binomial power calculation with H0-distribution B(NπD , 0.5), and H1-Distribution B(NπD , OR/(OR + 1)) is performed.

5.6 Validation

The results of the exact procedure were checked against the values given on pages 161-163 in O’Brien (2002). Com- plete correspondence was found in the one-tailed case and

also in the two-tailed case when the alpha balancing Op- tion 2 (“Assign to minor tail α/2, then rest to major tail”, see above) was chosen in G * Power .

We also compared the exact results of G * Power gener- ated for a large range of parameters to the results produced by PASS (Hintze, 2006) for the same scenarios. We found complete correspondence in one-sided test. In two-sided tests PASS uses an alpha balancing strategy correspond- ing to Option 1 in G * Power (“Assign α/2 on both sides”, see above). With two-sided tests we found small deviations between G * Power and PASS (about ±1 in the third deci- mal place), especially for small sample sizes. These devia- tions were always much smaller than those resulting from a change of the balancing strategy. All comparisons with PASS were restricted to N < 2000, since for larger N the ex- act routine in PASS sometimes produced nonsensical values (this restriction is noted in the PASS manual).

15

Figure 8: Result of the sample McNemar test (see text for details).

16

6 Exact: Proportions - inequality of two independent groups (Fisher’s exact-test)

6.1 Introduction

This procedure calculates power and sample size for tests comparing two independent binomial populations with probabilities π1 and π2, respectively. The results of sam- pling from these two populations can be given in a 2 × 2 contingency table X:

Group 1 Group 2 Total Success x1 x2 m Failure n1 − x1 n2 − x2 N − m

Total n1 n2 N

Here, n1, n2 are the sample sizes, and x1, x2 the observed number of successes in the two populations. N = n1 + n2 is the total sample size, and m = x1 + x2 the total number of successes.

The null hypothesis states that π1 = π2, whereas the al- ternative hypothesis assumes different probabilities in both populations:

H0 : π1 − π2 = 0 H1 : π1 − π2 6= 0.

6.2 Effect size index

The effect size is determined by directly specifying the two proportions π1 and π2.

6.3 Options

This test has no options.

6.4 Examples

6.5 Related tests

6.6 Implementation notes

6.6.1 Exact unconditional power

The procedure computes the exact unconditional power of the (conditional) test.

The exact probability of table X (see introduction) under H0, conditional on x1 + x2 = m, is given by:

Pr(X|m, H0) = (n1x1

)(n2x2 )

(Nm)

Let T be a test statistic, t a possible value of T, and M the set of all tables X with a total number of successes equal to m. We define Mt = {X ∈ M : T ≥ t}, i.e. Mt is the subset of M containing all tables for which the value of the test statistic is equal to or exceeds the value t. The exact null distribution of T is obtained by calculating Pr(T ≥ t|m, H0) = ∑X∈Mt Pr(X|m, H0) for all possible t. The critical value tα is the smallest value such that Pr(T ≥ tα|m, H0) ≤ α. The power is then defined as:

1 − β = N

∑ m=0

P(m)Pr(T ≥ tα|m, H1),

where Pr(T ≥ tα|m, H1) = ∑

X∈Mtα

B12 ∑X∈M B12

,

P(m) = Pr(x1 + x2 = m|H1) = B12, and

B12 = (

n1 x1

) π

x1 1 (1 − π1)

n1−x1 (

n2 x2

) π

x2 2 (1 − π2)

n2−x2

For two-sided tests G * Power provides three common test statistics that are asymptotically equivalent:

1. Fisher’s exact test:

T = − ln [ (n1x1

)(n2x2 )

(Nm)

]

2. Persons’s exact test:

T = 2

∑ j=1

( (xj − mnj /N)2

mnj /N +

[(nj − xj)− (N − m)nj /N]2

(N − m)nj /N

)

3. Likelihood ratio exact test:

T = 2 2

∑ j=1

( xj ln

[ xj

mnj /N

] + (nj − xj) ln

[ nj − xj

(N − m)nj /N

])

The choice of the test statistics only influences the way in which α is distributed on both sides of the null distribution.

For one-sided tests the test statistic is T = x2.

6.6.2 Large sample approximation

The large sample approximation is based on a continuity corrected χ2 test with pooled variances. To permit a two- sided test, a z test version is used: The H0 distribution is the standard normal distribution N(0, 1), and the H1 distri- bution given by the normal distribution N(m(k), σ), with

σ = 1 σ0

√ p1(1 − p1)/n1 + p2(1 − p2)/n2

m(k) = 1 σ0

[p2 − p1 − k(1/n1 + 1/n2)/2], with

σ0 =

√ n1(1 − p1) + n2(1 − p2)

n1n2 ·

n1 p1 + n2 p2 n1 + n2

k = {

p1 < p2 : −1 p1 ≥ p2 : +1

6.7 Validation

The results were checked against the values produced by GPower 2.0.

17

7 Exact test: Multiple Regression - ran- dom model

In multiple regression analyses, the relation of a dependent variable Y to m independent factors X = (X1, ..., Xm) is studied. The present procedure refers to the so-called un- conditional or random factors model of multiple regression (Gatsonis & Sampson, 1989; Sampson, 1974), that is, it is as- sumed that Y and X1, . . . , Xm are random variables, where (Y, X1, . . . , Xm) have a joint multivariate normal distribu- tion with a positive definite covariance matrix:(

σ2Y Σ ′ YX

ΣYX ΣX

) and mean (µY , µX). Without loss of generality we may as- sume centered variables with µY = 0, µXi = 0.

The squared population multiple correlation coefficient between Y and X is given by:

ρ2YX = Σ ′ YX Σ

−1 X ΣYX /σ

2 Y .

and the regression coefficient vector γ, the analog of β in the fixed model, by γ = Σ−1X ΣYX . The maximum likelihood estimates of the regression coefficient vector and the resid- uals are the same under both the fixed and the random model (see theorem 1 in Sampson (1974)); the models dif- fer, however, with respect to power.

The present procedure allows power analyses for the test that the population squared correlations coefficient ρ2YX has the value ρ20. The null and alternate hypotheses are:

H0 : ρ2YX = ρ 2 0

H1 : ρ2YX 6= ρ 2 0.

An important special case is ρ0 = 0 (corresponding to the assumption ΣYX = 0). A commonly used test statistic for this case is F = [(N − m − 1)/ p]R2YX /(1 − R

2 YX), which has

a central F distribution with d f1 = m, and d f2 = N −m−1. This is the same test statistic that is used in the fixed model. The power differs, however, in both cases.

7.1 Effect size index

The effect size is the population squared correlation coeffi- cient H1 ρ2 under the alternative hypothesis. To fully spec- ify the effect size, you also need to give the population squared correlation coefficient H0 ρ2 under the null hypoth- esis.

Pressing the button Determine on the left of the effect size label in the main window opens the effect size drawer (see Fig. 9) that may be used to calculateρ2 either from the con- fidence interval for the population ρ2YX given an observed squared multiple correlation R2YX or from predictor corre- lations.

Effect size from C.I. Figure (9) shows an example of how the H1 ρ2 can be determined from the confidence interval computed for an observed R2. You have to input the sam- ple size, the number of predictors, the observed R2 and the confidence level of the confidence interval. In the remain- ing input field a relative position inside the confidence in- terval can be given that determines the H1 ρ2 value. The

Figure 9: Effect size drawer to calculate ρ2 from a confidence in- terval or via predictor correlations (see text).

value can range from 0 to 1, where 0, 0.5 and 1 corre- sponds to the left, central and right position inside the inter- val, respectively. The output fields C.I. lower ρ2 and C.I. upper ρ2 contain the left and right border of the two-sided 100(1 − α) percent confidence interval for ρ2. The output fields Statistical lower bound and Statistical upper bound show the one-sided (0, R) and (L, 1) intervals, respec- tively.

Effect size from predictor correlations By choosing the option "From predictor correlation matrix" (see Fig. (9)) one may compute ρ2 from the matrix of correlation among the predictor variables and the correlations between predictors and the dependent variable Y. Pressing the "Insert/edit matrix"-button opens a window, in which one can spec- ify (1) the row vector u containing the correlations between each of the m predictors Xi and the dependent variable Y and (2) the m × m matrix B of correlations among the pre- dictors. The squared multiple correlation coefficient is then given by ρ2 = uB−1u′. Each input correlation must lie in the interval [−1, 1], the matrix B must be positive-definite, and the resulting ρ2 must lie in the interval [0, 1]. Pressing the Button "Calc ρ2" tries to calculate ρ2 from the input and checks the positive-definiteness of matrix B.

Relation of ρ2 to effect size f 2 The relation between ρ2

and effect size f 2 used in the fixed factors model is:

f 2 = ρ2

1 − ρ2

18

Figure 10: Input of correlations between predictors and Y (top) and the matrix of correlations among the predictors (bottom).

and conversely:

ρ2 = f 2

1 + f 2

Cohen (1988, p. 412) defines the following conventional values for the effect size f 2:

• small f 2 = 0.02

• medium f 2 = 0.15

• large f 2 = 0.35

which translate into the following values for ρ2:

• small ρ2 = 0.02

• medium ρ2 = 0.13

• large ρ2 = 0.26

7.2 Options

You can switch between an exact procedure for the calcula- tion of the distribution of the squared multiple correlation coefficient ρ2 and a three-moment F approximation sug- gested by Lee (1971, p.123). The latter is slightly faster and may be used to check the results of the exact routine.

7.3 Examples

7.3.1 Power and sample size

Example 1 We replicate an example given for the proce- dure for the fixed model, but now under the assumption that the predictors are not fixed but random samples: We assume that a dependent variable Y is predicted by as set B of 5 predictors and that ρ2YX is 0.10, that is that the 5 pre- dictors account for 10% of the variance of Y. The sample size is N = 95 subjects. What is the power of the F test that ρ2YX = 0 at α = 0.05?

We choose the following settings in G * Power to calcu- late the power:

• Select Type of power analysis: Post hoc

• Input Tail(s): One H1 ρ2: 0.1 α err prob: 0.05 Total sample size: 95 Number of predictors: 5 H0 ρ2: 0.0

19

• Output Lower critical R2: 0.115170 Upper critical R2: 0.115170 Power (1- β): 0.662627

The output shows that the power of this test is about 0.663 which is slightly lower than the power 0.674 found in the fixed model. This observation holds in general: The power in the random model is never larger than that found for the same scenario in the fixed model.

Example 2 We now replicate the test of the hypotheses H0 : ρ2 ≤ 0.3 versus H1 : ρ2 > 0.3 given in Shieh and Kung (2007, p.733), for N = 100, α = 0.05, and m = 5 predictors. We assume that H1ρ2 = 0.4 . The settings and output in this case are:

• Select Type of power analysis: Post hoc

• Input Tail(s): One H1 ρ2: 0.4 α err prob: 0.05 Total sample size: 100 Number of predictors: 5 H0 ρ2: 0.3

• Output Lower critical R2: 0.456625 Upper critical R2: 0.456625 Power (1- β): 0.346482

The results show, that H0 should be rejected if the observed R2 is larger than 0.457. The power of the test is about 0.346. Assume we observed R2 = 0.5. To calculate the associated p-value we may use the G * Power -calculator. The syntax of the CDF of the squared sample multiple correlation co- efficient is mr2cdf(R2,ρ2,m+1,N). Thus for the present case we insert 1-mr2cdf(0.5,0.3,6,100) in the calculator and pressing Calculate gives 0.01278. These values replicate those given in Shieh and Kung (2007).

Example 3 We now ask for the minimum sample size re- quired for testing the hypothesis H0 : ρ2 ≥ 0.2 vs. the spe- cific alternative hypothesis H1 : ρ2 = 0.05 with 5 predictors to achieve power=0.9 and α = 0.05 (Example 2 in Shieh and Kung (2007)). The inputs and outputs are:

• Select Type of power analysis: A priori

• Input Tail(s): One H1 ρ2: 0.05 α err prob: 0.05 Power (1- β): 0.9 Number of predictors: 5 H0 ρ2: 0.2

• Output Lower critical R2: 0.132309 Upper critical R2: 0.132309 Total sample size: 153 Actual power: 0.901051

The results show that N should not be less than 153. This confirms the results in Shieh and Kung (2007).

7.3.2 Using confidence intervals to determine the effect size

Suppose that in a regression analysis with 5 predictors and N = 50 we observed a squared multiple correlation coef- ficient R2 = 0.3 and we want to use the lower boundary of the 95% confidence interval for ρ2 as H1 ρ2. Pressing the Determine-button next to the effect size field in the main window opens the effect size drawer. After selecting input mode "From confidence interval" we insert the above val- ues (50, 5, 0.3, 0.95) in the corresponding input field and set Rel C.I. pos to use (0=left, 1=right) to 0 to se- lect the left interval border. Pressing calculate computes the lower, upper, and 95% two-sided confidence intervals: [0, 4245], [0.0589, 1] and [0.0337, 0.4606]. The left boundary of the two-sided interval (0.0337) is transfered to the field H1 ρ2.

7.3.3 Using predictor correlations to determine effect size

We may use assumptions about the (m×m) correlation ma- trix between a set of m predictors, and the m correlations between predictor variables and the dependent variable Y to determine ρ2. Pressing the Determine-button next to the effect size field in the main window opens the effect size drawer. After selecting input mode "From predictor corre- lations" we insert the number of predictors in the corre- sponding field and press "Insert/edit matrix". This opens a input dialog (see Fig. (10)). Suppose that we have 4 pre- dictors and that the 4 correlations between Xi and Y are u = (0.3, 0.1,−0.2, 0.2). We insert this values in the tab "Corr between predictors and outcome". Assume further that the correlations between X1 and X3 and between X2 and X4 are 0.5 and 0.2, respectively, whereas all other pre- dictor pairs are uncorrelated. We insert the correlation ma- trix

B =

 

1 0 0.5 0 0 1 0 0.2

0.5 0 1 0 0 0.2 0 1

 

under the "Corr between predictors" tab. Pressing the "Calc ρ2"-button computes ρ2 = uB−1u′ = 0.297083, which also confirms that B is positive-definite and thus a correct corre- lation matrix.

7.4 Related tests

Similar tests in G * Power 3.0:

• Linear Multiple Regression: Deviation of R2 from zero.

• Linear Multiple Regression: Increase of R2.

7.5 Implementation notes

The procedure uses the exact sampling distribution of the squared multiple correlation coefficient (MRC-distribution) Lee (1971, 1972). The parameters of this distribution are the population squared multiple correlation coefficient ρ2, the

20

number of predictors m, and the sample size N. The only difference between the H0 and H1 distribution is that the population multiple correlation coefficient is set to "H0 ρ2" in the former and to "H1 ρ2" in the latter case.

Several algorithms for the computation of the exact or approximate CDF of the distribution have been pro- posed (Benton & Krishnamoorthy, 2003; Ding, 1996; Ding & Bargmann, 1991; Lee, 1971, 1972). Benton and Krishnamoor- thy (2003) have shown, that the implementation proposed by Ding and Bargmann (1991) (that is used in Dunlap, Xin, and Myers (2004)) may produce grossly false results in some cases. The implementation of Ding (1996) has the disadvantage that it overflows for large sample sizes, be- cause factorials occuring in ratios are explicitly evaluated. This can easily be avoided by using the log of the gamma function in the computation instead.

In G * Power we use the procedure of Benton and Krish- namoorthy (2003) to compute the exact CDF and a modi- fied version of the procedure given in Ding (1996) to com- pute the exact PDF of the distribution. Optionally, one can choose to use the 3-moment noncentral F approximation proposed by Lee (1971) to compute the CDF. The latter pro- cedure has also been used by Steiger and Fouladi (1992) in their R2 program, which provides similar functionality.

7.6 Validation

The power and sample size results were checked against the values produced by R2 (Steiger & Fouladi, 1992), the ta- bles in Gatsonis and Sampson (1989), and results reported in Dunlap et al. (2004) and Shieh and Kung (2007). Slight deviations from the values computed with R2 were found, which are due to the approximation used in R2, whereas complete correspondence was found in all other tests made. The confidence intervals were checked against values com- puted in R2, the results reported in Shieh and Kung (2007), and the tables given in Mendoza and Stafford (2001).

7.7 References

See Chapter 9 in Cohen (1988) for a description of the fixed model. The random model is described in Gatsonis and Sampson (1989) and Sampson (1974).

21

8 Exact: Proportion - sign test

The sign test is equivalent to a test that the proba- bility π of an event in the populations has the value π0 = 0.5. It is identical to the special case π0 = 0.5 of the test Exact: Proportion - difference from constant (one sample case). For a more thorough de- scription see the comments for that test.

8.1 Effect size index

The effect size index is g = π − 0.5. (Cohen, 1969, p. 142) defines the following effect size

conventions:

• small g = 0.05

• medium g = 0.15

• large g = 0.25

8.2 Options

See comments for Exact: Proportion - difference from constant (one sample case) in chapter 4 (page 11).

8.3 Examples

8.4 Related tests

Similar tests in G * Power 3.0:

• Exact: Proportion - difference from constant (one sam- ple case).

8.5 Implementation notes

See comments for Exact: Proportion - difference from constant (one sample case) in chapter 4 (page 11).

8.6 Validation

The results were checked against the tabulated val- ues in Cohen (1969, chap. 5). For more information see comments for Exact: Proportion - difference from constant (one sample case) in chapter 4 (page 11).

22

9 Exact: Generic binomial test

9.1 Effect size index

9.2 Options

Since the binomial distribution is discrete, it is normally not possible to achieve exactly the nominal α-level. For two- sided tests this leads to the problem how to “distribute” α on the two sides. G * Power offers three options (case 1 is the default):

1. Assign α/2 on both sides: Both sides are handled inde- pendently in exactly the same way as in a one-sided test. The only difference is that here α/2 is used in- stead of α. From the three options this one leads to the greatest deviation from the actual α (in post hoc analy- ses).

2. Assign to minor tail α/2, then rest to major tail (α2 = α/2, α1 = α − α2): First α/2 is applied on the side of the central distribution that is farther away from the noncentral distribution (minor tail). The criterion used on the other side is then α − α1, where α1 is the actual α found on the minor side. Since α1 ≤ α/2 one can conclude that (in post hoc analyses) the sum of the ac- tual values α1 + α2 is in general closer to the nominal α-level than in case 1.

3. Assign α/2 on both sides, then increase to minimize the dif- ference of α1 + α2 to α: The first step is exactly the same as in case 1. Then, in the second step, the critical val- ues on both sides of the distribution are increased (us- ing the lower of the two potential incremental α-values) until the sum of both actual α’s is as close as possible to the nominal α.

9.3 Examples

9.4 Related tests

9.5 Implementation notes

9.6 Validation

The results were checked against the values produced by GPower 2.0.

9.7 References

Cohen...

23

10 F test: Fixed effects ANOVA - one way

The fixed effects one-way ANOVA tests whether there are any differences between the means µi of k ≥ 2 normally distributed random variables with equal variance σ. The random variables represent measurements of a variable X in k fixed populations. The one-way ANOVA can be viewed as an extension of the two group t test for a difference of means to more than two groups.

The null hypothesis is that all k means are identical H0 : µ1 = µ2 = . . . = µk. The alternative hypothesis states that at least two of the k means differ. H1 : µi 6= µj, for at least one pair i, j with 1 ≤ i, j ≤ k.

10.1 Effect size index

The effect size f is defined as: f = σm /σ. In this equa- tion σm is the standard deviation of the group means µi and σ the common standard deviation within each of the k groups. The total variance is then σ2t = σ

2 m + σ

2. A dif- ferent but equivalent way to specify the effect size is in terms of η2, which is defined as η2 = σ2m /σ

2 t . That is, η

2

is the ratio between the between-groups variance σ2m and the total variance σ2t and can be interpreted as “proportion of variance explained by group membership”. The relation- ship between η2 and f is: η2 = f 2/(1 + f 2) or solved for f : f =

√ η2/(1 − η2).

Cohen (1969, p.348) defines the following effect size con- ventions:

• small f = 0.10

• medium f = 0.25

• large f = 0.40

If the mean µi and size ni of all k groups are known then the standard deviation σm can be calculated in the following way:

µ̄ = ∑ k i=1 wi µi, (grand mean),

σm =

√ ∑

k i=1 wi(µi − µ̄)

2.

where wi = ni /(n1 + n2 + · · · + nk) stands for the relative size of group i.

Pressing the Determine button to the left of the effect size label opens the effect size drawer. You can use this drawer to calculate the effect size f from variances, from η2 or from the group means and group sizes. The drawer essentially contains two different dialogs and you can use the Select procedure selection field to choose one of them.

10.1.1 Effect size from means

In this dialog (see left side of Fig. 11) you normally start by setting the number of groups. G * Power then provides you with a mean and group size table of appropriate size. Insert the standard deviation σ common to all groups in the SD σ within each group field. Then you need to specify the mean µi and size ni for each group. If all group sizes are equal then you may insert the common group size in the input field to the right of the Equal n button. Clicking on

Figure 11: Effect size dialogs to calculate f

this button fills the size column of the table with the chosen value.

Clicking on the Calculate button provides a preview of the effect size that results from your inputs. If you click on the Calculate and transfer to main window button then G * Power calculates the effect size and transfers the result into the effect size field in the main window. If the number of groups or the total sample size given in the ef- fect size drawer differ from the corresponding values in the main window, you will be asked whether you want to ad- just the values in the main window to the ones in the effect size drawer.

24

10.1.2 Effect size from variance

This dialog offers two ways to specify f . If you choose From Variances then you need to insert the variance of the group means, that is σ2m, into the Variance explained by special effect field, and the square of the common standard deviation within each group, that is σ2, into the Variance within groups field. Alternatively, you may choose the option Direct and then specify the effect size f via η2.

10.2 Options

This test has no options.

10.3 Examples

We compare 10 groups, and we have reason to expect a "medium" effect size ( f = .25). How many subjects do we need in a test with α = 0.05 to achieve a power of 0.95?

• Select Type of power analysis: A priori

• Input Effect size f : 0.25 α err prob: 0.05 Power (1-β err prob): 0.95 Number of groups: 10

• Output Noncentrality parameter λ: 24.375000 Critical F: 1.904538 Numerator df: 9 Denominator df: 380 Total sample size: 390 Actual Power: 0.952363

Thus, we need 39 subjects in each of the 10 groups. What if we had only 200 subjects available? Assuming that both α and β error are equally costly (i.e., the ratio q := beta/alpha = 1) which probably is the default in basic research, we can compute the following compromise power analysis:

• Select Type of power analysis: Compromise

• Input Effect size f : 0.25 β/α ratio: 1 Total sample size: 200 Number of groups: 10

• Output Noncentrality parameter λ: 12.500000 Critical F: 1.476210 Numerator df: 9 Denominator df: 190 α err prob: 0.159194 β err prob: 0.159194 Power (1-β err prob): 0.840806

10.4 Related tests

• ANOVA: Fixed effects, special, main effects and inter- actions

• ANOVA: Repeated measures, between factors

10.5 Implementation notes

The distribution under H0 is the central F(k −1, N − k) dis- tribution with numerator d f1 = k − 1 and denominator d f2 = N − k. The distribution under H1 is the noncentral F(k − 1, N − k, λ) distribution with the same df’s and non- centrality parameter λ = f 2 N. (k is the number of groups, N is the total sample size.)

10.6 Validation

The results were checked against the values produced by GPower 2.0.

25

11 F test: Fixed effects ANOVA - spe- cial, main effects and interactions

This procedure may be used to calculate the power of main effects and interactions in fixed effects ANOVAs with fac- torial designs. It can also be used to compute power for planned comparisons. We will discuss both applications in turn.

11.0.1 Main effects and interactions

To illustrate the concepts underlying tests of main effects and interactions we will consider the specific example of an A × B × C factorial design, with i = 3 levels of A, j = 3 levels of B, and k = 4 levels of C. This design has a total number of 3 × 3 × 4 = 36 groups. A general assumption is that all groups have the same size and that in each group the dependent variable is normally distributed with identi- cal variance.

In a three factor design we may test three main effects of the factors A, B, C, three two-factor interactions A × B, A × C, B × C, and one three-factor interaction A × B × C. We write µijk for the mean of group A = i, B = j, C = k. To indicate the mean of means across a dimension we write a star (?) in the corresponding index. Thus, in the example µij? is the mean of the groups A = i, B = j, C = 1, 2, 3, 4. To simplify the discussion we assume that the grand mean µ??? over all groups is zero. This can always be achieved by subtracting a given non-zero grand mean from each group mean.

In testing the main effects, the null hypothesis is that all means of the corresponding factor are identical. For the main effect of factor A the hypotheses are, for instance:

H0 : µ1?? = µ2?? = µ3??

H1 : µi?? 6= µj?? for at least one index pair i, j.

The assumption that the grand mean is zero implies that ∑i µi?? = ∑j µ?j? = ∑k µ??k = 0. The above hypotheses are therefore equivalent to

H0 : µi?? = 0 for all i

H1 : µi?? 6= 0 for at least one i.

In testing two-factor interactions, the residuals δij?, δi?k, and δ?ik of the groups means after subtraction of the main effects are considered. For the A × B interaction of the ex- ample, the 3 × 3 = 9 relevant residuals are δij? = µij? − µi?? − µ?j?. The null hypothesis of no interaction effect states that all residuals are identical. The hypotheses for the A × B interaction are, for example:

H0 : δij? = δkl? for all index pairs i, j and k, l.

H1 : δij? 6= δkl? for at least one combination of i, j and k, l.

The assumption that the grand mean is zero implies that ∑i,j δij? = ∑i,k δi?k = ∑j,k δ?jk = 0. The above hypotheses are therefore equivalent to

H0 : δij? = 0 for all i, j

H1 : δij? 6= 0 for at least one i, j.

In testing the three-factor interactions, the residuals δijk of the group means after subtraction of all main effects and all two-factor interactions are considered. In a three factor design there is only one possible three-factor interaction. The 3 × 3 × 4 = 36 residuals in the example are calculated as δijk = µijk − µi?? − µ?j? − µ??k − δij? − δi?k − δ?jk. The null hypothesis of no interaction states that all residuals are equal. Thus,

H0 : δijk = δlmn for all combinations of index triples i, j, k and l, m, n.

H1 : δijk 6= δlmn for at least one combination of index triples i, j, k and l, m, n.

The assumption that the grand mean is zero implies that ∑i,j,k δijk = 0. The above hypotheses are therefore equivalent to

H0 : δijk = 0 for all i, j, k

H1 : δijk 6= 0 for at least one i, j, k.

It should be obvious how the reasoning outlined above can be generalized to designs with 4 and more factors.

11.0.2 Planned comparisons

Planned comparison are specific tests between levels of a factor planned before the experiment was conducted.

One application is the comparison between two sets of levels. The general idea is to subtract the means across two sets of levels that should be compared from each other and to test whether the difference is zero. Formally this is done by calculating the sum of the component-wise prod- uct of the mean vector ~µ and a nonzero contrast vector ~c (i.e. the scalar product of ~µ and c): C = ∑ki=1 ci µi. The contrast vector c contains negative weights for levels on one side of the comparison, positive weights for the lev- els on the other side of the comparison and zero for levels that are not part of the comparison. The sum of weights is always zero. Assume, for instance, that we have a fac- tor with 4 levels and mean vector ~µ = (2, 3, 1, 2) and that we want to test whether the means in the first two lev- els are identical to the means in the last two levels. In this case we define ~c = (−1/2,−1/2, 1/2, 1/2) and get C = ∑i ~µi~ci = −1 − 3/2 + 1/2 + 1 = −1.

A second application is testing polygonal contrasts in a trend analysis. In this case it is normally assumed that the factor represents a quantitative variable and that the lev- els of the factor that correspond to specific values of this quantitative variable are equally spaced (for more details, see e.g. Hays (1988, p. 706ff)). In a factor with k levels k − 1 orthogonal polynomial trends can be tested.

In planned comparisons the null hypothesis is: H0 : C = 0, and the alternative hypothesis H1 : C 6= 0.

11.1 Effect size index

The effect size f is defined as: f = σm /σ. In this equation σm is the standard deviation of the effects that we want to test and σ the common standard deviation within each of the groups in the design. The total variance is then σ2t = σ

2 m +

σ2. A different but equivalent way to specify the effect size

26

is in terms of η2, which is defined as η2 = σ2m /σ 2 t . That is,

η2 is the ratio between the between-groups variance σ2m and the total variance σ2t and can be interpreted as “proportion of variance explained by the effect under consideration”. The relationship between η2 and f is: η2 = f 2/(1 + f 2) or solved for f : f =

√ η2/(1 − η2).

Cohen (1969, p.348) defines the following effect size con- ventions:

• small f = 0.10

• medium f = 0.25

• large f = 0.40

Figure 12: Effect size dialog to calculate f

Clicking on the Determine button to the left of the effect size label opens the effect size drawer (see Fig. 12). You can use this drawer to calculate the effect size f from variances or from η2. If you choose From Variances then you need to insert the variance explained by the effect under considera- tion, that is σ2m, into the Variance explained by special effect field, and the square of the common standard de- viation within each group, that is σ2, into the Variance within groups field. Alternatively, you may choose the op- tion Direct and then specify the effect size f via η2.

See examples section below for information on how to calculate the effect size f in tests of main effects and inter- actions and tests of planned comparisons.

11.2 Options

This test has no options.

11.3 Examples

11.3.1 Effect sizes from means and standard deviations

To illustrate the test of main effects and interaction we as- sume the specific values for our A × B × C example shown in Table 13. Table 14 shows the results of a SPSS analysis (GLM univariate) done for these data. In the following we will show how to reproduce the values in the Observed Power column in the SPSS output with G * Power .

As a first step we calculate the grand mean of the data. Since all groups have the same size (n = 3) this is just the arithmetic mean of all 36 groups means: mg = 3.1382. We then subtract this grand mean from all cells (this step

is not essential but makes the calculation and discussion easier). Next, we estimate the common variance σ within each group by calculating the mean variance of all cells, that is, σ2 = 1/36 ∑i s

2 i = 1.71296.

Main effects To calculate the power for the A, B, and C main effects we need to know the effect size f = σm /σ. We already know σ2 to be 1.71296 but need to calculate the variance of the means σ2m for each factor. The procedure is analogous for all three main effects. We therefore demon- strate only the calculations necessary for the main effect of factor A.

We first calculate the three means for factor A: µi?? = {−0.722231, 1.30556,−0.583331}. Due to the fact that we have first subtracted the grand mean from each cell we have ∑i µi?? = 0, and we can easily compute the vari- ance of these means as mean square: σ2m = 1/3 ∑i µ

2 i?? =

0.85546. With these values we calculate f = √

σm /σ =√ 0.85546/1.71296 = 0.7066856. The effect size drawer

in G * Power can be used to do the last calculation: We choose From Variances and insert 0.85546 in the Variance explained by special effect and 1.71296 in the Error Variance field. Pressing the Calculate button gives the above value for f and a partial η2 of 0.3330686. Note that the partial η2 given by G * Power is calculated from f according to the formula η2 = f 2/(1 + f 2) and is not identical to the SPSS partial η2, which is based on sam- ple estimates. The relation between the two is “SPSS η20 ” = η2 N/(N + k(η2 − 1)), where N denotes the total sample size, k the total number of groups in the design and η2 the G * Power value. Thus η20 = 0.33306806 · 108/(108 − 36 + 0.33306806 · 36) = 0.42828, which is the value given in the SPSS output.

We now use G * Power to calculate the power for α = 0.05 and a total sample size 3 × 3 × 4 × 3 = 108. We set

• Select Type of power analysis: Post hoc

• Input Effect size f : 0.7066856 α err prob: 0.05 Total sample size: 108 Numerator df: 2 (number of factor levels - 1, A has 3 levels) Number of groups: 36 (total number of groups in the design)

• Output Noncentrality parameter λ: 53.935690 Critical F: 3.123907 Denominator df: 72 Power (1-β err prob): 0.99999

The value of the noncentrality parameter and the power computed by G * Power are identical to the values in the SPSS output.

Two-factor interactions To calculate the power for two- factor interactions A × B, A × C, and A × B we need to calculate the effect size f corresponding to the values given in table 13. The procedure is analogous for each of the three

27

Figure 13: Hypothetical means (m) and standard deviations (s) of a 3 × 3 × 4 design.

Figure 14: Results computed with SPSS for the values given in table 13

two-factor interactions and we thus restrict ourselves to the A × B interaction.

The values needed to calculate σ2m are the 3 × 3 = 9 residuals δij?. They are given by δij? = µij? − µi?? − µ?j? = {0.555564, -0.361111, -0.194453, -0.388903, 0.444447, -0.0555444, -0.166661, -0.0833361, 0.249997}. The mean of these values is zero (as a consequence of subtracting the grand mean). Thus, the variance σm is given by 1/9 ∑i,j δ

2 ij? = 0.102881. This results in an effect size

f = √

0.102881/1.71296 = 0.2450722 and a partial η2 = 0.0557195. Using the formula given in the previous section on main effects it can be checked that this corresponds to a “SPSS η20 ” of 0.0813, which is identical to that given in the SPSS output.

We use G * Power to calculate the power for α = 0.05 and a total sample size 3 × 3 × 4 × 3 = 108. We set:

• Select Type of power analysis: Post hoc

• Input Effect size f : 0.2450722 α err prob: 0.05 Total sample size: 108 Numerator df: 4 (#A-1)(#B-1) = (3-1)(3-1)

Number of groups: 36 (total number of groups in the design)

• Output Noncentrality parameter λ: 6.486521 Critical F: 2.498919 Denominator df: 72 Power (1-β err prob): 0.475635

(The notation #A in the comment above means number of levels in factor A). A check reveals that the value of the non- centrality parameter and the power computed by G * Power are identical to the values for (A * B) in the SPSS output.

Three-factor interations To calculate the effect size of the three-factor interaction corresponding to the values given in table 13 we need the variance of the 36 residuals δijk = µijk − µi?? − µ?j? − µ??k − δij? − δi?j − δ?jk = {0.333336, 0.777792, -0.555564, -0.555564, -0.416656, -0.305567, 0.361111, 0.361111, 0.0833194, -0.472225, 0.194453, 0.194453, 0.166669, -0.944475, 0.388903, 0.388903, 0.666653, 0.222242, -0.444447, -0.444447, -0.833322, 0.722233, 0.0555444, 0.0555444, -0.500006, 0.166683, 0.166661, 0.166661, -0.249997, 0.083325, 0.0833361, 0.0833361, 0.750003, -0.250008, -

28

0.249997, -0.249997}. The mean of these values is zero (as a consequence of subtracting the grand mean). Thus, the variance σm is given by 1/36 ∑i,j,k δ

2 ijk = 0.185189. This

results in an effect size f = √

0.185189/1.71296 = 0.3288016 and a partial η2 = 0.09756294. Using the formula given in the previous section on main effects it can be checked that this corresponds to a “SPSS η2” of 0.140, which is identical to that given in the SPSS output.

We use G * Power to calculate the power for α = 0.05 and a total sample size 3×3×4×3 = 108. We therefore choose

• Select Type of power analysis: Post hoc

• Input Effect size f : 0.3288016 α err prob: 0.05 Total sample size: 108 Numerator df: 12 (#A-1)(#B-1)(#C-1) = (3-1)(3-1)(4- 1) Number of groups: 36 (total number of groups in the design)

• Output Noncentrality parameter λ: 11.675933 Critical F: 1.889242 Denominator df: 72 Power (1-β err prob): 0.513442

(The notation #A in the comment above means number of levels in factor A). Again a check reveals that the value of the noncentrality parameter and the power computed by G * Power are identical to the values for (A * B * C) in the SPSS output.

11.3.2 Using conventional effect sizes

In the example given in the previous section, we assumed that we know the true values of the mean and variances in all groups. We are, however, seldom in that position. In- stead, we usually only have rough estimates of the expected effect sizes. In these cases we may resort to the conventional effect sizes proposed by Cohen.

Assume that we want to calculate the total sample size needed to achieve a power of 0.95 in testing the A × C two- factor interaction at α level 0.05. Assume further that the total design in this scenario is A × B × C with 3 × 2 × 5 factor levels, that is, 30 groups. Theoretical considerations suggest that there should be a small interaction. We thus use the conventional value f = 0.1 defined by Cohen (1969) as small effect. The inputs into and outputs of G * Power for this scenario are:

• Select Type of power analysis: A priori

• Input Effect size f : 0.1 α err prob: 0.05 Power (1-β err prob): 0.95 Numerator df: 8 (#A-1)(#C-1) = (3-1)(5-1) Number of groups: 30 (total number of groups in the design)

• Output Noncentrality parameter λ: 22.830000 Critical F: 1.942507 Denominator df: 2253 Total sample size: 2283 Actual power: 0.950078

G * Power calculates a total sample size of 2283. Please note that this sample size is not a multiple of the group size 30 (2283/30 = 76.1)! If you want to ensure that your have equal group sizes, round this value up to a multiple of 30 by choosing a total sample size of 30*77=2310. A post hoc analysis with this sample size reveals that this increases the power to 0.952674.

11.3.3 Power for planned comparisons

To calculate the effect size f = σm /σ for a given compari- son C = ∑ki=1 µi ci we need to know—besides the standard deviation σ within each group—the standard deviation σm of the effect. It is given by:

σm = |C|√

N k ∑

i=1 c2i /ni

where N, ni denote total sample size and sample size in group i, respectively.

Given the mean vector µ = (1.5, 2, 3, 4), sample size ni = 5 in each group, and standard deviation σ = 2 within each group, we want to calculate the power for the following contrasts:

contrast weights c σµ f η2

1,2 vs. 3,4 - 12 - 1 2

1 2

1 2 0.875 0.438 0.161

lin. trend -3 -1 1 3 0.950 0.475 0.184 quad. trend 1 -1 -1 1 0.125 0.063 0.004

Each contrast has a numerator d f = 1. The denominator dfs are N − k, where k is the number of levels (4 in the example).

To calculate the power of the linear trend at α = 0.05 we specify:

• Select Type of power analysis: A priori

• Input Effect size f : 0.475164 α err prob: 0.05 Total sample size: 20 Numerator df: 1 Number of groups: 4

• Output Noncentrality parameter λ: 4.515617 Critical F: 4.493998 Denominator df: 16 Power (1-β err prob): 0.514736

Inserting the f ’s for the other two contrasts yields a power of 0.451898 for the comparison of “1,2 vs. 3,4”, and a power of 0.057970 for the test of a quadratic trend.

29

11.4 Related tests

• ANOVA: One-way

11.5 Implementation notes

The distribution under H0 is the central F(d f1, N −k) distri- bution. The numerator d f1 is specified in the input and the denominator df is d f2 = N − k, where N is the total sample size and k the total number of groups in the design. The distribution under H1 is the noncentral F(d f1, N −k, λ) dis- tribution with the same df’s and noncentrality parameter λ = f 2 N.

11.6 Validation

The results were checked against the values produced by GPower 2.0.

30

12 t test: Linear Regression (size of slope, one group)

A linear regression is used to estimate the parameters a, b of a linear relationship Y = a + bX between the dependent variable Y and the independent variable X. X is assumed to be a set of fixed values, whereas Yi is modeled as a random variable: Yi = a + bXi + εi, where εi denotes normally dis- tributed random errors with mean 0 and standard deviation σi. A common assumption also adopted here is that all σi’s are identical, that is σi = σ. The standard deviation of the error is also called the standard deviation of the residuals.

A common task in linear regression analysis is to test whether the slope b is identical to a fixed value b0 or not. The null and the two-sided alternative hypotheses are:

H0 : b − b0 = 0 H1 : b − b0 6= 0.

12.1 Effect size index

Slope H1, the slope b of the linear relationship assumed under H1 is used as effect size measure. To fully specify the effect size, the following additional inputs must be given:

• Slope H0

This is the slope b0 assumed under H0.

• Std dev σ_x

The standard deviation σx of the values in X: σx =√ 1 N ∑

n i=1(Xi − X̄)2. The standard deviation must be >

0.

• Std dev σ_y

The standard deviation σy > 0 of the Y-values. Impor- tant relationships of σy to other relevant measures are:

σy = (bσx)/ρ (1)

σy = σ/ √

1 − ρ2 (2)

where σ denotes the standard deviation of the residu- als Yi − (aX + b) and ρ the correlation coefficient be- tween X and Y.

The effect size dialog may be used to determine Std dev σ_y and/or Slope H1 from other values based on Eqns (1) and (2) given above.

Pressing the button Determine on the left side of the ef- fect size label in the main window opens the effect size drawer (see Fig. 15).

The right panel in Fig 15 shows the combinations of in- put and output values in different input modes. The input variables stand on the left side of the arrow ’=>’, the output variables on the right side. The input values must conform to the usual restrictions, that is, σ > 0, σx > 0, σy > 0, −1 < ρ < 1. In addition, Eqn. (1) together with the restriction on ρ implies the additional restriction −1 < b · σx /σy < 1.

Clicking on the button Calculate and transfer to main window copies the values given in Slope H1, Std dev σ_y and Std dev σ_x to the corresponding input fields in the main window.

12.2 Options

This test has no options.

12.3 Examples

We replicate an example given on page 593 in Dupont and Plummer (1998). The question investigated in this example is, whether the actual average time spent per day exercising is related to the body mass index (BMI) after 6 month on a training program. The estimated standard deviation of exercise time of participants is σx = 7.5 minutes. From a previous study the standard deviation of the BMI in the group of participants is estimated to be σy = 4. The sample size is N = 100 and α = 0.05. We want to determine the power with which a slope b0 = −0.0667 of the regression line (corresponding to a drop of BMI by 2 per 30 min/day exercise) can be detected.

• Select Type of power analysis: Post hoc

• Input Tail(s): Two Effect size Slope H1: -0.0667 α err prob: 0.05 Total sample size: 100 Slope H0: 0 Std dev σ_x: 7.5 Std dev σ_y: 4

• Output Noncentrality parameter δ: 1.260522 Critical t:-1.984467 Df: 98 Power (1- β): 0.238969

The output shows that the power of this test is about 0.24. This confirms the value estimated by Dupont and Plummer (1998, p. 596) for this example.

12.3.1 Relation to Multiple Regression: Omnibus

The present procedure is a special case of Multiple regres- sion, or better: a different interface to the same procedure using more convenient variables. To show this, we demon- strate how the MRC procedure can be used to compute the example above. First, we determine R2 = ρ2 from the re- lation b = ρ · σy /σx , which implies ρ2 = (b · σx /σy)2. En- tering (-0.0667*7.5/4)^2 into the G * Power calculator gives ρ2 = 0.01564062. We enter this value in the effect size di- alog of the MRC procedure and compute an effect size f 2 = 0.0158891. Selecting a post hoc analysis and setting α err prob to 0.05, Total sample size to 100 and Number of predictors to 1, we get exactly the same power as given above.

12.4 Related tests

Similar tests in G * Power 3.0:

• Multiple Regression: Omnibus (R2 deviation from zero).

• Correlation: Point biserial model

31

Figure 15: Effect size drawer to calculate σy and/or slope b from various inputs constellations (see right panel).

12.5 Implementation notes

The H0-distribution is the central t distribution with d f2 = N − 2 degrees of freedom, where N is the sample size. The H1-distribution is the noncentral t distribution with the same degrees of freedom and noncentrality parameter δ = √

N[σx(b − b0)]/σy.

12.6 Validation

The results were checked against the values produced by by PASS (Hintze, 2006) and perfect correspondance was found.

32

13 F test: Multiple Regression - om- nibus (deviation of R2 from zero), fixed model

In multiple regression analyses the relation of a dependent variable Y to p independent factors X1, ..., Xm is studied. The present procedure refers to the so-called conditional or fixed factors model of multiple regression (Gatsonis & Sampson, 1989; Sampson, 1974), that is, it is assumed that

Y = Xβ + ε

where X = (1X1 X2 · · · Xm) is a N × (m + 1) matrix of a constant term and fixed and known predictor variables Xi. The elements of the column vector β of length m + 1 are the regression weights, and the column vector ε of length N contains error terms, with εi ∼ N(0, σ).

This procedure allows power analyses for the test that the proportion of variance of a dependent variable Y explained by a set of predictors B, that is R2Y·B, is zero. The null and alternate hypotheses are:

H0 : R2Y·B = 0 H1 : R2Y·B > 0.

As will be shown in the examples section, the MRC pro- cedure is quite flexible and can be used as a substitute for some other tests.

13.1 Effect size index

The general definition of the effect size index f 2 used in this procedure is: f 2 = VS /VE, where VS is the propor- tion of variance explained by a set of predictors, and VE the residual or error variance (VE + VS = 1). In the special case considered here (case 0 in Cohen (1988, p. 407ff.)) the pro- portion of variance explained is given by VS = R2Y·B and the residual variance by VE = 1 − R2Y·B. Thus:

f 2 = R2Y·B

1 − R2Y·B

and conversely:

R2Y·B = f 2

1 + f 2

Cohen (1988, p. 412) defines the following conventional values for the effect size f 2:

• small f 2 = 0.02

• medium f 2 = 0.15

• large f 2 = 0.35

Pressing the button Determine on the left side of the ef- fect size label in the main window opens the effect size drawer (see Fig. 16).

Effect size from squared multiple correlation coefficient Choosing input mode "From correlation coefficient" allows to calculate effect size f 2 from the squared multiple corre- lation coefficient R2Y·B.

Figure 16: Effect size drawer to calculate f 2 from either R2 or from predictor correlations.

Effect size from predictor correlations By choosing the option "From predictor correlations" (see Fig. (17)) one may compute ρ2 from the matrix of correlations among the pre- dictor variables and the correlations between predictors and the dependent variable Y. Pressing the "Insert/edit matrix"- button opens a window, in which one can specify (a) the row vector u containing the correlations between each of the m predictors Xi and the dependent variable Y, and (b) the m × m matrix B of correlations among the predictors. The squared multiple correlation coefficient is then given by ρ2 = uB−1u′. Each input correlation must lie in the inter- val [−1, 1], the matrix B must be positive-definite, and the resulting ρ2 must lie in the interval [0, 1]. Pressing the But- ton "Calc ρ2" tries to calculate ρ2 from the input and checks the positive-definiteness of matrix B and the restriction on ρ2.

13.2 Options

This test has no options.

13.3 Examples

13.3.1 Basic example

We assume that a dependent variable Y is predicted by as set B of 5 predictors and that the population R2Y·B is 0.10, that is that the 5 predictors account for 10% of the variance of Y. The sample size is N = 95 subjects. What is the power of the F test at α = 0.05?

First, by inserting R2 = 0.10 in the effect size dialog we calculate the corresponding effect size f 2 = 0.1111111. We then use the following settings in G * Power to calculate the power:

• Select Type of power analysis: Post hoc

• Input Effect size f 2: 0.1111111 α err prob: 0.05 Total sample size: 95

33

Figure 17: Input of correlations between predictors and Y (top) and the matrix of the correlations among predictors (see text).

Number of predictors: 5

• Output Noncentrality parameter λ: 10.555555 Critical F: 2.316858 Numerator df: 5 Denominator df: 89 Power (1- β): 0.673586

The output shows that the power of this test is about 0.67. This confirms the value estimated by Cohen (1988, p. 424) in his example 9.1, which uses identical values.

13.3.2 Example showing relations to a one-way ANOVA and the two-sample t-test

We assume the means 2, 3, 2, 5 for the k = 4 experimen- tal groups in a one-factor design. The sample sizes in the group are 5, 6, 6, 5, respectively, and the common standard deviation is assumed to be σ = 2. Using the effect size di- alog of the one-way ANOVA procedure we calculate from these values the effect size f = 0.5930904. With α = 0.05, 4 groups, and a total sample size of 22, a power of 0.536011 is computed.

An equivalent analysis could be done using the MRC procedure. To this end we set the effect size of the MRC procedure to f 2 = 0.593090442 = 0.351756 and the number of predictors to (number of groups -1) — in the example to k − 1 = 3. Choosing the remaining parameters α and total sample size exactly as in the one-way ANOVA case leads to identical result.

From the fact that the two-sided t-tests for the difference in means of two independent groups is a special case of the one-way ANOVA, it can be concluded that this test can also be regarded as a special case of the MRC procedure. The relation between the effect size d of the t-test and f 2 is as follows: f 2 = (d/2)2.

13.3.3 Example showing the relation to two-sided tests of point-biserial correlations

For testing whether a point biserial correlation r is dif- ferent from zero, using the special procedure provided in G * Power is recommended. But the power analysis of the two-sided test can also be done as well with the current MRC procedure. We just need to set R2 = r2 and Number of predictor = 1.

Given the correlation r = 0.5 (r2 = 0.25) we get f 2 =

34

0.25/(1 − 0.25) = 0.333. For α = 0.05 and total sample size N = 12 a power of 0.439627 is computed from both proce- dures.

13.4 Related tests

Similar tests in G * Power 3.0:

• Multiple Regression: Special (R2 increase).

• ANOVA: Fixed effects, omnibus, one-way.

• Means: Difference between two independent means (two groups)

• Correlation: Point biserial model

13.5 Implementation notes

The H0-distribution is the central F distribution with nu- merator degrees of freedom d f1 = m, and denominator de- grees of freedom d f2 = N − m − 1, where N is the sam- ple size and p the number of predictors in the set B ex- plaining the proportion of variance given by R2Y·B. The H1- distribution is the noncentral F distribution with the same degrees of freedom and noncentrality parameter λ = f 2 N.

13.6 Validation

The results were checked against the values produced by GPower 2.0 and those produced by PASS (Hintze, 2006). Slight deviations were found to the values tabulated in Co- hen (1988). This is due to an approximation used by Cohen (1988) that underestimates the noncentrality parameter λ and therefore also the power. This issue is discussed more thoroughly in Erdfelder, Faul, and Buchner (1996).

13.7 References

See Chapter 9 in Cohen (1988) .

35

14 F test: Multiple Regression - special (increase of R2), fixed model

In multiple regression analyses the relation of a dependent variable Y to m independent factors X1, ..., Xm is studied. The present procedure refers to the so-called conditional or fixed factors model of multiple regression (Gatsonis & Sampson, 1989; Sampson, 1974), that is, it is assumed that

Y = Xβ + ε

where X = (1X1 X2 · · · Xm) is a N × (m + 1) matrix of a constant term and fixed and known predictor variables Xi. The elements of the column vector β of length m + 1 are the regression weights, and the column vector ε of length N contains error terms, with εi ∼ N(0, σ).

This procedure allows power analyses for the test, whether the proportion of variance of variable Y explained by a set of predictors A is increased if an additional nonempty predictor set B is considered. The variance ex- plained by predictor sets A, B, and A ∪ B is denoted by R2Y·A, R

2 Y·B, and R

2 Y·A,B, respectively.

Using this notation, the null and alternate hypotheses are:

H0 : R2Y·A,B − R 2 Y·A = 0

H1 : R2Y·A,B − R 2 Y·A > 0.

The directional form of H1 is due to the fact that R2Y·A,B, that is the proportion of variance explained by sets A and B combined, cannot be lower than the proportion R2Y·A ex- plained by A alone.

As will be shown in the examples section, the MRC pro- cedure is quite flexible and can be used as a substitute for some other tests.

14.1 Effect size index

The general definition of the effect size index f 2 used in this procedure is: f 2 = VS /VE, where VS is the proportion of variance explained by a set of predictors, and VE is the residual or error variance.

1. In the first special cases considered here (case 1 in Cohen (1988, p. 407ff.)), the proportion of variance explained by the additional predictor set B is given by VS = R2Y·A,B − R

2 Y·A and the residual variance by

VE = 1 − R2Y·A,B. Thus:

f 2 = R2Y·A,B − R

2 Y·A

1 − R2Y·A,B

The quantity R2Y·A,B − R 2 Y,A is also called the semipar-

tial multiple correlation and symbolized by R2Y·(B·A). A further interesting quantity is the partial multiple cor- relation coefficient, which is defined as

R2YB·A := R2Y·A,B − R

2 Y·A

1 − R2Y·A =

R2Y·(B·A) 1 − R2Y·A

Using this definition, f can alternatively be written in terms of the partial R2:

f 2 = R2YB·A

1 − R2YB·A

2. In a second special case (case 2 in Cohen (1988, p. 407ff.)), the same effect variance VS is considered, but it is assumed that there is a third set of predictors C that also accounts for parts of the variance of Y and thus reduces the error variance: VE = 1 − R2Y·A,B,C . In this case, the effect size is

f 2 = R2Y·A,B − R

2 Y·A

1 − R2Y·A,B,C

We may again define a partial R2x as:

R2x := R2YB·A

1 − (R2Y·A,B,C − R 2 YB·A)

and with this quantity we get

f 2 = R2x

1 − R2x Note: Case 1 is the special case of case 2, where C is the empty set.

Pressing the button Determine on the left side of the ef- fect size label in the main window opens the effect size drawer (see Fig. 18) that may be used to calculate f 2 from the variances VS and VE, or alternatively from the partial R2.

Figure 18: Effect size drawer to calculate f 2 from variances or from the partial R2.

(Cohen, 1988, p. 412) defines the following conventional values for the effect size f 2:

• small f 2 = 0.02

• medium f 2 = 0.15

• large f 2 = 0.35

14.2 Options

This test has no options.

14.3 Examples

14.3.1 Basic example for case 1

We make the following assumptions: A dependent variable Y is predicted by two sets of predictors A and B. The 5

36

predictors in A alone account for 25% of the variation of Y, thus R2Y·A = 0.25. Including the 4 predictors in set B increases the proportion of variance explained to 0.3, thus R2Y·A,B = 0.3. We want to calculate the power of a test for the increase due to the inclusion of B, given α = 0.01 and a total sample size of 90.

First we use the option From variances in the ef- fect size drawer to calculate the effect size. In the in- put field Variance explained by special effect we in- sert R2Y·A,B − R

2 Y·A = 0.3 − 0.25 = 0.05, and as Residual

variance we insert 1 − R2Y·A,B = 1 − 0.3 = 0.7. After click- ing on Calculate and transfer to main window we see that this corresponds to a partial R2 of about 0.0666 and to an effect size f = 0.07142857. We then set the input field Numerator df in the main window to 4, the number of predictors in set B, and Number of predictors to the total number of predictors in A and B, that is to 4 + 5 = 9.

This leads to the following analysis in G * Power :

• Select Type of power analysis: Post hoc

• Input Effect size f 2: 0.0714286 α err prob: 0.01 Total sample size: 90 Number of tested predictors: 4 Total number of predictors: 9

• Output Noncentrality parameter λ: 6.428574 Critical F: 3.563110 Numerator df: 4 Denominator df: 80 Power (1- β): 0.241297

We find that the power of this test is very low, namely about 0.24. This confirms the result estimated by Cohen (1988, p. 434) in his example 9.10, which uses identical val- ues. It should be noted, however, that Cohen (1988) uses an approximation to the correct formula for the noncentrality parameter λ that in general underestimates the true λ and thus also the true power. In this particular case, Cohen esti- mates λ = 6.1, which is only slightly lower than the correct value 6.429 given in the output above.

By using an a priori analysis, we can compute how large the sample size must be to achieve a power of 0.80. We find that the required sample size is N = 242.

14.3.2 Basis example for case 2

Here we make the following assumptions: A dependent variable Y is predicted by three sets of predictors A, B and C, which stand in the following causal relationship A ⇒ B ⇒ C. The 5 predictors in A alone account for 10% of the variation of Y, thus R2Y·A = 0.10. Including the 3 predic- tors in set B increases the proportion of variance explained to 0.16, thus R2Y·A,B = 0.16. Considering in addition the 4 predictors in set C, increases the explained variance further to 0.2, thus R2Y·A,B,C = 0.2. We want to calculate the power of a test for the increase in variance explained by the in- clusion of B in addition to A, given α = 0.01 and a total

sample size of 200. This is a case 2 scenario, because the hy- pothesis only involves sets A and B, whereas set C should be included in the calculation of the residual variance.

We use the option From variances in the effect size drawer to calculate the effect size. In the input field Variance explained by special effect we insert R2Y·A,B − R

2 Y·A = 0.16 − 0.1 = 0.06, and as Residual

variance we insert 1 − R2Y·A,B,C = 1 − 0.2 = 0.8. Clicking on Calculate and transfer to main window shows that this corresponds to a partial R2 = 0.06976744 and to an ef- fect size f = 0.075. We then set the input field Numerator df in the main window to 3, the number of predictors in set B (which are responsible to a potential increase in variance explained), and Number of predictors to the total number of predictors in A, B and C (which all influence the residual variance), that is to 5 + 3 + 4 = 12.

This leads to the following analysis in G * Power :

• Select Type of power analysis: Post hoc

• Input Effect size f 2: 0.075 α err prob: 0.01 Total sample size: 200 Number of tested predictors: 3 Total number of predictors: 12

• Output Noncentrality parameter λ:15.000000 Critical F: 3.888052 Numerator df: 3 Denominator df: 187 Power (1- β): 0.766990

We find that the power of this test is about 0.767. In this case the power is slightly larger than the power value 0.74 esti- mated by Cohen (1988, p. 439) in his example 9.13, which uses identical values. This is due to the fact that his approx- imation for λ = 14.3 underestimates the true value λ = 15 given in the output above.

14.3.3 Example showing the relation to factorial ANOVA designs

We assume a 2 × 3 × 4 design with three factors U, V, W. We want to test main effects, two-way interactions (U × V, U × W, V × W) and the three-way interaction (U × V × W). We may use the procedure “ANOVA: Fixed ef- fects, special, main effects and interactions” in G * Power to do this analysis (see the corresponding entry in the man- ual for details). As an example, we consider the test of the V ×W interaction. Assuming that Variance explained by the interaction = 0.422 and Error variance = 6.75, leads to an effect size f = 0.25 (a mean effect size according to Cohen (1988)). Numerator df = 6 corresponds to (levels of V - 1)(levels of W - 1), and the Number of Groups = 24 is the total number of cells (2 · 3 · 4 = 24) in the design. With α = 0.05 and total sample size 120 we compute a power of 0.470.

We now demonstrate, how these analyses can be done with the MRC procedure. A main factor with k levels cor- responds to k − 1 predictors in the MRC analysis, thus

37

the number of predictors is 1, 2, and 3 for the three fac- tors U, V, and W. The number of predictors in interactions is the product of the number of predictors involved. The V × W interaction, for instance, corresponds to a set of (3 − 1)(4 − 1) = (2)(3) = 6 predictors.

To test an effect with MRC we need to isolate the relative contribution of this source to the total variance, that is we need to determine VS. We illustrate this for the V ×W inter- action. In this case we must find R2Y·V×W by excluding from the proportion of variance that is explained by V, W, V ×W together, that is R2Y·V,W,V×W , the contribution of main ef- fects, that is: R2Y·V×W = R

2 Y·V,W,V×W − R

2 Y·V,W . The residual

variance VE is the variance of Y from which the variance of all sources in the design have been removed.

This is a case 2 scenario, in which V × W corresponds to set B with 2 · 3 = 6 predictors, V ∪ W corresponds to set A with 2 + 3 = 5 predictors, and all other sources of variance, that is U, U × V, U × W, U × V × W, correspond to set C with (1 + (1 · 2) + (1 · 3) + (1 · 2 · 3)) = 1 + 2 + 3 + 6 = 12 predictors. Thus, the total number of predictors is (6 + 5 + 12) = 23. Note: The total number of predictors is always (number of cells in the design -1).

We now specify these contributions numerically: RY·A,B,C = 0.325, RY·A,B − R2Y Ȧ = R

2 Y·V×W =

0.0422. Inserting these values in the effect size dia- log (Variance explained by special effect = 0.0422, Residual variance = 1-0.325 = 0.675) yields an effect size f 2 = 0.06251852 (compare these values with those chosen in the ANOVA analysis and note that f 2 = 0.252 = 0.0625). To calculate power, we set Number of tested predictors = 6 (= number of predictors in set B), and Total number of predictors = 23. With α = 0.05 and N = 120, we get— as expected—the same power 0.470 as from the equivalent ANOVA analysis.

14.4 Related tests

Similar tests in G * Power 3.0:

• Multiple Regression: Omnibus (R2 deviation from zero).

• ANOVA: Fixed effects, special, main effect and interac- tions.

14.5 Implementation notes

The H0-distribution is the central F distribution with nu- merator degrees of freedom d f1 = q, and denominator de- grees of freedom d f2 = N − m − 1, where N is the sample size, q the number of tested predictors in set B, which is re- sponsible for a potential increase in explained variance, and m is the total number of predictors. In case 1 (see section “effect size index”), m = q + w, in case 2, m = q + w + v, where w is the number of predictors in set A and v the number of predictors in set C. The H1-distribution is the noncentral F distribution with the same degrees of freedom and noncentrality parameter λ = f 2 N.

14.6 Validation

The results were checked against the values produced by GPower 2.0 and those produced by PASS (Hintze, 2006).

Slight deviations were found to the values tabulated in Co- hen (1988). This is due to an approximation used by Cohen (1988) that underestimates the noncentrality parameter λ and therefore also the power. This issue is discussed more thoroughly in Erdfelder et al. (1996).

14.7 References

See Chapter 9 in Cohen (1988) .

38

15 F test: Inequality of two Variances

This procedure allows power analyses for the test that the population variances σ20 and σ

2 1 of two normally dis-

tributed random variables are identical. The null and (two- sided)alternate hypothesis of this test are:

H0 : σ1 − σ0 = 0 H1 : σ1 − σ0 6= 0.

The two-sided test (“two tails”) should be used if there is no a priori restriction on the sign of the deviation assumed in the alternate hypothesis. Otherwise use the one-sided test (“one tail”).

15.1 Effect size index

The ratio σ21 /σ 2 0 of the two variances is used as effect size

measure. This ratio is 1 if H0 is true, that is, if both vari- ances are identical. In an a priori analysis a ratio close or even identical to 1 would imply an exceedingly large sample size. Thus, G * Power prohibits inputs in the range [0.999, 1.001] in this case.

Pressing the button Determine on the left side of the ef- fect size label in the main window opens the effect size drawer (see Fig. 19) that may be used to calculate the ratio from two variances. Insert the variances σ20 and σ

2 1 in the

corresponding input fields.

Figure 19: Effect size drawer to calculate variance ratios.

15.2 Options

This test has no options.

15.3 Examples

We want to test whether the variance σ21 in population B is different from the variance σ20 in population A. We here regard a ratio σ21 /σ

2 0 > 1.5( or, the other way around, <

1/1.5 = 0.6666) as a substantial difference. We want to realize identical sample sizes in both groups.

How many subjects are needed to achieve the error levels α = 0.05 and β = 0.2 in this test? This question can be answered by using the following settings in G * Power :

• Select Type of power analysis: A priori

• Input Tail(s): Two Ratio var1/var0: 1.5 α err prob: 0.05

Power (1- β): 0.80 Allocation ratio N2/N1: 1

• Output Lower critical F: 0.752964 Upper critical F: 1.328085 Numerator df: 192 Denominator df: 192 Sample size group 1: 193 Sample size group 2: 193 Actual power : 0.800105

The output shows that we need at least 386 subjects (193 in each group) in order to achieve the desired level of the α and β error. To apply the test, we would estimate both variances s21 and s

2 0 from samples of size N1 and N0, respec-

tively. The two-sided test would be significant at α = 0.05 if the statistic x = s21/s

2 0 were either smaller than the lower

critical value 0.753 or greater then the upper critical value 1.328.

By setting “Allocation ratio N2/N1 = 2”, we can easily check that a much larger total sample size, namely N = 443 (148 and 295 in group 1 and 2, respectively), would be required if the sample sizes in both groups are clearly different.

15.4 Related tests

Similar tests in G * Power 3.0:

• Variance: Difference from constant (two sample case).

15.5 Implementation notes

It is assumed that both populations are normally dis- tributed and that the means are not known in advance but estimated from samples of size N1 and N0, respectively. Un- der these assumptions, the H0-distribution of s21/s

2 0 is the

central F distribution with N1 − 1 numerator and N0 − 1 denominator degrees of freedom (FN1−1,N0−1), and the H1- distribution is the same central F distribution scaled with the variance ratio, that is, (σ21 /σ

2 0 ) · FN1−1,N0−1.

15.6 Validation

The results were successfully checked against values pro- duced by PASS (Hintze, 2006) and in a Monte Carlo simu- lation.

39

16 t test: Correlation - point biserial model

The point biserial correlation is a measure of association between a continuous variable X and a binary variable Y, the latter of which takes on values 0 and 1. It is assumed that the continuous variables X at Y = 0 and Y = 1 are normally distributed with means µ0, µ1 and equal variance σ. If π is the proportion of values with Y = 1 then the point biserial correlation coefficient is defined as:

ρ = (µ1 − µ0)

√ π(1 − π)

σx

where σx = σ + (µ1 − µ0)2/4. The point biserial correlation is identical to a Pearson cor-

relation between two vectors x, y, where xi contains a value from X at Y = j, and yi = j codes the group from which the X was taken.

The statistical model is the same as that underlying a test for a differences in means µ0 and µ1 in two inde- pendent groups. The relation between the effect size d = (µ1 − µ0)/σ used in that test and the point biserial correla- tion ρ considered here is given by:

ρ = d√

d2 + N 2

n0 n1

where n0, n1 denote the sizes of the two groups and N = n0 + n1.

The power procedure refers to a t-test used to evaluate the null hypothesis that there is no (point-biserial) correla- tion in the population (ρ = 0). The alternative hypothesis is that the correlation coefficient has a non-zero value r.

H0 : ρ = 0 H1 : ρ = r.

The two-sided (“two tailed”) test should be used if there is no restriction on the sign of ρ under the alternative hy- pothesis. Otherwise use the one-sided (“one tailed”) test.

16.1 Effect size index

The effect size index |ρ| is the absolute value of the cor- relation coefficient in the population as postulated in the alternative hypothesis. From this definition it follows that 0 ≤ |ρ| < 1.

Cohen (1969, p.79) defines the following effect size con- ventions for |ρ|:

• small ρ = 0.1

• medium ρ = 0.3

• large ρ = 0.5

Pressing the Determine button on the left side of the ef- fect size label opens the effect size drawer (see Fig. 20). You can use it to calculate |ρ| from the coefficient of determina- tion ρ2.

16.2 Options

This test has no options.

Figure 20: Effect size dialog to compute ρ from the coefficient of determination ρ2.

16.3 Examples

We want to know how many subjects it takes to detect r = .25 in the population, given α = β = .05. Thus, H0: ρ = 0, H1: ρ = 0.25.

• Select Type of power analysis: A priori

• Input Tail(s): One Effect size |ρ|: 0.25 α err prob: 0.05 Power (1-β err prob): 0.95

• Output noncentrality parameter δ: 3.306559 Critical t: 1.654314 df: 162 Total sample size: 164 Actual power: 0.950308

The results indicate that we need at least N = 164 sub- jects to ensure a power > 0.95. The actual power achieved with this N (0.950308) is slightly higher than the requested power.

To illustrate the connection to the two groups t test, we calculate the corresponding effect size d for equal sample sizes n0 = n1 = 82:

d = Nr√

n0n1(1 − r2) =

164 · 0.25√ 82 · 82 · (1 − 0.252)

= 0.51639778

Performing a power analysis for the one-tailed two group t test with this d, n0 = n1 = 82, and α = 0.05 leads to exactly the same power 0.930308 as in the example above. If we assume unequal sample sizes in both groups, for example n0 = 64, n1 = 100, then we would compute a different value for d:

d = Nr√

n0n1(1 − r2) =

164 · 0.25√ 100 · 64 · (1 − 0.252)

= 0.52930772

but we would again arrive at the same power. It thus poses no restriction of generality that we only input the total sam- ple size and not the individual group sizes in the t test for correlation procedure.

16.4 Related tests

Similar tests in G * Power 3.0:

• Exact test for the difference of one (Pearson) correlation from a constant

• Test for the difference of two (Pearson) correlations

40

16.5 Implementation notes

The H0-distribution is the central t-distribution with d f = N − 2 degrees of freedom. The H1-distribution is the non- central t-distribution with d f = N − 2 and noncentrality parameter δ where

δ =

√ |ρ|2 N

1.0 −|ρ|2

N represents the total sample size and |ρ| represents the effect size index as defined above.

16.6 Validation

The results were checked against the values produced by GPower 2.0 Faul and Erdfelder (1992).

41

17 t test: Linear Regression (two groups)

A linear regression is used to estimate the parameters a, b of a linear relationship Y = a + bX between the dependent variable Y and the independent variable X. X is assumed to be a set of fixed values, whereas Yi is modeled as a random variable: Yi = a + bXi + εi, where εi denotes normally dis- tributed random errors with mean 0 and standard deviation σi. A common assumption also adopted here is that all σi’s are identical, that is σi = σ. The standard deviation of the error is also called the standard deviation of the residuals.

If we have determined the linear relationships between X and Y in two groups: Y1 = a1 + b1 X1, Y2 = a2 + b2 X2, we may ask whether the slopes b1, b2 are identical.

The null and the two-sided alternative hypotheses are

H0 : b1 − b2 = 0 H1 : b1 − b2 6= 0.

17.1 Effect size index

The absolute value of the difference between the slopes |∆slope| = |b1 − b2| is used as effect size. To fully spec- ify the effect size, the following additional inputs must be given:

• Std dev residual σ

The standard deviation σ > 0 of the residuals in the combined data set (i.e. the square root of the weighted sum of the residual variances in the two data sets): If σ2r1 and σ

2 r2 denote the variance of the residuals r1 =

(a1 + b1 X1)− Y1 and r2 = (a2 + b2 X2)− Y2 in the two groups, and n1, n2 the respective sample sizes, then

σ =

√ n1σ2r1 + n2σ

2 r2

n1 + n2 (3)

• Std dev σ_x1

The standard deviation σx1 > 0 of the X-values in group 1.

• Std dev σ_x2

The standard deviation σx2 > 0 of the X-values in group 2.

Important relationships between the standard deviations σxi of Xi, σyi of Yi, the slopes bi of the regression lines, and the correlation coefficient ρi between Xi and Yi are:

σyi = (bi σxi )/ρi (4)

σyi = σri / √

1 − ρ2i (5)

where σi denotes the standard deviation of the residuals Yi − (bi X + ai).

The effect size dialog may be used to determine Std dev residual σ and |∆ slope| from other values based on Eqns (7), (8) and (9) given above. Pressing the button Determine on the left side of the effect size label in the main window opens the effect size drawer (see Fig. 23).

The left panel in Fig 23 shows the combinations of in- put and output values in different input modes. The in- put variables stand on the left side of the arrow ’=>’, the

output variables on the right side. The input values must conform to the usual restrictions, that is, σxi > 0, σyi > 0, −1 < ρi < 1. In addition, Eq (8) together with the restriction on ρi implies the additional restriction −1 < b · σxi /σyi < 1.

Clicking on the button Calculate and transfer to main window copies the values given in Std dev σ_x1, Std dev σ_x2, Std dev residual σ, Allocation ration N2/N1, and |∆ slope | to the corresponding input fields in the main window.

17.2 Options

This test has no options.

17.3 Examples

We replicate an example given on page 594 in Dupont and Plummer (1998) that refers to an example in Armitage, Berry, and Matthews (2002, p. 325). The data and relevant statistics are shown in Fig. (24). Note: Contrary to Dupont and Plummer (1998), we here consider the data as hypoth- esized true values and normalize the variance by N not (N − 1).

The relation of age and vital capacity for two groups of men working in the cadmium industry is investigated. Group 1 includes n1 = 28 worker with less than 10 years of cadmium exposure, and Group 2 n2 = 44 workers never exposed to cadmium. The standard deviation of the ages in both groups are σx1 = 9.029 and σx2 = 11.87. Regressing vi- tal capacity on age gives the following slopes of the regres- sion lines β1 = −0.04653 and β2 = −0.03061. To calculate the pooled standard deviation of the residuals we use the effect size dialog: We use input mode “σ_x, σ_y, slope => residual σ, ρ” and insert the values given above, the standard deviations σy1, σy2 of y (capacity) as given in Fig. 24, and the allocation ratio n2/n1 = 44/28 = 1.571428. This results in an pooled standard deviation of the residuals of σ = 0.5578413 (compare the right panel in Fig. 23).

We want to recruit enough workers to detect a true differ- ence in slope of |(−0.03061)− (−0.04653)| = 0.01592 with 80% power, α = 0.05 and the same allocation ratio to the two groups as in the sample data.

• Select Type of power analysis: A priori

• Input Tail(s): Two |∆ slope|: 0.01592 α err prob: 0.05 Power (1- β): 0.80 Allocation ratio N2/N1: 1.571428 Std dev residual σ: 0.5578413 Std dev σ_x1: 9.02914 Std dev σ_x2: 11.86779

• Output Noncentrality parameter δ:2.811598 Critical t: 1.965697 Df: 415 Sample size group 1: 163 Sample size group 2: 256 Total sample size: 419 Actual power: 0.800980

42

Figure 21: Effect size drawer to calculate the pooled standard deviation of the residuals σ and the effect size |∆ slope | from various inputs constellations (see left panel). The right panel shows the inputs for the example discussed below.

29 5.21 50 3.50 27 5.29 41 3.77

29 5.17 45 5.06 25 3.67 41 4.22

33 4.88 48 4.06 24 5.82 37 4.94

32 4.50 51 4.51 32 4.77 42 4.04

31 4.47 46 4.66 23 5.71 39 4.51

29 5.12 58 2.88 25 4.47 41 4.06

29 4.51 32 4.55 43 4.02

30 4.85 18 4.61 41 4.99

21 5.22 19 5.86 48 3.86

28 4.62 26 5.20 47 4.68

23 5.07 33 4.44 53 4.74

35 3.64 27 5.52 49 3.76

38 3.64 33 4.97 54 3.98

38 5.09 25 4.99 48 5.00

43 4.61 42 4.89 49 3.31

39 4.73 35 4.09 47 3.11

38 4.58 35 4.24 52 4.76

42 5.12 41 3.88 58 3.95

43 3.89 38 4.85 62 4.60

43 4.62 41 4.79 65 4.83

37 4.30 36 4.36 62 3.18

50 2.70 36 4.02 59 3.03

n1 = 28 n2 = 44

sx1 = 9.029147 sx2 = 11.867794

sy1 = 0.669424 sy2 = 0.684350

b1 = -0.046532 b2 = -0.030613

a1 = 6.230031 a2 = 5.680291

r1 = -0.627620 r2 = -0.530876

Group 1: exposure Group 2: no exposure

age capacity age capacity age capacity age capacity

Figure 22: Data for the example discussed Dupont and Plumer

43

The output shows that we need 419 workers in total, with 163 in group 1 and 256 in group 2. These values are close to those reported in Dupont and Plummer (1998, p. 596) for this example (166 + 261 = 427). The slight difference is due to the fact that they normalize the variances by N − 1, and use shifted central t-distributions instead of non-central t distributions.

17.3.1 Relation to Multiple Regression: Special

The present procedure is essentially a special case of Multi- ple regression, but provides a more convenient interface. To show this, we demonstrate how the MRC procedure can be used to compute the example above (see also Dupont and Plummer (1998, p.597)).

First, the data are combined into a data set of size n1 + n2 = 28 + 44 = 72. With respect to this combined data set, we define the following variables (vectors of length 72):

• y contains the measured vital capacity

• x1 contains the age data

• x2 codes group membership (0 = not exposed, 1 = ex- posed)

• x3 contains the element-wise product of x1 and x2

The multiple regression model:

y = β0 + β1 x1 + β2 x2 + β3 x3 + εi

reduces to y = β0 + β1 x1, and y = (β0 + β2 x2) + (β1 + β3)x1, for unexposed and exposed workers, respectively. In this model β3 represents the difference in slope between both groups, which is assumed to be zero under the null hypothesis. Thus, the above model reduces to

y = β0 + β1 x1 + β2 x2 + εi

if the null hypothesis is true. Performing a multiple regression analysis with the full

model leads to β1 = −0.01592 and R21 = 0.3243. With the reduced model assumed in the null hypothesis one finds R20 = 0.3115. From these values we compute the following effect size:

f 2 = R21 − R

2 0

1 − R21 =

0.3243 − 0.3115 1 − 0.3243

= 0.018870

Selecting an a priori analysis and setting α err prob to 0.05, Power (1-β err prob) to 0.80, Numerator df to 1 and Number of predictors to 3, we get N = 418, that is almost the same result as in the example above.

17.4 Related tests

Similar tests in G * Power 3.0:

• Multiple Regression: Omnibus (R2 deviation from zero).

• Correlation: Point biserial model

17.5 Implementation notes

The procedure implements a slight variant of the algorithm proposed in Dupont and Plummer (1998). The only differ- ence is, that we replaced the approximation by shifted cen- tral t-distributions used in their paper with noncentral t- distributions. In most cases this makes no great difference.

The H0-distribution is the central t distribution with d f = n1 + n2 − 4 degrees of freedom, where n1 and n2 denote the sample size in the two groups, and the H1 distribution is the non-central t distribution with the same degrees of freedom and the noncentrality parameter δ = ∆

√ n2.

Statistical test. The power is calculated for the t-test for equal slopes as described in Armitage et al. (2002) in chap- ter 11. The test statistic is (see Eqn 11.18, 11.19, 11.20):

t = b̂1 − b̂2 sb̂1−b̂2

with d f = n1 − n2 − 4 degrees of freedom. Let for group i ∈ {1, 2}, Sxi, Syi, denote the sum of

squares in X and Y (i.e. the variance times ni). Then a pooled estimate of the residual variance be obtained by s2r = (Sy1 + Sy2 )/(n1 + n2 − 4). The standard error of the difference of slopes is

sb̂1−b̂2 =

√ s2r

( 1

Sx1 +

1 Sx2

) Power of the test: In the procedure for equal slopes the

noncentrality parameter is δ = ∆ √

n, with ∆ = |∆slope|/σR and

σR =

√√√√σ2 (

1 + 1

mσ2x1 +

1 σ2x2

) (6)

where m = n1/n2, σx1 and σx2 are the standard deviations of X in group 1 and 2, respectively, and σ the common stan- dard deviation of the residuals.

17.6 Validation

The results were checked for a range of input scenarios against the values produced by the program PS published by Dupont and Plummer (1998). Only slight deviations were found that are probably due to the use of the non- central t-distribution in G * Power instead of the shifted central t-distributions that are used in PS.

44

18 t test: Linear Regression (two groups)

A linear regression is used to estimate the parameters a, b of a linear relationship Y = a + bX between the dependent variable Y and the independent variable X. X is assumed to be a set of fixed values, whereas Yi is modeled as a random variable: Yi = a + bXi + εi, where εi denotes normally dis- tributed random errors with mean 0 and standard deviation σi. A common assumption also adopted here is that all σi’s are identical, that is σi = σ. The standard deviation of the error is also called the standard deviation of the residuals.

If we have determined the linear relationships between X and Y in two groups: Y1 = a1 + b1 X1, Y2 = a2 + b2 X2, we may ask whether the slopes b1, b2 are identical.

The null and the two-sided alternative hypotheses are

H0 : b1 − b2 = 0 H1 : b1 − b2 6= 0.

18.1 Effect size index

The absolute value of the difference between the slopes |∆slope| = |b1 − b2| is used as effect size. To fully spec- ify the effect size, the following additional inputs must be given:

• Std dev residual σ

The standard deviation σ > 0 of the residuals in the combined data set (i.e. the square root of the weighted sum of the residual variances in the two data sets): If σ2r1 and σ

2 r2 denote the variance of the residuals r1 =

(a1 + b1 X1)− Y1 and r2 = (a2 + b2 X2)− Y2 in the two groups, and n1, n2 the respective sample sizes, then

σ =

√ n1σ2r1 + n2σ

2 r2

n1 + n2 (7)

• Std dev σ_x1

The standard deviation σx1 > 0 of the X-values in group 1.

• Std dev σ_x2

The standard deviation σx2 > 0 of the X-values in group 2.

Important relationships between the standard deviations σxi of Xi, σyi of Yi, the slopes bi of the regression lines, and the correlation coefficient ρi between Xi and Yi are:

σyi = (bi σxi )/ρi (8)

σyi = σri / √

1 − ρ2i (9)

where σi denotes the standard deviation of the residuals Yi − (bi X + ai).

The effect size dialog may be used to determine Std dev residual σ and |∆ slope| from other values based on Eqns (7), (8) and (9) given above. Pressing the button Determine on the left side of the effect size label in the main window opens the effect size drawer (see Fig. 23).

The left panel in Fig 23 shows the combinations of in- put and output values in different input modes. The in- put variables stand on the left side of the arrow ’=>’, the

output variables on the right side. The input values must conform to the usual restrictions, that is, σxi > 0, σyi > 0, −1 < ρi < 1. In addition, Eq (8) together with the restriction on ρi implies the additional restriction −1 < b · σxi /σyi < 1.

Clicking on the button Calculate and transfer to main window copies the values given in Std dev σ_x1, Std dev σ_x2, Std dev residual σ, Allocation ration N2/N1, and |∆ slope | to the corresponding input fields in the main window.

18.2 Options

This test has no options.

18.3 Examples

We replicate an example given on page 594 in Dupont and Plummer (1998) that refers to an example in Armitage et al. (2002, p. 325). The data and relevant statistics are shown in Fig. (24). Note: Contrary to Dupont and Plummer (1998), we here consider the data as hypothesized true values and normalize the variance by N not (N − 1).

The relation of age and vital capacity for two groups of men working in the cadmium industry is investigated. Group 1 includes n1 = 28 worker with less than 10 years of cadmium exposure, and Group 2 n2 = 44 workers never exposed to cadmium. The standard deviation of the ages in both groups are σx1 = 9.029 and σx2 = 11.87. Regressing vi- tal capacity on age gives the following slopes of the regres- sion lines β1 = −0.04653 and β2 = −0.03061. To calculate the pooled standard deviation of the residuals we use the effect size dialog: We use input mode “σ_x, σ_y, slope => residual σ, ρ” and insert the values given above, the standard deviations σy1, σy2 of y (capacity) as given in Fig. 24, and the allocation ratio n2/n1 = 44/28 = 1.571428. This results in an pooled standard deviation of the residuals of σ = 0.5578413 (compare the right panel in Fig. 23).

We want to recruit enough workers to detect a true differ- ence in slope of |(−0.03061)− (−0.04653)| = 0.01592 with 80% power, α = 0.05 and the same allocation ratio to the two groups as in the sample data.

• Select Type of power analysis: A priori

• Input Tail(s): Two |∆ slope|: 0.01592 α err prob: 0.05 Power (1- β): 0.80 Allocation ratio N2/N1: 1.571428 Std dev residual σ: 0.5578413 Std dev σ_x1: 9.02914 Std dev σ_x2: 11.86779

• Output Noncentrality parameter δ:2.811598 Critical t: 1.965697 Df: 415 Sample size group 1: 163 Sample size group 2: 256 Total sample size: 419 Actual power: 0.800980

45

Figure 23: Effect size drawer to calculate the pooled standard deviation of the residuals σ and the effect size |∆ slope | from various inputs constellations (see left panel). The right panel shows the inputs for the example discussed below.

29 5.21 50 3.50 27 5.29 41 3.77

29 5.17 45 5.06 25 3.67 41 4.22

33 4.88 48 4.06 24 5.82 37 4.94

32 4.50 51 4.51 32 4.77 42 4.04

31 4.47 46 4.66 23 5.71 39 4.51

29 5.12 58 2.88 25 4.47 41 4.06

29 4.51 32 4.55 43 4.02

30 4.85 18 4.61 41 4.99

21 5.22 19 5.86 48 3.86

28 4.62 26 5.20 47 4.68

23 5.07 33 4.44 53 4.74

35 3.64 27 5.52 49 3.76

38 3.64 33 4.97 54 3.98

38 5.09 25 4.99 48 5.00

43 4.61 42 4.89 49 3.31

39 4.73 35 4.09 47 3.11

38 4.58 35 4.24 52 4.76

42 5.12 41 3.88 58 3.95

43 3.89 38 4.85 62 4.60

43 4.62 41 4.79 65 4.83

37 4.30 36 4.36 62 3.18

50 2.70 36 4.02 59 3.03

n1 = 28 n2 = 44

sx1 = 9.029147 sx2 = 11.867794

sy1 = 0.669424 sy2 = 0.684350

b1 = -0.046532 b2 = -0.030613

a1 = 6.230031 a2 = 5.680291

r1 = -0.627620 r2 = -0.530876

Group 1: exposure Group 2: no exposure

age capacity age capacity age capacity age capacity

Figure 24: Data for the example discussed Dupont and Plumer

46

The output shows that we need 419 workers in total, with 163 in group 1 and 256 in group 2. These values are close to those reported in Dupont and Plummer (1998, p. 596) for this example (166 + 261 = 427). The slight difference is due to the fact that they normalize the variances by N − 1, and use shifted central t-distributions instead of non-central t distributions.

18.3.1 Relation to Multiple Regression: Special

The present procedure is essentially a special case of Multi- ple regression, but provides a more convenient interface. To show this, we demonstrate how the MRC procedure can be used to compute the example above (see also Dupont and Plummer (1998, p.597)).

First, the data are combined into a data set of size n1 + n2 = 28 + 44 = 72. With respect to this combined data set, we define the following variables (vectors of length 72):

• y contains the measured vital capacity

• x1 contains the age data

• x2 codes group membership (0 = not exposed, 1 = ex- posed)

• x3 contains the element-wise product of x1 and x2

The multiple regression model:

y = β0 + β1 x1 + β2 x2 + β3 x3 + εi

reduces to y = β0 + β1 x1, and y = (β0 + β2 x2) + (β1 + β3)x1, for unexposed and exposed workers, respectively. In this model β3 represents the difference in slope between both groups, which is assumed to be zero under the null hypothesis. Thus, the above model reduces to

y = β0 + β1 x1 + β2 x2 + εi

if the null hypothesis is true. Performing a multiple regression analysis with the full

model leads to β1 = −0.01592 and R21 = 0.3243. With the reduced model assumed in the null hypothesis one finds R20 = 0.3115. From these values we compute the following effect size:

f 2 = R21 − R

2 0

1 − R21 =

0.3243 − 0.3115 1 − 0.3243

= 0.018870

Selecting an a priori analysis and setting α err prob to 0.05, Power (1-β err prob) to 0.80, Numerator df to 1 and Number of predictors to 3, we get N = 418, that is almost the same result as in the example above.

18.4 Related tests

Similar tests in G * Power 3.0:

• Multiple Regression: Omnibus (R2 deviation from zero).

• Correlation: Point biserial model

18.5 Implementation notes

The procedure implements a slight variant of the algorithm proposed in Dupont and Plummer (1998). The only differ- ence is, that we replaced the approximation by shifted cen- tral t-distributions used in their paper with noncentral t- distributions. In most cases this makes no great difference.

The H0-distribution is the central t distribution with d f = n1 + n2 − 4 degrees of freedom, where n1 and n2 denote the sample size in the two groups, and the H1 distribution is the non-central t distribution with the same degrees of freedom and the noncentrality parameter δ = ∆

√ n2.

Statistical test. The power is calculated for the t-test for equal slopes as described in Armitage et al. (2002) in chap- ter 11. The test statistic is (see Eqn 11.18, 11.19, 11.20):

t = b̂1 − b̂2 sb̂1−b̂2

with d f = n1 − n2 − 4 degrees of freedom. Let for group i ∈ {1, 2}, Sxi, Syi, denote the sum of

squares in X and Y (i.e. the variance times ni). Then a pooled estimate of the residual variance be obtained by s2r = (Sy1 + Sy2 )/(n1 + n2 − 4). The standard error of the difference of slopes is

sb̂1−b̂2 =

√ s2r

( 1

Sx1 +

1 Sx2

) Power of the test: In the procedure for equal slopes the

noncentrality parameter is δ = ∆ √

n, with ∆ = |∆slope|/σR and

σR =

√√√√σ2 (

1 + 1

mσ2x1 +

1 σ2x2

) (10)

where m = n1/n2, σx1 and σx2 are the standard deviations of X in group 1 and 2, respectively, and σ the common stan- dard deviation of the residuals.

18.6 Validation

The results were checked for a range of input scenarios against the values produced by the program PS published by Dupont and Plummer (1998). Only slight deviations were found that are probably due to the use of the non- central t-distribution in G * Power instead of the shifted central t-distributions that are used in PS.

47

19 t test: Means - difference between two dependent means (matched pairs)

The null hypothesis of this test is that the population means µx , µy of two matched samples x, y are identical. The sam- pling method leads to N pairs (xi, yi) of matched observa- tions.

The null hypothesis that µx = µy can be reformulated in terms of the difference zi = xi − yi. The null hypothesis is then given by µz = 0. The alternative hypothesis states that µz has a value different from zero:

H0 : µz = 0 H1 : µz 6= 0.

If the sign of µz cannot be predicted a priori then a two- sided test should be used. Otherwise use the one-sided test.

19.1 Effect size index

The effect size index dz is defined as:

dz = |µz| σz

= |µx − µy|√

σ2x + σ 2 y − 2ρxy σx σy

where µx , µy denote the population means, σx and σy de- note the standard deviation in either population, and ρxy denotes the correlation between the two random variables. µz and σz are the population mean and standard deviation of the difference z.

A click on the Determine button to the left of the effect size label in the main window opens the effect size drawer (see Fig. 25). You can use this drawer to calculate dz from the mean and standard deviation of the differences zi. Al- ternatively, you can calculate dz from the means µx and µy and the standard deviations σx and σy of the two random variables x, y as well as the correlation between the random variables x and y.

19.2 Options

This test has no options.

19.3 Examples

Let us try to replicate the example in Cohen (1969, p. 48). The effect of two teaching methods on algebra achieve- ments are compared between 50 IQ matched pairs of pupils (i.e. 100 pupils). The effect size that should be detected is d = (m0 − m1)/σ = 0.4. Note that this is the effect size index representing differences between two independent means (two groups). We want to use this effect size as a basis for a matched-pairs study. A sample estimate of the correlation between IQ-matched pairs in the population has been calculated to be r = 0.55. We thus assume ρxy = 0.55. What is the power of a two-sided test at an α level of 0.05?

To compute the effect size dz we open the effect size drawer and choose "from group parameters". We only know the ratio d = (µx − µy)∆m/σ = 0.4. We are thus free to choose any values for the means and (equal) standard de- viations that lead to this ratio. We set “Mean group 1 = 0”,

Figure 25: Effect size dialog to calculate effect size d from the pa- rameters of two correlated random variables

.

“Mean group 2 = 0.4”, “SD group 1 = 1”, and “SD group 2 = 1”. Finally, we set the “Correlation between groups” to be 0.55. Pressing the "Calculate and transfer to main window" button copies the resulting effect size dz = 0.421637 to the main window. We supply the remaining input values in the main window and press "Calculate"

• Select Type of power analysis: Post hoc

• Input Tail(s): Two Effect size dz: 0.421637 α err prob: 0.05 Total sample size: 50

• Output Noncentrality parameter δ: 2.981424 Critical t: 2.009575 df: 49 Power (1- β): 0.832114

The computed power of 0.832114 is close to the value 0.84 estimated by Cohen using his tables. To estimate the in- crease in power due to the correlation between pairs (i.e., due to the shifting from a two-group design to a matched- pairs design), we enter "Correlation between groups = 0" in the effect size drawer. This leads to dz = 0.2828427. Repeat- ing the above analysis with this effect size leads to a power of only 0.500352.

How many subjects would we need to arrive at a power of about 0.832114 in a two-group design? We click X-Y plot for a range of values to open the Power Plot window. Let us plot (on the y axis) the power (with markers and displaying the values in the plot) as a function of the total sample size. We want to plot just 1 graph with the err prob set to 0.05 and effect size dz fixed at 0.2828427.

48

Figure 26: Power vs. sample size plot for the example.

Clicking Draw plot yields a graph in which we can see that we would need about 110 pairs, that is, 220 subjects (see Fig 26 on page 49). Thus, by matching pupils we cut down the required size of the sample by more than 50%.

19.4 Related tests

19.5 Implementation notes

The H0 distribution is the central Student t distribution t(N − 1, 0), the H1 distribution is the noncentral Student t distribution t(N − 1, δ), with noncentrality δ = d

√ N.

19.6 Validation

The results were checked against the values produced by GPower 2.0.

49

20 t test: Means - difference from con- stant (one sample case)

The one-sample t test is used to determine whether the pop- ulation mean µ equals some specified value µ0. The data are from a random sample of size N drawn from a normally distributed population. The true standard deviation in the population is unknown and must be estimated from the data. The null and alternate hypothesis of the t test state:

H0 : µ − µ0 = 0 H1 : µ − µ0 6= 0.

The two-sided (“two tailed”) test should be used if there is no restriction on the sign of the deviation from µ0 as- sumed in the alternate hypothesis. Otherwise use the one- sided (“one tailed”) test .

20.1 Effect size index

The effect size index d is defined as:

d = µ − µ0

σ

where σ denotes the (unknown) standard deviation in the population. Thus, if µ and µ0 deviate by one standard de- viation then d = 1.

Cohen (1969, p. 38) defines the following conventional values for d:

• small d = 0.2

• medium d = 0.5

• large d = 0.8

Pressing the Determine button on the left side of the ef- fect size label opens the effect size drawer (see Fig. 27). You can use this dialog to calculate d from µ, µ0 and the stan- dard deviation σ.

Figure 27: Effect size dialog to calculate effect size d means and standard deviation.

.

20.2 Options

This test has no options.

20.3 Examples

We want to test the null hypothesis that the population mean is µ = µ0 = 10 against the alternative hypothesis that µ = 15. The standard deviation in the population is es- timated to be σ = 8. We enter these values in the effect size dialog: Mean H0 = 10, Mean H1 = 15, SD σ = 8 to calculate the effect size d = 0.625.

Next we want to know how many subjects it takes to detect the effect d = 0.625, given α = β = .05. We are only interested in increases in the mean and thus choose a one- tailed test.

• Select Type of power analysis: A priori

• Input Tail(s): One Effect size d: 0.625 α err prob: 0.05 Power (1-β err prob): 0.95

• Output Noncentrality parameter δ: 3.423266 Critical t: 1.699127 df: 29 Total sample size: 30 Actual power: 0.955144

The results indicates that we need at least N = 30 subjects to ensure a power > 0.95. The actual power achieved with this N (0.955144) is slightly higher than the requested one.

Cohen (1969, p.59) calculates the sample size needed in a two-tailed test that the departure from the population mean is at least 10% of the standard deviation, that is d = 0.1, given α = 0.01 and β ≤ 0.1. The input and output values for this analysis are:

• Select Type of power analysis: A priori

• Input Tail(s): Two Effect size d: 0.1 α err prob: 0.01 Power (1-β err prob): 0.90

• Output Noncentrality parameter δ: 3.862642 Critical t: 2.579131 df: 1491 Total sample size: 1492 Actual power: 0.900169

G * Power outputs a sample size of n = 1492 which is slightly higher than the value 1490 estimated by Cohen us- ing his tables.

20.4 Related tests

20.5 Implementation notes

The H0 distribution is the central Student t distribution t(N − 1, 0), the H1 distribution is the noncentral Student t distribution t(N − 1, δ), with noncentrality δ = d

√ N.

50

20.6 Validation

The results were checked against the values produced by GPower 2.0.

51

21 t test: Means - difference be- tween two independent means (two groups)

The two-sample t test is used to determine if two popu- lation means µ1, µ2 are equal. The data are two samples of size n1 and n2 from two independent and normally dis- tributed populations. The true standard deviations in the two populations are unknown and must be estimated from the data. The null and alternate hypothesis of this t test are:

H0 : µ1 − µ2 = 0 H1 : µ1 − µ2 6= 0.

The two-sided test (“two tails”) should be used if there is no restriction on the sign of the deviation assumed in the alternate hypothesis. Otherwise use the one-sided test (“one tail”).

21.1 Effect size index

The effect size index d is defined as:

d = µ1 − µ2

σ

Cohen (1969, p. 38) defines the following conventional values for d:

• small d = 0.2

• medium d = 0.5

• large d = 0.8

Pressing the button Determine on the left side of the ef- fect size label opens the effect size drawer (see Fig. 28). You can use it to calculate d from the means and standard devi- ations in the two populations. The t-test assumes the vari- ances in both populations to be equal. However, the test is relatively robust against violations of this assumption if the sample sizes are equal (n1 = n2). In this case a mean σ′ may be used as the common within-population σ (Cohen, 1969, p.42):

σ′ =

√ σ21 + σ

2 2

2

In the case of substantially different sample sizes n1 6= n2 this correction should not be used because it may lead to power values that differ greatly from the true values (Co- hen, 1969, p.42).

21.2 Options

This test has no options.

21.3 Examples

21.4 Related tests

21.5 Implementation notes

The H0 distribution is the central Student t distribution t(N − 2, 0); the H1 distribution is the noncentral Student

Figure 28: Effect size dialog to calculate effect size d from the pa- rameters of two independent random variables.

.

t distribution t(N − 2, δ), where N = n1 + n2 and δ the noncentrality parameter, which is defined as:

δ = d √

n1n2 n1 + n2

21.6 Validation

The results were checked against the values produced by GPower 2.0.

52

22 Wilcoxon signed-rank test: Means - difference from constant (one sam- ple case)

This test is essentially identical to the Wilcoxon signed-rank test for matched pairs described in chapter 23. The only dif- ference is in the calculation of the effect size d (or the cor- responding moment p1). The effect size used here is analo- gous to the one sample t-test.

22.1 Examples

We replicate some power values from table 4 given in Shieh, Jan, and Randles (2007) (see table 29).

To compare these values with the power computed by G * Power , we choose the Lehmann method without conti- nuity correction. For effect size d = 0.1 and a normal distri- bution as parent we get:

• Select Type of power analysis: Post hoc

• Input Tail(s): One Parent distribution: Normal Effect size d: 0.1 α err prob: 0.05 Total sample size: 649

• Output critical z: 1.6448536 Power (1- β): 0.8000589 Moment p1: 0.5398278 Moment p2: 0.5562315 Moment p3: 0.3913924

For d = 0.8 and the Laplace parent we get

• Select Type of power analysis: Post hoc

• Input Tail(s): One Parent distribution: Laplace Effect size dz: 0.8 α err prob: 0.05 Total sample size: 11

• Output critical z: 1.6448536 Power (1- β): 0.8308341 Moment p1: 0.8387046 Moment p2: 0.8890997 Moment p3: 0.8206572

In both cases, the values computed with G * Power us- ing the “Lehmann method without continuity correction” correspond closely to those given in the table for the “Ex- act variance” method and both values correspond closely to simulated power.

In the latter case we would get a power of 0.923 With the A.R.E-method if the Laplace-parent is chosen. The lower

bound for the power that results if min ARE is chosen is 0.7319.

22.2 Related tests

Related tests in G * Power 3.0:

• Means: Difference from constant.

• Proportions: Sign test.

• Means: Wilcoxon rank sum test.

22.3 Validation

The results were checked against the results given in Shieh et al. (2007), analogous computations in PASS (Hintze, 2006) and results produced by unifyPow O’Brien (1998). There was complete correspondence with the values given in O’Brien, while there were slight differences to those pro- duced by PASS. The reason of these differences seems to be that PASS truncates the weighted sample sizes to integer values.

53

d = 0.1 0.2 0.4 0.6 0.8 1.0

Figure 29: A part of table 4 in Shieh et al. (2007) that compares nominal power values computed using different methods with simulated power

54

23 Wilcoxon signed-rank test: (matched pairs)

The Wilcoxon signed-rank test is a nonparametric alterna- tive to the one sample t test. Its use is mainly motivated by uncertainty concerning the assumption of normality made in the t test.

The Wilcoxon signed-rank test can be used to test whether a given distribution H is symmetric about zero. The power routines implemented in G * Power refer to the important special case of a “shift model”, which states that H is obtained by subtracting two symmetric distributions F and G, where G is obtained by shifting F by an amount ∆: G(x) = F(x − ∆) for all x. The relation of this shift model to the one sample t test is obvious if we assume that F is the fixed distribution with mean µ0 stated in the null hypoth- esis and G the distribution of the test group with mean µ. Under this assumptions H(x) = F(x)− G(x) is symmetric about zero under H0, that is if ∆ = µ − µ0 = 0 or, equiv- alently F(x) = G(x), and asymmetric under H1, that is if ∆ 6= 0.

The Wilcoxon signed-rank test The signed-rank test is based on ranks. Assume that a sample of size N is drawn from a distribution H(x). To each sample value xi a rank S between 1 and N is assigned that corresponds to the po- sition of |xi| in a increasingly ordered list of all absolute sample values. The general idea of the test is to calculate the sum of the ranks assigned to positive sample values (x > 0) and the sum of the ranks assigned to negative sam- ple values (x < 0) and to reject the hypothesis that H is symmetric if these two rank sums are clearly different.

The actual procedure is as follows: Since the rank sum of negative values is known if that of positive values is given, it suffices to consider the rank sum Vs of positive values. The positive ranks can be specified by a n-tupel (S1, . . . , Sn), where 0 ≤ n ≤ N. There are (Nn ) possible n-tuples for a given n. Since n can take on the values 0, 1, . . . , N, the total number of possible choices for the S’s is: ∑Ni=0 (

N i ) = 2

N . (We here assume a continuous distribution H for which the probabilities for ties, that is the occurance of two identical |x|, is zero.) Therefore, if the null hypothesis is true, then the probability to observe a particular n and a certain n- tuple is:P(N+ = n; S1 = s1, . . . , Sn = sn) = 1/2N . To calcu- late the probability to observe (under H0) a particular posi- tive rank sum Vs = S1 + . . . + Sn we just need to count the number k of all tuples with rank sum Vs and to add their probabilities, thus P(Vs = v) = k/2n. Doing this for all possible Vs between the minimal value 0 corresponding to the case n = 0, and the maximal value N(N + 1)/2, corre- sponding to the n = N tuple (1, 2, . . . , N), gives the discrete probability distribution of Vs under H0. This distribution is symmetric about N(N + 1)/4. Referring to this proba- bility distribution we choose in a one-sided test a critical value c with P(Vs ≥ c) ≤ α and reject the null hypothesis if a rank sum Vs > c is observed. With increasing sample size the exact distribution converges rapidly to the normal distribution with mean E(Vs) = N(N + 1)/4 and variance Var(Vs) = N(N + 1)(2N + 1)/24.

Logistic

Laplace

Normal

-6 -4 -2 0 2 4 6

0.0

0.1

0.2

0.3

0.4

0.5

x

f( x )

Figure 30: Densities of the Normal, Laplace, and Logistic distribu- tions (with µ = 0, σ = 1)

Power of the Wilcoxon rank-sum test The signed-rank test as described above is distribution free in the sense that its validity does not depend on the specific form of the re- sponse distribution H. This distribution independence does no longer hold, however, if one wants to estimate numeri- cal values for the power of the test. The reason is that the effect of a certain shift ∆ on the deviation from symmetry and therefore the distribution of Vs depends on the specific form of F (and G). For power calculations it is therefore necessary to specify the response distribution F. G * Power provides three predefined continuous and symmetric re- sponse functions that differ with respect to kurtosis, that is the “peakedness” of the distribution:

• Normal distribution N(µ, σ2)

p(x) = 1 √

2π e−(x−µ)

2 /(2σ2)

• Laplace or Double Exponential distribution:

p(x) = 1 2

e−|x|

• Logistic distribution

p(x) = e−x

(1 + e−x)2

Scaled and/or shifted versions of the Laplace and Logistic densities that can be calculated by applying the transfor- mation 1a p((x − b)/a), a > 0, are again probability densities and are referred to by the same name.

Approaches to the power analysis G * Power implements two different methods to estimate the power for the signed- rank Wilcoxon test: A) The asymptotic relative efficiency (A.R.E.) method that defines power relative to the one sam- ple t test, and B) a normal approximation to the power pro- posed by Lehmann (1975, pp. 164-166). We describe the gen- eral idea of both methods in turn. More specific information can be found in the implementation section below.

55

• A.R.E-method: The A.R.E method assumes the shift model described in the introduction. It relates normal approximations to the power of the one-sample t-test (Lehmann, 1975, Eq. (4.44), p. 172) and the Wilcoxon test for a specified H (Lehmann, 1975, Eq. (4.15), p. 160). If for a model with fixed H and ∆ the sample size N are required to achieve a specified power for the Wilcoxon signed-rank test and a samples size N′ is required in the t test to achieve the same power, then the ratio N′/N is called the efficiency of the Wilcoxon signed-rank test relative to the one-sample t test. The limiting efficiency as sample size N tends to infinity is called the asymptotic relative efficiency (A.R.E. or Pitman efficiency) of the Wilcoxon signed rank test rel- ative to the t test. It is given by (Hettmansperger, 1984, p. 71):

12σ2H

  +∞∫ −∞

H2(x)dx

 2 = 12 +∞∫

−∞

x2 H(x)dx

  +∞∫ −∞

H2(x)dx

 2

Note, that the A.R.E. of the Wilcoxon signed-rank test to the one-sample t test is identical to the A.R.E of the Wilcoxon rank-sum test to the two-sample t test (if H = F; for the meaning of F see the documentation of the Wilcoxon rank sum test).

If H is a normal distribution, then the A.R.E. is 3/π ≈ 0.955. This shows, that the efficiency of the Wilcoxon test relative to the t test is rather high even if the as- sumption of normality made in the t test is true. It can be shown that the minimal A.R.E. (for H with finite variance) is 0.864. For non-normal distributions the Wilcoxon test can be much more efficient than the t test. The A.R.E.s for some specific distributions are given in the implementation notes. To estimate the power of the Wilcoxon test for a given H with the A.R.E. method one basically scales the sample size with the corresponding A.R.E. value and then performs the procedure for the t test for two independent means.

• Lehmann method: The computation of the power re- quires the distribution of Vs for the non-null case, that is for cases where H is not symmetric about zero. The Lehmann method uses the fact that

Vs − E(Vs)√ VarVs

tends to the standard normal distribution as N tends to infinity for any fixed distributions H for which 0 < P(X < 0) < 1. The problem is then to compute expectation and variance of Vs. These values depend on three “moments” p1, p2, p3, which are defined as:

– p1 = P(X < 0).

– p2 = P(X + Y > 0).

– p3 = P(X + Y > 0 and X + Z > 0).

where X, Y, Z are independent random variables with distribution H. The expectation and variance are given as:

E(Vs) = N(N − 1)p2/2 + N p1

Var(Vs) = N(N − 1)(N − 2)(p3 − p21) +N(N − 1)[2(p1 − p2)2

+3p2(1 − p2)]/2 + N p1(1 − p1)

The value p1 is easy to interpret: If H is continuous and shifted by an amount ∆ > 0 to larger values, then p1 is the probability to observe a negative value. For a null shift (no treatment effect, ∆ = 0, i.e. H symmetric about zero) we get p1 = 1/2.

If c denotes the critical value of a level α test and Φ the CDF of the standard normal distribution, then the normal approximation of the power of the (one-sided) test is given by:

Π(H) ≈ 1 − Φ [

c − a − E(Vs)√ Var(Vs)

]

where a = 0.5 if a continuity correction is applied, and a = 0 otherwise.

The formulas for p1, p2, p3 for the predefined distribu- tions are given in the implementation section.

23.1 Effect size index

The conventional values proposed by Cohen (1969, p. 38) for the t-test are applicable. He defines the following con- ventional values for d:

• small d = 0.2

• medium d = 0.5

• large d = 0.8

Pressing the button Determine on the left side of the ef- fect size label opens the effect size dialog (see Fig. 31). You can use this dialog to calculate d from the means and a common standard deviations in the two populations.

If N1 = N2 but σ1 6= σ2 you may use a mean σ′ as com- mon within-population σ (Cohen, 1969, p.42):

σ′ =

√ σ21 + σ

2 2

2

If N1 6= N2 you should not use this correction, since this may lead to power values that differ greatly from the true values (Cohen, 1969, p.42).

23.2 Options

This test has several options (see Fig. 32). First, you may choose between the A.R.E-method and the Lehmann method. Given the A.R.E method you may either choose the value for k (see implementation notes) implicitly via the corresponding parent distribution or specify it manu- ally. Analogously, you may specify the three moments by choosing the distribution or “manually”. There are further options to choose between two different types of effect size parameter (d vs.Moment p1), the type of distribution shown in the plot window, whether both d and p2 are given in the output, and finally, whether to use a continuity-correction.

56

Figure 31: Effect size dialog to calculate effect size d from the pa- rameters of two matched random variables.

.

23.3 Examples

We replicate the example in O’Brien (2002, p. 141): The question is, whether women report eating different amounts of food on pre- vs. post-menstrual days? The null hypothesis is that there is no difference, that is 50% report to eat more in pre-menstrual days and 50% to eat more in post-menstrual days. In the alternative hypothesis the as- sumption is that 80% of the women report to eat more in the pre-menstrual days.

We want to know the power of a two-sided test under the assumptions that the responses are distributed according to a Laplace parent distribution, that N = 10, and α = 0.05. We use the effect size dialog zu compute dz. In the options we select the Lehmann method, the use of predefined par- ent distributions, the effect size measure d and to use the continuity correction.

• Select Type of power analysis: Post hoc

• Input Tail(s): Two Parent distribution: Laplace Effect size dz: 1.13842 α err prob: 0.05 Total sample size: 10

• Output critical z: 1.9599640 Power (1- β): 0.8534958 Moment p1: 0.9000531 Moment p2: 0.9478560 Moment p3: 0.9122280

23.4 Related tests

Related tests in G * Power 3.0:

• Means: Difference from contrast.

• Proportions: Sign test.

• Means: Wilcoxon rank sum test.

23.5 Implementation notes

The H0 distribution is the central Student t distribution t(Nk − 2, 0); the H1 distribution is the noncentral Student t distribution t(Nk−2, δ), where the noncentrality parameter δ is given by:

δ = d

√ N1 N2k

N1 + N2

The parameter k represents the asymptotic relative effi- ciency vs. correspondig t tests (Lehmann, 1975, p. 371ff) and depends in the following way on the parent distribu- tion:

Parent Value of k (ARE) Uniform: 1.0 Normal: 3/ pi Logistic: π2/9 Laplace: 3/2

min ARE: 0.864

Min ARE is a limiting case that gives a theoretic al mini- mum of the power for the Wilcoxon-Mann-Whitney test.

23.6 Validation

The results were checked against the values produced by PASS (Hintze, 2006) and those produced by unifyPow O’Brien (1998). There was complete correspondence with the values given in O’Brien, while there were slight dif- ferences to those produced by PASS. The reason of these differences seems to be that PASS truncates the weighted sample sizes to integer values.

57

Figure 32: Options for Wilcoxon tests

58

24 Wilcoxon-Mann-Whitney test of a difference between two indepen- dent means

The Wilcoxon-Mann-Whitney (WMW) test (or U-test) is a nonparametric alternative to the two-group t test. Its use is mainly motivated by uncertainty concerning the assump- tion of normality made in the t test.

It refers to a general two sample model, in which F and G characterize response distributions under two different conditions. The null effects hypothesis states F = G, while the alternative is F 6= G. The power routines implemented in G * Power refer to the important special case of a “shift model”, which states that G is obtained by shifting F by an amount ∆: G(x) = F(x − ∆) for all x. The shift model expresses the assumption that the treatment adds a certain amout ∆ to the response x (Additivity).

The WMW test The WMW-test is based on ranks. As- sume m subjects are randomly assigned to control group A and n other subjects to treatment group B. After the ex- periment, all N = n + m subjects together are ranked ac- cording to a some response measure, i.e. each subject is assigned a unique number between 1 and N. The general idea of the test is to calculate the sum of the ranks assigned to subjects in either group and to reject the “no effect hy- pothesis” if the rank sums in the two groups are clearly different. The actual procedure is as follows: Since the m ranks of control group A are known if those of treatment group B are given, it suffices to consider the n ranks of group B. They can be specified by a n-tupel (S1, . . . , Sn). There are (Nn ) possible n-tuples. Therefore, if the null hy- pothesis is true, then the probability to observe a particular n-tupel is:P(S1 = s1, . . . , Sn = sn) = 1/(

N n ). To calculate

the probability to observe (under H0) a particular rank sum Ws = S1 + . . . + Sn we just need to count the number k of all tuples with rank sum Ws and to add their probabilities, thus P(Ws = w) = k/(

N n ). Doing this for all possible Ws be-

tween the minimal value n(n + 1)/2 corresponding to the tuple (1, 2, . . . , n), and the maximal value n(1 + 2m + n)/2, corresponding to the tuple (N − n + 1, N − n + 2, . . . , N), gives the discrete probability distribution of Ws under H0. This distribution is symmetric about n(N + 1)/2. Referring to this probability distribution we choose a critical value c with P(Ws ≥ c) ≤ α and reject the null hypothesis if a rank sum Ws > c is observed. With increasing sample sizes the exact distribution converges rapidly to the normal distribution with mean E(Ws) = n(N + 1)/2 and variance Var(Ws) = mn(N + 1)/12.

A drawback of using Ws is that the minimal value of Ws depends on n. It is often more convenient to subtract the minimal value and use WXY = Ws − n(n + 1)/2 instead. The statistic WXY (also known as the Mann-Whitney statis- tic) can also be derived from a slightly different perspective: If X1, . . . , Xm and Y1, . . . , Yn denote the observations in the control and treatment group, respectively, than WXY is the number of pairs (Xi, Yj) with Xi < Yj. The approximating normal distribution of WXY has mean E(WXY) = mn/2 and Var(WXY) = Var(Ws) = mn(N + 1)/12.

Logistic

Laplace

Normal

-6 -4 -2 0 2 4 6

0.0

0.1

0.2

0.3

0.4

0.5

x

f( x )

Figure 33: Densities of the Normal, Laplace, and Logistic distribu- tions (with µ = 0, σ = 1)

Power of the WMW test The WMW test as described above is distribution free in the sense that its validity does not depend on the specific form of the response distri- bution F. This distribution independence does not longer hold, however, if one wants to estimate numerical values for the power of the test. The reason is that the effect of a certain shift ∆ on the rank distribution depends on the spe- cific form of F (and G). For power calculations it is therefore necessary to specify the response distribution F. G * Power provides three predefined continuous and symmetric re- sponse functions that differ with respect to kurtosis, that is the “peakedness” of the distribution:

• Normal distribution N(µ, σ2)

p(x) = 1 √

2π e−(x−µ)

2 /(2σ2)

• Laplace or Double Exponential distribution:

p(x) = 1 2

e−|x|

• Logistic distribution

p(x) = e−x

(1 + e−x)2

Scaled and/or shifted versions of the Laplace and Logistic densities that can be calculated by applying the transfor- mation 1a p((x − b)/a), a > 0, are again probability densities and are referred to by the same name.

Approaches to the power analysis G * Power implements two different methods to estimate the power for the WMW test: A) The asymptotic relative efficiency (A.R.E.) method that defines power relative to the two groups t test, and B) a normal approximation to the power proposed by Lehmann (1975, pp. 69-71). We describe the general idea of both meth- ods in turn. More specific information can be found in the implementation section below.

59

• A.R.E-method: The A.R.E method assumes the shift model described in the introduction. It relates normal approximations to the power of the t-test (Lehmann, 1975, Eq. (2.42), p. 78) and the Wilcoxon test for a spec- ified F (Lehmann, 1975, Eq. (2.29), p. 72). If for a model with fixed F and ∆ the sample sizes m = n are re- quired to achieve a specified power for the Wilcoxon test and samples sizes m′ = n′ are required in the t test to achieve the same power, then the ratio n′/n is called the efficiency of the Wilcoxon test relatve to the t test. The limiting efficiency as sample sizes m and n tend to infinity is called the asymptotic relative efficiency (A.R.E. or Pitman efficiency) of the Wilcoxon test rela- tive to the t test. It is given by (Hettmansperger, 1984, p. 71):

12σ2F

  +∞∫ −∞

F2(x)dx

 2 = 12 +∞∫

−∞

x2 F(x)dx

  +∞∫ −∞

F2(x)dx

 2

If F is a normal distribution, then the A.R.E. is 3/π ≈ 0.955. This shows, that the efficiency of the Wilcoxon test relative to the t test is rather high even if the as- sumption of normality made in the t test is true. It can be shown that the minimal A.R.E. (for F with fi- nite variance) is 0.864. For non-normal distributions the Wilcoxon test can be much more efficient than the t test. The A.R.E.s for some specific distributions are given in the implementation notes. To estimate the power of the Wilcoxon test for a given F with the A.R.E. method one basically scales the sample size with the corresponding A.R.E. value and then performs the pro- cedure for the t test for two independent means.

• Lehmann method: The computation of the power re- quires the distribution of WXY for the non-null case F 6= G. The Lehmann method uses the fact that

WXY − E(WXY)√ VarWXY

tends to the standard normal distribution as m and n tend to infinity for any fixed distributions F and G for X and Y for which 0 < P(X < Y) < 1. The problem is then to compute expectation and variance of WXY . These values depend on three “moments” p1, p2, p3, which are defined as:

– p1 = P(X < Y). – p2 = P(X < Y and X < Y′). – p3 = P(X < Y and X′ < Y).

Note that p2 = p3 for symmetric F in the shift model. The expectation and variance are given as:

E(WXY) = mn p1 Var(WXY) = mn p1(1 − p1) . . .

+mn(n − 1)(p2 − p21) . . . +nm(m − 1)(p3 − p21)

The value p1 is easy to interpret: If the response dis- tribution G of the treatment group is shifted to larger

values, then p1 is the probability to observe a lower value in the control condition than in the test condi- tion. For a null shift (no treatment effect, ∆ = 0) we get p1 = 1/2.

If c denotes the critical value of a level α test and Φ the CDF of the standard normal distribution, then the normal approximation of the power of the (one-sided) test is given by:

Π(F, G) ≈ 1 − Φ [

c − a − mn p1√ Var(WXY)

]

where a = 0.5 if a continuity correction is applied, and a = 0 otherwise.

The formulas for p1, p2, p3 for the predefined distribu- tions are given in the implementation section.

24.1 Effect size index

You may choose between the A.R.E. and the Lehmann- Method in the options section.

A.R.E. method In the A.R.E. method the effect size d is defined as:

d = µ1 − µ2

σ =

∆ σ

where µ1, µ2 are the means of the response functions F and G and σ the standard deviation of the response distribution F.

In addition to the effect size, you have to specify the A.R.E. for F. If you want to use the predefined distributions, you may choose the Normal, Logistic, or Laplace transfor- mation or the minimal A.R.E. From this selection G * Power determines the A.R.E. automatically. Alternatively, you can choose the option to determine the A.R.E. value by hand. In this case you must first calculate the A.R.E. for your F using the formula given above.

The conventional values proposed by Cohen (1969, p. 38) for the t-test are applicable. He defines the following con- ventional values for d:

• small d = 0.2

• medium d = 0.5

• large d = 0.8

Pressing the button Determine on the left side of the ef- fect size label opens the effect size dialog (see Fig. 34). You can use this dialog to calculate d from the means and a common standard deviations in the two populations.

If N1 = N2 but σ1 6= σ2 you may use a mean σ′ as com- mon within-population σ (Cohen, 1969, p.42):

σ′ =

√ σ21 + σ

2 2

2

If N1 6= N2 you should not use this correction, since this may lead to power values that differ greatly from the true values (Cohen, 1969, p.42).

60

Figure 34: Effect size dialog to calculate effect size d from the pa- rameters of two independent random variables with equal vari- ance.

.

Lehmann method If the Lehmann method is chosen then the effect size is given by the first moment p1 and the rela- tive SD. If one of the three pre-defined parent distributions is chosen, then the remaining moments p2 and p3 are com- puted automatically, otherwise they must also be given.

24.2 Options

This test has several options (see Fig. 32). First, you may choose between the A.R.E-method and the Lehmann method. Given the A.R.E method you may either choose the value for k (see implementation notes) implicitly via the corresponding parent distribution or specify it manually. Analogously, you may specify the three moments by choos- ing the distribution or by entering all three values manually. There are further options to choose between two different types of effect size parameter (d vs.Moment p1), the type of distribution shown in the plot window, whether both d and p2 are given in the output, and finally, whether to use a continuity-correction.

24.3 Examples

Example 1 We replicate the example in O’Brien (2002, p. 130ff): The question is, whether the injury rates in two types of shoes differ. The null hypothesis is that there is no dif- ference. The means 0.3 and 0.27 and a common standard deviation sd = 0.08 are assumed in group 1 and 2, respec- tively.

We want to know the power of a two-sided test under the assumptions that the responses are distributed according to a Laplace parent distribution, that N1 = 67, N2 = 134 (i.e. N1 + N2 = 201) and α = 0.05. We use the effect size dialog to compute the effect size d = 0.375. In the options we select the Lehmann method, the use of predefined parent distri-

butions, the effect size measure d and to use the continuity correction.

• Select Type of power analysis: Post hoc

• Input Tail(s): Two Parent distribution: Laplace Effect size d: 0.375 α err prob: 0.05 Sample size group 1: 67 Sample size group 2: 134

• Output critical z: 1.9599640 Power (1- β): 0.8472103 Moment p1: 0.6277817 Moment p2: 0.6277817

The computed power of 0.847 corresponds exactly to the value given in O’Brien for this case. Changing the parent distribution to normal reduces the power to 0.68.

Example 2 This replicates the example described in O’Brien (2002, p. 134ff) and illustrates the specification of the effect size via the first moment p1. To enable this input method we select in the options the Lehmann method, the use of predefined parent distributions, the effect size mea- sure p1 and to use the continuity correction.

The first moment is defined as the probability that a ran- dom Y from the first group is larger than a random Y from the second group. The parent distribution is assumed to be “logistic” and we choose p1 = 0.6 relative to this parent dis- tribution. An increase in SD in the second group can also be specified. Here a value of rel. SD = 1.15 is used.

• Select Type of power analysis: Post hoc

• Input Tail(s): One Parent distribution: Logistic Relative SD: 1.15 Moment p1: 0.6 α err prob: 0.05 Sample size group 1: 67 Sample size group 2: 134

• Output critical z: 1.6448536 Power (1- β): 0.6461276 Effect size d: 0.2911838 Moment p2: 0.4251206

The computed power 0.646 is exactly the value given in O’Brien.

If the parent distribution is changed to “Laplace” then a very similar power 0.647 is computed. This deviates consid- erably from the value 0.755 given in O’Brien for this case in his table. The reason is that G * Power always uses a single parent, that is, if the parent is changed to “Laplace” then the given p1 is interpreted relative to this parent. For the table in O’Brien, however, the effect size was always computed

61

from the Logistic parent. To replicate this behavior, one can change to the effect size d input mode, use the effect size d given in the output for the logistic parent (d = 0.291183), and choose Laplace as the new parent:

• Select Type of power analysis: Post hoc

• Input Tail(s): One Parent distribution: Laplace Relative SD: 1.15 Effect size d: 0.2911838 α err prob: 0.05 Sample size group 1: 67 Sample size group 2: 134

• Output critical z: 1.6448536 Power (1- β): 0.7551564 Moment p1: 0.6005708 Moment p2: 0.4405953

24.4 Related tests

24.5 Implementation notes

The procedure uses the The H0 distribution is the central Student t distribution t(Nk − 2, 0); the H1 distribution is the noncentral Student t distribution t(Nk−2, δ), where the noncentrality parameter δ is given by:

δ = d

√ N1 N2k

N1 + N2

The parameter k represents the asymptotic relative effi- ciency vs. correspondig t tests (Lehmann, 1975, p. 371ff) and depends in the following way on the parent distribu- tion:

Parent Value of k (ARE) Uniform: 1.0 Normal: 3/ pi Logistic: π2/9 Laplace: 3/2

min ARE: 0.864

Min ARE is a limiting case that gives a theoretical minimum of the power for the Wilcoxon-Mann-Whitney test.

24.5.1 Conversion from effect size d to p1

Given effect size d the first moment p1 is

• Normal distribution:

p1 = Φ (

d/ √

2 )

• Logistic distribution:

p1 = 1 − (1 + u) exp(−u) (1 − exp(−u))2

with u = dπ/ √

3

• Laplace distribution

p1 = 1 − exp(−u)(1 + u/2)/2

with u = d √

2

24.6 Validation

The results were checked against the values produced by PASS (Hintze, 2006) and those produced by unifyPow O’Brien (1998). There was complete correspondence with the values given in O’Brien, while there were slight dif- ferences to those produced by PASS. The reason of these differences seems to be that PASS truncates the weighted sample sizes to integer values.

62

25 t test: Generic case

With generic t tests you can perform power analyses for any test that depends on the t distribution. All parameters of the noncentral t-distribution can be manipulated inde- pendently. Note that with Generic t-Tests you cannot do a priori power analyses, the reason being that there is no definite association between N and df (the degrees of free- dom). You need to tell G * Power the values of both N and df explicitly.

25.1 Effect size index

In the generic case, the noncentrality parameter δ of the noncentral t distribution may be interpreted as effect size.

25.2 Options

This test has no options.

25.3 Examples

To illustrate the usage we calculate with the generic t test the power of a one sample t test. We assume N = 25, µ0 = 0, µ1 = 1, and σ = 4 and get the effect size d = (µ0 − µ1)/σ = (0 − 1)/4 = −0.25, the noncentrality parameter δ = d

√ N = −0.25

√ 25 = −1.25, and the de-

grees of freedom d f = N − 1 = 24. We choose a post hoc analysis and a two-sided test. As result we get the power (1 − β) = 0.224525; this is exactly the same value we get from the specialized routine for this case in G * Power .

25.4 Related tests

Related tests in G * Power are:

• Correlation: Point biserial model

• Means: Difference between two independent means (two groups)

• Means: Difference between two dependent means (matched pairs)

• Means: Difference from constant (one sample case)

25.5 Implementation notes

One distribution is fixed to the central Student t distribu- tion t(d f ). The other distribution is a noncentral t distribu- tion t(d f , δ) with noncentrality parameter δ.

In an a priori analysis the specified α determines the crit- ical tc. In the case of a one-sided test the critical value tc has the same sign as the noncentrality parameter, otherwise there are two critical values t1c = −t2c .

25.6 Validation

The results were checked against the values produced by GPower 2.0.

63

26 χ2 test: Variance - difference from constant (one sample case)

This procedure allows power analyses for the test that the population variance σ2 of a normally distributed random variable has the specific value σ20 . The null and (two-sided) alternative hypothesis of this test are:

H0 : σ − σ0 = 0 H1 : σ − σ0 6= 0.

The two-sided test (“two tails”) should be used if there is no restriction on the sign of the deviation assumed in the alternative hypothesis. Otherwise use the one-sided test (“one tail”).

26.1 Effect size index

The ratio σ2/σ20 of the variance assumed in H1 to the base line variance is used as effect size measure. This ratio is 1 if H0 is true, that is, if both variances are identical. In an a priori analysis a ratio close or even identical to 1 would imply an exceedingly large sample size. Thus, G * Power prohibits inputs in the range [0.999, 1.001] in this case.

Pressing the button Determine on the left side of the ef- fect size label in the main window opens the effect size drawer (see Fig. 35) that may be used to calculate the ra- tio from the two variances. Insert the baseline variance σ20 in the field variance V0 and the alternate variance in the field variance V1.

Figure 35: Effect size drawer to calculate variance ratios.

26.2 Options

This test has no options.

26.3 Examples

We want to test whether the variance in a given population is clearly lower than σ0 = 1.5. In this application we use “σ2 is less than 1” as criterion for “clearly lower”. Insert- ing variance V0 = 1.5 and variance V1 = 1 in the effect size drawer we calculate as effect size a Ratio var1/var0 of 0.6666667.

How many subjects are needed to achieve the error levels α = 0.05 and β = 0.2 in this test? This question can be answered by using the following settings in G * Power :

• Select Type of power analysis: A priori

• Input Tail(s): One Ratio var1/var0: 0.6666667 α err prob: 0.05 Power (1- β): 0.80

• Output Lower critical χ2: 60.391478 Upper critical χ2: 60.391478 Df: 80 Total sample size: 81 Actual power : 0.803686

The output shows that using a one-sided test we need at least 81 subjects in order to achieve the desired level of the α and β error. To apply the test, we would estimate the variance s2 from the sample of size N. The one-sided test would be significant at α = 0.05 if the statistic x = (N −1) · s2/σ20 were lower than the critical value 60.39.

By setting “Tail(s) = Two”, we can easily check that a two- sided test under the same conditions would have required a much larger sample size, namely N = 103, to achieve the error criteria.

26.4 Related tests

Similar tests in G * Power 3.0:

• Variance: Test of equality (two sample case).

26.5 Implementation notes

It is assumed that the population is normally distributed and that the mean is not known in advance but estimated from a sample of size N. Under these assumptions the H0-distribution of s2(N − 1)/σ20 is the central χ

2 distribu- tion with N − 1 degrees of freedom (χ2N−1), and the H1- distribution is the same central χ2 distribution scaled with the variance ratio, that is, (σ2/σ20 ) · χ

2 N−1.

26.6 Validation

The correctness of the results were checked against values produced by PASS (Hintze, 2006) and in a monte carlo sim- ulation.

64

27 z test: Correlation - inequality of two independent Pearson r’s

This procedure refers to tests of hypotheses concerning dif- ferences between two independent population correlation coefficients. The null hypothesis states that both correlation coefficients are identical ρ1 = ρ2. The (two-sided) alterna- tive hypothesis is that the correlation coefficients are differ- ent: ρ1 6= ρ2:

H0 : ρ1 − ρ2 = 0 H1 : ρ1 − ρ2 6= 0.

If the direction of the deviation ρ1 − ρ2 cannot be pre- dicted a priori, a two-sided (’two-tailed’) test should be used. Otherwise use a one-sided test.

27.1 Effect size index

The effect size index q is defined as a difference between two ’Fisher z’-transformed correlation coefficients: q = z1 − z2, with z1 = ln((1 + r1)/(1 − r1))/2, z2 = ln((1 + r2)/(1 − r2))/2. G * Power requires q to lie in the interval [−10, 10].

Cohen (1969, p. 109ff) defines the following effect size conventions for q:

• small q = 0.1

• medium q = 0.3

• large q = 0.5

Pressing the button Determine on the left side of the ef- fect size label opens the effect size dialog:

It can be used to calculate q from two correlations coeffi- cients.

If q and r2 are given, then the formula r1 = (a − 1)/(a + 1), with a = exp[2q + ln((1 + r2)/(1 − r2))] may be used to calculate r1.

G * Power outputs critical values for the zc distribu- tion. To transform these values to critical values qc re- lated to the effect size measure q use the formula: qc = zc √ (N1 + N2 − 6)/((N1 − 3)(N2 − 3)) (see Cohen, 1969, p.

135).

27.2 Options

This test has no options.

27.3 Examples

Assume it is know that test A correlates with a criterion with r1 = 0.75 and that we want to test whether an alter- native test B correlates higher, say at least with r2 = 0.88.

We have two data sets one using A with N1 = 51, and a second using B with N2 = 206. What is the power of a two- sided test for a difference between these correlations, if we set α = 0.05?

We use the effect size drawer to calculate the effect size q from both correlations. Setting correlation coefficient r1 = 0.75 and correlation coefficient r2 = 0.88 yields q = −0.4028126. The input and outputs are as follows:

• Select Type of power analysis: Post hoc

• Input Tail(s): two Effect size q: -0.4028126 α err prob: 0.05 Sample size: 260 Sample size: 51

• Output Critical z: 60.391478 Upper critical χ2: -1.959964 Power (1- β): 0.726352

The output shows that the power is about 0.726. This is very close to the result 0.72 given in Example 4.3 in Cohen (1988, p. 131), which uses the same input values. The small deviations are due to rounding errors in Cohen’s analysis.

If we instead assume N1 = N2, how many subjects do we then need to achieve the same power? To answer this question we use an a priori analysis with the power cal- culated above as input and an Allocation ratio N2/N1 = 1 to enforce equal sample sizes. All other parameters are chosen identical to that used in the above case. The result is, that we now need 84 cases in each group. Thus choosing equal sample sizes reduces the totally needed sample size considerably, from (260+51) = 311 to 168 (84 +84).

27.4 Related tests

• Correlations: Difference from constant (one sample case)

• Correlations: Point biserial model

27.5 Implementation notes

The H0-distribution is the standard normal distribu- tion N(0, 1), the H1-distribution is normal distribution N(q/s, 1), where q denotes the effect size as defined above, N1 and N2 the sample size in each groups, and s =√

1/(N1 − 3) + 1/(N2 − 3).

27.6 Validation

The results were checked against the table in Cohen (1969, chap. 4).

65

28 z test: Correlation - inequality of two dependent Pearson r’s

This procedure provides power analyses for tests of the hypothesis that two dependent Pearson correlation coef- ficients ρa,b and ρc,d are identical. The corresponding test statistics Z1? and Z2? were proposed by Dunn and Clark (1969) and are described in Eqns. (11) and (12) in Steiger (1980) (for more details on these tests, see implementation notes below).

Two correlation coefficients ρa,b and ρc,d are dependent, if at least one of the four possible correlation coefficients ρa,c, ρa,d, ρb,c and ρb,d between other pairs of the four data sets a, b, c, d is non-zero. Thus, in the general case where a, b, c, and d are different data sets, we do not only have to consider the two correlations under scrutiny but also four additional correlations.

In the special case, where two of the data sets are iden- tical, the two correlations are obviously always dependent, because at least one of the four additional correlations men- tioned above is exactly 1. Two other of the four additional correlations are identical to the two correlations under test. Thus, there remains only one additional correlation that can be freely specified. In this special case we denote the H0 correlation ρa,b, the H1 correlation ρa,c and the additional correlation ρb,c.

It is convenient to describe these two cases by two corre- sponding correlations matrices (that may be sub-matrices of a larger correlations matrix): A 4 × 4-matrix in the general case of four different data sets (‘no common index’):

C1 =

 

1 ρa,b ρa,c ρa,d ρa,b 1 ρb,c ρb,d ρa,c ρb,c 1 ρc,d ρa,d ρb,d ρc,d 1

  ,

and a 3×3 matrix in the special case, where one of the data sets is identical in both correlations (‘common index’):

C2 =

  1 ρa,b ρa,cρa,b 1 ρb,c

ρa,c ρb,c 1

 

Note: The values for ρx,y in matrices C1 and C2 cannot be chosen arbitrarily between -1 and 1. This is easily illustrated by considering the matrix C2: It should be obvious that we cannot, for instance, choose ρa,b = −ρa,c and ρb,c = 1.0, because the latter choice implies that the other two correla- tions are identical. It is, however, generally not that easy to decide whether a given matrix is a valid correlation matrix. In more complex cases the following formal criterion can be used: A given symmetric matrix is a valid correlation ma- trix, if and only if the matrix is positive semi-definite, that is, if all eigenvalues of the matrix are non-negative.

The null hypothesis in the general case with ’no common index’ states that ρa,b = ρc,d. The (two-sided) alternative hypothesis is that these correlation coefficients are different: ρa,b 6= ρc,d:

H0 : ρa,b − ρc,d = 0 H1 : ρa,b − ρc,d 6= 0.

Here, G * Power refers to the test Z2? described in Eq. (12) in Steiger (1980).

The null hypothesis in the special case of a ‘common in- dex’ states that ρa,b = ρa,c. The (two-sided) alternative hy- pothesis is that these correlation coefficients are different: ρa,b 6= ρa,c:

H0 : ρa,b − ρa,c = 0 H1 : ρa,b − ρa,c 6= 0.

Here, G * Power refers to the test Z1? described in Eq. (11) in Steiger (1980).

If the direction of the deviation ρa,b − ρc,d (or ρa,b − ρa,c in the ’common index’ case) cannot be predicted a priori, a two-sided (’two-tailed’) test should be used. Otherwise use a one-sided test.

28.1 Effect size index

In this procedure the correlation coefficient assumed under H1 is used as effect size, that is ρc,d in the general case of ‘no common index’ and ρa,c in the special case of a ’common index’.

To fully specify the effect size, the following additional inputs are required:

• ρa,b, the correlation coefficient assumed under H0, and

• all other relevant correlation coefficients that specify the dependency between the correlations assumed in H0 and H1: ρb,c in the ’common index’ case, and ρa,c, ρa,d, ρb,c, ρb,d in the general case of ’no common in- dex’.

G * Power requires the correlations assumed under H0 and H1 to lie in the interval [−1 + ε, 1 − ε], with ε = 10−6, and the additional correlations to lie in the interval [−1, 1]. In a priori analyses zero effect sizes are not allowed, because this would imply an infinite sample size. In this case the additional restriction |ρa,b − ρc,d| > 10−6 (or |ρa,b − ρa,c| > 10−6) holds.

Why do we not use q, the effect size proposed by Cohen (1988) for the case of two independent correlations? The ef- fect size q is defined as a difference between two ’Fisher z’-transformed correlation coefficients: q = z1 − z2, with z1 = ln((1 + r1)/(1 − r1))/2, z2 = ln((1 + r2)/(1 − r2))/2. The choice of q as effect size is sensible for tests of inde- pendent correlations, because in this case the power of the test does not depend on the absolute value of the correla- tion coefficient assumed under H0, but only on the differ- ence q between the transformed correlations under H0 and H1. This is no longer true for dependent correlations and we therefore used the effect size described above. (See the implementation section for a more thorough discussion of these issues.)

Although the power is not strictly independent of the value of the correlation coefficient under H0, the deviations are usually relatively small and it may therefore be conve- nient to use the definition of q to specify the correlation under H1 for a given correlation under H0. In this way, one can relate to the effect size conventions for q defined by Cohen (1969, p. 109ff) for independent correlations:

• small q = 0.1

• medium q = 0.3

• large q = 0.5

66

The effect size drawer, which can be opened by pressing the button Determine on the left side of the effect size label, can be used to do this calculation:

The dialog performs the following transformation: r2 = (a − 1)/(a + 1), with a = exp[−2q + ln((1 + r1)/(1 − r1))], where q, r1 and r2 denote the effect size q, and the correla- tions coefficient assumed under H0 and H1, respectively.

28.2 Options

This test has no options.

28.3 Examples

We assume the following correlation matrix in the popula- tion:

Cp =

 

1 ρ1,2 ρ1,3 ρ1,4 ρ1,2 1 ρ2,3 ρ2,4 ρ1,3 ρ2,3 1 ρ3,4 ρ1,4 ρ2,4 ρ3,4 1

 

=

 

1 0.5 0.4 0.1 0.5 1 0.2 −0.4 0.4 0.2 1 0.8 0.1 −0.4 0.8 1

 

28.3.1 General case: No common index

We want to perform an a priori analysis for a one-sided test whether ρ1,4 = ρ2,3 or ρ1,4 < ρ2,3 holds.

With respect to the notation used in G * Power we have the following identities a = 1, b = 4, c = 2, d = 3. Thus wet get: H0 correlation ρa,b = ρ1,4 = 0.1, H1 correlation ρc,d = ρ2,3 = 0.2, and ρa,c = ρ1,2 = 0.5, ρa,d = ρ1,3 = 0.4, ρb,c = ρ4,2 = ρ2,4 = −0.4, ρb,d = ρ4,3 = ρ3,4 = 0.8.

We want to know how large our samples need to be in order to achieve the error levels α = 0.05 and β = 0.2. We choose the procedure ‘Correlations: Two independent Pearson r’s (no common index)’ and set:

• Select Type of power analysis: A priori

• Input Tail(s): one H1 corr ρ_cd: 0.2 α err prob: 0.05 Power (1-β err prob): 0.8 H0 Corr ρ_ab: 0.1 Corr ρ_ac: 0.5 Corr ρ_ad: 0.4 Corr ρ_bc: -0.4

Corr ρ_bd: 0.8

• Output Critical z: 1.644854 Sample Size: 886 Actual Power: 0.800093

We find that the sample size in each group needs to be N = 886. How large would N be, if we instead assume that ρ1,4 and ρ2,3 are independent, that is, that ρac = ρad = ρbc = ρbd = 0? To calculate this value, we may set the corre- sponding input fields to 0, or, alternatively, use the proce- dure for independent correlations with an allocation ration N2/N1 = 1. In either case, we find a considerably larger sample size of N = 1183 per data set (i.e. we correlate data vectors of length N = 1183). This shows that the power of the test increases considerably if we take dependencies between correlations coefficients into account.

If we try to change the correlation ρbd from 0.8 to 0.9, G * Power shows an error message stating: ‘The correlation matrix is not valid, that is, not positive semi-definite’. This indicates that the matrix Cp with ρ3,4 changed to 0.9 is not a possible correlation matrix.

28.3.2 Special case: Common index

Assuming again the population correlation matrix Cp shown above, we want to do an a priori analysis for the test whether ρ1,3 = ρ2,3 or ρ1,3 > ρ2,3 holds. With respect to the notation used in G * Power we have the following iden- tities: a = 3 (the common index), b = 1 (the index of the second data set entering in the correlation assumed under H0, here ρa,b = ρ3,1 = ρ1,3), and c = 2 (the index of the remaining data set).

Thus, we have: H0 correlation ρa,b = ρ3,1 = ρ1,3 = 0.4, H1 correlation ρa,c = ρ3,2 = ρ2,3 = 0.2, and ρb,c = ρ1,2 = 0.5

For this effect size we want to calculate how large our sample size needs to be, in order to achieve the error levels α = 0.05 and β = 0.2. We choose the procedure ‘Correla- tions: Two independent Pearson r’s (common index)’ and set:

• Select Type of power analysis: A priori

• Input Tail(s): one H1 corr ρ_ac: 0.2 α err prob: 0.05 Power (1-β err prob): 0.8 H0 Corr ρ_ab: 0.4 Corr ρ_bc: 0.5

• Output Critical z: 1.644854 Sample Size: 144 Actual Power: 0.801161

The answer is that we need sample sizes of 144 in each group (i.e. the correlations are calculated between data vec- tors of length N = 144).

67

28.3.3 Sensitivity analyses

We now assume a scenario that is identical to that described above, with the exception that ρb,c = −0.6. We want to know, what the minimal H1 correlation ρa,c is that we can detect with α = 0.05 and β = 0.2 given a sample size N = 144. In this sensitivity analysis, we have in general two possible solutions, namely one for ρa,c ≤ ρa,b and one for ρa,c ≥ ρa,b. The relevant settings for the former case are:

• Select Type of power analysis: Sensitivity

• Input Tail(s): one Effect direction: ρ_ac ≤ ρ_ab α err prob: 0.05 Power (1-β err prob): 0.8 Sample Size: 144 H0 Corr ρ_ab: 0.4 Corr ρ_bc: -0.6

• Output Critical z: -1.644854 H1 corr ρ_ac: 0.047702

The result is that the error levels are as requested or lower if the H1 correlation ρa,c is equal to or lower than 0.047702.

We now try to find the corresponding H1 correlation that is larger than ρa,b = 0.4. To this end, we change the effect size direction in the settings shown above, that is, we choose ρa,c ≥ ρa,b. In this case, however, G * Power shows an error message, indicating that no solution was found. The reason is, that there is no H1 correlation ρa,c ≥ ρa,b that leads to a valid (i.e. positive semi-definite) correlation matrix and simultaneously ensures the requested error levels. To indi- cate a missing result the output for the H1 correlation is set to the nonsensical value 2.

G * Power checks in both the ‘common index’ case and the general case with no common index, whether the cor- relation matrix is valid and outputs a H1 correlation of 2 if no solution is found. This is also true in the XY-plot if the H1 correlation is the dependent variable: Combinations of input values for which no solution can be found show up with a nonsensical value of 2.

28.3.4 Using the effect size dialog

28.4 Related tests

• Correlations: Difference from constant (one sample case)

• Correlations: Point biserial model

• Correlations: Two independent Pearson r’s (two sam- ple case)

28.5 Implementation notes

28.5.1 Background

Let X1, . . . , XK denote multinormally distributed random variables with mean vector µ and covariance matrix C. A sample of size N from this K-dimensional distribution

leads to a N × K data matrix, and pair-wise correlation of all columns to a K × K correlation matrix. By drawing M samples of size N one can compute M such correlation ma- trices, and one can determine the variances σ2a,b of the sam- ple of M correlation coefficients ra,b, and the covariances σa,b;c,d between samples of size M of two different correla- tions ra,b, rc,d. For M → ∞, the elements of Ψ, which denotes the asymptotic variance-covariance matrix of the correla- tions times N, are given by [see Eqns (1) and (2) in Steiger (1980)]:

Ψa,b;a,b = Nσ 2 a,b = (1 − ρ

2 a,b)

2 (11) Ψa,b;c,d = Nσa,b;b,c (12)

= [(ρa,c − ρa,b ρb,c)(ρb,d − ρb,c ρc,d) + . . . (ρa,d − ρa,c ρc,d)(ρb,c − ρa,b ρa,c) + . . . (ρa,c − ρa,d ρc,d)(ρb,d − ρa,b ρa,d) + . . . (ρa,d − ρa,b ρb,d)(ρb,c − ρb,d ρc,d)]/2

When two correlations have an index in common, the ex- pression given in Eq (12) simplifies to [see Eq (3) in Steiger (1980)]:

Ψa,b;a,c = Nσa,b;a,c (13)

= ρb,c(1 − ρ2a,b − ρ 2 a,c)−

−ρa,b ρa,c(1 − ρ2a,c − ρ 2 a,b − ρ

2 b,c)/2

If the raw sample correlations ra,b are transformed by the Fisher r-to-z transform:

za,b = 1 2

ln (

1 + ra,b 1 − ra,b

) then the elements of the variance-covariance matrix times (N-3) of the transformed raw correlations are [see Eqs. (9)- (11) in Steiger (1980)]:

ca,b;a,b = (N − 3)σza,b ;za,b = 1

ca,b;c,d = (N − 3)σza,b ;zc,d = Ψa,b;c,d

(1 − ρ2a,b)(1 − ρ 2 c,d)

ca,b;a,c = (N − 3)σza,b ;za,c = Ψa,b;a,c

(1 − ρ2a,b)(1 − ρ 2 a,c)

28.5.2 Test statistics

The test statistics proposed by Dunn and Clark (1969) are [see Eqns (12) and (13) in Steiger (1980)]:

Z1? = (za,b − za,c)/ √

2 − 2sa,b;a,c N − 3

Z2? = (za,b − zc,d)/ √

2 − 2sa,b;c,d N − 3

where sa,b;a,c and sa,b;c,d denote sample estimates of the co- variances ca,b;a,c and ca,b;c,d between the transformed corre- lations, respectively.

Note: 1. The SD of the difference of za,b − za,c given in the denominator of the formula for Z1? depends on the value of ρa,b and ρa,c; the same holds analogously for Z2?. 2. The only difference between Z2? and the z-statistic used for independent correlations is that in the latter the covariance sa,b;c,d is assumed to be zero.

68

28.5.3 Central and noncentral distributions in power cal- culations

For the special case with a ‘common index’ the H0- distribution is the standard normal distribution N(0, 1) and the H1 distribution the normal distribution N(m1, s1), with

s0 = √ (2 − 2c0)/(N − 3),

with c0 = ca,b;a,c for H0, i.e. ρa,c = ρa,b m1 = (za,b − za,c)/s0

s1 = [√

(2 − 2c1)/(N − 3) ]

/s0,

with c1 = ca,b;a,c for H1.

In the general case ‘without a common index’ the H0- distribution is also the standard normal distribution N(0, 1) and the H1 distribution the normal distribution N(m1, s1), with

s0 = √ (2 − 2c0)/(N − 3),

with c0 = ca,b;c,d for H0, i.e. ρc,d = ρa,b m1 = (za,b − zc,d)/s0

s1 = [√

(2 − 2c1)/(N − 3) ]

/s0,

with c1 = ca,b;a,c for H1.

28.6 Validation

The results were checked against Monte-Carlo simulations.

69

29 Z test: Multiple Logistic Regression

A logistic regression model describes the relationship be- tween a binary response variable Y (with Y = 0 and Y = 1 denoting non-occurance and occurance of an event, respec- tively) and one or more independent variables (covariates or predictors) Xi. The variables Xi are themselves random variables with probability density function fX(x) (or prob- ability distribution fX(x) for discrete X).

In a simple logistic regression with one covariate X the assumption is that the probability of an event P = Pr(Y = 1) depends on X in the following way:

P = eβ0+β1 x

1 + eβ0+β1 x =

1 1 + e−(β0+β1 x)

For β1 6= 0 and continuous X this formula describes a smooth S-shaped transition of the probability for Y = 1 from 0 to 1 (β1 > 0) or from 1 to 0 (β1 < 0) with increasing x. This transition gets steeper with increasing β1. Rearrang- ing the formula leads to: log(P/(1 − P)) = β0 + β1 X. This shows that the logarithm of the odds P/(1− P), also called a logit, on the left side of the equation is linear in X. Here, β1 is the slope of this linear relationship.

The interesting question is whether covariate Xi is related to Y or not. Thus, in a simple logistic regression model, the null and alternative hypothesis for a two-sided test are:

H0 : β1 = 0 H1 : β1 6= 0.

The procedures implemented in G * Power for this case es- timates the power of the Wald test. The standard normally distributed test statistic of the Wald test is:

z = β̂1√

var(β̂1)/N =

β̂1 SE(β1)

where β̂1 is the maximum likelihood estimator for parame- ter β1 and var(β̂1) the variance of this estimate.

In a multiple logistic regression model log(P/(1 − P)) = β0 + β1 x1 + · · · + β p xp the effect of a specific covariate in the presence of other covariates is tested. In this case the null hypothesis is H0 : [β1, β2, . . . , β p] = [0, β2, . . . , β p] and the alternative H1 : [β1, β2, . . . , β p] = [β̂, β2, . . . , β p], where β̂ 6= 0.

29.1 Effect size index

In the simple logistic model the effect of X on Y is given by the size of the parameter β1. Let p1 denote the probability of an event under H0, that is exp(β0) = p1/(1 − p1), and p2 the probability of an event under H1 at X = 1, that is exp(β0 + β1) = p2/(1− p2). Then exp(β0 + β1)/ exp(β0) = exp(β1) = [p2/(1 − p2)]/[p1/(1 − p1)] := odds ratio OR, which implies β1 = log[OR].

Given the probability p1 (input field Pr(Y=1|X=1) H0) the effect size is specified either directly by p2 (input field Pr(Y=1|X=1) H1) or optionally by the odds ratio (OR) (in- put field Odds ratio). Setting p2 = p1 or equivalently OR = 1 implies β1 = 0 and thus an effect size of zero. An effect size of zero must not be used in a priori analyses.

Besides these values the following additional inputs are needed

• R2 other X.

In models with more than one covariate, the influence of the other covariates X2, . . . , Xp on the power of the test can be taken into account by using a correction fac- tor. This factor depends on the proportion R2 = ρ21·23... p of the variance of X1 explained by the regression rela- tionship with X2, . . . , Xp. If N is the sample size consid- ering X1 alone, then the sample size in a setting with additional covariates is: N′ = N/(1 − R2). This cor- rection for the influence of other covariates has been proposed by Hsieh, Bloch, and Larsen (1998). R2 must lie in the interval [0, 1].

• X distribution:

Distribution of the Xi. There a 7 options:

1. Binomial [P(k) = (Nk )π k(1 − π)N−k, where k is

the number of successes (X = 1) in N trials of a Bernoulli process with probability of success π, 0 < π < 1 ]

2. Exponential [ f (x) = (1/λ)e−1/λ, exponential dis- tribution with parameter λ > 0]

3. Lognormal [ f (x) = 1/(xσ √

2π) exp[−(ln x − µ)2/(2σ2)], lognormal distribution with parame- ters µ and σ > 0.]

4. Normal [ f (x) = 1/(σ √

2π) exp[−(x − µ)2/(2σ2)], normal distribution with parame- ters µ and σ > 0)

5. Poisson (P(X = k) = (λk /k!)e−λ, Poisson distri- bution with parameter λ > 0)

6. Uniform ( f (x) = 1/(b − a) for a ≤ x ≤ b, f (x) = 0 otherwise, continuous uniform distribution in the interval [a, b], a < b)

7. Manual (Allows to manually specify the variance of β̂ under H0 and H1)

G * Power provides two different types of procedure to calculate power: An enumeration procedure and large sample approximations. The Manual mode is only available in the large sample procedures.

29.2 Options

Input mode You can choose between two input modes for the effect size: The effect size may be given by either speci- fying the two probabilities p1, p2 defined above, or instead by specifying p1 and the odds ratio OR.

Procedure G * Power provides two different types of pro- cedure to estimate power. An "enumeration procedure" pro- posed by Lyles, Lin, and Williamson (2007) and large sam- ple approximations. The enumeration procedure seems to provide reasonable accurate results over a wide range of situations, but it can be rather slow and may need large amounts of memory. The large sample approximations are much faster. Results of Monte-Carlo simulations indicate that the accuracy of the procedures proposed by Demi- denko (2007) and Hsieh et al. (1998) are comparable to that of the enumeration procedure for N > 200. The procedure base on the work of Demidenko (2007) is more general and

70

slightly more accurate than that proposed by Hsieh et al. (1998). We thus recommend to use the procedure proposed by Demidenko (2007) as standard procedure. The enumera- tion procedure of Lyles et al. (2007) may be used to validate the results (if the sample size is not too large). It must also be used, if one wants to compute the power for likelihood ratio tests.

1. The enumeration procedure provides power analy- ses for the Wald-test and the Likelihood ratio test. The general idea is to construct an exemplary data set with weights that represent response probabilities given the assumed values of the parameters of the X- distribution. Then a fit procedure for the generalized linear model is used to estimate the variance of the re- gression weights (for Wald tests) or the likelihood ratio under H0 and H1 (for likelihood ratio tests). The size of the exemplary data set increases with N and the enu- meration procedure may thus be rather slow (and may need large amounts of computer memory) for large sample sizes. The procedure is especially slow for anal- ysis types other then "post hoc", which internally call the power routine several times. By specifying a thresh- old sample size N you can restrict the use of the enu- meration procedure to sample sizes < N. For sample sizes ≥ N the large sample approximation selected in the option dialog is used. Note: If a computation takes too long you can abort it by pressing the ESC key.

2. G * Power provides two different large sample approx- imations for a Wald-type test. Both rely on the asymp- totic normal distribution of the maximum likelihood estimator for parameter β1 and are related to the method described by Whittemore (1981). The accuracy of these approximation increases with sample size, but the deviation from the true power may be quite no- ticeable for small and moderate sample sizes. This is especially true for X-distributions that are not sym- metric about the mean, i.e. the lognormal, exponential, and poisson distribution, and the binomial distribu- tion with π 6= 1/2.The approach of Hsieh et al. (1998) is restricted to binary covariates and covariates with standard normal distribution. The approach based on Demidenko (2007) is more general and usually more accurate and is recommended as standard procedure. For this test, a variance correction option can be se- lected that compensates for variance distortions that may occur in skewed X distributions (see implementa- tion notes). If the Hsieh procedure is selected, the pro- gram automatically switches to the procedure of Demi- denko if a distribution other than the standard normal or the binomial distribution is selected.

29.3 Possible problems

As illustrated in Fig. 36, the power of the test does not al- ways increase monotonically with effect size, and the max- imum attainable power is sometimes less than 1. In partic- ular, this implies that in a sensitivity analysis the requested power cannot always be reached. From version 3.1.8 on G * Power returns in these cases the effect size which maxi- mizes the power in the selected direction (output field "‘Ac- tual power"’). For an overview about possible problems, we

recommend to check the dependence of power on effect size in the plot window.

Covariates with a Lognormal distribution are especially problematic, because this distribution may have a very long tail (large positive skew) for larger values of m and s and may thus easily lead to numerical problems. In version 3.1.8 the numerical stability of the procedure has been consider- ably improved. In addition the power with highly skewed distributions may behave in an unintuitive manner and you should therefore check such cases carefully.

29.4 Examples

We first consider a model with a single predictor X, which is normally distributed with m = 0 and σ = 1. We as- sume that the event rate under H0 is p1 = 0.5 and that the event rate under H1 is p2 = 0.6 for X = 1. The odds ra- tio is then OR = (0.6/0.4)/(0.5/0.5) = 1.5, and we have β1 = log(OR) ≈ 0.405. We want to estimate the sample size necessary to achieve in a two-sided test with α = 0.05 a power of at least 0.95. We want to specify the effect size in terms of the odds ratio. When using the procedure of Hsieh et al. (1998) the input and output is as follows:

• Select Statistical test: Logistic Regression Type of power analysis: A priori

• Options: Effect size input mode: Odds ratio Procedure: Hsieh et al. (1998)

• Input Tail(s): Two Odds ratio: 1.5 Pr(Y=1) H0: 0.5 α err prob: 0.05 Power (1-β err prob): 0.95 R2 other X: 0 X distribution: Normal X parm µ: 0 X parm σ: 1

• Output Critical z: 1.959964 Total sample size: 317 Actual power: 0.950486

The results indicate that the necessary sample size is 317. This result replicates the value in Table II in Hsieh et al. (1998) for the same scenario. Using the other large sample approximation proposed by Demidenko (2007) we instead get N = 337 with variance correction and N = 355 without.

In the enumeration procedure proposed by Lyles et al. (2007) the χ2-statistic is used and the output is

• Output Noncentrality parameter λ: 13.029675 Critical χ2:3.841459 Df: 1 Total sample size: 358 Actual power: 0.950498

71

Thus, this routine estimates the minimum sample size in this case to be N = 358.

In a Monte-Carlo simulation of the Wald test in the above scenario with 50000 independent cases we found a mean power of 0.940, 0.953, 0.962, and 0.963 for samples sizes 317, 337, 355, and 358, respectively. This indicates that in this case the method based on Demidenko (2007) with variance correction yields the best approximation.

We now assume that we have additional covariates and estimate the squared multiple correlation with these others covariates to be R2 = 0.1. All other conditions are identical. The only change we need to make is to set the input field R2 other X to 0.1. Under this condition the necessary sample size increases from 337 to a value of 395 when using the procedure of Demidenko (2007) with variance correction.

As an example for a model with one binary covariate X we choose the values of the fourth example in Table I in Hsieh et al. (1998). That is, we assume that the event rate under H0 is p1 = 0.05, and the event rate under H0 with X = 1 is p2 = 0.1. We further assume a balanced design (π = 0.5) with equal sample frequencies for X = 0 and X = 1. Again we want to estimate the sample size necessary to achieve in a two-sided test with α = 0.05 a power of at least 0.95. We want to specify the effect size directly in terms of p1 and p2:

• Select Statistical test: Logistic Regression Type of power analysis: A priori

• Options: Effect size input mode: Two probabilities

• Input Tail(s): Two Pr(Y=1|X=1) H1: 0.1 Pr(Y=1|X=1) H0: 0.05 α err prob: 0.05 Power (1-β err prob): 0.95 R2 other X: 0 X Distribution: Binomial X parm π: 0.5

• Output Critical z: 1.959964 Total sample size: 1437 Actual power: 0.950068

According to these results the necessary sample size is 1437. This replicates the sample size given in Table I in Hsieh et al. (1998) for this example. This result is confirmed by the procedure proposed by Demidenko (2007) with vari- ance correction. The procedure without variance correction and the procedure of Lyles et al. (2007) for the Wald-test yield N = 1498. In Monte-Carlo simulations of the Wald test (50000 independent cases) we found mean power val- ues of 0.953, and 0.961 for sample sizes 1437, and 1498, respectively. According to these results, the procedure of Demidenko (2007) with variance correction again yields the best power estimate for the tested scenario.

Changing just the parameter of the binomial distribution (the prevalence rate) to a lower value π = 0.2, increases the sample size to a value of 2158. Changing π to an equally unbalanced but higher value, 0.8, increases the required

sample size further to 2368 (in both cases the Demidenko procedure with variance correction was used). These exam- ples demonstrate the fact that a balanced design requires a smaller sample size than an unbalanced design, and a low prevalence rate requires a smaller sample size than a high prevalence rate (Hsieh et al., 1998, p. 1625).

29.5 Related tests

• Poisson regression

29.6 Implementation notes

29.6.1 Enumeration procedure

The procedures for the Wald- and Likelihood ratio tests are implemented exactly as described in Lyles et al. (2007).

29.6.2 Large sample approximations

The large sample procedures for the univariate case are both related to the approach outlined in Whittemore (1981). The correction for additional covariates has been proposed by Hsieh et al. (1998). As large sample approximations they get more accurate for larger sample sizes.

Demidenko-procedure In the procedure based on Demi- denko (2007, 2008), the H0 distribution is the standard nor- mal distribution N(0, 1), the H1 distribution the normal dis- tribution N(m1, s1) with:

m1 = [√

N(1 − R2)/v1 ]

β1 (14)

s1 = √ (av0 + (1 − a)v1)/v1 (15)

where N denotes the sample size, R2 the squared multi- ple correlation coefficient of the covariate of interest on the other covariates, and v1 the variance of β̂1 under H1, whereas v0 is the variance of β̂1 for H0 with b∗0 = ln(µ/(1− µ)), with µ =

∫ fX(x) exp(b0 + b1 x)/(1 + exp(b0 + b1 x)dx.

For the procedure without variance correction a = 0, that is, s1 = 1. In the procedure with variance correction a = 0.75 for the lognormal distribution, a = 0.85 for the binomial distribution, and a = 1, that is s1 =

√ v0/v1, for all other

distributions. The motivation of the above setting is as follows: Under

H0 the Wald-statistic has asymptotically a standard nor- mal distribution, whereas under H1, the asymptotic nor- mal distribution of the test statistic has mean β1/se(β̂1) = β1/ √

v1/N, and standard deviation 1. The variance for fi- nite n is, however, biased (the degree depending on the X-distribution) and

√ (av0 + (1 − a)v1)/v1, which is close

to 1 for symmetric X distributions but deviate from 1 for skewed ones, gives often a much better estimate of the ac- tual standard deviation of the distribution of β̂1 under H1. This was confirmed in extensive Monte-Carlo simulations with a wide range of parameters and X distributions.

The procedure uses the result that the (m + 1) maximum likelihood estimators β0, βi, . . . , βm are asymptotically nor- mally distributed, where the variance-covariance matrix is

72

given by the inverse of the (m + 1)× (m + 1) Fisher infor- mation matrix I. The (i, j)th element of I is given by

Iij = −E [

∂2 log L ∂βi ∂β j

]

= NE[Xi Xj exp(β0 + β1 Xi + . . . + βm Xm)

1 + exp(β0 + β1 Xi + . . . + βm Xm))2

Thus, in the case of one continuous predictor, I is a 4 × 4 matrix with elements:

I00 = ∫ ∞ −∞

GX(x)dx

I10 = I01 = ∫ ∞ −∞

xGX(x)dx

I11 = ∫ ∞ −∞

x2 GX(x)dx

with

GX(x) := fX(x) exp(β0 + β1 x)

(1 + exp(β0 + β1 x))2 ,

where fX(x) is the PDF of the X distribution (for discrete predictors, the integrals must be replaced by corresponding sums). The element M11 of the inverse of this matrix (M = I−1), that is the variance of β1, is given by: M11 = Var(β) = I00/(I00 I11 − I201). In G * Power , numerical integration is used to compute these integrals.

To estimate the variance of β̂1 under H1, the parameter β0 and β1 in the equations for Iij are chosen as implied by the input, that is β0 = log[p1/(1 − p1)], β1 = log[OR]. To estimate the variance under H0, one chooses β1 = 0 and β0 = β

∗ 0, where β

∗ 0 is chosen as defined above.

Hsieh et al.-procedure The procedures proposed in Hsieh et al. (1998) are used. The samples size formula for continu- ous, normally distributed covariates is [Eqn (1) in Hsieh et al. (1998)]:

N = (z1−α/2 + z1−β)

2

p1(1 − p1)β̂2

where β̂ = log([p1/(1 − p1)]/[p2/(1 − p2)]) is the tested effect size, and p1, p2 are the event rates at the mean of X and one SD above the mean, respectively.

For binary covariates the sample size formula is [Eqn (2) in Hsieh et al. (1998)]:

N =

[ z1−α

√ pqB + z1−β

√ p1q1 + p2q2(1 − B)/B

]2 (p1 − p2)2(1 − B)

with qi := 1 − pi, q := (1 − p), where p = (1 − B)p1 + B p2 is the overall event rate, B is the proportion of the sample with X = 1, and p1, p2 are the event rates at X = 0 and X = 1 respectively.

29.7 Validation

To check the correct implementation of the procedure pro- posed by Hsieh et al. (1998), we replicated all examples pre- sented in Tables I and II in Hsieh et al. (1998). The single deviation found for the first example in Table I on p. 1626 (sample size of 1281 instead of 1282) is probably due to

rounding errors. Further checks were made against the cor- responding routine in PASS Hintze (2006) and we usually found complete correspondence. For multiple logistic re- gression models with R2 other X > 0, however, our values deviated slightly from the result of PASS. We believe that our results are correct. There are some indications that the reason for these deviations is that PASS internally rounds or truncates sample sizes to integer values.

To validate the procedures of Demidenko (2007) and Lyles et al. (2007) we conducted Monte-Carlo simulations of Wald tests for a range of scenarios. In each case 150000 independent cases were used. This large number of cases is necessary to get about 3 digits precision. In our experience, the common praxis to use only 5000, 2000 or even 1000 in- dependent cases in simulations (Hsieh, 1989; Lyles et al., 2007; Shieh, 2001) may lead to rather imprecise and thus misleading power estimates.

Table (2) shows the errors in the power estimates for different procedures. The labels "Dem(c)" and "Dem" denote the procedure of Demidenko (2007) with and without variance correction, the labels "LLW(W)" and "LLW(L)" the procedure of Lyles et al. (2007) for the Wald test and the likelihood ratio test, respectively. All six predefined distributions were tested (the parameters are given in the table head). The following 6 combi- nations of Pr(Y=1|X=1) H0 and sample size were used: (0.5,200),(0.2,300),(0.1,400),(0.05,600),(0.02,1000). These val- ues were fully crossed with four odds ratios (1.3, 1.5, 1.7, 2.0), and two alpha values (0.01, 0.05). Max and mean errors were calculated for all power values < 0.999. The results show that the precision of the procedures depend on the X distribution. The procedure of Demidenko (2007) with the variance correction proposed here, predicted the simulated power values best.

73

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

P ow

er 1

−β

Pr(Y=1|X=1)H1

Simulation (50000 samples) Demidenko (2007) with var.corr. Demidenko (2007)

Figure 36: Results of a simulation study investigating the power as a function of effect size for Logistic regression with covariate X ∼ Lognormal(0,1), p1 = 0.2, N = 100, α = 0.05, one-sided. The plot demonstrates that the method of Demidenko (2007), especially if combined with variance correction, predicts the power values in the simulation rather well. Note also that here the power does not increase monotonically with effect size (on the left side from Pr(Y = 1|X = 0) = p1 = 0.2) and that the maximum attainable power may be restricted to values clearly below one.

max error procedure tails bin(0.3) exp(1) lognorm(0,1) norm(0,1) poisson(1) uni(0,1)

Dem(c) 1 0.0132 0.0279 0.0309 0.0125 0.0185 0.0103 Dem(c) 2 0.0125 0.0326 0.0340 0.0149 0.0199 0.0101

Dem 1 0.0140 0.0879 0.1273 0.0314 0.0472 0.0109 Dem 2 0.0145 0.0929 0.1414 0.0358 0.0554 0.0106

LLW(W) 1 0.0144 0.0878 0.1267 0.0346 0.0448 0.0315 LLW(W) 2 0.0145 0.0927 0.1407 0.0399 0.0541 0.0259 LLW(L) 1 0.0174 0.0790 0.0946 0.0142 0.0359 0.0283 LLW(L) 2 0.0197 0.0828 0.1483 0.0155 0.0424 0.0232

mean error procedure tails binomial exp lognorm norm poisson uni

Dem(c) 1 0.0045 0.0113 0.0120 0.0049 0.0072 0.0031 Dem(c) 2 0.0064 0.0137 0.0156 0.0061 0.0083 0.0039

Dem 1 0.0052 0.0258 0.0465 0.0069 0.0155 0.0035 Dem 2 0.0049 0.0265 0.0521 0.0080 0.0154 0.0045

LLW(W) 1 0.0052 0.0246 0.0469 0.0078 0.0131 0.0111 LLW(W) 2 0.0049 0.0253 0.0522 0.0092 0.0139 0.0070 LLW(L) 1 0.0074 0.0174 0.0126 0.0040 0.0100 0.0115 LLW(L) 2 0.0079 0.0238 0.0209 0.0047 0.0131 0.0073

Table 1: Results of simulation. Shown are the maximum and mean error in power for different procedures (see text).

74

30 Z test: Poisson Regression

A Poisson regression model describes the relationship be- tween a Poisson distributed response variable Y (a count) and one or more independent variables (covariates or pre- dictors) Xi, which are themselves random variables with probability density fX(x).

The probability of y events during a fixed ‘exposure time’ t is:

Pr(Y = y|λ, t) = e−λt(λt)y

y!

It is assumed that the parameter λ of the Poisson distribu- tion, the mean incidence rate of an event during exposure time t, is a function of the Xi’s. In the Poisson regression model considered here, λ depends on the covariates Xi in the following way:

λ = exp(β0 + β1 X1 + β2 X2 + · · ·+ βm Xm)

where β0, . . . , βm denote regression coefficients that are es- timated from the data Frome (1986).

In a simple Poisson regression with just one covariate X1, the procedure implemented in G * Power estimates the power of the Wald test, applied to decide whether covariate X1 has an influence on the event rate or not. That is, the null and alternative hypothesis for a two-sided test are:

H0 : β1 = 0 H1 : β1 6= 0.

The standard normally distributed test statistic of the Wald test is:

z = β̂1√

var(β̂1)/N =

β̂1 SE(β1)

where β̂1 is the maximum likelihood estimator for parame- ter β1 and var(β̂1) the variance of this estimate.

In a multiple Poisson regression model: λ = exp(β0 + β1 X1 + · · ·+ βm Xm), m > 1, the effect of a specific covariate in the presence of other covariates is tested. In this case the null and alternative hypotheses are:

H0 : [β1, β2, . . . , βm] = [0, β2, . . . , βm] H1 : [β1, β2, . . . , βm] = [β∗1, β2, . . . , βm]

where β∗1 > 0.

30.1 Effect size index

The effect size is specified by the ratio R of λ under H1 to λ under H0:

R = exp(β0 + β1 X1)

exp(β0) = exp β1 X1

. The following additional inputs are needed

• Exp(β1).

This is the value of the λ-ratio R defined above for X = 1, that is the relative increase of the event rate over the base event rate exp(β0) assumed under H0, if X is increased one unit. If, for instance, a 10% increase over

the base rate is assumed if X is increased by one unit, this value is set to (100+10)/100 = 1.1.

An input of exp(β1) = 1 corresponds to "no effect" and must not be used in a priori calculations.

• Base rate exp(β0).

This is the mean event rate assumed under H0. It must be greater than 0.

• Mean exposure.

This is the time unit during which the events are counted. It must be greater than 0.

• R2 other X.

In models with more than one covariate, the influence of the other covariates X2, . . . , Xp on the power of the test can be taken into account by using a correction fac- tor. This factor depends on the proportion R2 = ρ21·23... p of the variance of X1 explained by the regression rela- tionship with X2, . . . , Xp. If N is the sample size consid- ering X1 alone, then the sample size in a setting with additional covariates is: N′ = N/(1 − R2). This cor- rection for the influence of other covariates is identical to that proposed by Hsieh et al. (1998) for the logistic regression.

In line with the interpretation of R2 as squared corre- lation it must lie in the interval [0, 1].

• X distribution:

Distribution of the Xi. There a 7 options:

1. Binomial [P(k) = (Nk )π k(1 − π)N−k, where k is

the number of successes (X = 1) in N trials of a Bernoulli process with probability of success π, 0 < π < 1 ]

2. Exponential [ f (x) = (1/λ)e−1/λ, exponential dis- tribution with parameter λ > 0]

3. Lognormal [ f (x) = 1/(xσ √

2π) exp[−(ln x − µ)2/(2σ2)], lognormal distribution with parame- ters µ and σ > 0.]

4. Normal [ f (x) = 1/(σ √

2π) exp[−(x − µ)2/(2σ2)], normal distribution with parame- ters µ and σ > 0)

5. Poisson (P(X = k) = (λk /k!)e−λ, Poisson distri- bution with parameter λ > 0)

6. Uniform ( f (x) = 1/(b − a) for a ≤ x ≤ b, f (x) = 0 otherwise, continuous uniform distribution in the interval [a, b], a < b)

7. Manual (Allows to manually specify the variance of β̂ under H0 and H1)

G * Power provides two different procedures— an enumeration procedure and a large sample approximation—to calculate power. The manual mode is only available in the large sample procedure.

75

30.2 Options

G * Power provides two different types of procedure to es- timate power. An "enumeration procedure" proposed by Lyles et al. (2007) and large sample approximations. The enumeration procedure seems to provide reasonable accu- rate results over a wide range of situations, but it can be rather slow and may need large amounts of memory. The large sample approximations are much faster. Results of Monte-Carlo simulations indicate that the accuracy of the procedure based on the work of Demidenko (2007) is com- parable to that of the enumeration procedure for N > 200, whereas errors of the procedure proposed by Signorini (1991) can be quite large. We thus recommend to use the procedure based on Demidenko (2007) as the standard pro- cedure. The enumeration procedure of Lyles et al. (2007) may be used for small sample sizes and to validate the re- sults of the large sample procedure using an a priori anal- ysis (if the sample size is not too large). It must also be used, if one wants to compute the power for likelihood ra- tio tests. The procedure of Signorini (1991) is problematic and should not be used; it is only included to allow checks of published results referring to this widespread procedure.

1. The enumeration procedure provides power analy- ses for the Wald-test and the Likelihood ratio test. The general idea is to construct an exemplary data set with weights that represent response probabilities given the assumed values of the parameters of the X- distribution. Then a fit procedure for the generalized linear model is used to estimate the variance of the re- gression weights (for Wald tests) or the likelihood ratio under H0 and H1 (for likelihood ratio tests). The size of the exemplary data set increases with N and the enu- meration procedure may thus be rather slow (and may need large amounts of computer memory) for large sample sizes. The procedure is especially slow for anal- ysis types other then "post hoc", which internally call the power routine several times. By specifying a thresh- old sample size N you can restrict the use of the enu- meration procedure to sample sizes < N. For sample sizes ≥ N the large sample approximation selected in the option dialog is used. Note: If a computation takes too long you can abort it by pressing the ESC key.

2. G * Power provides two different large sample approx- imations for a Wald-type test. Both rely on the asymp- totic normal distribution of the maximum likelihood estimator β̂. The accuracy of these approximation in- creases with sample size, but the deviation from the true power may be quite noticeable for small and moderate sample sizes. This is especially true for X- distributions that are not symmetric about the mean, i.e. the lognormal, exponential, and poisson distribu- tion, and the binomial distribution with π 6= 1/2. The procedure proposed by Signorini (1991) and vari- ants of it Shieh (2001, 2005) use the "null variance for- mula" which is not correct for the test statistic assumed here (and that is used in existing software) Demidenko (2007, 2008). The other procedure which is based on the work of Demidenko (2007) on logistic regression is usually more accurate. For this test, a variance correc- tion option can be selected that compensates for vari-

ance distortions that may occur in skewed X distribu- tions (see implementation notes).

30.3 Examples

We replicate the example given on page 449 in Signorini (1991). The number of infections Y of swimmers (X = 1) vs. non-swimmers (X=0) during a swimming season (ex- posure time = 1) is tested. The infection rate is modeled as a Poisson distributed random variable. X is assumed to be binomially distributed with π = 0.5 (equal numbers of swimmers and non-swimmers are sampled). The base rate, that is, the infection rate in non-swimmers, is estimated to be 0.85. The significance level is α = 0.05. We want to know the sample size needed to detect a 30% or greater increase in infection rate with a power of 0.95. A 30% increase im- plies a relative rate of 1.3 ([100%+30%]/100%).

We first choose to use the procedure of Signorini (1991)

• Select Statistical test: Regression: Poisson Regression Type of power analysis: A priori

• Input Tail(s): One Exp(β1): 1.3 α err prob: 0.05 Power (1-β err prob): 0.95 Base rate exp(β0): 0.85 Mean exposure: 1.0 R2 other X: 0 X distribution: Binomial X parm π: 0.5

• Output Critical z: 1.644854 Total sample size: 697 Actual power: 0.950121

The result N = 697 replicates the value given in Signorini (1991). The other procedures yield N = 649 (Demidenko (2007) with variance correction, and Lyles et al. (2007) for Likelihood-Ratio tests), N = 655 (Demidenko (2007) with- out variance correction, and Lyles et al. (2007) for Wald tests). In Monte-Carlo simulations of the Wald test for this scenario with 150000 independent cases per test a mean power of 0.94997, 0.95183, and 0.96207 was found for sam- ple sizes of 649, 655, and 697, respectively. This simulation results suggest that in this specific case the procedure based on Demidenko (2007) with variance correction gives the most accurate estimate.

We now assume that we have additional covariates and estimate the squared multiple correlation with these others covariates to be R2 = 0.1. All other conditions are identical. The only change needed is to set the input field R2 other X to 0.1. Under this condition the necessary sample size increases to a value of 774 (when we use the procedure of Signorini (1991)).

Comparison between procedures To compare the accu- racy of the Wald test procedures we replicated a number of test cases presented in table III in Lyles et al. (2007) and conducted several additional tests for X-distributions not

76

considered in Lyles et al. (2007). In all cases a two-sided test with N = 200, β0 = 0.5, α = 0.05 is assumed.

If we use the procedure of Demidenko (2007), then the complete G * Power input and output for the test in the first row of table (2) below would be:

• Select Statistical test: Regression: Poisson Regression Type of power analysis: Post hoc

• Input Tail(s): Two Exp(β1): =exp(-0.1) α err prob: 0.05 Total sample size: 200 Base rate exp(β0): =exp(0.5) Mean exposure: 1.0 R2 other X: 0 X distribution: Normal X parm µ: 0 X parm σ: 1

• Output Critical z: -1.959964 Power (1-β err prob): 0.444593

When using the enumeration procedure, the input is ex- actly the same, but in this case the test is based on the χ2- distribution, which leads to a different output.

• Output Noncentrality parameter λ: 3.254068 Critical χ2: 3.841459 Df: 1 Power (1-β err prob): 0.438076

The rows in table (2) show power values for different test scenarios. The column "Dist. X" indicates the distribution of the predictor X and the parameters used, the column "β1" contains the values of the regression weight β1 under H1, "Sim LLW" contains the simulation results reported by Lyles et al. (2007) for the test (if available), "Sim" contains the results of a simulation done by us with considerable more cases (150000 instead of 2000). The following columns contain the results of different procedures: "LLW" = Lyles et al. (2007), "Demi" = Demidenko (2007), "Demi(c)" = Demi- denko (2007) with variance correction, and "Signorini" Sig- norini (1991).

30.4 Related tests

• Logistic regression

30.5 Implementation notes

30.5.1 Enumeration procedure

The procedures for the Wald- and Likelihood ratio tests are implemented exactly as described in Lyles et al. (2007).

30.5.2 Large sample approximations

The large sample procedures for the univariate case use the general approach of Signorini (1991), which is based on the approximate method for the logistic regression described in

Whittemore (1981). They get more accurate for larger sam- ple sizes. The correction for additional covariates has been proposed by Hsieh et al. (1998).

The H0 distribution is the standard normal distribu- tion N(0, 1), the H1 distribution the normal distribution N(m1, s1) with:

m1 = β1 √

N(1 − R2) · t/v0 (16)

s1 = √

v1/v0 (17)

in the Signorini (1991) procedure, and

m1 = β1 √

N(1 − R2) · t/v1 (18) s1 = s

? (19)

in the procedure based on Demidenko (2007). In these equa- tions t denotes the mean exposure time, N the sample size, R2 the squared multiple correlation coefficient of the co- variate of interest on the other covariates, and v0 and v1 the variance of β̂1 under H0 and H1, respectively. With- out variance correction s? = 1, with variance correction s? =

√ (av?0 + (1 − a)v1)/v1, where v

? 0 is the variance un-

der H0 for β?0 = log(µ), with µ = ∫

fX(x) exp(β0 + β1 x)dx. For the lognormal distribution a = 0.75; in all other cases a = 1. With variance correction, the value of s? is often close to 1, but deviates from 1 for X-distributions that are not symmetric about the mean. Simulations showed that this compensates for variance inflation/deflation in the dis- tribution of β̂1 under H1 that occurs with such distributions finite sample sizes.

The large sample approximations use the result that the (m + 1) maximum likelihood estimators β0, βi, . . . , βm are asymptotically (multivariate) normal distributed, where the variance-covariance matrix is given by the inverse of the (m + 1) × (m + 1) Fisher information matrix I. The (i, j)th element of I is given by

Iij = −E [

∂2 log L ∂βi ∂β j

] = NE[Xi Xj e

β0+β1 Xi+...+βm Xm ]

Thus, in the case of one continuous predictor I is a 4 × 4 matrix with elements:

I00 = ∫ ∞ −∞

f (x) exp(β0 + β1 x)dx

I10 = I01 = ∫ ∞ −∞

f (x)x exp(β0 + β1 x)dx

I11 = ∫ ∞ −∞

f (x)x2 exp(β0 + β1 x)dx

where f (x) is the PDF of the X distribution (for discrete predictors the integrals must be replaced by correspond- ing sums). The element M11 of the inverse of this ma- trix (M = I−1), that is the variance of β1, is given by: M11 = Var(β1) = I00/(I00 I11 − I201). In G * Power , numeri- cal integration is used to compute these integrals.

To estimate the variance of β̂1 under H1, the parameter β0 and β1 in the equations for Iij are chosen as specified in the input. To estimate the variance under H0, one chooses β1 = 0 and β0 = β?. In the procedures proposed by Signorini (1991) β? = β0. In the procedure with variance correction, β? is chosen as defined above.

77

Dist. X β1 Sim LLW Sim LLW Demi Demi(c) Signorini N(0,1) -0.10 0.449 0.440 0.438 0.445 0.445 0.443 N(0,1) -0.15 0.796 0.777 0.774 0.782 0.782 0.779 N(0,1) -0.20 0.950 0.952 0.953 0.956 0.956 0.954

Unif(0,1) -0.20 0.167 0.167 0.169 0.169 0.169 0.195 Unif(0,1) -0.40 0.478 0.472 0.474 0.475 0.474 0.549 Unif(0,1) -0.80 0.923 0.928 0.928 0.928 0.932 0.966

Logn(0,1) -0.05 0.275 0.298 0.305 0.320 0.291 0.501 Logn(0,1) -0.10 0.750 0.748 0.690 0.695 0.746 0.892 Logn(0,1) -0.15 0.947 0.947 0.890 0.890 0.955 0.996 Poiss(0.5) -0.20 - 0.614 0.599 0.603 0.613 0.701 Poiss(0.5) -0.40 - 0.986 0.971 0.972 0.990 0.992

Bin(0.2) -0.20 - 0.254 0.268 0.268 0.254 0.321 Bin(0.2) -0.40 - 0.723 0.692 0.692 0.716 0.788

Exp(3) -0.40 - 0.524 0.511 0.518 0.521 0.649 Exp(3) -0.70 - 0.905 0.868 0.871 0.919 0.952

Table 2: Power for different test scenarios as estimated with different procedures (see text)

The integrals given above happen to be proportional to derivatives of the moment generating function g(x) := E(etX) of the X distribution, which is often known in closed form. Given g(x) and its derivatives the variance of β is cal- culated as:

Var(β) = g(β)

g(β)g′′(β)− g′(β)2 ·

1 exp(β0)

With this definition, the variance of β̂1 under H0 is v0 = lim β→0

Var(β) and the variance of β̂1 under H1 is v1 =

Var(β1). The moment generating function for the lognormal dis-

tribution does not exist. For the other five predefined dis- tributions this leads to:

1. Binomial distribution with parameter π

g(x) = (1 − π + πex)n

v0 exp(β0) = π/(1 − π) v1 exp(β0) = 1/(1 − π) + 1/(π exp(β1))

2. Exponential distribution with parameter λ

g(x) = (1 − x/λ)−1

v0 exp(β0) = λ 2

v1 exp(β0) = (λ − β1)3/λ

3. Normal distribution with parameters (µ, σ)

g(x) = exp(µx + σ2 x2

2 )

v0 exp(β0) = 1/σ 2

v1 exp(β0) = exp(−[β1µ + (β1σ)2/2])

4. Poisson distribution with parameter λ

g(x) = exp(λ(ex − 1)) v0 exp(β0) = 1/λ v1 exp(β0) = exp(−β1 + λ − exp(β1)λ)/λ

5. Uniform distribution with parameters (u, v) corre- sponding to the interval borders.

g(x) = exv − exu x(v − u)

v0 exp(β0) = 12/(u − v)2

v1 exp(β0) = β31(hu − hv)(u − v)

h2u + h2v − hu+v(2 + β21(u − v)2) hx = exp(β1 x)

In the manual mode you have to calculate the variances v0 and v1 yourself. To illustrate the necessary steps, we show the calculation for the exponential distribution (given above) in more detail:

g(β) = (1 − β/λ)−1 = λ

λ − β

g′(β) = λ

(λ − β)2

g′′(β) = 2λ

(λ − β)3

Var(β)/ exp(β0) = g(β)

g(β)g′′(β)− g′(x)2

= (λ − β)3

λ v0 exp(β0) = Var(0) exp(β0)

= λ2

v1 exp(β0) = Var(β1) exp(β0)

= (λ − β1)3/λ

Note, however, that sensitivity analyses in which the effect size is estimated, are not possible in the manual mode. This is so, because in this case the argument β of the function Var(β) is not constant under H1 but the target of the search.

30.6 Validation

The procedure proposed by Signorini (1991) was checked against the corresponding routine in PASS (Hintze, 2006) and we found perfect agreement in one-sided tests. (The results of PASS 06 are wrong for normally distributed X

78

with σ 6= 1; this error has been corrected in the newest version PASS 2008). In two-sided tests we found small de- viations from the results calculated with PASS. The reason for these deviations is that the (usually small) contribution to the power from the distant tail is ignored in PASS but not in G * Power . Given the generally low accuracy of the Signorini (1991) procedure, these small deviations are of no practical consequence.

The results of the other two procedures were compared to the results of Monte-Carlo simulations and with each other. The results given in table (2) are quite representative of these more extensive test. These results indicate that the deviation of the computed power values may deviate from the true values by ±0.05.

79

31 Z test: Tetrachoric Correlation

31.0.1 Background

In the model underlying tetrachoric correlation it is as- sumed that the frequency data in a 2×2-table stem from di- chotimizing two continuous random variables X and Y that are bivariate normally distributed with mean m = (0, 0) and covariance matrix:

Σ = (

1 ρ ρ 1

) .

The value ρ in this (normalized) covariance matrix is the tetrachoric correlation coefficient. The frequency data in the table depend on the criterion used to dichotomize the marginal distributions of X and Y and on the value of the tetrachoric correlation coefficient.

From a theoretical perspective, a specific scenario is com- pletely characterized by a 2 × 2 probability matrix:

X = 0 X = 1 Y = 0 p11 p12 p1? Y = 1 p21 p22 p2?

p?1 p?2 1

where the marginal probabilities are regarded as fixed. The whole matrix is completely specified, if the marginal prob- abilities p?2 = Pr(X = 1), p2? = Pr(Y = 1) and the table probability p11 are given. If z1 and z2 denote the quantiles of the standard normal distribution corresponding to the marginal probabilities p?1 and p1?, that is Φ(zx) = p?1 and Φ(zy) = p1?, then p1,1 is the CDF of the bivariate normal distribution described above, with the upper limits zx and zy:

p11 = ∫ zy −∞

∫ zx −∞

Φ(x, y, r)dxdy

where Φ(x, y, r) denotes the density of the bivariate normal distribution. It should be obvious that the upper limits zx and zy are the values at which the variables X and Y are dichotomized.

Observed frequency data are assumed to be random sam- ples from this theoretical distribution. Thus, it is assumed that random vectors (xi, yi) have been drawn from the bi- variate normal distribution described above that are after- wards assigned to one of the four cells according to the ‘column’ criterion xi ≤ zx vs xi > zx and the ‘row’ criterion yi ≤ zy vs yi > zy. Given a frequency table:

X = 0 X = 1 Y = 0 a b a + b Y = 1 c d c + d

a + c b + d N

the central task in testing specific hypotheses about tetra- choric correlation coefficients is to estimate the correlation coefficient and its standard error from this frequency data. Two approaches have be proposed. One is to estimate the exact correlation coefficient (e.g. Brown & Benedetti, 1977), the other to use simple approximations ρ∗ of ρ that are eas- ier to compute (e.g. Bonett & Price, 2005). G * Power pro- vides power analyses for both approaches. (See, however, the implementation notes for a qualification of the term ‘ex- act’ used to distinguish between both approaches.)

The exact computation of the tetrachoric correlation coef- ficient is difficult. One reason is of a computational nature (see implementation notes). A more principal problem is, however, that frequency data are discrete, which implies that the estimation of a cell probability can be no more ac- curate than 1/(2N). The inaccuracies in estimating the true correlation ρ are especially severe when there are cell fre- quencies less than 5. In these cases caution is necessary in interpreting the estimated r. For a more thorough discus- sion of these issues see Brown and Benedetti (1977) and Bonett and Price (2005).

31.0.2 Testing the tetrachoric correlation coefficient

The implemented routines estimate the power of a test that the tetrachoric correlation ρ has a fixed value ρ0. That is, the null and alternative hypothesis for a two-sided test are:

H0 : ρ − ρ0 = 0 H1 : ρ − ρ0 6= 0.

The hypotheses are identical for both the exact and the ap- proximation mode.

In the power procedures the use of the Wald test statistic: W = (r − ρ0)/se0(r) is assumed, where se0(r) is the stan- dard error computed at ρ = ρ0. As will be illustrated in the example section, the output of G * Power may be also be used to perform the statistical test.

31.1 Effect size index

The correlation coefficient assumed under H1 (H1 corr ρ) is used as effect size.

The following additional inputs are needed to fully spec- ify the effect size

• H0 corr ρ.

This is the tetrachoric correlation coefficient assumed under H0. An input H1 corr ρ = H0 corr ρ corre- sponds to "no effect" and must not be used in a priori calculations.

• Marginal prop x.

This is the marginal probability that X > zx (i.e. p∗2)

• Marginal prop y.

This is the marginal probability that Y > zy (i.e. p2∗)

The correlations must lie in the interval ] − 1, 1[, and the probabilities in the interval ]0, 1[.

Effect size calculation The effect size dialog may be used to determine H1 corr ρ in two different ways:

• A first possibility is to specify for each cell of the 2 × 2 table the probability of the corresponding event assumed under H1. Pressing the "Calculate" button calculates the exact (Correlation ρ) and approximate (Approx. correlation ρ∗) tetrachoric correlation co- efficient, and the marginal probabilities Marginal prob x = p12 + p22, and Marginal prob y = p21 + p22. The exact correlation coefficient is used as H1 corr ρ. (see left panel in Fig. 37).

80

Note: The four cell probabilities must sum to 1. It there- fore suffices to specify three of them explicitly. If you leave one of the four cells empty, G * Power computes the fourth value as: (1 - sum of three p).

• A second possibility is to compute a confidence in- terval for the tetrachoric correlation in the population from the results of a previous investigation, and to choose a value from this interval as H1 corr ρ. In this case you specify four observed frequencies, the rela- tive position 0 < k < 1 inside the confidence interval (0, 0.5, 1 corresponding to the left, central, and right position, respectively), and the confidence level (1− α) of the confidence interval (see right panel in Fig. 37).

From this data G * Power computes the total sample size N = f11 + f12 + f21 + f22 and estimates the cell probabilities pij by: pij = ( fij + 0.5)/(N + 2). These are used to compute the sample correlation coefficient r, the estimated marginal probabilities, the borders (L, R) of the (1 − α) confidence interval for the population correlation coefficient ρ, and the standard error of r. The value L + (R − L) ∗ k is used as H1 corr ρ. The computed correlation coefficient, the confidence inter- val and the standard error of r depend on the choice for the exact Brown and Benedetti (1977) vs. the approxi- mate Bonett and Price (2005) computation mode, made in the option dialog. In the exact mode, the labels of the output fields are Correlation r, C.I. ρ lwr, C.I. ρ upr, and Std. error of r, in the approximate mode an asterisk ∗ is appended after r and ρ.

Clicking on the button Calculate and transfer to main window copies the values given in H1 corr ρ, Margin prob x, Margin prob y, and - in frequency mode - Total sample size to the corresponding input fields in the main window.

31.2 Options

You can choose between the exact approach in which the procedure proposed by Brown and Benedetti (1977) is used and the approximation suggested by Bonett and Price (2005).

31.3 Examples

To illustrate the application of the procedure we refer to ex- ample 1 in Bonett and Price (2005): The Yes or No answers of 930 respondents to two questions in a personality inven- tory are recorded in a 2 × 2-table with the following result: f11 = 203, f12 = 186, f21 = 167, f22 = 374.

First we use the effect size dialog to compute from these data the confidence interval for the tetrachoric correlation in the population. We choose in the effect size drawer, From C.I. calculated from observed freq. Next, we in- sert the above values in the corresponding fields and press Calculate. Using the ‘exact’ computation mode (selected in the Options dialog in the main window), we get an esti- mated correlation r = 0.334, a standard error of r = 0.0482, and the 95% confidence interval [0.240, 0.429] for the pop- ulation ρ. We choose the left border of the C.I. (i.e. relative position 0, corresponding to 0.240) as the value of the tetra- choric correlation coefficient ρ under H0.

We now want to know, how many subjects we need to a achieve a power of 0.95 in a one-sided test of the H0 that ρ = 0 vs. the H1 ρ = 0.24, given the same marginal proba- bilities and α = 0.05.

Clicking on ‘Calculate and transfer to main window’ copies the computed H1 corr ρ = 0.2399846 and the marginal probabilities px = 0.602 and py = 0.582 to the corresponding input fields in the main window. The com- plete input and output is as follows:

• Select Statistical test: Correlation: Tetrachoric model Type of power analysis: A priori

• Input Tail(s): One H1 corr ρ: 0.2399846 α err prob: 0.05 Power (1-β err prob):0.95 H0 corr ρ: 0 Marginal prob x: 0.6019313 Marginal prob y: 0.5815451

• Output Critical z: 1.644854 Total sample size: 463 Actual power: 0.950370 H1 corr ρ: 0.239985 H0 corr ρ: 0.0 Critical r lwr: 0.122484 Critical r upr: 0.122484 Std err r: 0.074465

This shows that we need at least a sample size of 463 in this case (the Actual power output field shows the power for a sample size rounded to an integer value).

The output also contains the values for ρ under H0 and H1 used in the internal computation procedure. In the ex- act computation mode a deviation from the input values would indicate that the internal estimation procedure did not work correctly for the input values (this should only oc- cur at extreme values of r or marginal probabilities). In the approximate mode, the output values correspond to the r values resulting from the approximation formula.

The remaining outputs show the critical value(s) for r un- der H0: In the Wald test assumed here, z = (r − ρ0)/se0(r) is approximately standard normally distributed under H0. The critical values of r under H0 are given (a) as a quan- tile z1−α/2 of the standard normal distribution, and (b) in the form of critical correlation coefficients r and standard error se0(r). (In one-sided tests, the single critical value is reported twice in Critical r lwr and Critical r upr). In the example given above, the standard error of r under H0 is 0.074465, and the critical value for r is 0.122484. Thus, (r − ρ0)/se(r) = (0.122484 − 0)/0.074465 = 1.64485 = z1−α as expected.

Using G * Power to perform the statistical test of H0 G * Power may also be used to perform the statistical test of H0. Assume that we want to test the H0: ρ = ρ0 = 0.4 vs the two-sided alternative H1: ρ 6= 0.4 for α = 0.05. As- sume further that we observed the frequencies f11 = 120,

81

Figure 37: Effect size drawer to calculate H1 ρ, and the marginal probabilities (see text).

f12 = 45, f21 = 56, and f22 = 89. To perform the test we first use the option "From C.I. calculated from observed freq" in the effect size dialog to compute from the ob- served frequencies the correlation coefficient r and the es- timated marginal probabilities. In the exact mode we find r = 0.513, "Est. marginal prob x" = 0.433, and "Est. marginal prob y" = 0.468. In the main window we then choose a "post hoc" analysis. Clicking on "Calculate and transfer to main window" in the effect size dialog copies the values for marginal x, marginal y, and the sample size 310 to the main window. We now set "H0 corr ρ" to 0.4 and "α err prob" to 0.05. After clicking on "Calculate" in the main win- dow, the output section shows the critical values for the correlation coefficient([0.244, 0.555]) and the standard er- ror under H0 (0.079366). These values show that the test is not significant for the chosen α-level, because the observed r = 0.513 lies inside the interval [0.244, 0.555]. We then use the G * Power calculator to compute the p-value. In- serting z = (0.513-0.4)/0.0794; 1-normcdf(z,0,1) and clicking on the "Calculate" button yields p = 0.077.

If we instead want to use the approximate mode, we first choose this option in the "Options" dialog and then pro- ceed in essentially the same way. In this case we find a very similar value for the correlation coefficient r∗ = 0.509.

The critical values for r∗ given in the output section of the main window are [0.233, 0.541] and the standard error for r∗ is 0.0788. Note: To compute the p-Value in the approx- imate mode, we should use H0 corr ρ∗ given in the out- put and not H0 corr ρ specified in the input. Accordingly, using the following input in the G * Power calculator z = (0.509-0.397)/0.0788; 1-normcdf(z,0,1) yields p = 0.0776, a value very close to that given above for the exact mode.

31.4 Related tests

31.5 Implementation notes

Given ρ and the marginal probabilties px and py, the fol- lowing procedures are used to calculate the value of ρ (in the exact mode) or ρ∗ (in the approximate mode) and to estimate the standard error of r and r∗.

31.5.1 Exact mode

In the exact mode the algorithms proposed by Brown and Benedetti (1977) are used to calculate r and to estimate the standard error s(r). Note that the latter is not the expected

82

standard error σ(r)! To compute σ(r) would require to enu- merate all possible tables Ti for the given N. If p(Ti) and ri denote the probability and the correlation coefficient of ta- ble i, then σ2(r) = ∑i(ri − ρ)2 p(Ti) (see Brown & Benedetti, 1977, p. 349) for details]. The number of possible tables in- creases rapidly with N and it is therefore in general compu- tationally too expensive to compute this exact value. Thus, ‘exact’ does not mean, that the exact standard error is used in the power calculations.

In the exact mode it is not necessary to estimate r in order to calculate power, because it is already given in the input. We nevertheless report the r calculated by the routine in the output to indicate possible limitations in the precision of the routine for |r| near 1. Thus, if the r’s reported in the output section deviate markedly from those given in the input, all results should be interpreted with care.

To estimate s(r) the formula based on asymptotic theory proposed by Pearson in 1901 is used:

s(r) = 1

N3/2φ(zx , zy, r) {(a + d)(b + d)/4+

+(a + c)(b + d)Φ22 + (a + b)(c + d)Φ 2 1 +

+2(ad − bc)Φ1Φ2 − (ab − cd)Φ2 − −(ac − bd)Φ1}

1/2

or with respect to cell probabilities:

s(r) = 1

N1/2φ(zx , zy, r) {(p11 + p22)(p12 + p22)/4+

+(p11 + p21)(p12 + p22)Φ 2 2 +

+(p11 + p12)(p21 + p22)Φ 2 1 +

+2(p11 p22 − p12 p21)Φ1Φ2 − −(p11 p12 − p21 p22)Φ2 − −(p11 p21 − p12 p22)Φ1}

1/2

where

Φ1 = φ (

zx − rzy (1 − r2)1/2

) − 0.5,

Φ2 = φ (

zy − rzx (1 − r2)1/2

) − 0.5, and

Φ(zx , zy, r) = 1

2π(1 − r2)1/2 ×

×exp [ −

z2x − 2rzx zy + z2y) 2(1 − r2)

] .

Brown and Benedetti (1977) show that this approximation is quite good if the minimal cell frequency is at least 5 (see tables 1 and 2 in Brown et al.).

31.5.2 Approximation mode

Bonett and Price (2005) propose the following approxima- tions.

Correlation coefficient Their approximation ρ∗ of the tetrachoric correlation coefficient is:

ρ∗ = cos(π/(1 + ωc)),

where c = (1 −|p1∗− p∗1|/5 − (1/2 − pm)2)/2, with p1∗ = p11 + p12, p∗1 = p11 + p21, pm = smallest marginal propor- tion, and ω = p11 p22/(p12 p21).

The same formulas are used to compute an estimator r∗ from frequency data fij. The only difference is that esti- mates p̂ij = ( fij + 0.5)/N of the true probabilities are used.

Confidence Interval The 100 ∗ (1 − α) confidence interval for r∗ is computed as follows:

C I = [cos(π/(1 + Lĉ)), cos(π/(1 + Uĉ))],

where

L = exp(ln ω̂ + zα/2s(ln ω̂)) U = exp(ln ω̂ − zα/2s(ln ω̂))

s(ln ω̂) = {

1 N

( 1

p̂11 +

1 p̂12

+ 1

p̂21 +

1 p̂22

)}1/2 and zα/2 is the α/2 quartile of the standard normal distri- bution.

Asymptotic standard error for r∗ The standard error is given by:

s(r∗) = k {

1 N

( 1

p̂11 +

1 p̂12

+ 1

p̂21 +

1 p̂22

)}1/2 with

k = πĉw sin[π/(1 + w)]

(1 + w)2

where w = ω̂ĉ.

31.5.3 Power calculation

The H0 distribution is the standard normal distribution N(0, 1). The H1 distribution is the normal distribution with mean N(m1, s1), where:

m1 = (ρ − ρ0)/sr0 s1 = sr1/sr0

The values sr0 and sr1 denote the standard error (approxi- mate or exact) under H0 and H1.

31.6 Validation

The correctness of the procedures to calc r, s(r) and r∗, s(r∗) was checked by reproducing the examples in Brown and Benedetti (1977) and Bonett and Price (2005), respectively. The soundness of the power routines were checked by Monte-Carlo simulations, in which we found good agree- ment between simulated and predicted power.

83

References

Armitage, P., Berry, G., & Matthews, J. (2002). Statistical methods in medical research (4th edition ed.). Blackwell Science Ltd.

Barabesi, L., & Greco, L. (2002). A note on the exact com- putation of the student t snedecor f and sample corre- lation coefficient distribution functions. Journal of the Royal Statistical Society: Series D (The Statistician), 51, 105-110.

Benton, D., & Krishnamoorthy, K. (2003). Computing dis- crete mixtures of continuous distributions: noncen- tral chisquare, noncentral t and the distribution of the square of the sample multiple correlation coefficient. Computational statistics & data analysis, 43, 249-267.

Bonett, D. G., & Price, R. M. (2005). Inferential method for the tetrachoric correlation coefficient. Journal of Educational and Behavioral Statistics, 30, 213-225.

Brown, M. B., & Benedetti, J. K. (1977). On the mean and variance of the tetrachoric correlation coefficient. Psy- chometrika, 42, 347-355.

Cohen, J. (1969). Statistical power analysis for the behavioural sciences. New York: Academic Press.

Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Hillsdale, New Jersey: Lawrence Erlbaum As- sociates.

Demidenko, E. (2007). Sample size determination for lo- gistic regression revisited. Statistics in Medicine, 26, 3385-3397.

Demidenko, E. (2008). Sample size and optimal design for logistic regression with binary interaction. Statistics in Medicine, 27, 36-46.

Ding, C. G. (1996). On the computation of the distribu- tion of the square of the sample multiple correlation coefficient. Computational statistics & data analysis, 22, 345-350.

Ding, C. G., & Bargmann, R. E. (1991). Algorithm as 260: Evaluation of the distribution of the square of the sample multiple correlation coefficient. Applied Statis- tics, 40, 195-198.

Dunlap, W. P., Xin, X., & Myers, L. (2004). Computing aspects of power for multiple regression. Behavior Re- search Methods, Instruments, & Computer, 36, 695-701.

Dunn, O. J., & Clark, V. A. (1969). Correlation coefficients measured on the same individuals. Journal of the Amer- ican Statistical Association, 64, 366-377.

Dupont, W. D., & Plummer, W. D. (1998). Power and sam- ple size calculations for studies involving linear re- gression. Controlled clinical trials, 19, 589-601.

Erdfelder, E., Faul, F., & Buchner, A. (1996). Gpower: A gen- eral power analysis program. Behavior Research Meth- ods, Instruments, & Computer, 28, 1-11.

Faul, F., & Erdfelder, E. (1992). Gpower 2.0. Bonn, Germany: Universität Bonn.

Frome, E. L. (1986). Multiple regression analysis: Applica- tions in the health sciences. In D. Herbert & R. Myers (Eds.), (p. 84-123). The American Institute of Physics.

Gatsonis, C., & Sampson, A. R. (1989). Multiple correlation: Exact power and sample size calculations. Psychologi- cal Bulletin, 106, 516-524.

Hays, W. (1988). Statistics (4th ed.). Orlando, FL: Holt, Rinehart and Winston.

Hettmansperger, T. P. (1984). Statistical inference based on ranks. New York: Wiley.

Hintze, J. (2006). NCSS, PASS, and GESS. Kaysville, Utah: NCSS.

Hsieh, F. Y. (1989). Sample size tables for logistic regression. Statistics in medicine, 8, 795-802.

Hsieh, F. Y., Bloch, D. A., & Larsen, M. D. (1998). A simple method of sample size calculation for linear and lo- gistic regression. Statistics in Medicine, 17, 1623-1634.

Lee, Y. (1971). Some results on the sampling distribution of the multiple correlation coefficient. Journal of the Royal Statistical Society. Series B (Methodological), 33, 117-130.

Lee, Y. (1972). Tables of the upper percentage points of the multiple correlation coefficient. Biometrika, 59, 179- 189.

Lehmann, E. (1975). Nonparameterics: Statistical methods based on ranks. New York: McGraw-Hill.

Lyles, R. H., Lin, H.-M., & Williamson, J. M. (2007). A prac- tial approach to computing power for generalized lin- ear models with nominal, count, or ordinal responses. Statistics in Medicine, 26, 1632-1648.

Mendoza, J., & Stafford, K. (2001). Confidence inter- val, power calculation, and sample size estimation for the squared multiple correlation coefficient under the fixed and random regression models: A computer program and useful standard tables. Educational & Psychological Measurement, 61, 650-667.

O’Brien, R. (1998). A tour of unifypow: A sas mod- ule/macro for sample-size analysis. Proceedings of the 23rd SAS Users Group International Conference, Cary, NC, SAS Institute, 1346-1355.

O’Brien, R. (2002). Sample size analysis in study planning (using unifypow.sas).

(available on the WWW: http://www.bio.ri.ccf.org/UnifyPow.all/UnifyPowNotes020811.pdf)

Sampson, A. R. (1974). A tale of two regressions. American Statistical Association, 69, 682-689.

Shieh, G. (2001). Sample size calculations for logistic and poisson regression models. Biometrika, 88, 1193-1199.

Shieh, G. (2005). On power and sample size calculations for Wald tests in generalized linear models. Journal of Statistical planning and inference, 128, 43-59.

Shieh, G., Jan, S.-L., & Randles, R. H. (2007). Power and sample size determinations for the wilcoxon signed- rank test. Journal of Statistical Computation and Simula- tion, 77(8), 717-724. doi: 10.1080/10629360600635245

Shieh, G., & Kung, C.-F. (2007). Methodological and compu- tational considerations for multiple correlation anal- ysis. Behavior Research Methods, Instruments, & Com- puter, 39, 731-734.

Signorini, D. F. (1991). Sample size for poisson regression. Biometrika, 78, 446-450.

Steiger, J. H. (1980). Tests for comparing elements of a correlation matrix. Psychological Bulletin, 87, 245-251.

Steiger, J. H., & Fouladi, R. T. (1992). R2: A computer pro- gram for interval estimation, power calculations, sam- ple size estimation, and hypothesis testing in multiple regression. Behavior Research Methods, Instruments, & Computer, 24, 581-582.

Whittemore, A. S. (1981). Sample size for logistic regres- sion with small response probabilities. Journal of the

84

American Statistical Association, 76, 27-32.

85

  • Introduction
  • The G*Power calculator
  • Exact: Correlation - Difference from constant (one sample case)
  • Exact: Proportion - difference from constant (one sample case)
  • Exact: Proportion - inequality, two dependent groups (McNemar)
  • Exact: Proportions - inequality of two independent groups (Fisher's exact-test)
  • Exact test: Multiple Regression - random model
  • Exact: Proportion - sign test
  • Exact: Generic binomial test
  • F test: Fixed effects ANOVA - one way
  • F test: Fixed effects ANOVA - special, main effects and interactions
  • t test: Linear Regression (size of slope, one group)
  • F test: Multiple Regression - omnibus (deviation of R2 from zero), fixed model
  • F test: Multiple Regression - special (increase of R2), fixed model
  • F test: Inequality of two Variances
  • t test: Correlation - point biserial model
  • t test: Linear Regression (two groups)
  • t test: Linear Regression (two groups)
  • t test: Means - difference between two dependent means (matched pairs)
  • t test: Means - difference from constant (one sample case)
  • t test: Means - difference between two independent means (two groups)
  • Wilcoxon signed-rank test: Means - difference from constant (one sample case)
  • Wilcoxon signed-rank test: (matched pairs)
  • Wilcoxon-Mann-Whitney test of a difference between two independent means
  • t test: Generic case
  • 2 test: Variance - difference from constant (one sample case)
  • z test: Correlation - inequality of two independent Pearson r's
  • z test: Correlation - inequality of two dependent Pearson r's
  • Z test: Multiple Logistic Regression
  • Z test: Poisson Regression
  • Z test: Tetrachoric Correlation
  • References