Week 3

mloi01
CohensdStatistic.pdf

Cohen's d Statistic

In: Encyclopedia of Research Design

By: Sbayne B. Piasta & Laura M. Justice

Edited by: Neil J. Salkind

Book Title: Encyclopedia of Research Design

Chapter Title: "Cohen's d Statistic"

Pub. Date: 2012

Access Date: May 24, 2020

Publishing Company: SAGE Publications, Inc.

City: Thousand Oaks

Print ISBN: 9781412961271

Online ISBN: 9781412961288

DOI: https://dx.doi.org/10.4135/9781412961288

Print pages: 181-185

© 2010 SAGE Publications, Inc. All Rights Reserved.

This PDF has been generated from SAGE Research Methods. Please note that the pagination of the

online version will vary from the pagination of the print book.

Cohen's d statistic is a type of effect size. An effect size is a specific numerical nonzero value used to

represent the extent to which a null hypothesis is false. As an effect size, Cohen's d is typically used to

represent the magnitude of differences between two (or more) groups on a given variable, with larger values

representing a greater differentiation between the two groups on that variable. When comparing means

in a scientific study, the reporting of an effect size such as Cohen's d is considered complementary to

the reporting of results from a test of statistical significance. Whereas the test of statistical significance is

used to suggest whether a null hypothesis is true (no difference exists between Populations A and B for a

specific phenomenon) or false (a difference exists between Populations A and B for a specific phenomenon),

the calculation of an effect size estimate is used to represent the degree of difference between the two

populations in those instances for which the null hypothesis was deemed false. In cases for which the null

hypothesis is false (i.e., rejected), the results of a test of statistical significance imply that reliable differences

exist between two populations on the phenomenon of interest, but test outcomes do not provide any value

regarding the extent of that difference. The calculation of Cohen's d and its interpretation provide a way to

estimate the actual size of observed differences between two groups, namely, whether the differences are

small, medium, or large.

Calculation of Cohen's d Statistic

Cohen's d statistic is typically used to estimate between-subjects effects for grouped data, consistent with

an analysis of variance framework. Often, it is employed within experimental contexts to estimate the

differential impact of the experimental manipulation across conditions on the dependent variable of interest.

The dependent variable must represent continuous data; other effect size measures (e.g., Pearson family of

correlation coefficients, odds ratios) are appropriate for non-continuous data.

General Formulas

Cohen's d statistic represents the standardized mean differences between groups. Similar to other means of

standardization such as z scoring, the effect size is expressed in standard score units. In general, Cohen's d

is defined as

where d represents the effect size, μ1 and μ2 represent the two population means, and σ? represents the

pooled within-group population standard deviation. In practice, these population parameters are typically

unknown and estimated by means of sample statistics:

SAGE

2010 SAGE Publications, Ltd. All Rights Reserved.

SAGE Research Methods

Page 2 of 9 Cohen's d Statistic

The population means are replaced with sample means , and the

population standard deviation is replaced with Sp, the pooled standard deviation from the sample. The pooled

standard deviation is derived by weighing the variance around each sample mean by the respective sample

size.

Calculation of the Pooled Standard Deviation

Although computation of the difference in sample means is straightforward in Equation 2, the pooled standard

deviation may be calculated in a number of ways. Consistent with the traditional definition of a standard

deviation, this statistic may be computed as

where nj represents the sample sizes for j groups and sj2 represents the variance (i.e., squared standard

deviation) of the j samples. Often, however, the pooled sample standard deviation is corrected for bias in its

estimation of the corresponding population parameter, σ?. Equation 4 denotes this correction of bias in the

sample statistic (with the resulting effect size often referred to as Hedge's g):

When simply computing the pooled standard deviation across two groups, this formula may be reexpressed

in a more common format. This formula is suitable for data analyzed with a two-way analysis of variance,

such as a treatment–control contrast:

The formula may be further reduced to the average of the sample variances when sample sizes are equal:

or

SAGE

2010 SAGE Publications, Ltd. All Rights Reserved.

SAGE Research Methods

Page 3 of 9 Cohen's d Statistic

in the case of two groups.

Other means of specifying the denominator for Equation 2 are varied. Some formulas use the average

standard deviation across groups. This procedure disregards differences in sample size in cases of unequal

n when one is weighing sample variances and may or may not correct for sample bias in estimation of the

population standard deviation. Further formulas employ the standard deviation of the control or comparison

condition (an effect size referred to as Glass's Δ). This method is particularly suited when the introduction

of treatment or other experimental manipulation leads to large changes in group variance. Finally, more

complex formulas are appropriate when calculating Cohen's d from data involving cluster randomized or

nested research designs. The complication partially arises because of the three available variance statistics

from which the pooled standard deviation may be computed: the within-cluster variance, the between-cluster

variance, or the total variance (combined between- and within-cluster variance). Researchers must select the

variance statistic appropriate for the inferences they wish to draw.

Expansion beyond Two-Group Comparisons: Contrasts and Repeated

Measures

Cohen's d always reflects the standardized difference between two means. The means, however, are not

restricted to comparisons of two independent groups. Cohen's d may also be calculated in multigroup designs

when a specific contrast is of interest. For example, the average effect across two alternative treatments may

be compared with a control. The value of the contrast becomes the numerator as specified in Equation 2, and

the pooled standard deviation is expanded to include all j groups specified in the contrast (Equation 4).

A similar extension of Equations 2 and 4 may be applied to repeated measures analyses. The difference

between two repeated measures is divided by the pooled standard deviation across the j repeated measures.

The same formula may also be applied to simple contrasts within repeated measures designs, as well as

interaction contrasts in mixed (between- and within-subjects factors) or split-plot designs. Note, however, that

the simple application of the pooled standard deviation formula does not take into account the correlation

between repeated measures. Researchers disagree as to whether these correlations ought to contribute to

effect size computation; one method of determining Cohen's d while accounting for the correlated nature of

repeated measures involves computing d from a paired t test.

Additional Means of Calculation

Beyond the formulas presented above, Cohen's d may be derived from other statistics, including the Pearson

family of correlation coefficients (r), t tests, and F tests. Derivations from r are particularly useful, allowing

for translation among various effect size indices. Derivations from other statistics are often necessary when

raw data to compute Cohen's d are unavailable, such as when conducting a meta-analysis of published data.

When d is derived as in Equation 3, the following formulas apply:

SAGE

2010 SAGE Publications, Ltd. All Rights Reserved.

SAGE Research Methods

Page 4 of 9 Cohen's d Statistic

and

Note that Equation 10 applies only for F tests with 1 degree of freedom (df) in the numerator; further formulas

apply when df > 1.

When d is derived as in Equation 4, the following formulas ought to be used:

Again, Equation 13 applies only to instances in which the numerator df = 1.

These formulas must be corrected for the correlation (r) between dependent variables in repeated measures

designs. For example, Equation 12 is corrected as follows:

Finally, conversions between effect sizes computed with Equations 3 and 4 may be easily accomplished:

and

Variance and Confidence Intervals

The estimated variance of Cohen's d depends on how the statistic was originally computed. When sample

bias in the estimation of the population pooled standard deviation remains uncorrected (Equation 3), the

SAGE

2010 SAGE Publications, Ltd. All Rights Reserved.

SAGE Research Methods

Page 5 of 9 Cohen's d Statistic

variance is computed in the following manner:

A simplified formula is employed when sample bias is corrected as in Equation 4:

Once calculated, the effect size variance may be used to compute a confidence interval (CI) for the statistic

to determine statistical significance:

The z in the formula corresponds to the z-score value on the normal distribution corresponding to the desired

probability level (e.g., 1.96 for a 95% CI). Variances and CIs may also be obtained through bootstrapping

methods.

Interpretation

Cohen's d, as a measure of effect size, describes the overlap in the distributions of the compared samples

on the dependent variable of interest. If the two distributions overlap completely, one would expect no mean

difference between them (i.e., ). To the extent that the distributions do not overlap, the

difference ought to be greater than zero (assuming ).

Cohen's d may be interpreted in terms of both statistical significance and magnitude, with the latter the more

common interpretation. Effect sizes are statistically significant when the computed CI does not contain zero.

This implies less than perfect overlap between the distributions of the two groups compared. Moreover, the

significance testing implies that this difference from zero is reliable, or not due to chance (excepting Type

I errors). While significance testing of effect sizes is often undertaken, however, interpretation based solely

on statistical significance is not recommended. Statistical significance is reliant not only on the size of the

effect but also on the size of the sample. Thus, even large effects may be deemed unreliable when insufficient

sample sizes are utilized.

Interpretation of Cohen's d based on the magnitude is more common than interpretation based on statistical

significance of the result. The magnitude of Cohen's d indicates the extent of nonoverlap between two

distributions, or the disparity of the mean difference from zero. Larger numeric values of Cohen's d indicate

larger effects or greater differences between the two means. Values may be positive or negative, although

the sign merely indicates whether the first or second mean in the numerator was of greater magnitude (see

Equation 2). Typically, researchers choose to subtract the smaller mean from the larger, resulting in a positive

effect size. As a standardized measure of effect, the numeric value of Cohen's d is interpreted in standard

SAGE

2010 SAGE Publications, Ltd. All Rights Reserved.

SAGE Research Methods

Page 6 of 9 Cohen's d Statistic

deviation units. Thus, an effect size of d = 0.5 indicates that two group means are separated by one-half

standard deviation or that one group shows a one-half standard deviation advantage over the other.

The magnitude of effect sizes is often described nominally as well as numerically. Jacob Cohen defined

effects as small (d = 0.2), medium (d = 0.5), or large (d = 0.8). These rules of thumb were derived

after surveying the behavioral sciences literature, which included studies in various disciplines involving

diverse populations, interventions or content under study, and research designs. Cohen, in proposing these

benchmarks in a 1988 text, explicitly noted that they are arbitrary and thus ought not be viewed as absolute.

However, as occurred with use of .05 as an absolute criterion for establishing statistical significance, Cohen's

benchmarks are oftentimes interpreted as absolutes, and as a result, they have been criticized in recent

years as outdated, atheoretical, and inherently nonmeaningful. These criticisms are especially prevalent in

applied fields in which medium-to-large effects prove difficult to obtain and smaller effects are often of great

importance. The small effect of d = 0.07, for instance, was sufficient for physicians to begin recommending

aspirin as an effective method of preventing heart attacks. Similar small effects are often celebrated in

intervention and educational research, in which effect sizes of d = 0.3 to d = 0.4 are the norm. In these fields,

the practical importance of reliable effects is often weighed more heavily than simple magnitude, as may be

the case when adoption of a relatively simple educational approach (e.g., discussing vs. not discussing novel

vocabulary words when reading storybooks to children) results in effect sizes of d = 0.25 (consistent with

increases of one-fourth of a standard deviation unit on a standardized measure of vocabulary knowledge).

Critics of Cohen's benchmarks assert that such practical or substantive significance is an important

consideration beyond the magnitude and statistical significance of effects. Interpretation of effect sizes

requires an understanding of the context in which the effects are derived, including the particular

manipulation, population, and dependent measure(s) under study. Various alternatives to Cohen's rules of

thumb have been proposed. These include comparisons with effects sizes based on (a) normative data

concerning the typical growth, change, or differences between groups prior to experimental manipulation; (b)

those obtained in similar studies and available in the previous literature; (c) the gain necessary to attain an a

priori criterion; and (d) cost-benefit analyses.

Cohen's d in Meta-Analyses

Cohen's d, as a measure of effect size, is often used in individual studies to report and interpret the

magnitude of between-group differences. It is also a common tool used in meta-analyses to aggregate

effects across different studies, particularly in meta-analyses involving study of between-group differences,

such as treatment studies. A metaanalysis is a statistical synthesis of results from independent research

studies (selected for inclusion based on a set of predefined commonalities), and the unit of analysis in the

meta-analysis is the data used for the independent hypothesis test, including sample means and standard

deviations, extracted from each of the independent studies. The statistical analyses used in the meta-

analysis typically involve (a) calculating the Cohen's d effect size (standardized mean difference) on data

SAGE

2010 SAGE Publications, Ltd. All Rights Reserved.

SAGE Research Methods

Page 7 of 9 Cohen's d Statistic

available within each independent study on the target variable(s) of interest and (b) combining these individual

summary values to create pooled estimates by means of any one of a variety of approaches (e.g., Rebecca

DerSimonian and Nan Laird's random effects model, which takes into account variations among studies on

certain parameters). Therefore, the methods of the meta-analysis may rely on use of Cohen's d as a way

to extract and combine data from individual studies. In such meta-analyses, the reporting of results involves

providing average d values (and CIs) as aggregated across studies.

In meta-analyses of treatment outcomes in the social and behavioral sciences, for instance, effect estimates

may compare outcomes attributable to a given treatment (Treatment X) as extracted from and pooled across

multiple studies in relation to an alternative treatment (Treatment Y) for Outcome Z using Cohen's d (e.g., d

= 0.21, CI = 0.06, 1.03). It is important to note that the meaningfulness of this result, in that Treatment X is,

on average, associated with an improvement of about one-fifth of a standard deviation unit for Outcome Z

relative to Treatment Y, must be interpreted in reference to many factors to determine the actual significance

of this outcome. Researchers must, at the least, consider whether the one-fifth of a standard deviation unit

improvement in the outcome attributable to Treatment X has any practical significance.

Sbayne B. Piasta & and Laura M. Justice

http://dx.doi.org/10.4135/9781412961288.n58

See also

• Analysis of Variance (ANOVA)

• Effect Size, Measures of

• Mean Comparisons

• Meta-Analysis

• Statistical Power Analysis for the Behavioral Sciences

Further Readings

Cohen, J.(1988). Statistical power analysis for the behavioral sciences (2nd ed.). Mahwah, NJ: Lawrence

Erlbaum.

Cooper, H., &Hedges, L. V.(1994). The handbook of research synthesis.New York: Russell Sage

Foundation.

Hedges, L. V. Effect sizes in cluster-randomized designs. (2007).32,341–370.

Hill, C. J.,Bloom, H. S.,Black, A. R., &Lipsey, M. W.(2007, July). Empirical benchmarks for interpreting

effect sizes in research.New York: MDRC.

Ray, J. W. , & Shadish, W. R. How interchangeable are different estimators of effect size?

(1996).64,1316–1325.

SAGE

2010 SAGE Publications, Ltd. All Rights Reserved.

SAGE Research Methods

Page 8 of 9 Cohen's d Statistic

Wilkinson, L. , & APA Task Force on Statistical Inference Statistical methods in psychology journals:

Guidelines and explanations. (1999).54,594–604.

SAGE

2010 SAGE Publications, Ltd. All Rights Reserved.

SAGE Research Methods

Page 9 of 9 Cohen's d Statistic

  • Cohen's d Statistic
    • In: Encyclopedia of Research Design