Week 3
Cohen's d Statistic
In: Encyclopedia of Research Design
By: Sbayne B. Piasta & Laura M. Justice
Edited by: Neil J. Salkind
Book Title: Encyclopedia of Research Design
Chapter Title: "Cohen's d Statistic"
Pub. Date: 2012
Access Date: May 24, 2020
Publishing Company: SAGE Publications, Inc.
City: Thousand Oaks
Print ISBN: 9781412961271
Online ISBN: 9781412961288
DOI: https://dx.doi.org/10.4135/9781412961288
Print pages: 181-185
© 2010 SAGE Publications, Inc. All Rights Reserved.
This PDF has been generated from SAGE Research Methods. Please note that the pagination of the
online version will vary from the pagination of the print book.
Cohen's d statistic is a type of effect size. An effect size is a specific numerical nonzero value used to
represent the extent to which a null hypothesis is false. As an effect size, Cohen's d is typically used to
represent the magnitude of differences between two (or more) groups on a given variable, with larger values
representing a greater differentiation between the two groups on that variable. When comparing means
in a scientific study, the reporting of an effect size such as Cohen's d is considered complementary to
the reporting of results from a test of statistical significance. Whereas the test of statistical significance is
used to suggest whether a null hypothesis is true (no difference exists between Populations A and B for a
specific phenomenon) or false (a difference exists between Populations A and B for a specific phenomenon),
the calculation of an effect size estimate is used to represent the degree of difference between the two
populations in those instances for which the null hypothesis was deemed false. In cases for which the null
hypothesis is false (i.e., rejected), the results of a test of statistical significance imply that reliable differences
exist between two populations on the phenomenon of interest, but test outcomes do not provide any value
regarding the extent of that difference. The calculation of Cohen's d and its interpretation provide a way to
estimate the actual size of observed differences between two groups, namely, whether the differences are
small, medium, or large.
Calculation of Cohen's d Statistic
Cohen's d statistic is typically used to estimate between-subjects effects for grouped data, consistent with
an analysis of variance framework. Often, it is employed within experimental contexts to estimate the
differential impact of the experimental manipulation across conditions on the dependent variable of interest.
The dependent variable must represent continuous data; other effect size measures (e.g., Pearson family of
correlation coefficients, odds ratios) are appropriate for non-continuous data.
General Formulas
Cohen's d statistic represents the standardized mean differences between groups. Similar to other means of
standardization such as z scoring, the effect size is expressed in standard score units. In general, Cohen's d
is defined as
where d represents the effect size, μ1 and μ2 represent the two population means, and σ? represents the
pooled within-group population standard deviation. In practice, these population parameters are typically
unknown and estimated by means of sample statistics:
SAGE
2010 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods
Page 2 of 9 Cohen's d Statistic
The population means are replaced with sample means , and the
population standard deviation is replaced with Sp, the pooled standard deviation from the sample. The pooled
standard deviation is derived by weighing the variance around each sample mean by the respective sample
size.
Calculation of the Pooled Standard Deviation
Although computation of the difference in sample means is straightforward in Equation 2, the pooled standard
deviation may be calculated in a number of ways. Consistent with the traditional definition of a standard
deviation, this statistic may be computed as
where nj represents the sample sizes for j groups and sj2 represents the variance (i.e., squared standard
deviation) of the j samples. Often, however, the pooled sample standard deviation is corrected for bias in its
estimation of the corresponding population parameter, σ?. Equation 4 denotes this correction of bias in the
sample statistic (with the resulting effect size often referred to as Hedge's g):
When simply computing the pooled standard deviation across two groups, this formula may be reexpressed
in a more common format. This formula is suitable for data analyzed with a two-way analysis of variance,
such as a treatment–control contrast:
The formula may be further reduced to the average of the sample variances when sample sizes are equal:
or
SAGE
2010 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods
Page 3 of 9 Cohen's d Statistic
in the case of two groups.
Other means of specifying the denominator for Equation 2 are varied. Some formulas use the average
standard deviation across groups. This procedure disregards differences in sample size in cases of unequal
n when one is weighing sample variances and may or may not correct for sample bias in estimation of the
population standard deviation. Further formulas employ the standard deviation of the control or comparison
condition (an effect size referred to as Glass's Δ). This method is particularly suited when the introduction
of treatment or other experimental manipulation leads to large changes in group variance. Finally, more
complex formulas are appropriate when calculating Cohen's d from data involving cluster randomized or
nested research designs. The complication partially arises because of the three available variance statistics
from which the pooled standard deviation may be computed: the within-cluster variance, the between-cluster
variance, or the total variance (combined between- and within-cluster variance). Researchers must select the
variance statistic appropriate for the inferences they wish to draw.
Expansion beyond Two-Group Comparisons: Contrasts and Repeated
Measures
Cohen's d always reflects the standardized difference between two means. The means, however, are not
restricted to comparisons of two independent groups. Cohen's d may also be calculated in multigroup designs
when a specific contrast is of interest. For example, the average effect across two alternative treatments may
be compared with a control. The value of the contrast becomes the numerator as specified in Equation 2, and
the pooled standard deviation is expanded to include all j groups specified in the contrast (Equation 4).
A similar extension of Equations 2 and 4 may be applied to repeated measures analyses. The difference
between two repeated measures is divided by the pooled standard deviation across the j repeated measures.
The same formula may also be applied to simple contrasts within repeated measures designs, as well as
interaction contrasts in mixed (between- and within-subjects factors) or split-plot designs. Note, however, that
the simple application of the pooled standard deviation formula does not take into account the correlation
between repeated measures. Researchers disagree as to whether these correlations ought to contribute to
effect size computation; one method of determining Cohen's d while accounting for the correlated nature of
repeated measures involves computing d from a paired t test.
Additional Means of Calculation
Beyond the formulas presented above, Cohen's d may be derived from other statistics, including the Pearson
family of correlation coefficients (r), t tests, and F tests. Derivations from r are particularly useful, allowing
for translation among various effect size indices. Derivations from other statistics are often necessary when
raw data to compute Cohen's d are unavailable, such as when conducting a meta-analysis of published data.
When d is derived as in Equation 3, the following formulas apply:
SAGE
2010 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods
Page 4 of 9 Cohen's d Statistic
and
Note that Equation 10 applies only for F tests with 1 degree of freedom (df) in the numerator; further formulas
apply when df > 1.
When d is derived as in Equation 4, the following formulas ought to be used:
Again, Equation 13 applies only to instances in which the numerator df = 1.
These formulas must be corrected for the correlation (r) between dependent variables in repeated measures
designs. For example, Equation 12 is corrected as follows:
Finally, conversions between effect sizes computed with Equations 3 and 4 may be easily accomplished:
and
Variance and Confidence Intervals
The estimated variance of Cohen's d depends on how the statistic was originally computed. When sample
bias in the estimation of the population pooled standard deviation remains uncorrected (Equation 3), the
SAGE
2010 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods
Page 5 of 9 Cohen's d Statistic
variance is computed in the following manner:
A simplified formula is employed when sample bias is corrected as in Equation 4:
Once calculated, the effect size variance may be used to compute a confidence interval (CI) for the statistic
to determine statistical significance:
The z in the formula corresponds to the z-score value on the normal distribution corresponding to the desired
probability level (e.g., 1.96 for a 95% CI). Variances and CIs may also be obtained through bootstrapping
methods.
Interpretation
Cohen's d, as a measure of effect size, describes the overlap in the distributions of the compared samples
on the dependent variable of interest. If the two distributions overlap completely, one would expect no mean
difference between them (i.e., ). To the extent that the distributions do not overlap, the
difference ought to be greater than zero (assuming ).
Cohen's d may be interpreted in terms of both statistical significance and magnitude, with the latter the more
common interpretation. Effect sizes are statistically significant when the computed CI does not contain zero.
This implies less than perfect overlap between the distributions of the two groups compared. Moreover, the
significance testing implies that this difference from zero is reliable, or not due to chance (excepting Type
I errors). While significance testing of effect sizes is often undertaken, however, interpretation based solely
on statistical significance is not recommended. Statistical significance is reliant not only on the size of the
effect but also on the size of the sample. Thus, even large effects may be deemed unreliable when insufficient
sample sizes are utilized.
Interpretation of Cohen's d based on the magnitude is more common than interpretation based on statistical
significance of the result. The magnitude of Cohen's d indicates the extent of nonoverlap between two
distributions, or the disparity of the mean difference from zero. Larger numeric values of Cohen's d indicate
larger effects or greater differences between the two means. Values may be positive or negative, although
the sign merely indicates whether the first or second mean in the numerator was of greater magnitude (see
Equation 2). Typically, researchers choose to subtract the smaller mean from the larger, resulting in a positive
effect size. As a standardized measure of effect, the numeric value of Cohen's d is interpreted in standard
SAGE
2010 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods
Page 6 of 9 Cohen's d Statistic
deviation units. Thus, an effect size of d = 0.5 indicates that two group means are separated by one-half
standard deviation or that one group shows a one-half standard deviation advantage over the other.
The magnitude of effect sizes is often described nominally as well as numerically. Jacob Cohen defined
effects as small (d = 0.2), medium (d = 0.5), or large (d = 0.8). These rules of thumb were derived
after surveying the behavioral sciences literature, which included studies in various disciplines involving
diverse populations, interventions or content under study, and research designs. Cohen, in proposing these
benchmarks in a 1988 text, explicitly noted that they are arbitrary and thus ought not be viewed as absolute.
However, as occurred with use of .05 as an absolute criterion for establishing statistical significance, Cohen's
benchmarks are oftentimes interpreted as absolutes, and as a result, they have been criticized in recent
years as outdated, atheoretical, and inherently nonmeaningful. These criticisms are especially prevalent in
applied fields in which medium-to-large effects prove difficult to obtain and smaller effects are often of great
importance. The small effect of d = 0.07, for instance, was sufficient for physicians to begin recommending
aspirin as an effective method of preventing heart attacks. Similar small effects are often celebrated in
intervention and educational research, in which effect sizes of d = 0.3 to d = 0.4 are the norm. In these fields,
the practical importance of reliable effects is often weighed more heavily than simple magnitude, as may be
the case when adoption of a relatively simple educational approach (e.g., discussing vs. not discussing novel
vocabulary words when reading storybooks to children) results in effect sizes of d = 0.25 (consistent with
increases of one-fourth of a standard deviation unit on a standardized measure of vocabulary knowledge).
Critics of Cohen's benchmarks assert that such practical or substantive significance is an important
consideration beyond the magnitude and statistical significance of effects. Interpretation of effect sizes
requires an understanding of the context in which the effects are derived, including the particular
manipulation, population, and dependent measure(s) under study. Various alternatives to Cohen's rules of
thumb have been proposed. These include comparisons with effects sizes based on (a) normative data
concerning the typical growth, change, or differences between groups prior to experimental manipulation; (b)
those obtained in similar studies and available in the previous literature; (c) the gain necessary to attain an a
priori criterion; and (d) cost-benefit analyses.
Cohen's d in Meta-Analyses
Cohen's d, as a measure of effect size, is often used in individual studies to report and interpret the
magnitude of between-group differences. It is also a common tool used in meta-analyses to aggregate
effects across different studies, particularly in meta-analyses involving study of between-group differences,
such as treatment studies. A metaanalysis is a statistical synthesis of results from independent research
studies (selected for inclusion based on a set of predefined commonalities), and the unit of analysis in the
meta-analysis is the data used for the independent hypothesis test, including sample means and standard
deviations, extracted from each of the independent studies. The statistical analyses used in the meta-
analysis typically involve (a) calculating the Cohen's d effect size (standardized mean difference) on data
SAGE
2010 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods
Page 7 of 9 Cohen's d Statistic
available within each independent study on the target variable(s) of interest and (b) combining these individual
summary values to create pooled estimates by means of any one of a variety of approaches (e.g., Rebecca
DerSimonian and Nan Laird's random effects model, which takes into account variations among studies on
certain parameters). Therefore, the methods of the meta-analysis may rely on use of Cohen's d as a way
to extract and combine data from individual studies. In such meta-analyses, the reporting of results involves
providing average d values (and CIs) as aggregated across studies.
In meta-analyses of treatment outcomes in the social and behavioral sciences, for instance, effect estimates
may compare outcomes attributable to a given treatment (Treatment X) as extracted from and pooled across
multiple studies in relation to an alternative treatment (Treatment Y) for Outcome Z using Cohen's d (e.g., d
= 0.21, CI = 0.06, 1.03). It is important to note that the meaningfulness of this result, in that Treatment X is,
on average, associated with an improvement of about one-fifth of a standard deviation unit for Outcome Z
relative to Treatment Y, must be interpreted in reference to many factors to determine the actual significance
of this outcome. Researchers must, at the least, consider whether the one-fifth of a standard deviation unit
improvement in the outcome attributable to Treatment X has any practical significance.
Sbayne B. Piasta & and Laura M. Justice
http://dx.doi.org/10.4135/9781412961288.n58
See also
• Analysis of Variance (ANOVA)
• Effect Size, Measures of
• Mean Comparisons
• Meta-Analysis
• Statistical Power Analysis for the Behavioral Sciences
Further Readings
Cohen, J.(1988). Statistical power analysis for the behavioral sciences (2nd ed.). Mahwah, NJ: Lawrence
Erlbaum.
Cooper, H., &Hedges, L. V.(1994). The handbook of research synthesis.New York: Russell Sage
Foundation.
Hedges, L. V. Effect sizes in cluster-randomized designs. (2007).32,341–370.
Hill, C. J.,Bloom, H. S.,Black, A. R., &Lipsey, M. W.(2007, July). Empirical benchmarks for interpreting
effect sizes in research.New York: MDRC.
Ray, J. W. , & Shadish, W. R. How interchangeable are different estimators of effect size?
(1996).64,1316–1325.
SAGE
2010 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods
Page 8 of 9 Cohen's d Statistic
Wilkinson, L. , & APA Task Force on Statistical Inference Statistical methods in psychology journals:
Guidelines and explanations. (1999).54,594–604.
SAGE
2010 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods
Page 9 of 9 Cohen's d Statistic
- Cohen's d Statistic
- In: Encyclopedia of Research Design