Encyclopedia of Research Design Cohen's d Statistic Contributors: Shayne B. Piasta & Laura M. Justice Editors: Neil J. Salkind Book Title: Encyclopedia of Research Design Chapter Title: "Cohen's d Statistic" Pub.  Publishing Company: SAGE Publications, Inc. City: Thousand Oaks Print ISBN: 9781412961271 Online ISBN: 9781412961288 DOI: http://dx.doi.org/10.4135/9781412961288.n58 Print pages: 181-186 ©2010 SAGE Publications, Inc. All Rights Reserved. This PDF has been generated from SAGE knowledge. Please note that the pagination of the online version will vary from the pagination of the print book. Northcentral University ©2010 SAGE Publications, Inc. All Rights Reserved. SAGE knowledge Page 3 of 13 Encyclopedia of Research Design: Cohen's d Statistic http://dx.doi.org/10.4135/9781412961288.n58 Cohen's d statistic is a type of effect size. An effect size is a specific numerical nonzero value used to represent the extent to which a null hypothesis is false. As an effect size, Cohen's d is typically used to represent the magnitude of differences between two (or more) groups on a given variable, with larger values representing a greater differentiation between the two groups on that variable. When comparing means in a scientific study, the reporting of an effect size such as Cohen's d is considered complementary to the reporting of results from a test of statistical significance. Whereas the test of statistical significance is used to suggest whether a null hypothesis is true (no difference exists between Populations A and B for a specific phenomenon) or false (a difference exists between [p. 181 ↓ ] Populations A and B for a specific phenomenon), the calculation of an effect size estimate is used to represent the degree of difference between the two populations in those instances for which the null hypothesis was deemed false. In cases for which the null hypothesis is false (i.e., rejected), the results of a test of statistical significance imply that reliable differences exist between two populations on the phenomenon of interest, but test outcomes do not provide any value regarding the extent of that difference. The calculation of Cohen's d and its interpretation provide a way to estimate the actual size of observed differences between two groups, namely, whether the differences are small, medium, or large. Calculation of Cohen's d Statistic Cohen's d statistic is typically used to estimate between-subjects effects for grouped data, consistent with an analysis of variance framework. Often, it is employed within experimental contexts to estimate the differential impact of the experimental manipulation across conditions on the dependent variable of interest. The dependent variable must represent continuous data; other effect size measures (e.g., Pearson family of correlation coefficients, odds ratios) are appropriate for non-continuous data. Northcentral University ©2010 SAGE Publications, Inc. All Rights Reserved. SAGE knowledge Page 4 of 13 Encyclopedia of Research Design: Cohen's d Statistic General Formulas Cohen's d statistic represents the standardized mean differences between groups. Similar to other means of standardization such as z scoring, the effect size is expressed in standard score units. In general, Cohen's d is defined as where d represents the effect size, µ 1 and µ 2 represent the two population means, and # # represents the pooled within-group population standard deviation. In practice, these population parameters are typically unknown and estimated by means of sample statistics: The population means are replaced with sample means ( , j ) and the population standard deviation is replaced with S p Northcentral University ©2010 SAGE Publications, Inc. All Rights Reserved. SAGE knowledge Page 5 of 13 Encyclopedia of Research Design: Cohen's d Statistic , the pooled standard deviation from the sample. The pooled standard deviation is derived by weighing the variance around each sample mean by the respective sample size. Calculation of the Pooled Standard Deviation Although computation of the difference in sample means is straightforward in Equation 2, the pooled standard deviation may be calculated in a number of ways. Consistent with the traditional definition of a standard deviation, this statistic may be computed as where n j represents the sample sizes for j groups and s 2 j represents the variance (i.e., squared standard deviation) of the / samples. Often, however, the pooled sample standard deviation is corrected for bias in its estimation of the corresponding population parameter, # # . Equation 4 denotes this correction of bias in the sample statistic (with the resulting effect size often referred to as Hedge's g): When simply computing the pooled standard deviation across two groups, this formula may be reexpressed in a more common format. This formula is suitable for data analyzed with a two-way analysis of variance, such as a treatment-control contrast: Northcentral University ©2010 SAGE Publications, Inc. All Rights Reserved. SAGE knowledge Page 6 of 13 Encyclopedia of Research Design: Cohen's d Statistic The formula may be further reduced to the average of the sample variances when sample sizes are equal: [p. 182 ↓ ] or in the case of two groups. Other means of specifying the denominator for Equation 2 are varied. Some formulas use the average standard deviation across groups. This procedure disregards differences in sample size in cases of unequal n when one is weighing sample variances and may or may not correct for sample bias in estimation of the population standard deviation. Further formulas employ the standard deviation of the control or comparison condition (an effect size referred to as Glass's ∆). This method is particularly suited when the introduction of treatment or other experimental manipulation leads to large changes in group variance. Finally, more complex formulas are appropriate when calculating Cohen's d from data involving cluster randomized or nested research designs. The complication partially arises because of the three available variance statistics from which the pooled standard deviation may be computed: the within-cluster variance, the between-cluster variance, or the total variance (combined between- and within-cluster variance). Researchers must select the variance statistic appropriate for the inferences they wish to draw. Northcentral University ©2010 SAGE Publications, Inc. All Rights Reserved. SAGE knowledge Page 7 of 13 Encyclopedia of Research Design: Cohen's d Statistic Expansion beyond Two-Group Comparisons: Contrasts and Repeated Measures Cohen's d always reflects the standardized difference between two means. The means, however, are not restricted to comparisons of two independent groups. Cohen's d may also be calculated in multigroup designs when a specific contrast is of interest. For example, the average effect across two alternative treatments may be compared with a control. The value of the contrast becomes the numerator as specified in Equation 2, and the pooled standard deviation is expanded to include all j groups specified in the contrast (Equation 4). A similar extension of Equations 2 and 4 may be applied to repeated measures analyses. The difference between two repeated measures is divided by the pooled standard deviation across the j repeated measures. The same formula may also be applied to simple contrasts within repeated measures designs, as well as interaction contrasts in mixed (between- and within-subjects factors) or split-plot designs. Note, however, that the simple application of the pooled standard deviation formula does not take into account the correlation between repeated measures. Researchers disagree as to whether these correlations ought to contribute to effect size computation; one method of determining Cohen's d while accounting for the correlated nature of repeated measures involves computing d from a paired t test. Additional Means of Calculation Beyond the formulas presented above, Cohen's d may be derived from other statistics, including the Pearson family of correlation coefficients (r), t tests, and F tests. Derivations from r are particularly useful, allowing for translation among various effect size indices. Derivations from other statistics are often necessary when raw data to compute Cohen's d are unavailable, such as when conducting a meta-analysis of published data. When d is derived as in Equation 3, the following formulas apply: Northcentral University ©2010 SAGE Publications, Inc. All Rights Reserved. SAGE knowledge Page 8 of 13 Encyclopedia of Research Design: Cohen's d Statistic and Note that Equation 10 applies only for F tests with 1 degree of freedom (df) in the numerator; further formulas apply when df> 1. When d is derived as in Equation 4, the following formulas ought to be used: [p. 183 ↓ ] Again, Equation 13 applies only to instances in which the numerator df= 1. These formulas must be corrected for the correlation (r) between dependent variables in repeated measures designs. For example, Equation 12 is corrected as follows: Finally, conversions between effect sizes computed with Equations 3 and 4 may be easily accomplished: Northcentral University ©2010 SAGE Publications, Inc. All Rights Reserved. SAGE knowledge Page 9 of 13 Encyclopedia of Research Design: Cohen's d Statistic and Variance and Confidence Intervals The estimated variance of Cohen's d depends on how the statistic was originally computed. When sample bias in the estimation of the population pooled standard deviation remains uncorrected (Equation 3), the variance is computed in the following manner: A simplified formula is employed when sample bias is corrected as in Equation 4: Once calculated, the effect size variance may be used to compute a confidence interval (CI) for the statistic to determine statistical significance: The z in the formula corresponds to the z-score value on the normal distribution corresponding to the desired probability level (e.g., 1.96 for a 95% CI). Variances and CIs may also be obtained through bootstrapping methods. Interpretation Cohen's d, as a measure of effect size, describes the overlap in the distributions of the compared samples on the dependent variable of interest. If the two distributions overlap completely, one would expect no mean difference between them Northcentral University ©2010 SAGE Publications, Inc. All Rights Reserved. SAGE knowledge Page 10 of 13 Encyclopedia of Research Design: Cohen's d Statistic . To the extent that the distributions do not overlap, the difference ought to be greater than zero (assuming ). Cohen's d may be interpreted in terms of both statistical significance and magnitude, with the latter the more common interpretation. Effect sizes are statistically significant when the computed CI does not contain zero. This implies less than perfect overlap between the distributions of the two groups compared. Moreover, the significance testing implies that this difference from zero is reliable, or not due to chance (excepting Type I errors). While significance testing of effect sizes is often undertaken, however, interpretation based solely on statistical significance is not recommended. Statistical significance is reliant not only on the size of the effect but also on the size of the sample. Thus, even large effects may be deemed unreliable when insufficient sample sizes are utilized. Interpretation of Cohen's d based on the magnitude is more common than interpretation based on statistical significance of the result. The magnitude of Cohen's d indicates the extent of nonoverlap between two distributions, or the disparity of the mean difference from zero. Larger numeric values of Cohen's d indicate larger effects or greater differences between the two means. Values may be positive or negative, although the sign merely indicates whether the first or second mean in the numerator was of greater magnitude (see Equation 2). Typically, researchers choose to subtract the smaller mean from the larger, resulting in a positive [p. 184 ↓ ] effect size. As a standardized measure of effect, the numeric value of Cohen's d is interpreted in standard deviation units. Thus, an effect size of d =0.5 indicates that two group means are separated by one-half standard deviation or that one group shows a one-half standard deviation advantage over the other. The magnitude of effect sizes is often described nominally as well as numerically. Jacob Cohen defined effects as small (d=0.2), medium (d= 0.5), or large (d=0.8). These rules of thumb were derived after surveying the behavioral sciences literature, which included studies in various disciplines involving diverse populations, interventions or content under study, and research designs. Cohen, in proposing these benchmarks in a 1988 Northcentral University ©2010 SAGE Publications, Inc. All Rights Reserved. SAGE knowledge Page 11 of 13 Encyclopedia of Research Design: Cohen's d Statistic text, explicitly noted that they are arbitrary and thus ought not be viewed as absolute. However, as occurred with use of .05 as an absolute criterion for establishing statistical significance, Cohen's benchmarks are oftentimes interpreted as absolutes, and as a result, they have been criticized in recent years as outdated, atheoretical, and inherently nonmeaningful. These criticisms are especially prevalent in applied fields in which medium-to-large effects prove difficult to obtain and smaller effects are often of great importance. The small effect of d=0.07, for instance, was sufficient for physicians to begin recommending aspirin as an effective method of preventing heart attacks. Similar small effects are often celebrated in intervention and educational research, in which effect sizes of d= 0.3 to d= 0.4 are the norm. In these fields, the practical importance of reliable effects is often weighed more heavily than simple magnitude, as may be the case when adoption of a relatively simple educational approach (e.g., discussing vs. not discussing novel vocabulary words when reading storybooks to children) results in effect sizes of d= 0.25 (consistent with increases of one-fourth of a standard deviation unit on a standardized measure of vocabulary knowledge). Critics of Cohen's benchmarks assert that such practical or substantive significance is an important consideration beyond the magnitude and statistical significance of effects. Interpretation of effect sizes requires an understanding of the context in which the effects are derived, including the particular manipulation, population, and dependent measure(s) under study. Various alternatives to Cohen's rules of thumb have been proposed. These include comparisons with effects sizes based on (a) normative data concerning the typical growth, change, or differences between groups prior to experimental manipulation; (b) those obtained in similar studies and available in the previous literature; (c) the gain necessary to attain an a priori criterion; and (d) cost– benefit analyses. Cohen's d in Meta-Analyses Cohen's d, as a measure of effect size, is often used in individual studies to report and interpret the magnitude of between-group differences. It is also a common tool used in meta-analyses to aggregate effects across different studies, particularly in meta-analyses involving study of between-group differences, such as treatment studies. A meta-analysis is a statistical synthesis of results from independent research Northcentral University ©2010 SAGE Publications, Inc. All Rights Reserved. SAGE knowledge Page 12 of 13 Encyclopedia of Research Design: Cohen's d Statistic studies (selected for inclusion based on a set of predefined commonalities), and the unit of analysis in the meta-analysis is the data used for the independent hypothesis test, including sample means and standard deviations, extracted from each of the independent studies. The statistical analyses used in the meta-analysis typically involve (a) calculating the Cohen's d effect size (standardized mean difference) on data available within each independent study on the target variable(s) of interest and (b) combining these individual summary values to create pooled estimates by means of any one of a variety of approaches (e.g., Rebecca DerSimonian and Nan Laird's random effects model, which takes into account variations among studies on certain parameters). Therefore, the methods of the meta-analysis may rely on use of Cohen's d as a way to extract and combine data from individual studies. In such meta-analyses, the reporting of results involves providing average d values (and CIs) as aggregated across studies. In meta-analyses of treatment outcomes in the social and behavioral sciences, for instance, effect estimates may compare outcomes attributable to a given treatment (Treatment X) as extracted from and pooled across multiple studies in relation to an alternative treatment (Treatment Y) for Outcome Z using Cohen's d (e.g., d =0.21, CI = 0.06, 1.03). It is important to note that the meaningful-ness of this result, in that Treatment X is, on average, associated with an improvement of about [p. 185 ↓ ] one-fifth of a standard deviation unit for Outcome Z relative to Treatment Y, must be interpreted in reference to many factors to determine the actual significance of this outcome. Researchers must, at the least, consider whether the one-fifth of a standard deviation unit improvement in the outcome attributable to Treatment X has any practical significance. Shayne B.Piasta, and Laura M.Justice http://dx.doi.org/10.4135/9781412961288.n58 See also • Analysis of Variance (ANOVA) • Effect Size, Measures of • Mean Comparisons • Meta-Analysis Northcentral University ©2010 SAGE Publications, Inc. All Rights Reserved. SAGE knowledge Page 13 of 13 Encyclopedia of Research Design: Cohen's d Statistic • Statistical Power Analysis for the Behavioral Sciences Further Readings Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Mahwah, NJ: Lawrence Erlbaum. Cooper, H., & Hedges, L. V. (1994). The handbook of research synthesis . New York: Russell Sage Foundation. Hedges, L. V. Effect sizes in cluster-randomized designs . Journal of Educational & Behavioral Statistics 32 (2007). 341–370. http://dx.doi.org/10.3102/1076998606298043 Hill, C. J., Bloom, H. S., Black, A. R., & Lipsey, M. W. (2007, July). Empirical benchmarks for interpreting effect sizes in research . New York: MDRC. Ray, J. W., and Shadish, W. R. How interchangeable are different estimators of effect size . Journal of Consulting & Clinical Psychology 64 (1996). 1316–1325. http:// dx.doi.org/10.1037/0022-006X.64.6.1316 Wilkinson, L. APA Task Force on Statistical Inference . Statistical methods in psychology journals: Guidelines and explanations . American Psychologist 54 (1999). 594–604. http://dx.doi.org/10.1037/0003-066X.54.8.594

    • 10 years ago
    Calculate Project A+ Tutorial use as Guide
    NOT RATED

    Purchase the answer to view it

    • calculate_project.docx