Reading summary #2 short assignment
American Academy of Political and Social Science
Meta-Analytic Methods for Criminology Author(s): David B. Wilson Source: The Annals of the American Academy of Political and Social Science, Vol. 578, What Works in Preventing Crime? Systematic Reviews of Experimental and Quasi- Experimental Research (Nov., 2001), pp. 71-89 Published by: Sage Publications, Inc. in association with the American Academy of Political and Social Science Stable URL: http://www.jstor.org/stable/1049868 Accessed: 03-07-2017 21:16 UTC
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted
digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about
JSTOR, please contact [email protected].
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
http://about.jstor.org/terms
Sage Publications, Inc., American Academy of Political and Social Science are collaborating with JSTOR to digitize, preserve and extend access to The Annals of the American Academy of Political and Social Science
This content downloaded from 155.33.16.124 on Mon, 03 Jul 2017 21:16:17 UTC All use subject to http://about.jstor.org/terms
ANNALS, AAPSS, 578, November 2001
Meta-Analytic Methods for Criminology
By DAVID B. WILSON
ABSTRACT: Meta-analysis was designed to synthesize empirical re- lationships across studies, such as the effects of a specific crime pre- vention intervention on criminal offending behavior. Meta-analysis focuses on the size and direction of effects across studies, examining the consistency of effects and the relationship between study features and observed effects. The findings from meta-analysis not only reveal robust empirical relationships but also identify existing weaknesses in the knowledge base. Furthermore, meta-analytic results can easily be translated into summary statistics useful for informing public pol- icy regarding effective crime prevention efforts.
David B. Wilson is an assistant professor of the administration ofjustice at George Mason University. His research interests include program evaluation research method- ology, meta-analysis, crime and general problem behaviorprevention programs, and ju- venile delinquency intervention effectiveness.
NOTE: This work was supported by the Jerry Lee Foundation.
71
This content downloaded from 155.33.16.124 on Mon, 03 Jul 2017 21:16:17 UTC All use subject to http://about.jstor.org/terms
THE ANNALS OF THE AMERICAN ACADEMY
I MAGINE you are given the task of
synthesizing what is currently known about the effectiveness of cor-
rectional boot camps for reducing future criminal behavior among ju- venile and adult offenders. An ex-
haustive search for all relevant eval-
uations of boot camp programs compared with more traditional forms of punishment and rehabilita- tion identifies 29 unique studies. The findings from these studies range from large positive to large negative statistically significant effects. To complicate matters, the studies vary in the evaluation methods used, in- cluding the definition of recidivism (for example, rearrest, reconviction, and reinstitutionalization), offender populations, and program character- istics. How will you meaningfully make sense of this array of informa- tion?
The statistical methods of meta-
analysis were designed specifically to address this situation. Meta-anal-
ysis represents a statistical and sys- tematic approach to reviewing research findings across multiple independent studies. As such, meta- analyses are systematic reviews (Petrosino et al. 2001 [this issue]). However, not all criminological inter- vention research literatures can be
successfully meta-analyzed, and thus not all systematic reviews will use the statistical methods of meta-
analysis. The basic idea behind meta-analy-
sis dates back almost 100 years and is simple. Karl Pearson, the devel- oper of the Pearson product-moment correlation coefficient, synthesized the findings from multiple studies of the effectiveness of inoculation for
typhoid fever (Pearson 1904). His method involved computing the cor- relation between inoculation and
mortality within each study and then averaging the correlations across studies, producing a composite corre- lation. By today's standards, this was a meta-analysis, although the term was not introduced until the 1970s
(Glass 1976).
The logical framework of meta- analysis is based on the assumption that the averaging of findings across studies will produce a more valid estimate of the effect of interest
than that of any individual study. Typically, the finding from any indi- vidual study is imprecise due to sam- pling error. Thus some studies of a specific phenomenon, such as the effectiveness of correctional boot
camps, will overestimate and others will underestimate the size of the
true effect. Instability in observed effects due to sampling error is an assumption at the core of statistical inference testing, such as a t test between an intervention and com-
parison condition. Averaging across studies is analogous to averaging across individuals within a single study or averaging across multiple test items.
For a collection of pure replica- tions, the logic behind meta-analysis is indisputable if one accepts the logic and assumptions of the stan- dard statistical practices of the social and medical sciences. Meta-analysis as it is applied in criminology and the other social sciences extends this
logic to collections of studies that are conceptual replications, that is, stud- ies that examine the same relation-
ship of interest but differ from one
72
This content downloaded from 155.33.16.124 on Mon, 03 Jul 2017 21:16:17 UTC All use subject to http://about.jstor.org/terms
META-ANALYTIC METHODS FOR CRIMINOLOGY
another in other respects, such as the research design or elements of the intervention.
Conceptual replications are assumed to be estimating the same fundamen- tal relationship, despite differences in methodology and other substan- tive features. This variability in study features can be viewed as a strength, however, because a synthe- sis of conceptual replications can show that a relationship is observed across a range of methodological and substantive variability. Unlike sam- pling error, however, errors in esti- mates of the relationship of interest that arise from poor study design will not necessarily cancel out as a result of aggregation. Therefore the meta- analyst must carefully assess the influence of methodological variation on observed effects (Wilson and Lipsey, in press).
WHY META-ANALYSIS?
Meta-analysis is not the only method of synthesizing or reviewing results across studies. Other ap- proaches include the narrative and vote-count review. The narrative
review relies on a researcher's ability to digest the array of findings across studies and arrive at a pronounce- ment regarding the evidence for or against a hypothesis using some unknown and unknowable (that is, subjective) mental calculus.
The vote-count method imposes discipline on this process by tallying the number of studies with statisti-
cally significant findings in favor of the hypothesis and the number con- trary to the hypothesis (null find- ings). This approach is appealing, for
it is objective and systematic, yet simple. Furthermore it upholds the long-standing tradition in the social sciences of allowing the statistical significance test to be the arbiter of the validity of a scientific hypothesis.
The intuitive appeal of the vote count obscures its weaknesses. First, the vote count fails to account for the
differential precision of the studies being reviewed. Larger studies, all else being equal, provide more pre- cise estimates of the relationship of interest and thus should be given greater weight in a review.
Second, the vote count fails to rec- ognize the fundamental asymmetry of the statistical significance test. A statistically significant finding is a strong conclusion, whereas a statisti- cally nonsignificant (null) finding is a weak conclusion. In the vote-count
review, null findings are typically interpreted as evidence that the rela- tionship of interest does not exist (for example, the intervention is not effective). This is an incorrect inter-
pretation. Failure to reject a null hypothesis is not support for the null, merely suspended judgment. Enough null findings in the same direction are evidence that the null is
false. This possibility was recognized by Fisher (1944), a strong proponent of significance testing.
Third, the vote count ignores the size of the observed effects. By focus- ing on statistical significance, and not the size and direction of the
effect, a study with a small but statis- tically significant effect would be viewed as evidence favoring the hy- pothesis, and a study with a large nonsignificant effect would be viewed as evidence against the
73
This content downloaded from 155.33.16.124 on Mon, 03 Jul 2017 21:16:17 UTC All use subject to http://about.jstor.org/terms
THE ANNALS OF THE AMERICAN ACADEMY
hypothesis. Both studies provide evi- dence that the relationship is non- zero, although the strength of that evidence is weak in one of the studies.
The benefits of a null hypothesis sta- tistical significance test for inter- preting a finding from an individual study do not translate into benefits when evaluating a collection of related studies.
Furthermore a counterintuitive
feature of the vote-count method is
that the likelihood of arriving at an incorrect conclusion increases as the
number of studies on a topic increases, if the typical statistical power of the studies in that area is low. This is a common situation in
criminology. For example, Lipsey and colleagues (1985) estimated that the typical power of evaluations of juve- nile delinquency interventions was less than .50. A vote-count review of
that literature is sure to yield mis- leading conclusions.
Meta-analysis avoids the pitfalls of the vote-count method by focusing on the size and direction of effects
across studies, not whether the indi- vidual effects were statistically sig- nificant. The latter largely depends on the sample size of the study. Fur- thermore focusing on the size and direction of the effect makes better
use of the data available in the pri- mary studies, providing a mecha- nism for analyzing differences across studies and drawing inferences about the likely size of the true popu- lation effect of interest. The statisti-
cal methods of meta-analysis allow for an assessment of both the consis-
tency of findings across studies and the relationship of study features with variability in effects.
As a method, meta-analysis includes all of the essential features
of a systematic review (see Petrosino et al. 2001), including an exhaustive search for all relevant studies (pub- lished or not), explicit inclusion and exclusion criteria, and a coding pro- tocol for extracting data from the studies. The distinctive feature of
meta-analysis is the application of statistical techniques to the analysis of the study findings, where study findings are encoded on a common metric. The section below presents an overview of the analytic methods of meta-analysis. Several articles in this issue (MacKenzie, Wilson, and Kider 2001 [this issue]; Lipsey, Chap- man, and Landenberger 2001 [this issue]) provide examples of meta- analytic methods. This article con- cludes with a discussion of the
strengths and weaknesses of meta- analysis and guidance on when not to use meta-analysis.
A FRAMEWORK FOR
META-ANALYSIS
A defining feature of meta-analy- sis is the effect size, that is, any index of the effect of interest that is compa- rable across studies. The effect size
might index the effects of a treat- ment group relative to a comparison group or the relationship between two observed variables, such as gen- der and mathematical achievement
or attachment to parents and delin- quent behavior. In the analysis of meta-analytic data, the effect size is the dependent variable.
The need for an effect size places restrictions on what research can be
meta-analyzed. The collection of
74
This content downloaded from 155.33.16.124 on Mon, 03 Jul 2017 21:16:17 UTC All use subject to http://about.jstor.org/terms
META-ANALYTIC METHODS FOR CRIMINOLOGY
studies of interest to the reviewer must examine the same basic rela-
tionship, even if at a broad level of abstraction. At the broad end of the
continuum would be a group of stud- ies examining the effects of school- based prevention programs on delin- quent behavior. At the narrow end of
the continuum would be a set of repli- cations of a study on the effects of the drug DepoProvea on the perpetra- tion of sexual offenses. The research
designs of a collection of studies would all need to be sufficiently simi- lar such that a comparable effect size could be computed from each. Thus most meta-analyses of intervention studies will stipulate that eligible studies use a comparison group design.
The specific effect size index used in a given meta-analysis will depend on the nature of the research being synthesized. Commonly used effect size indices for intervention research
are the standardized mean differ-
ence, odds ratio, and correlation coef- ficient. The standardized mean dif-
ference-type effect size is well suited to two group comparison studies (for example, a treatment versus a com- parison condition) with continuous or dichotomous dependent measures. The odds ratio is well suited to these
same research domains with the
exception that the dependent mea- sures must be dichotomous, such as whether the participants recidivated within 12 months of leaving the pro- gram. The correlation coefficient can be applied to the broadest range of research designs, including all designs for which standardized mean difference and odds ratio effect sizes
can be computed. Because of this, it
has been argued that the correlation coefficient is the ideal effect size
(Rosenthal 1991). However, the stan- dardized mean difference and odds ratio effect sizes have distinct statis-
tical advantages over the correlation coefficient for intervention research
and are more natural indices of pro- gram effects.
Standardized
mean difference
The standardized mean differ-
ence, d, represents the effect of an intervention as the difference between the intervention and com-
parison group means on the depend- ent variable of interest, standardized by the pooled within-groups stan- dard deviation. Thus findings based on different operationalizations of the dependent variable of interest (for example, delinquency) are stan- dardized to a common metric: stan-
dard deviation units for the popula- tion. An advantage of d is that it can be computed from a wide range of statistical data, including means and standard deviations, t tests, F tests, correlation coefficients, and 2 x 2 con- tingency tables (see Lipsey and Wil- son 2001). Although conceptualized as the difference between two groups on a continuous dependent variable, d can also be computed from dichoto- mous data.
Odds ratio
The odds ratio, o, represents the effect of an intervention as the odds
of a favorable (or unfavorable) out- come for the intervention group rela- tive to the comparison group. It is used when the outcome is measured
75
This content downloaded from 155.33.16.124 on Mon, 03 Jul 2017 21:16:17 UTC All use subject to http://about.jstor.org/terms
THE ANNALS OF THE AMERICAN ACADEMY
dichotomously, such as is common in medicine and criminology. The odds ratio is easy to compute from either the raw frequencies of a 2 x 2 contin- gency table or the proportions of suc- cesses or failures in each condition.
As a ratio of two odds, a value of 1 indicates an equal likelihood of a suc- cessful outcome, whereas values between 1 and 0 indicate a negative effect and values greater than 1 indi- cate a positive effect. Unlike the cor- relation coefficient, the odds ratio is unaffected by differential base rates (the marginal distribution) for the outcome across studies (see Farrington and Loeber 2000), thus eliminating a potential source of effect variability across studies.
Correlation coefficient
The correlation coefficient is a
widely used and widely understood statistic within the social sciences. It
can be used to represent the relation- ship between two dichotomous vari- ables, a dichotomous and a continu- ous variable, and two continuous variables. The correlation coefficient
has a distinct disadvantage, however, when one or both of the variables on which it is based are dichotomous
(Farrington and Loeber 2000). For example, the correlation coefficient is restricted to less than +1 in absolute
value if the percentage of partici- pants in the intervention and com- parison conditions is not split fifty- fifty. Thus it is recommended that it only be used for meta-analyses of correlational research and that
meta-analyses of intervention stud- ies use either the standardized mean
difference, the odds ratio, or a more specialized effect size (for a dis-
cussion of other alternatives, see Lipsey and Wilson 2001).
ANALYSIS OF META-ANALYTIC DATA
A typical meta-analysis extracts one or more effect sizes per study and codes a variety of study characteris- tics to represent the important sub- stantive and methodological differ- ences across studies. Before analysis of the data, statistical transforma- tions and adjustments may need to be applied to the effect size. If multi- ple effect sizes were extracted per study, then a method of including only a single effect size per study (or sample within a study) per analysis will need to be adopted. The analysis of effect size data typically examines the central tendency of the effect size distribution and the consistency of effects across studies. Additional
analyses test for the ability of study features to explain inconsistencies in effects across studies. Meta-analytic methods for performing these analy- ses are summarized below.
Transformations and adjustments
There are standard adjustments and transformations that are rou-
tinely applied to effect sizes, and optional adjustments may be applied depending on the purpose of the meta-analysis. For example, Hedges (1982; Hedges and Olkin 1985) showed that the standardized mean
difference effect size is positively biased when based on a small sam-
ple; that is, it is too large in absolute value, and the bias increases as sam- ple size decreases. The size of bias is
76
This content downloaded from 155.33.16.124 on Mon, 03 Jul 2017 21:16:17 UTC All use subject to http://about.jstor.org/terms
META-ANALYTIC METHODS FOR CRIMINOLOGY
very modest for all but very small sample sizes, but the adjustment is easy to perform and routinely done when using d as the effect size index (for formulas, see the appendix).
When using the odds ratio, one encounters a complication that is also easily rectified. The odds ratio is asymmetric, with negative relation- ships represented as values between 0 and 1 and positive relationships represented as values between 1 and infinity. This complicates analysis. Fortunately, the natural logarithm of the odds ratio is symmetric about 0 with a well-defined standard error.
The importance of the latter is dis- cussed below. Thus, for purposes of analysis, the odds ratio is trans- formed into the logged odds ratio. Results can be transformed back into
odds ratios for purposes of interpre- tation using the antilogarithm.
Similarly the correlation coeffi- cient has a distributional shape that is less than ideal for purposes of com- puting averages. Furthermore the standard error is asymmetric, partic- ularly as the correlation approaches -1 or +1. This is easily solved by applying Fisher's Zr transformation, which normalizes the correlation and results in a standard error that is
remarkably simple. As with the odds ratio, final results can be trans- formed back into correlation coeffi-
cients for interpretative purposes. Hunter and Schmidt (1990) pro-
posed adjusting effect sizes for mea- surement unreliability and invalid- ity, range restriction, and artificial dichotomization. These adjustments, however, depend on information that is rarely reported for outcome mea- sures in crime and justice evaluation
studies, such as reliability and valid- ity coefficients. The logic of these adjustments is to estimate what would have been observed under more ideal research conditions.
These adjustments, while common in meta-analyses of measurement generalizability studies, are rarely used in meta-analyses of interven- tion research. If they are used, it is recommended that a sensitivity analysis be performed to assess the effect the adjustments have on the results.
Statistical independence among effect sizes
A complication with effect size data is the often numerous effect sizes of interest available from each
study. Effect sizes that are based on the same sample of individuals (or other units of analysis, such as city blocks and so forth) are statistically dependent, that is, correlated with each other. Meta-analytic analysis assumes that each data point (effect size in this case) is statistically inde- pendent of all other data points.1 Thus we can include only one effect size per sample in any given analysis. An independent set of effect sizes can be obtained through several strate- gies. First, each major outcome con- struct of interest can, and should, be analyzed separately. For example, effect sizes representing employ- ment success should be analyzed sep- arately from those representing criminal behavior. Second, multiple effect sizes within each outcome con-
struct can be averaged to produce one effect size per study or sample within a study. Alternatively, a meta-ana- lyst may choose a single effect size
77
This content downloaded from 155.33.16.124 on Mon, 03 Jul 2017 21:16:17 UTC All use subject to http://about.jstor.org/terms
THE ANNALS OF THE AMERICAN ACADEMY
based on an explicit criterion. That is, the meta-analyst may prefer rearrest data over reinstitutionalization data
if the former are available. Finally, the meta-analyst may randomly select among those effect sizes that are of interest to a given analysis. Note that several analyses can be performed, each with a different set of independent effect sizes.
The inverse variance weight
An additional complication of meta-analytic data is the differential precision in effect sizes across stud- ies. Effect sizes based on large sam- ples, all other things being equal, are more precise than effect sizes based on small samples. A simple solution to this problem would be to weight each effect size by its sample size. Hedges (1982) showed, however, that the optimal weight is based on the variance (squared standard error) of each effect size. This is intuitively appealing as well, for the standard error is a statistical expression of the precision of parameter, such as an effect size. The smaller the standard error, the more precise is the effect size. Thus, in all meta-analytic anal- yses, weights are computed from the inverse of the squared standard error of the effect size. This is called the
inverse variance weight method. Equations for the inverse variance weight for each of the three effect size indices discussed above are pre- sented in the appendix.
The mean effect size and related statistics
A starting point for the analysis of effect size data is the computation of
the overall mean effect size, com- puted as a weighted mean, weighting by the inverse variance weight. A z test can be performed to assess whether the mean effect size is sta-
tistically greater than (or less than) 0, and a confidence interval can be constructed around the mean effect
size. Both statistics rely on the stan- dard error of the mean effect size, computed from the sum of the weights. Thus both the precision and number of the individual effect sizes
influence the precision of the mean effect size. (For equations, see the appendix.)
The mean effect size is meaningful only if the effects are consistent across studies, that is, statistically homogeneous. If the effects are highly heterogeneous, then a single overall mean effect size does not ade-
quately represent the effects observed by the collection of studies. In meta-analysis, consistency in effects is assessed with the homoge- neity statistic Q. A statistically sig- nificant Q indicates that the observed variability in effect sizes exceeds statistical expectations regarding the variability that would be observed across pure replications, that is, if the collection of studies were indeed estimating a common population effect size. A statistically nonsignificant Q suggests that the variability in effects across studies is no greater than expected due to sam- pling error.
A heterogeneous distribution (a significant Q) is often the desired outcome of a homogeneity analysis. Heterogeneity justifies the explora- tion of the relationship between study features and effects, an important
78
This content downloaded from 155.33.16.124 on Mon, 03 Jul 2017 21:16:17 UTC All use subject to http://about.jstor.org/terms
META-ANALYTIC METHODS FOR CRIMINOLOGY
aspect ofmeta-analysis. The analytic approaches available to the meta- analyst for examining between study effects are an analysis of mean effect sizes by a categorical study feature, analogous to a one-way ANOVA, and a meta-analytic regression analysis approach. Both approaches rely on inverse variance weighting, and both can be implemented under the assumptions of a fixed- or random- effects model. The assumptions of these models will be discussed below.
Categorical analysis of effect sizes: The analog to the ANOVA
The analog to the ANOVA-type analysis is used to examine the rela- tionship between a single categorical variable, such as treatment type or research method, and effect size. There may be as few as two catego- ries, in which case the analysis is con- ceptually similar to a t test, or many categories. A separate mean effect size and associated statistics, such as a z test and confidence interval, are computed for each category of the variable of interest. To test whether the mean effect sizes differ across
categories, a Q between groups is cal- culated (see the appendix). Although this statistic is distributed as a chi-
square, it is interpreted in the same fashion as an F from a one-way ANOVA. A significant Q between groups indicates that the variability in the mean effect sizes across cate-
gories is greater than expected due to sampling error. Thus the category is related to effect size. Examination of
confidence intervals provides evi- dence of the source of the important difference(s).
As with the overall distribution, the residual distribution of effects
within categories may be homoge- neous or heterogeneous. This is tested with the Q within statistic (see the appendix). A homogeneous Q within indicates that the categorical variable explained the excess vari- ability detected by the overall homo- geneity test. In this case, the categor- ical variable provides an explanation for the variability in effects across studies. Alternatively, additional sources of variability in effects exist if the Q within is significant.
The computation of the analog to the ANOVA can be tedious. Macros
that work with existing statistical software packages exist for perform- ing this analysis (for example, Lipsey and Wilson 2001; Wang and Bush- man 1998). BioStat (2000) has cre- ated a meta-analysis program that among other features performs the analog to the ANOVA analysis.
Meta-analytic regression analysis
The analog to the ANOVA is lim- ited to a single categorical variable. A more flexible and general analytic strategy for assessing the relation- ship between study features and effect size is regression analysis. Regression analysis can incorporate multiple independent variables (study features) in a single analysis, including continuous variables and categorical variables (via dummy coding). The differences between ordinary least squares regression and meta-analytic regression are the weighting by the inverse variance and a modification to the standard
error of the regression coefficients,
79
This content downloaded from 155.33.16.124 on Mon, 03 Jul 2017 21:16:17 UTC All use subject to http://about.jstor.org/terms
THE ANNALS OF THE AMERICAN ACADEMY
necessitating the use of specialized software (for example, Lipsey and Wilson 2001; Wang and Bushman 1998). As with the analog to the ANOVA, two Q values are calculated as part ofmeta-analytic regression: a Q for the model and a Q for the resid- ual or error variance. The former is a
test of the predictive ability of the study features in explaining between- studies variability in effects. The regression model accounts for signifi- cant variability in the effect size dis- tribution if the Q for the model is sig- nificant. As with the Q within for the analog to the ANOVA, a significant Q for the error variance indicates that
excess variability remains in the effects across studies after account-
ing for the variability explained by the regression model. That is, the residual distribution in effect sizes is
heterogeneous. Recognizing the correlational
nature of the above analyses of the relationship between study features and effect size is critical. Study fea- tures are often correlated with one
another and, as such, a moderating relationship may be the result of con- founded between-studies features.
For example, the mean effect size for treatment type A may be higher than the mean effect size for treatment
type B. The studies examining treat- ment type B, however, may have used a less sensitive measure of the out-
come construct, thus confounding treatment type with characteristics of the dependent variable. Multi- variate analyses can help assess the interrelationships between study features, but these analyses cannot account for unmeasured study characteristics.
Fixed and random
effects models
The statistical model presented above assumes that the collection of
effect sizes being analyzed is esti- mating a common population effect size. In statistical terms, this is a fixed-effects model. Stated differ-
ently, a fixed-effects model assumes that each effect size differs from the
true population effect size solely due to subject-level sampling error. Each observed effect size is viewed as an
imperfect estimate of the true, single population effect for the intervention of interest. This provides the theoret- ical basis for incorporating the stan- dard error of the effect size (an esti-
mate of subject-level sampling error) into the analysis as the inverse vari- ance weight.
This assumption is restrictive and likely to be untenable in many syn- theses of criminological intervention research where studies of a common
research hypothesis differ on many dimensions, some of which are likely to be related to effect size. Thus each
effect size has variability (that is, instability) due to subject-level sam- pling error and study-level variabil- ity. The random-effects model assumes that at least some portion of the study-level variability is unex- plained by the study features included in the statistical models of
effect size. These study differences may simply be unmeasured, or they may be unmeasurable. In both cases, each effect size is assumed to esti-
mate a true population effect size for that study, and the collection of true population effect sizes represents a random distribution of effects. In
80
This content downloaded from 155.33.16.124 on Mon, 03 Jul 2017 21:16:17 UTC All use subject to http://about.jstor.org/terms
META-ANALYTIC METHODS FOR CRIMINOLOGY
statistical terms, this is a random- effects model.
Methods for estimating random- effects models in meta-analysis are well developed. The basic method involves modifying the definition of the inverse variance weight such that it incorporates both the subject- and study-level estimates of instabil- ity. The inverse variance weight is thus based on both the standard error of the effect size and an esti-
mate of the variability in the distri- bution of population effects. The lat- ter is computed from the observed distribution of effects. Random- effects models are more conservative
than fixed-effects models. Confi-
dence intervals will be larger, and regression coefficients that were sta- tistically significant under a fixed- effects model may no longer be signif- icant under a random-effects model.
It is recommended that meta-analy- ses ofcriminological literatures use a random-effects model of analysis unless a clear justification to do oth- erwise exists.
Sensitivity analysis
A final analytic issue is the sensi- tivity of the results to unusual study effects and decisions made by the meta-analyst. First, it is wise to examine the influence of outliers in
the distribution of effect sizes and the distribution of inverse variance
weights. A modest effect size outlier with a large weight can drive an analysis. Rerunning an important analysis with and without highly influential studies can help verify that the observed result is not solely a function of a single unusual study. Second, the method of selecting one
effect size per study for any given analysis may also affect the meta- analytic findings. For example, in the boot camp systematic review by Mac- Kenzie, Wilson, and Kider (2001), the analyses were performed on a single effect size selected from each study based on a set of decision rules. A sen-
sitivity analysis showed that using a composite of all recidivism effect sizes produced the same results, bol- stering the authors' confidence in the findings. Third, if the meta-analysis has included methodologically weak studies, analyses examining the rela- tionship between method features and observed effects are essential.
Illustration: Cognitive- behavioral programs for sex offenders
To illustrate the methods outlined
above, I have selected a subset of studies included in a meta-analysis of sex offender programs (Gallagher, Wilson, and MacKenzie no date). Presented below are the programs based on cognitive-behavioral princi- ples. Studies were included if they used a comparison group design and the comparison received either no treatment or non-sex-offender-spe- cific treatment. Studies also had to
report a measure of sex offense recid- ivism at some point following termi- nation of the program.
A total of 13 studies met the eligi- bility criteria for this meta-analysis. The recidivism data were dichoto-
mous and as such, the odds ratio was selected as the effect size index. The
odds ratio and 95 percent confidence interval for these 13 studies are pre- sented in Figure 1. Visual inspection of these odds ratios shows a distinct
81
This content downloaded from 155.33.16.124 on Mon, 03 Jul 2017 21:16:17 UTC All use subject to http://about.jstor.org/terms
THE ANNALS OF THE AMERICAN ACADEMY
FIGURE 1
ODDS RATIO AND 95 PERCENT CONFIDENCE INTERVAL FOR EACH OF THE 13 COGNITIVE-BEHAVIORAL SEX OFFENDER EVALUATION STUDIES
Author(s) N Favors Comparison Favors Intervention
Borduin, Henggeler, Blaske & Stein (N = 16) McGrath, Hoke & Vojtisek (N = 103) o
Hildebran & Pithers (N = 90) Marhsall, Eccles & Barbaree (N = 38)
Studer, Reddon, Roper & Estrada (N= 220) Nicholaichuk, Gordon, Andre & Gu (N= 579)
Gordon & Nicholaichuk (N= 206) Guarino & Kimball (N = 75)
Marques, Day, Nelson, & West (N = 229)o- Huot (N= 224)
Gordon & Nicholaichuk (N = 1248) Song & Lieb (N = 278) Nicholaichuk (N = 65)
Overall Mean Odds-Ratio
.02 .1 .50 1 5 25 200
Odds-Ratio
NOTE: Sources of programs are available from the author.
positive trend, with 12 of the 13 stud- ies observing lower recidivism rates (and hence odds ratios greater than 1) for the sex offender treatment con-
dition than the comparison condi- tion. The sole study with a negative effect (an odds ratio between 0 and 1) had a large confidence interval that extended well into the positive range and was from a study of poor method- ological quality.
The weighted mean odds ratio for this collection of 13 studies was 2.33, and the 95 percent confidence inter- val was 1.57 to 3.42. The z test indi- cates that this odds ratio was statis-
tically significant at conventional levels, z = 4.26, p < .001. This collec- tion of studies supports the conclu- sion that cognitive-behavioral pro- grams for sex offenders reduce the risk of a sexual reoffense. The homo-
geneity statistic was significant, indicating that the findings are not consistent across studies and may be
related to study features, Q = 21.99, df= 12,p < .05.
This collection of studies differed
in many ways, both in the research methods used and the specifics of the sex offender treatment program. Many of these 13 studies evaluated a cognitive-behavioral approach called relapse prevention. Relapse preven- tion programs may be more (or less) effective than other cognitive-behav- ioral programs. To explore this, the mean effect size for relapse preven- tion and other cognitive-behavioral programs was calculated (2.41 and 1.73, respectively). Also calculated were the Q between and Q within. The Q between was 0.87,p > .05, indi- cating that the observed difference between these two means was not
statistically significant. The Q within was statistically significant, QWITHIN = 21.12, df= 11,p = .03, indi- cating that significant variability across groups remained after accounting for treatment type.
82
This content downloaded from 155.33.16.124 on Mon, 03 Jul 2017 21:16:17 UTC All use subject to http://about.jstor.org/terms
META-ANALYTIC METHODS FOR CRIMINOLOGY
A regression analysis was per- formed to test whether the differen-
tial lengths of follow-up across stud- ies and the different definitions of recidivism could account for the het-
erogeneity. The regression coefficient for whether the recidivism was mea-
sured at least five years posttreat- ment was statistically significant and positive, B = 1.58, p = .01, sug- gesting that studies with longer fol- low-up periods observed larger dif- ferences in the rates of sexual
offending between the treated and nontreated groups. The effects of sex offender programs may increase over time, or the length of follow-up was related to an unmeasured program characteristic that led to greater effectiveness. The regression coeffi- cient for whether the recidivism mea-
sure was an indicator of arrest or
reconviction was also statistically significant, B = 1.25,p = .04, suggest- ing that arrest may be a more sensi- tive measure of the program effects. Significant variability in the effect size distribution was accounted for
by this regression model, QMODEL = 7.05, df= 3,p = .03. Furthermore the Q associated with the residual vari- ability in effect sizes was not statisti- cally significant, QRESIDUAL = 14.9, df= 10,p = .13, indicating that the resid- ual variability in effects is not greater than would be expected due to sampling error.
INTERPRETATION OF META-ANALYTIC FINDINGS
A researcher who finds a statisti-
cally significant effect is presented with the difficult task of deciding whether the effect is meaningful
from a practical or clinical perspec- tive. That is, is the effect "significant" in the everyday meaning of that word? Meta-analysts are confronted with the same problem. What is the practical significance of an observed mean effect size? A common ap- proach to addressing this problem is the translation of the effect size into
a success rate differential for the
intervention and comparison condi- tions, such as using the binomial effect size display (Rosenthal and Rubin 1983). For example, a stan- dardized mean difference effect size
of .40 is equivalent to a success rate differential of 20 percent (that is, 40 percent recidivism in the interven- tion condition and 60 percent recidi- vism in the comparison condition). If the audience for the meta-analysis is not familiar with standardized mean
difference effect sizes, then the suc- cess rate differential provides a use- ful method of understanding the practical significance of the observed findings.
The odds ratio has a natural inter-
pretation without transformation: the odds ratio is the odds of a success- ful outcome in the treated condition
relative to the comparison condition. Thinking about odds is, however, odd for all but the more mathematically inclined. As with the standardized
mean difference, a mean odds ratio can be translated into percentages of successes (or failures). This transla- tion requires "fixing" the failure rate for one of the conditions. For exam-
ple, if we assume a 50 percent recidi- vism rate for the comparison condi- tion, then an odds ratio of 1.5 translates into a recidivism rate of 40
percent in the treatment condition.
83
This content downloaded from 155.33.16.124 on Mon, 03 Jul 2017 21:16:17 UTC All use subject to http://about.jstor.org/terms
THE ANNALS OF THE AMERICAN ACADEMY
Presenting the results of a meta- analysis of odds ratios as percent- ages provides a means of assessing the magnitude of the observed pro- gram effects.
ADVANTAGES AND DISADVANTAGES
OF META-ANALYSIS
Meta-analysis has several distinct advantages over alternative forms of reviewing empirical research. As a systematic method of review, meta- analysis is replicable by independent researchers. The methods are
explicit and open to the scrutiny of other scholars, who may question the inclusion and exclusion criteria and
critique the variables used to exam- ine between-studies differences. This
can lead to productive debates and competing analyses of the meta-ana- lytic data. In addition, meta-analysis makes efficient use of the informa-
tion contained in the primary stud- ies. Focusing on the direction and magnitude of the findings across studies using a common statistical benchmark allows for the explora- tion of relationships between study features of effects that would not oth- erwise be observable. The statistical
methods of meta-analysis help guard against interpreting the dispersion in results as meaningful when it can just as easily be explained as sam- pling error. Finally, meta-analysis can handle a much larger number of studies than could effectively be summarized with alternative meth- ods. There is no theoretical limit to the number of studies that can be
incorporated into a single meta-anal- ysis, yet as a method it can also be
applied to a small number of similar studies.
As a practitioner of meta-analysis, I see few justified disadvantages to the use of meta-analysis. This does not mean that meta-analysis does not have its disadvantages. On the practical side, meta-analysis is far more time-consuming than tradi- tional forms of review and requires a moderate level of statistical sophisti- cation. Meta-analysis also simplifies the findings of the individual studies, often representing each study as a single effect size and a small set of descriptor variables. Complex pat- terns of effects often found in individ- ual studies do not lend themselves to
synthesis, such as the results from individual growth-curve modeling. To accommodate this, a reviewer may wish to augment a meta-analytic review with narrative descriptions of important studies and interesting study-level findings obscured in the meta-analytic synthesis. Finally, the methods of meta-analysis cannot overcome weaknesses in the primary studies. If the research base that
examines the hypothesis of interest is methodologically weak, then the findings from the meta-analysis will also be weak. In these situations, meta-analysis creates a solid founda- tion for the next generation of studies by clearly identifying the weak- nesses of the current knowledge base on a given issue.
WHEN NOT TO DO META-ANALYSIS
Meta-analysis is the preferred method of systematically reviewing a collection of empirical studies
84
This content downloaded from 155.33.16.124 on Mon, 03 Jul 2017 21:16:17 UTC All use subject to http://about.jstor.org/terms
META-ANALYTIC METHODS FOR CRIMINOLOGY
examining a common research hypothesis. However, meta-analysis is not appropriate for the synthesis of all empirical research literatures. First, meta-analysis cannot be used when a common effect size index can-
not be computed across the studies of interest. For example, the appropri- ate effect size for area studies (that is, studies that have a geographic area as the unit of analysis) is cur- rently being discussed among mem- bers of the Campbell Collaboration. Second, the research designs across a collection of studies examining the relationship of interest may be too disparate for meaningful synthesis. For example, studies with different units of analysis cannot be readily meta-analyzed unless sufficient data are presented to compute an effect size at a common level of analysis. Studies with fundamentally differ- ent research designs, such as one- group longitudinal studies and com- parison group studies also should not be combined in the same meta-analy- sis. Third, the research question for a meta-analysis may involve a multivariate relationship. Although methods have been devel- oped for meta-analyzing multi- variate research studies (for exam- ple, Becker 1992; Becker 1996; Premack and Hunter 1988), these methods have rarely been applied and are still not well developed. It is unlikely that the more elaborate research designs will ever easily lend themselves to synthesis. Thus some research questions addressed by pri- mary studies are not easily meta-
analyzed. Finally, meta-analysis does not address broad theoretical
issues that may be important to a debate regarding the value of various crime prevention efforts. Meta-anal- ysis is designed to synthesize the evi- dence regarding the strength of a relationship across distinct research studies. This is a very specific task that may be imbedded in a larger scholarly endeavor.
CONCLUSIONS
Systematic reviews approach the task of summarizing findings of a col- lection of research studies as a
research task. As a method of sys- tematic reviewing, meta-analysis takes this a step further by quantify- ing the direction and magnitude of the findings of interest across studies and uses specialized statistical methods to analyze the relationship between findings and study features. Properly executed, meta-analysis provides a firm foundation for future research. That is, empirical relation- ships that are well established and areas that are underresearched or
that have equivocal findings are identified through the meta-analytic process. In addition, meta-analysis provides a defensible strategy for summarizing crime prevention and intervention efforts for informing public policy. Although the methods are technical, the findings can be translated into summary statistics readily understandable by non- social science researchers.
85
This content downloaded from 155.33.16.124 on Mon, 03 Jul 2017 21:16:17 UTC All use subject to http://about.jstor.org/terms
THE ANNALS OF THE AMERICAN ACADEMY
APPENDIX
EQUATIONS FOR THE CALCULATION OF EFFECT SIZES AND META-ANALYTIC SUMMARY STATISTICS
No. Equation Notes
Common effect size indices
(1) d =x1- Spooled
ad (2) o=-
bc
(3) r= r
(4) d= 1-4N 9 d
(5) lor= log(o)
(6) z=5logf 1+r , (1-r )
(7) = elor
e2z _ 1 ' " -1 (8) r = e2z + 1
n, + n2 d (9) vd = + n(n2 2(n,
a b c 1
(11) v, = N-3
(12) w =- V
Standardized mean difference effect size; X1 is the mean of the intervention condition; X2 is the mean of
the comparison condition; and Spooled is the pooled within-groups standard deviation
Odds ratio effect size; a and c are the number of successful outcomes in the intervention and
comparison conditions, and b and dare the number of failures in the intervention and comparison conditions (based on a 2 x 2 contingency table)
Correlation coefficient effect size; r is the Pearson product-moment correlation coefficient between the two variables of interest
Common transformations of effect size
Small sample size bias correction; d is the standardized mean difference effect size and N is the total sample size
Log transformation of the odds ratio
Fisher's transformation of the correlation effect size
Logged odds ratio (lor) transformed into an odds ratio (o); e is the constant 2.7183
Transforms the effect size z from equation 6 back into a correlation; e is the constant 2.7183
Fixed effects model inverse variance weights ,2
The variance for the standardized mean difference; n, + ~n~2) and n2 are the sample sizes for the intervention and
1 comparison conditions d-~- ~ The variance for the logged odds ratio; a, b, c, and d
are the cell frequencies of a 2 x 2 contingency table The variance for the Fisher's transformed correlation
coefficient; N is the total sample size The inverse variance weight; v is the inverse variance from equation 9, 10, or 11
Mean effect size and related statistics
- X(ES-w) (13) ES= ( w Zw Weighted mean effect size, where ES is the effect size
index (equations 4, 5, or 6) and w is the inverse variance weight (equation 12)
86
This content downloaded from 155.33.16.124 on Mon, 03 Jul 2017 21:16:17 UTC All use subject to http://about.jstor.org/terms
META-ANALYTIC METHODS FOR CRIMINOLOGY
APPENDIX Continued
No. Equation Notes
(14) se - = 1 The standard error of the mean effect size
ES (15) z= A z test; tests whether ES is statistically greater than or
seES less than 0 (16) LowerCI = ES - 1.96se Lower bound of the 95 percent confidence interval
(17) UpperCI = ES + 1.96se E Upper bound of the 95 percent confidence interval
Homogeneity test Q
2 (D(ES ( w))2 (18) 0 = -(ES w2 W) - ('E w) / Homogeneity test ; distributed as a chi-square,
>(S22w degrees of freedom equals the number of effect sizes less 1
Random effects variance component and weight
Q - (k -1) (19) V = -(-) ( >2
Z,w- l
(20) w = V+ Ve
Wj)2 (21) j = (ES 2. w)- SWj
(22) QW = Q- QB
The random effects variance component; the random effects variance component has a more complex form when used as part of the analog to the ANOVA or regression models
The random effects inverse variance weight, where v is defined as in equations 9 through 11
Analog to the ANOVA
Q between groups; where j is 1 to the number of categories for the independent variable; distributed as a chi-square with j- 1 degrees of freedom
Q within groups; where Q is the overall homogeneity statistics defined in equation 18 and QB is defined in equation 21; distributed as a chi-square with the number of effect sizes minus the number of categories in the independent variable as the degrees of freedom
Meta-analytic regression analysis
(23) Use specialized software For example, SAS, SPSS, or Stata macros by Lipsey and Wilson (2001); SAS macros by Wang and Bushman (1998)
87
This content downloaded from 155.33.16.124 on Mon, 03 Jul 2017 21:16:17 UTC All use subject to http://about.jstor.org/terms
THE ANNALS OF THE AMERICAN ACADEMY
Note
1. Methods have been developed for han- dling dependent effect sizes in a single analy- sis, but these methods are beyond the scope of this article. (For details, see Gleser and Olkin 1994; Kalaian and Raudenbush 1996.)
References
Becker, Betsy J. 1992. Models of Science Achievement: Forces Affecting Perfor- mance in School Science. In Meta-
analysis for Explanation: A Casebook, ed. Thomas D. Cook, Harris Cooper, David S. Cordray, Heidi Hartmann, Larry V. Hedges, Richard J. Light, Thomas A. Louis, and Frederick Mosteller. New York: Russell Sage.
Becker, G. 1996. The Meta-Aanalysis of Factor Analyses: An Illustration Based on the Cumulation of Correla-
tion Matrices. Psychological Methods 1:341-53.
BioStat. 2000. Comprehensive Meta- Analysis (Software Program, Version 1.0.9). Englewood, NJ: BioStat. Avail- able: www.metaanalysis.com.
Farrington, David P. and Rolf Loeber. 2000. Some Benefits of Dichot-
omization in Psychiatric and Crimino- logical Research. Criminal Behaviour and Mental Health 10:100-122.
Fisher, Ronald A. 1944. Statistical Methods for Research Workers. 9th ed. London: Oliver and Boyd.
Gallagher, Catherine A., David B. Wilson, and Doris Layton MacKenzie. N.d. A Meta-Analysis of the Effectiveness of Sexual Offender Treatment Pro-
grams. Unpublished manuscript, Uni- versity of Maryland at College Park.
Glass, Gene V. 1976. Primary, Secondary and Meta-Analysis of Research. Edu- cational Researcher 5:3-8.
Gleser, Leon J. and Ingram Olkin. 1994. Stochastically Dependent Effect Sizes. In The Handbook of Research Synthesis, ed. Harris Cooper and
Larry V. Hedges. New York: Russell Sage.
Hedges, Larry V. 1982. Estimating Effect Size from a Series of Independent Ex- periments. Psychological Bulletin 92: 490-99.
Hedges, Larry V. and Ingram Olkin. 1985. Statistical Methods for Meta-Analysis. Orlando, FL: Academic Press.
Hunter, John E. and Frank L. Schmidt. 1990. Methods of Meta-Analysis: Cor- recting Error and Bias in Research Findings. Newbury Park, CA: Sage.
Kalaian, H. A. and Stephen W. Rauden- bush. 1996. A Multivariate Mixed Lin-
ear Model For Meta-Analysis. Psycho- logical Methods 1:227-35.
Lipsey, Mark W., Gabrielle L. Chapman, and Nana A. Landenberger. 2001. Cog- nitive-Behavioral Programs for Of- fenders. Annals of the American Acad- emy of Political and Social Science 578:144-157.
Lipsey, Mark W., Scott Crosse, J. Dunkle, J. Pollard, and G. Stobart. 1985. Evalu- ation: The State of the Art and the
Sorry State of the Science. New Direc- tions forProgram Evaluation 27:7-28.
Lipsey, Mark W. and David B. Wilson. 2001. Practical Meta-Analysis. Thou- sand Oaks, CA: Sage.
MacKenzie, Doris Layton, David B. Wil- son, and Suzanne B. Kider. 2001. Ef- fects of Correctional Boot Camps on Offending. Annals of the American Academy of Political and Social Sci- ence 578:126-143.
Pearson, Karl. 1904. Report on Certain Enteric Fever Inoculation Statistics. British Medical Journal 3:1243-46.
Quoted in Morton Hunt, How Science Takes Stock: The Story of Meta- Analysis (New York: Russell Sage, 1997).
Petrosino, Anthony, Robert F. Boruch, Haluk Soydan, Lorna Duggan, and Julio Sanchez-Meca. 2001. Meeting the Challenges of Evidence-Based Policy: The Campbell Collaboration.
88
This content downloaded from 155.33.16.124 on Mon, 03 Jul 2017 21:16:17 UTC All use subject to http://about.jstor.org/terms
META-ANALYTIC METHODS FOR CRIMINOLOGY
Annals of the American Academy of Political and Social Science 578:14-34.
Premack, Steven L. and John E. Hunter. 1988. Individual Unionization De-
cisions. Psychological Bulletin 103: 223-34.
Rosenthal, Robert. 1991. Meta-Analytic Procedures for Social Research. Ap- plied Social Research Methods Series. Vol. 6. Newbury Park, CA: Sage.
Rosenthal, Robert and Donald B. Rubin. 1983. A Simple, General Purpose Dis- play of Magnitude of Experimental Ef-
fect. Journal of Educational Psychol- ogy 74:166-69.
Wang, Morgan C. and Brad J. Bushman. 1998. Integrating Results Through Meta-Analytic Review Using SAS Software. Cary, NC: SAS Institute.
Wilson, David B. and Mark W. Lipsey. In press. The Role of Method in Treat- ment Effect Estimates: Evidence from
Psychological, Behavioral, and Educa- tional Treatment Intervention Meta-
Analyses. Psychological Methods.
89
This content downloaded from 155.33.16.124 on Mon, 03 Jul 2017 21:16:17 UTC All use subject to http://about.jstor.org/terms
- Contents
- image 1
- image 2
- image 3
- image 4
- image 5
- image 6
- image 7
- image 8
- image 9
- image 10
- image 11
- image 12
- image 13
- image 14
- image 15
- image 16
- image 17
- image 18
- image 19
- Issue Table of Contents
- The Annals of the American Academy of Political and Social Science, Vol. 578, Nov., 2001
- Front Matter [pp.1-7]
- Preface [pp.8-13]
- Methods and Perspectives
- Meeting the Challenges of Evidence-Based Policy: The Campbell Collaboration [pp.14-34]
- The Campbell Collaboration Crime and Justice Group [pp.35-49]
- Does Research Design Affect Study Outcomes in Criminal Justice? [pp.50-70]
- Meta-Analytic Methods for Criminology [pp.71-89]
- Research Findings from Prevention and Intervention Studies
- Early Parent Training to Prevent Disruptive Behavior Problems and Delinquency in Children [pp.90-103]
- The Effects of Hot Spots Policing on Crime [pp.104-125]
- Effects of Correctional Boot Camps on Offending [pp.126-143]
- Cognitive-Behavioral Programs for Offenders [pp.144-157]
- Future Directions
- Toward an Evidence-Based Approach to Preventing Crime [pp.158-173]
- Book Department
- International Relations and Politics
- untitled [pp.174-175]
- untitled [pp.175-176]
- untitled [pp.176-177]
- untitled [pp.177-179]
- untitled [p.179]
- Africa, Asia, and Latin America
- untitled [pp.179-181]
- untitled [pp.181-182]
- untitled [pp.182-183]
- untitled [pp.183-185]
- untitled [pp.185-186]
- Europe
- untitled [pp.186-187]
- untitled [pp.187-188]
- untitled [pp.188-189]
- United States
- untitled [pp.189-190]
- untitled [pp.190-192]
- untitled [pp.192-193]
- untitled [pp.193-194]
- untitled [pp.194-195]
- untitled [pp.195-196]
- untitled [pp.196-197]
- Sociology
- untitled [pp.197-198]
- untitled [pp.198-199]
- untitled [pp.199-200]
- Economics
- untitled [pp.201-202]
- Back Matter [pp.203-205]