Research Assignment
3/24/2019 Research and Evaluation in Education and Psychology: Integrating Diversity With Quantitative, Qualitative, and Mixed Methods
https://ncuone.ncu.edu/d2l/le/content/122307/viewContent/1252315/View?ou=122307 1/34
CHAPTER 2
Evaluation
We advocate for the importance of deepening the meaning of credible evaluation practice and findings by bringing multiple philosophical and theoretical lenses to the evaluation process as a basis for the
use of mixed methods in evaluation, thus providing evaluators with strategies for garnering more complex and diverse perspectives on the creation of credible evidence.
—Mertens and Hesse Biber, 2013, p. 5
The diversity of cultures within the United States guarantees that virtually all evaluators will work outside familiar cultural contexts at some time in their careers. Cultural competence in evaluation
theory and practice is critical for the profession and for the greater good of society. —American Evaluation Association, 2011, p. 1
In This Chapter • Evaluation is defined and distinguished from research in terms of purpose, method, and use.
• The history of evaluation and current theoretical models is explained, including the CIPP (Context, Input, Process, Product) model, responsive evaluation, theory-based evaluation, impact evaluation, participatory evaluation, utilization-focused evaluation, Real World Evaluation, developmental evaluation, empowerment evaluation, inclusive evaluation, feminist evaluation, culturally responsive evaluation, and postcolonial and Indigenous evaluation.
• Steps for planning an evaluation are described, including a description of what is to be evaluated, the purpose of the evaluation, the stakeholders in the evaluation, constraints affecting the evaluation, the evaluation questions, selection of an evaluation model, data collection specification, analysis and interpretation strategies, utilization, management of the evaluation, and meta-evaluation plans.
• Ethical guidelines from the American Evaluation Association (AEA) are also presented.
• Questions to guide critical analysis of evaluation studies are provided, with special reference to The Program Evaluation Standards (Yarbrough, Shulha, Hopson, & Caruthers, 2011).
In response to concerns about teenage pregnancies and escalating rates of sexually transmitted diseases and HIV/AIDS infection, the U.S. government enacted Title V, Section 510 of the Personal Responsibility and Work Opportunity Reconciliation Act of 1996 to promote an abstinence-only educational program for school-age youth. Because the federal funding for this program exceeded $1 billion, it seemed prudent to commission an evaluation to determine if abstinence-only programs were effective (McClelland & Fine, 2008). Mathematica Policy Research Institute received funds from the federal government to conduct an evaluation of four selected abstinence-only programs (Trenholm et al., 2007). “Findings indicate that youth in the program group were no more likely than control group youth to have abstained from sex and, among those who reported having had sex, they had similar numbers of sexual partners and had initiated sex at the same mean age” (p. xvii). The programs also did not have an impact on the number of pregnancies, births, or STDs. A summary of this study is presented as Sample Study 2.1.
Defining Evaluation Evaluators have not been immune to the generally pluralistic and sometimes contentious spirit
p. 47
p. 46
00000001583532 - Research and Evaluation in Education and Psychology: tative, Q
ualitative, and M ixed M
ethods 03/24/2019 - RS000000000000000 Integrating D
iversity W ith Q
uantit
3/24/2019 Research and Evaluation in Education and Psychology: Integrating Diversity With Quantitative, Qualitative, and Mixed Methods
https://ncuone.ncu.edu/d2l/le/content/122307/viewContent/1252315/View?ou=122307 2/34
Evaluators have not been immune to the generally pluralistic, and sometimes contentious, spirit prevailing in the research community. This is partially because evaluations are often (but certainly not exclusively) conducted on programs designed to help oppressed and troubled people. The direct relationship between the evaluation of social and educational programs and access to resources sets the stage for tensions that can sometimes result in conflicts. Ernie House (1993) captures this spirit in his description of the evolution of evaluation:
Gradually, evaluators recognized that there were different interests to be served in an evaluation and that some of these interests might conflict with one another. The result was pluralist conceptions of evaluation in which multiple methods, measures, criteria, perspectives, audiences, and interests were recognized. Conceptually, evaluation moved from monolithic to pluralist conceptions, reflecting the pluralism that had emerged in the larger society. How to synthesize, resolve, and adjudicate all these multiple multiples remains a formidable question, as indeed it does for the larger society. Evaluation, which was invented to solve social problems, was ultimately afflicted with many of the problems it was meant to solve. (p. 11)
SAMPLE STUDY 2.1 Summary of an Evaluation Study
Evaluation Problem: Teenage pregnancy and sexually transmitted diseases are associated with high personal and societal costs. A billion dollars of taxpayer money was used to implement abstinence-only programs for school-age youth.
Sample Evaluation Questions: What impacts do abstinence-only programs have on behavioral outcomes (e.g., sexual abstinence, sexual activity, and risks of STDs and pregnancy)?
Method: The evaluation used an experimental design that involved random assignment of youth in four locations to either the abstinence education program or the control group that did not receive this program: two urban programs in Florida and Minnesota and two rural programs in Virginia and Mississippi.
Treatment: The programs in each setting were different in some respects, but they were all required to adhere to principles such as the following: exclusively teach that abstinence from sexual activity is the only certain way to avoid out-of-wedlock pregnancy, sexually transmitted diseases, and other health problems. The students participated in the programs over a 3-year period.
Participants: The final evaluation was based on 2,057 youth, 1,209 in the experimental group and 848 in the control group. The evaluation was conducted 4 to 6 years after the students began participating in the program. Two of the cases were in middle schools, and the average age at the time of the final evaluation was 18 years old; two of the programs were implemented in upper-elementary schools, and their students averaged 15 years at the time of the final evaluation.
p. 48
00000001583532 - Research and Evaluation in Education and Psychology: tative, Q
ualitative, and M ixed M
ethods 03/24/2019 - RS000000000000000 Integrating D
iversity W ith Q
uantit
3/24/2019 Research and Evaluation in Education and Psychology: Integrating Diversity With Quantitative, Qualitative, and Mixed Methods
https://ncuone.ncu.edu/d2l/le/content/122307/viewContent/1252315/View?ou=122307 3/34
Data Collection: Evaluators used the Survey of Teen Activities and Attitudes; they administered it four times: baseline at the beginning of the program and three follow-up surveys either in school or by phone. During the implementation phase, survey data were supplemented by qualitative data collected during site visits. The final evaluation data were collected by a survey administered to youth in 2005 and 2006.
Results: The results are as stated at the beginning of this chapter: no differences in outcomes between experimental and control groups.
Discussion: Nationally, about half of all high school youth report having had sex, and more than one in five students report having had four or more sexual partners by the time they complete high school. In this study, 47 percent of sexually active youth had unprotected sex in the previous 12 months. One-quarter of sexually active adolescents nationwide have an STD, and many STDs are lifelong viral infections with no cure. “Findings from this study speak to the continued need for rigorous research on how to combat the high rate of teen sexual activity and its negative consequences” (p. 61).
SOURCE: Based on Trenholm et al. (2007). Ribbon image from © Jupiterimages/ablestock.com/Getty images
Given the tone of House’s words, it should come as no surprise that even the definition of evaluation has been contested. Many definitions of evaluation have been proposed. In the Encyclopedia of Evaluation (Mathison, 2005), Fournier (2005) provided this as a general definition of evaluation:
Evaluation is an applied inquiry process for collecting and synthesizing evidence that culminates in conclusions about the state of affairs, value, merit, worth, significance, or quality of a program, product, person, policy, proposal, or plan. Conclusions made in evaluations encompass both an empirical aspect (that something is the case) and a normative aspect (judgment about the value of something). It is the value feature that distinguishes evaluation from other types of inquiry, such as basic science research, clinical epidemiology, investigative journalism, or public polling. (p. 140)
The terms merit and worth used in this definition of evaluation also need clarification. In the same Encyclopedia of Evaluation (Mathison, 2005), we find these definitions:
• Merit is the absolute or relative quality of something, either overall or in regard to a particular criterion. To determine the merit of an evaluand in regard to a particular criterion, it is necessary to collect relevant performance data and to explicitly ascribe value to it; that is, to say how meritorious the evaluand is in that particular dimension. To determine the overall merit of the evaluand, a further step is required: synthesis of performances with multiple criteria. Merit determination and synthesis are two of the core methodological tasks that distinguish evaluation from the collection and reporting of descriptive data for interpretation by others. (Jane Davidson) (p. 247)
• Worth is an outcome of an evaluation and refers to the value of the evaluand in a particular context, as opposed to the evaluand’s intrinsic value, which is its merit. Worth and merit are not dependent on each other, and an evaluand (e.g., a doctor) may have merit (she is a highly skilled cardiologist) but have little worth (the hospital needs an anesthesiologist). The opposite is also the case (the hospital has found an anesthesiologist but not a very good one). The worth of an evaluand requires a thorough understanding of the particular context as well as the qualities and attributes of the evaluand. (Mathison) (p. 452)
Michael Patton (2008) makes this distinction between merit and worth:
Merit refers to the intrinsic value of a program, for example, how effective it is in meeting the needs of those it is intended to help. Worth refers to extrinsic value to those outside the program, for example, to the larger community or society. A welfare program that gets jobs for recipients
p. 49
00000001583532 - Research and Evaluation in Education and Psychology: tative, Q
ualitative, and M ixed M
ethods 03/24/2019 - RS000000000000000 Integrating D
iversity W ith Q
uantit
3/24/2019 Research and Evaluation in Education and Psychology: Integrating Diversity With Quantitative, Qualitative, and Mixed Methods
https://ncuone.ncu.edu/d2l/le/content/122307/viewContent/1252315/View?ou=122307 4/34
for example, to the larger community or society. A welfare program that gets jobs for recipients has merit for those who move out of poverty and worth to society by reducing welfare costs. (p. 113)
It is possible for a program to have merit but not worth. For example, if an agricultural program improves tobacco production, this would have merit for the farmers who increase their income. However, it would be viewed as having less worth by those who know that smoking causes cancer and enormous medical costs.
In international development communities, a distinction is made between monitoring and evaluation. The United Nations Development Programme (2002, p. 6) provides the following definitions that are relevant in evaluations undertaken in international development:
• Monitoring is defined as “a continuing function that aims primarily to provide the management and main stakeholders of an ongoing intervention with early indications of progress, or lack thereof, in the achievement of results. An ongoing intervention might be a project, programme or other kind of support to an outcome.”
• Evaluation is defined as “selective exercise that attempts to systematically and objectively assess progress towards and the achievement of an outcome. Evaluation is not a one-time event, but an exercise involving assessments of differing scope and depth carried out at several points in time in response to evolving needs for evaluative knowledge and learning during the effort to achieve an outcome.”
Shadish (1998) called for an expansion of the definition in terms of the purposes for which evaluations are done. His definition of evaluation included the idea that evaluation is defined in terms of its use of feasible practices to construct knowledge of the value of the evaluand that can be used to ameliorate the problems to which the evaluand is relevant. Even the part of the definition that refers to the purpose of the evaluation has been discussed and criticized. Sometimes, evaluations are done but no big decisions are made based on the results. Patton (2008) notes that evaluations can be used to reduce uncertainty about decisions that have to be made but that many other factors influence program decisions, such as availability of resources and the political climate. You can use the following list of questions to process the content of the various definitions and their implications for practice.
EXTENDING YOUR THINKING
Definitions of Evaluation
1. What do the definitions of program evaluation mean to you? Explain in your own words.
2. How do the concepts of merit and worth figure into your understanding of the meaning of evaluation?
3. Search for alternative definitions of program evaluation by examining other texts of web-based resources (see the American Evaluation Association’s Web page at www.eval.org for good sources). What similarities and differences do you see between/among the definitions? What distinguishes one definition from the others?
4. Why is it important which definition of evaluation you use? What kind of power rests in the definition to tell you what to do or how to do it?
5. What are the implications of the various definitions for working in culturally diverse communities? What are the advantages and disadvantages of adopting different definitions in a culturally complex setting?
6. What are your reflections on the power related to who gets to decide which definition of evaluation is used?
As already indicated the definitions of evaluation contain some jargon from the evaluation
p. 50
00000001583532 - Research and Evaluation in Education and Psychology: tative, Q
ualitative, and M ixed M
ethods 03/24/2019 - RS000000000000000 Integrating D
iversity W ith Q
uantit
3/24/2019 Research and Evaluation in Education and Psychology: Integrating Diversity With Quantitative, Qualitative, and Mixed Methods
https://ncuone.ncu.edu/d2l/le/content/122307/viewContent/1252315/View?ou=122307 5/34
As already indicated, the definitions of evaluation contain some jargon from the evaluation community that requires explanation for you to really understand what evaluation is, such as the terms merit and worth. Additional terms are defined in Box 2.1. A more comprehensive listing of terms and their meanings can be found in the Encyclopedia of Evaluation (Mathison, 2005).
BOX 2.1 Definition of Terms in Evaluation Parlance
When evaluators talk about the evaluand or object of the evaluation, they are talking about what it is that will be evaluated. This can include a social or educational program, a product, a policy, or personnel. Examples of the kinds of programs that are evaluated include enrichment programs for deaf, gifted adolescents; drug and alcohol abuse programs for the homeless; and management programs for high-level radioactive waste.
Formative evaluations are conducted primarily for the purposes of program improvement. Typically, formative evaluations are conducted during the development and implementation of the program and are reported to in-house staff that can use the information to improve the program. A summative evaluation is an evaluation used to make decisions about the continuation, revision, elimination, or merger of a program. Typically, it is done on a program that has stabilized and is often reported to an external agency.
Internal evaluators work within the organization that operates the program; external evaluators are “experts” brought in from outside the organization for the express purpose of conducting or assisting with the evaluation.
Evaluators must respond to the concerns and interests of selected members of the setting being evaluated (often termed the stakeholders). These include the program funders, the administrators, staff members, and recipients of the services (and sometimes those who do not receive the services for various reasons). These are the audiences that the evaluators serve in the planning, conduct, and use of the evaluation study, compared with the scholarly, academic audience of the researcher.
Evaluations can be conducted on social and educational policies, programs, products, or personnel. For purposes of this chapter, I focus on the evaluation of social and educational programs. References for individuals interested in personnel evaluation are provided in Box 2.2.
BOX 2.2 References for Personnel Evaluation
Books Darling-Hammond, L. (2013). Getting teacher evaluation right. New York, NY: Teachers College Press. Joint Committee on Standards for Educational Evaluation. (2009). The personnel evaluation standards (2nd
ed.). Thousand Oaks, CA: Corwin. Stronge, J. H., & Tucker, P. D. (Eds.). (2003). Handbook on teacher evaluation: Assessing and improving
performance. Larchmont, NY: Eye on Education. U.S. Office of Personnel Management. (2011). A handbook for measuring employee performance.
Washington DC: Author. Wilkerson, J. R., & Lang, W. S. (2007). Assessing teacher dispositions. Thousand Oaks, CA: Corwin.
Journals Counselor Education and Supervision
Educational Assessment, Evaluation and Accountability (formerly the Journal of Personnel Evaluation in Education)
Educational Leadership
Journal of Applied Psychology
Journal of Counseling Psychology
Journal of Occupational and Organizational Psychology
Personnel Psychology
p. 51
00000001583532 - Research and Evaluation in Education and Psychology: tative, Q
ualitative, and M ixed M
ethods 03/24/2019 - RS000000000000000 Integrating D
iversity W ith Q
uantit
3/24/2019 Research and Evaluation in Education and Psychology: Integrating Diversity With Quantitative, Qualitative, and Mixed Methods
https://ncuone.ncu.edu/d2l/le/content/122307/viewContent/1252315/View?ou=122307 6/34
Phi Delta Kappan
Professional School Counseling
Distinguishing Research and Evaluation Given the definitions of evaluation already discussed, you may have an inkling of how research and evaluation differ. While there is much overlap between the worlds of research and evaluation, evaluation occupies some unique territory (Mertens & Wilson, 2012). Greene (2000) writes about the commonalities that demarcate evaluation contexts and distinguish program evaluation from other forms of social inquiry (such as research). She argues that what distinguishes evaluation from other forms of social inquiry is its political inherency; that is, in evaluation, politics and science are inherently intertwined. Evaluations are conducted on the merit and worth of programs in the public domain, which are themselves responses to prioritized individual and community needs that resulted from political decisions. Program evaluation “is thus intertwined with political power and decision making about societal priorities and directions” (p. 982).
Trochim (2006) argues that evaluation is unique because of the organizational and political contexts in which it is conducted, which require skills in management, group processes, and political maneuvering that are not always needed in research. Mathison (2008) makes a strong claim that evaluation needs to be considered as a distinct discipline because of its historical emergence in the 1960s as a mechanism to examine valuing as a component of systematic inquiry, as well as the ensuing development of methodological approaches that focus on stakeholder input and use of defined criteria (see the American Evaluation Association’s, 2004, Guiding Principles for Evaluators, and The Program Evaluation Standards, Yarbrough, Shulha, Hopson, & Caruthers, 2011, discussed later in this chapter).
Scriven (2003) adds a thoughtful evolution of this train of thought by describing evaluation as a transdiscipline because it is used in so many other disciplines. He writes, “Evaluation is a discipline that serves other disciplines even as it is a discipline unto itself, thus its emergent transdisciplinary status” (p. 22). He says evaluation is like such disciplines as statistics and ethics that have unique ways of approaching issues but are also used in other areas of inquiry such as education, health, and social work.
Mertens and Wilson (2012) recognize the uniqueness of evaluation, as well as its overlap with applied research in education and the social sciences. While evaluation has contributed to our understanding of how to bring people together to address critical social issues, parallel developments have also been occurring in applied social research. Hence, “There is a place at which research and evaluation intersect—when research provides information about the need for, improvement of, or effects of programs or policies” (Mertens, 2009, p. 2). Thus, this provides additional rationale for including evaluation in this textbook as a major genre of systematic inquiry that borrows and enhances the methodologies developed in the research community.
This ties in (again) with the quotation that appeared earlier in this chapter. You might be wondering what time period is referenced in House’s (1993) remark that “gradually, evaluators recognized . . . ” (p. 11). When did evaluators think differently, and what did they think? I now present you with a brief history of evaluation that provides the context for understanding theories and methods in evaluation.
EXTENDING YOUR THINKING
Research Versus Evaluation Locate a published evaluation study. Explain why that study should or should not be classified as an evaluation study (as opposed to a research study) based on the definition of evaluation. What do you derive from this exercise in terms of the difference between evaluation and research?
Hi t d M d l f E l ti
p. 53
p. 52
00000001583532 - Research and Evaluation in Education and Psychology: tative, Q
ualitative, and M ixed M
ethods 03/24/2019 - RS000000000000000 Integrating D
iversity W ith Q
uantit
3/24/2019 Research and Evaluation in Education and Psychology: Integrating Diversity With Quantitative, Qualitative, and Mixed Methods
https://ncuone.ncu.edu/d2l/le/content/122307/viewContent/1252315/View?ou=122307 7/34
History and Models of Evaluation The origins of evaluation can be traced back to the 1800s, when the government first asked for external inspectors to evaluate public programs such as prisons, schools, hospitals, and orphanages (Stufflebeam, Madaus, & Kellaghan, 2000). However, most writers peg the beginning of the profession of evaluation, as it is now known, to the 1960s, with the passage of Great Society legislation (e.g., Head Start programs and the Elementary and Secondary Education Act) that mandated evaluations as a part of the programs. The history of evaluation is also complicated by its pluralistic disciplinary roots, with educational evaluators coming from a testing, assessment, and objectives-based evaluation background and psychologists more closely aligned with applied social research traditions (Mark, Greene, & Shaw, 2006).
Alkin (2013) depicts the historical roots of evaluation theories through a tree with three branches that he named: use, methods, and valuing (see Figure 2.1). The tree is useful in some respects; however, it has limited representation of evaluation theorists outside the United States, theorists of color, or Indigenous evaluators. Hopson and Hood (2005) undertook the “Nobody Knows My Name” project in order to bring to light the contributions of African Americans to program evaluation. Hood and Hopson (2008) particularly note the work of African American scholars Asa Hilliard (1989, 2007), Aaron Brown (1944), Leander Boykin (1950), and Reid E. Jackson (1940) that serve as a basis for their own work that extends responsive evaluation (Stake, 2006) to culturally responsive evaluation. This excerpt from Hood and Hopson’s (2008, p. 414) discussion of Hilliard’s work illustrates the potency of his contribution to evaluation from an Afrocentric perspective:
In 1989, Hilliard combined his Afrocentric perspective with his observations about evaluation at the annual meeting of the American Evaluation Association in his keynote address entitled “Kemetic (Egyptian) Historical Revision: Implications for Cross-Cultural Evaluation and Research in Education.” In particular, he reminded evaluators that they are engaged in the “manipulation of power and therefore politics in its truest sense. . . . Different approaches to evaluation can result in the painting of very different pictures of reality.” (p. 8)
In a rough way, Alkin’s three branches, with the addition of a social justice branch (Mertens & Wilson, 2012), can be mapped on the major paradigms described in Chapter 1. The use branch equates to the pragmatic paradigm, the methods branch to the postpositivist paradigm, the valuing branch to the constructivist paradigm, and the social justice branch (not pictured here) to the transformative paradigms.
Figure 2.1 Historical Roots of Evaluation
p. 54
00000001583532 - Research and Evaluation in Education and Psychology: tative, Q
ualitative, and M ixed M
ethods 03/24/2019 - RS000000000000000 Integrating D
iversity W ith Q
uantit
3/24/2019 Research and Evaluation in Education and Psychology: Integrating Diversity With Quantitative, Qualitative, and Mixed Methods
https://ncuone.ncu.edu/d2l/le/content/122307/viewContent/1252315/View?ou=122307 8/34
SOURCE: Alkin (2013).
Postpositivist Paradigm: Methods Branch Evaluation in both education and psychology began in the postpositivist paradigm. In education, evaluation emerged from a tradition of testing to assess student outcomes and progressed through an era of specification of objectives and measurement to determine if the objectives had been met. Ralph Tyler (cited in Stufflebeam et al., 2000) developed the objectives-based model for evaluation, and Malcolm Provus (cited in Stufflebeam et al., 2000) developed the discrepancy evaluation model. In psychology, the early years of evaluation were dominated by the work of Donald Campbell (see Shadish, Cook, & Campbell, 2002) in quasi-experimental design (a topic that is discussed extensively in Chapter 4 of this book). Contemporary evaluators have not abandoned the use of testing, objectives, and quasi-experimental designs, but they have modified and extended these strategies and added new approaches in the ensuing years. Lest we think that the postpositivist paradigm is a thing of the past, consider this statement from the U.S. Department of Education’s (2003) Identifying and Implementing Educational Practices Supported by Rigorous Evidence: A User Friendly Guide: “Well-designed and implemented randomized controlled trials are considered the ‘gold standard’ for evaluating an intervention’s effectiveness, in fields such as medicine, welfare and employment policy, and psychology” (p. 1). The U.S. Department of Education also maintains a website called the What Works Clearinghouse (http://ies.ed.gov/ncee/wwc/) that lists educational programs that are deemed to have been rigorously evaluated through the use of randomized experimental designs. Two of the sample studies described in this text provide examples of postpositivist approaches to evaluation—Sample Study 1.1 on the evaluation of the Success for All program (G. D. Borman et al., 2007) and Sample Study 2.1 on the evaluation of the abstinence-only programs (Trenholm et al., 2007). In addition, the international development community has adopted the practice of using experimental designs in evaluation; see the work of the Poverty Action Lab for multiple examples of this approach (http://www.povertyactionlab.org/).
Additional contributions to postpositivist approaches to evaluation can be found in the extensions of Donald Campbell’s work from the writings on theory-based evaluation (Donaldson, 2007). Theory- based evaluation is an approach in which the evaluator constructs a model of how the program works using stakeholders’ theories, available social science theory, or both to guide question formation and data gathering. Theory-based evaluation is a way to mitigate the problems encountered in a more simplistic notion of quasi-experimental design when applied in an evaluation setting. The role of the evaluator is to bring a theoretical framework from existing social science theory to the evaluation setting. This would add insights to the structure of the program and its effectiveness that might not be available to the stakeholders in the setting. They warn against uncritically accepting the stakeholders’ viewpoints as the basis for understanding the effectiveness of the program. They acknowledge that qualitative methods can be used during the program conceptualization and monitoring, but they advocate the use of randomized experiments for assessing program impact. Because theory-based evaluation is guided by a preference to use structural modeling methods, the kinds of questions formulated for the evaluation are those that fit neatly into causal modeling.
Donaldson (2007) expanded understandings about theory-driven evaluations, suggesting that the role of the evaluator is to come to a thorough understanding of the social problem, program, and implementation context (i.e., uncover the intended program theory) before deciding on the evaluation questions and methods. “Program theory-driven evaluation science is the systematic use of substantive knowledge about the phenomena under investigation and scientific methods to improve, to produce knowledge and feedback about, and to determine the merit, worth, and significance of evaluands such as social, educational, health, community, and organizational programs” (Donaldson & Lipsey, 2006, p. 67) Bledsoe and Graham’s (2005; Fitzpatrick & Bledsoe 2007) evaluation of the Fun With Books
p. 55
00000001583532 - Research and Evaluation in Education and Psychology: tative, Q
ualitative, and M ixed M
ethods 03/24/2019 - RS000000000000000 Integrating D
iversity W ith Q
uantit
3/24/2019 Research and Evaluation in Education and Psychology: Integrating Diversity With Quantitative, Qualitative, and Mixed Methods
https://ncuone.ncu.edu/d2l/le/content/122307/viewContent/1252315/View?ou=122307 9/34
67). Bledsoe and Graham s (2005; Fitzpatrick & Bledsoe, 2007) evaluation of the Fun With Books program is a multi-approach study that illustrates theory-driven, empowerment, and inclusive practices; it is summarized as Sample Study 2.2.
Thus, a theory-driven program evaluation must first develop the program theory that guides the program in its definition of a solution to a social problem. The evaluator obtains information about the program and its underlying theory by reviewing prior research, making public the implicit theories held by those closest to the program, and observing the program in operation. After the program theory has been made explicit, the evaluator can then work with the stakeholders to make decisions about evaluation questions and methods. Bledsoe and Graham (2005) synthesized the stakeholder’s program logic with social science theory related to developmental psychology and emergent literacy to make visible the underlying causal assumptions of the program.
SAMPLE STUDY 2.2 Multiple Evaluation Approaches: Theory Driven, Empowerment, Inclusive
© Con Tanasiuk/Desin Pics/Corbis
Evaluation Focus: Research on early literacy emphasizes the importance of family involvement; lack of early literacy development has been cited as one of the major contributors to academic underachievement for children from families who face challenges of poverty. Fun With Books (FWB) is a program designed to promote interactive family literacy by in-home reading of children’s literature and using related arts, crafts, and music to support the development of early literacy skills needed for success in school and later in life.
Sample Evaluation Questions: How was FWB implemented? How were cultural and socioeconomic status differences between the program staff and families addressed? What activities occurred and how were those activities tied to outcomes? Does FWB lead to an increase in (a) in-home reading between parent and child, (b) the child’s motivation to learn, (c) the child’s capacity to learn, and (d) the child’s opportunity to learn?
Methods/Design: The evaluators describe their methods as reflecting multiple approaches: theory-driven, empowerment, and inclusive evaluation approaches. They worked closely with program staff and families to ensure that the methods were appropriate to the cultural groups and responsive to the agency’s needs. The basic design was a pretest, posttest design with concentration of activity at the beginning on conceptualization of the program and emphasis on capturing process variables during its implementation.
Participants: The program was implemented in Trenton, New Jersey, a diverse community racially and ethnically (52% Black and 32% Latino, with large immigrant populations from Latin America, the Caribbean, and Eastern Europe) and an economically distressed area. Participants included staff of the project as well as family members of those in the program (10 staff members and administrators; 17 families and 31 children).
Instruments and Procedures: Evaluators used mixed methods: surveys, videotaped observations,
p. 56
00000001583532 - Research and Evaluation in Education and Psychology: tative, Q
ualitative, and M ixed M
ethods 03/24/2019 - RS000000000000000 Integrating D
iversity W ith Q
uantit
3/24/2019 Research and Evaluation in Education and Psychology: Integrating Diversity With Quantitative, Qualitative, and Mixed Methods
https://ncuone.ncu.edu/d2l/le/content/122307/viewContent/1252315/View?ou=122307 10/34
interviews, and focus groups, as well as document reviews of the organization’s history and practices. They also used established reliable measures that had been developed for Head Start programs, such as a child social awareness assessment.
Results/Discussion: FWB did increase parents reading to their children; it improved access to books and provided time and encouragement for reading. The evaluators also had several recommendations for improving the program related to discussing the effects of cultural and socioeconomic differences between volunteers and the families, reaching out to current and former participants to benefit from their experiences, extending the length of the program, and including formal testing of outcomes.
SOURCE: Bledsoe and Graham (2005).
Constructivist Paradigm: Values Branch Theorists such as Guba and Lincoln (1989), Patton (2002), Stake (2006), and House and Howe (1999) were influential in bringing the constructivist paradigm, along with qualitative methods, into evaluation. Stake’s work in responsive evaluation led the way to the introduction of the constructivist paradigm to the evaluation community. Guba and Lincoln (1989) acknowledge the foundation provided by Stake’s (1983) work in responsive evaluation in their development of what they termed fourth- generation evaluation. They conceptualized the four generations this way:
First generation: Measurement—testing of students Second generation: Description—objectives and tests Third generation: Judgment—the decision-based models Fourth generation: Constructivist, heuristic evaluation
Guba and Lincoln (1989) depict the stages of evaluation as generations, with the more recent replacing those that came earlier. It is my perception that the evaluation community does not generally share this depiction, in that many of the methods and models that were developed in evaluation’s earlier days continue to have influence on theory and practice.
Stake (2006) combined some of the elements of a utilization focus and discrepancy evaluation in his model of responsive evaluation. He includes the idea that evaluation involves comparing an observed value with some standard. The standard is to be defined by the expectations and criteria of different people for the program, and the observed values are to be based on those values actually held by the program. The evaluator’s job is to make a comprehensive statement of what the observed program values are with useful references to the dissatisfaction and satisfaction of appropriately selected people. He extended his work in the direction of case study methodology, thus further strengthening the place of qualitative methods in evaluation inquiry.
Currently, the evaluation community seems to have reached a certain comfort level with a pluralistic approach to methodology. Qualitative methods for research with applicability to evaluation are discussed in Chapter 8.
Transformative Paradigm: Social Justice Branch Although constructivist qualitative evaluators recognize the importance that values play in the inquiry process, this paradigm does not justify any particular set of values. Many years ago, House and Howe (1999) raised the question of what social justice and fairness mean in program evaluation. Within the evaluation community, some evaluators have shifted the focus to prioritize social justice and fairness within evaluation, with the consequent opening of the door to the transformative paradigm of social inquiry for evaluators (Greene, 2000; Mertens, 2009). Transformative approaches to evaluation parallel many of the transformative approaches in research discussed in Chapter 1 and elsewhere in this text; hence, I offer a brief listing of these approaches here as they have been developed in the evaluation community.
• Mertens (2003) developed an approach based on the transformative paradigm called inclusive
p. 57
00000001583532 - Research and Evaluation in Education and Psychology: tative, Q
ualitative, and M ixed M
ethods 03/24/2019 - RS000000000000000 Integrating D
iversity W ith Q
uantit
3/24/2019 Research and Evaluation in Education and Psychology: Integrating Diversity With Quantitative, Qualitative, and Mixed Methods
https://ncuone.ncu.edu/d2l/le/content/122307/viewContent/1252315/View?ou=122307 11/34
( ) p pp p g evaluation, in which community members who would be affected by the results of the evaluation would be included in the methodological decisions. There is a deliberate attempt to include groups that have historically experienced oppression and discrimination on the basis of gender, culture, economic levels, ethnicities/race, sexual orientation, and disabilities, with a conscious effort to build a link between the results of the evaluation and social action. To this end, the inclusive evaluator attempts to redress power imbalances in society by involving all relevant stakeholders in a way that is authentic and that accurately represents the stakeholders’ viewpoints. Inclusive evaluators are cognizant of issues of social justice that impact the definition of social problems. For example, they are aware that deficit models can place the blame for social problems on individuals or their culture rather than on the societal response to the individual or cultural group. The inclusive aspects of the Bledsoe and Graham (2005) study (Sample Study 2.2) exemplify this concern for appropriate identification of the complexity of cultural groups and inclusion of these groups throughout the process of the evaluation.
Inclusive evaluation involves a systematic investigation of the merit or worth of a program or system for the purpose of reducing uncertainty in decision making and to facilitate positive social change for the least advantaged. Thus, inclusive evaluation is data based, but the data are generated from an inclusive list of stakeholders, with special efforts to include those who have been traditionally underrepresented. It does not exclude those who have been traditionally included in evaluations. Mertens (2009; Mertens & Wilson, 2012) expanded the notion of inclusive evaluation to encompass transformative evaluation approaches, which is a broader philosophical framework for this type of evaluation.
• Deliberative democratic evaluation emerged from the early work of Hilliard (1989) and House and Howe (1999), among others. House and Howe identified three foundational conditions of deliberation about the results with relevant parties, inclusion of all relevant interests, and dialogue so that the interests of various stakeholders can be accurately ascertained. These form the basis for a democratic deliberative evaluation with the capacity to equalize power relations in making evaluative judgments. A deliberative evaluation involves reflective reasoning about relevant issues, including preferences and values. An inclusive evaluation involves including all relevant interests, stakeholders, and other citizens in the process. A dialogical evaluation process entails stakeholders and evaluators engaging in dialogues allowing stakeholders’ interests, opinions, and ideas to be portrayed more completely (see also Greene, 2006; Howe & MacGillivary, 2009).
• D. Fetterman and Wandersman (2007) developed an approach called empowerment evaluation, defined as “the use of evaluation concepts, techniques, and findings to foster improvement and self- determination” (p. 186). D. Fetterman and Wandersman acknowledge that not all empowerment evaluations are meant to be transformative. Empowerment evaluation can be practical or transformative, much the same as Whitmore (1998) describes these two perspectives for participatory evaluation. Practical empowerment evaluation focuses on program decision making and problem solving. Transformative empowerment evaluation focuses on psychological transformation, as well as political transformation. An underlying principle of this approach is that program participants conduct their own evaluations with an outside evaluator who often serves as a coach or facilitator depending on internal program capacities. “Empowerment evaluation is an approach that aims to increase the probability of achieving program success by (1) providing program stakeholders with tools for assessing the planning, implementation, and self-evaluation of their program and (2) mainstreaming evaluation as part of the planning and management of the program/organization” (Wandersman et al., 2005, p. 28). Empowerment evaluation has been criticized on a number of bases, and to Fetterman’s credit, he posts the criticisms and his responses to them on the empowerment evaluation website (http://www.davidfetterman.com/empowermentevaluation.htm). Criticisms have come from evaluators such as Scriven (2005), who asserts that empowerment evaluation is not really evaluation. N. L. Smith (2007) argues that empowerment evaluation is not an evaluation approach; rather, it is an ideology. R. L. Miller and Campbell (2007) reviewed studies conducted using empowerment evaluation since 2005 to determine its effects and concluded that there is “little warrant for claims that the specific empowerment evaluation process used provides, of itself, any additional positive benefits beyond the general benefits of having conducted an evaluation” (p. 580). Finally, empowerment evaluation, as conceptualized by D Fetterman and Wandersman (2007) does not explicitly address the issues of
p. 59
p. 58
00000001583532 - Research and Evaluation in Education and Psychology: tative, Q
ualitative, and M ixed M
ethods 03/24/2019 - RS000000000000000 Integrating D
iversity W ith Q
uantit
3/24/2019 Research and Evaluation in Education and Psychology: Integrating Diversity With Quantitative, Qualitative, and Mixed Methods
https://ncuone.ncu.edu/d2l/le/content/122307/viewContent/1252315/View?ou=122307 12/34
conceptualized by D. Fetterman and Wandersman (2007), does not explicitly address the issues of power related to sexism, racism, or oppression of people with disabilities (Whitmore, 1998).
• Developmental evaluation evolved from the work of Stockdill, Duhon-Sells, Olsen, and Patton (1992) and Patton (2011), in which they explored ways to actively involve people of color in the evaluation process of a multicultural education project. Stockdill et al.’s (1992) reflections capture the impetus for the emergence of the developmental evaluation model:
My training taught me that carefully developed data-collection instruments could obtain the needed information validly and reliably. Now I am learning that my white female culture got in the way, and that I was isolated from critical pieces of information. For example, it was only through visits to Cambodian families’ homes that the Cambodian staff member could determine their concerns about their children’s education. It was only through telephone calls by African American parents to other African American parents that we learned of the racism experienced by their children. It was only because of the cultural sensitivity of Saint Paul American Indians in Unity that eighteen of twenty Indian parents attended a meeting and shared the depth of their frustration about the failure of schools to educate their children. (p. 28)
Developmental evaluation is defined as evaluation that “supports innovative development to guide adaptation to emergent and dynamic realities in complex environments” (Patton, 2011, p. 1). Patton recognizes that this conceptualization of evaluation might be problematic for others in the evaluation community who view evaluation as a process of rendering judgment about whether or not a program’s goals have been met. He argues that this is one option for evaluators who wish to be valuable partners in the design process of programs for which the goals are emergent and changing and for which the purpose is learning, innovation, and change.
• Feminist evaluation includes judgments of merit and worth, application of social science methods to determine effectiveness, and achievement of program goals as well as tools related to social justice for the oppressed, especially, although not exclusively, women. Its central focus is on gender inequities that lead to social injustice. It uses a collaborative, inclusive process and captures multiple perspectives to bring about social change. As noted in Chapter 1, there is no single definition of feminism, nor a single definition of feminist evaluation. However, there are principles that have been identified across studies that claim to have a feminist orientation that are listed in Chapter 1 of this text. A summary of Boddy’s (2009) feminist evaluation is presented as Sample Study 1.4.
• Transformative participatory evaluation (akin to participatory action research, described in Chapter 8) requires that the investigator be explicitly concerned with gender, class, race, ethnicity, sexual orientation, and different abilities and the consequent meaning of these characteristics for access to power. Participation can be built into a project in ways that reflect cultural competency (see commentary in Box 2.3). Consoli, Casas, Cabrera, and Prado’s (2012) study of a program to prevent suicides of Latino youth is an example of a transformative participatory mixed methods evaluation. While participatory evaluation means that evaluation is a participatory process that involves the stakeholders in the various tasks of the evaluation so that the results are fully comprehensible to project participants (Cousins & Earl, 1995), not all participatory evaluations are transformative. The question of who is invited to participate in the process and the nature of their participation determines the extent to which participatory evaluation exemplifies the principles of the transformative paradigm. For example, Cousins and Earl explicitly acknowledge that participatory evaluation, as they conceptualize it, does not have as a goal the empowerment of individuals or groups or the rectification of societal inequities. Rather, they seek to enhance the use of evaluation data for practical problem solving within the contemporary organizational context.
BOX 2.3 Participatory Methods: What Approaches Reflect a Culturally Competent Evaluator With Different Groups?
Depth of understanding culture is not necessarily missing from participatory approaches, but it’s not
p. 60
00000001583532 - Research and Evaluation in Education and Psychology: tative, Q
ualitative, and M ixed M
ethods 03/24/2019 - RS000000000000000 Integrating D
iversity W ith Q
uantit
3/24/2019 Research and Evaluation in Education and Psychology: Integrating Diversity With Quantitative, Qualitative, and Mixed Methods
https://ncuone.ncu.edu/d2l/le/content/122307/viewContent/1252315/View?ou=122307 13/34
necessarily there. Participatory [evaluation] is more amenable to the manifestation of cultural competence in an evaluation setting, but I think that in and of itself it doesn’t qualify. When people use the words “participatory” and “empowerment,” not a lot of people get below the surface of talking about, well, what does that mean in terms of having a true understanding of the group you are working with? What does it mean in terms of appropriate ways that are considered to be culturally comfortable to establishing the questions for collection of data? What are the variations within that group? I do think there is danger in thinking I can take a strategy and just walk into a group and use a cookbook approach on how to do participatory or how to do empowerment without thinking about what does it mean that I am the researcher or evaluator in this process? There needs to be that sensitivity to the way you interact and how you engage people in a respectful way.
—Donna Mertens, quoted in Edno, Joh, and Yu (2003)
Nevertheless, the basic processes involved in the conduct of participatory evaluation provide a first step toward a transformative perspective in evaluation in that the professional evaluator works as a facilitator of the evaluation process but shares control and involvement in all phases of the research act with practitioners. In participatory evaluation, the evaluator helps train key organizational personnel in the technical skills vital to the successful completion of the evaluation project. These key organizational personnel—often administrators, counselors, or teachers—are taught sufficient technical knowledge and research skills to enable them to take on the coordinating role of continuing and new projects, with consultation with a professional evaluator as necessary.
The project participants are co-planners of the evaluation who complete the following tasks:
1. Discuss the evaluation questions that need to be addressed to determine if the project is making progress
2. Help define and identify sources of data required to answer evaluation questions 3. Are involved in the data collection and analysis and in report preparation
The main goal of participatory evaluation is to provide information for project decision makers and participants who will monitor the progress of, or improve, their project.
• Hood, Hopson, Obeidat, and Frierson (in press) developed the culturally responsive evaluation approach based on their experiences with evaluation in African American communities. They place priority on understanding the cultural and historical context in which programs are situated, as well as critiquing perspectives of community members that are based on deficit thinking. The culturally responsive approach to evaluation is well illustrated in the approach used to evaluate the talent development model of school reform (Thomas, 2004). These evaluators recognize the importance of matching salient characteristics of the evaluation team with the participants, establishing trusting relationships, and contributing to the overall progress of the educational enterprise in an ongoing manner (see also Mertens, 2009).
• Postcolonial and Indigenous approaches were developed by members of postcolonial and Indigenous communities, reflecting the cultural roots of American Indians (LaFrance & Crazy Bull, 2009), the Maori in New Zealand (Aotearoa) (Cram, 2009; L. T. Smith, 2012), and Africans (Chilisa, 2012). Evaluation protocols focus both inward and outward; they specify how Indigenous and non- Indigenous persons need to proceed when working with Indigenous groups, as well as guide the way the evaluation work is done based on the values and traditions of the community. Human relations based on trust and respect are considered to be paramount, especially given the history of violence against Indigenous peoples.
p. 61
00000001583532 - Research and Evaluation in Education and Psychology: tative, Q
ualitative, and M ixed M
ethods 03/24/2019 - RS000000000000000 Integrating D
iversity W ith Q
uantit
3/24/2019 Research and Evaluation in Education and Psychology: Integrating Diversity With Quantitative, Qualitative, and Mixed Methods
https://ncuone.ncu.edu/d2l/le/content/122307/viewContent/1252315/View?ou=122307 14/34
© Darren Greenwood/Design Pics/Corbis
Proponents of a transformative approach to evaluation argue that working within this paradigm can lead to more appropriate interventions and more judicious distribution of resources. The transformative paradigm in evaluation follows the same principles outlined in Chapter 1 for transformative research. Within the field of evaluation, these approaches explicitly address issues of power and representativeness of groups that have been traditionally pushed to the margins. Using the transformative paradigm as a base, the evaluator views each step in the evaluation process as an opportunity to raise questions about social justice, challenge the status quo, and bring in the voices of those who have been marginalized or inaccurately represented in previous evaluation studies.
Pragmatic Paradigm: Use Branch When evaluators discovered that using “objective social science methods” was not sufficient to ensure that their work would have an effect on public policy and social program decisions, they shifted their focus to more decision-based models of evaluation. As evaluators gained experience with trying to improve social programs, other models of evaluation were developed that tried to address some of the shortcomings of traditional educational assessment or experimental designs. This branch of the evaluation tree roughly parallels Alkin’s (2013) use branch; the focus is on making the information from the evaluation useful to stakeholders. Stufflebeam (1983) was instrumental in extending the definition of evaluation beyond the achievement of objectives to include the idea that it was a process of providing information for decision making. From his efforts with the Ohio State Evaluation Center and under the auspices of Phi Delta Kappa, Stufflebeam worked with other pioneers in the field of evaluation (including Egon Guba) to develop the CIPP (context, input, process, product) model of evaluation. Thus, the CIPP model tries to incorporate many aspects of the program that were not considered under earlier models. The components of the model can be explained by the nature of the evaluation questions asked for each component (see Table 2.1).
The context and input phases represent a needs assessment function that evaluation sometimes plays to determine what is needed in terms of goals and resources for a program. For more extensive information about needs assessment, refer to Altschuld and Kumar’s (2010) work on this topic.
Table 2.1 CIPP Evaluation Questions
p. 62
00000001583532 - Research and Evaluation in Education and Psychology: tative, Q
ualitative, and M ixed M
ethods 03/24/2019 - RS000000000000000 Integrating D
iversity W ith Q
uantit
3/24/2019 Research and Evaluation in Education and Psychology: Integrating Diversity With Quantitative, Qualitative, and Mixed Methods
https://ncuone.ncu.edu/d2l/le/content/122307/viewContent/1252315/View?ou=122307 15/34
Other major approaches that fit on the use branch include evaluations that focus on organizational culture (Preskill & Torres, 1999), Patton’s utilization-focused evaluation, and the international development model of evaluation called Real World Evaluation (Bamberger, Rugh, & Mabry, 2006).
• Preskill and Torres (1999) provide this definition situated in organizational culture: “We envision evaluative inquiry as an ongoing process for investigating and understanding critical organization issues. It is an approach to learning that is fully integrated with an organization’s work practices, and as such, it engenders (a) organization members’ interest and ability in exploring critical issues using evaluation logic, (b) organization members’ involvement in evaluative processes, and (c) the personal and professional growth of individuals within the organization” (pp. 1–2). Preskill (2008) expands this approach to evaluation by discussing methods for engaging organizational management in the cultivation of a culture that seeks and uses information for improvement. She emphasizes the need to build capacity in organizations to participate in and learn from evaluations through such mechanisms as training, technical assistance, written materials, establishing communities of practice, providing internships and apprenticeships, coaching and mentoring, engaging in appreciative inquiry activities, and using technology.
• Patton (2008) developed an approach known as utilization-focused program evaluation (UFE), defined as “evaluation done for and with specific intended primary users for specific, intended uses” (p. 37). Patton first published the UFE book in 1978, and this approach has been widely used. He clearly states that utilization should be considered from the very beginning of an evaluation study and that the quality of the evaluation is dependent on the actual use made of its findings. This was considered to be a radical idea in 1978; however, it has been accepted as common sense by many in the evaluation community today. According to Patton, the evaluator has a responsibility to facilitate the planning and implementation of evaluations to enhance the use of the findings. This of course necessitates the identification of the intended users (stakeholders) with whom the evaluator negotiates the type of information that the client would find useful.
• Real World Evaluation is an approach that emerged from the world of international development evaluation in which, for better or worse, funding agencies often impose severe constraints in terms of time and money (e.g., an evaluator might be brought into a country for a short period of time to conduct an evaluation) (Bamberger et al., 2006). Bamberger et al. offer strategies to adapt evaluation designs to meet the constraints imposed by the funding agency while still trying to be responsive to the cultural complexities in the evaluation context. For example, they suggest that a local consultant can be engaged to collect background data and conduct exploratory studies prior to the arrival of the outside evaluator. This might include preparation of reports on the social and economic characteristics of the targeted community, describing key features of the program to be evaluated, and a list of potential key informants and potential participants in focus groups. Other design features are similar to those discussed in the methodological chapters of this text.
EXTENDING YOUR THINKING
Evaluation Approaches Review the historical and current models of evaluation presented in this chapter. Select four models (one from each paradigm, perhaps). For each model, determine the theorist’s viewpoint regarding the following:
a. The purpose(s) of evaluation
b. The role of the evaluator in making valuing judgments
c. The role of the evaluator in making causal claims
d. The role of the evaluator in accommodating to the political setting
p. 64
p. 63
00000001583532 - Research and Evaluation in Education and Psychology: tative, Q
ualitative, and M ixed M
ethods 03/24/2019 - RS000000000000000 Integrating D
iversity W ith Q
uantit
3/24/2019 Research and Evaluation in Education and Psychology: Integrating Diversity With Quantitative, Qualitative, and Mixed Methods
https://ncuone.ncu.edu/d2l/le/content/122307/viewContent/1252315/View?ou=122307 16/34
g p g
e. The role of the evaluator in providing information for decision making
f. The perception of the theorist about who is the primary audience for the evaluation
g. The perception of the theorist about the appropriate role of stakeholders in the evaluation
h. The perception of the theorist about the role of other parties affected by or interested in the evaluation
i. The most appropriate way to train an evaluator based on that theoretical perspective
Resources and Processes for Conducting Evaluations A general outline for steps in conducting evaluations is presented below. However, for the student who is seriously considering conducting an evaluation study, the following list of resources is provided.
Evaluation Resources
Books and Monographs Donaldson, S. I. (2007). Program theory-driven evaluation science. Mahwah, NJ: Lawrence Erlbaum. Fetterman, D. M., & Wandersman, A. (2004). Empowerment evaluation principles in practice. New York, NY:
Guilford. Fitzpatrick, J. L., Sanders, J. R., & Worthen, B. R. (2011). Program evaluation (4th ed.). Boston, MA: Pearson. Hood, S., Hopson, R. K., & Frierson, H. T. (Eds.). (2005). The role of culture and cultural context: A mandate for
inclusion, the discovery of truth and understanding in evaluative theory and practice. Charlotte, NC: Information Age.
Mathison, S. (Ed.). (2005). Encyclopedia of evaluation. Thousand Oaks, CA: Sage. Mertens, D. M. (2009). Transformative research and evaluation. New York, NY: Guilford. Mertens, D. M., & McLaughlin, J. (2004). Research and evaluation methods in special education. Thousand Oaks,
CA: Corwin. Mertens, D. M., & Wilson, A. T. (2012). Program evaluation theory and practice: A comprehensive guide. New
York, NY: Guilford. Patton, M. Q. (2008). Utilization-focused evaluation (4th ed.). Thousand Oaks, CA: Sage. Seigart, D., & Brisolara, S. (Eds.). (2002). Feminist evaluation: Explorations and experiences (New Directions for
Evaluation, No. 96). San Francisco, CA: Jossey-Bass. Shaw, I. F., Greene, J. C., & Mark, M. M. (Eds.). (2006). The SAGE handbook of evaluation. Thousand Oaks, CA:
Sage. Stake, R. E. (2005). Multiple case study analysis. New York, NY: Guilford. Stufflebeam, D. L., & Shinkfield, A. J. (2007). Evaluation theory, models, and applications. San Francisco, CA:
Jossey-Bass. Thompson-Robinson, M., Hopson, R., & SenGupta, S. (Eds.). (2004). In search of cultural competence in
evaluation: Toward principles and practice (New Directions for Evaluation, No. 102). San Francisco, CA: Jossey-Bass.
Online Resources
• The website of the American Evaluation Association (AEA; www.eval.org) includes many links to evaluation resources that are available online.
• The W. K. Kellogg Foundation Evaluation Handbook (W. K. Kellogg Foundation, 1998) outlines a blueprint for designing and conducting evaluations, either independently or with the support of an external evaluator/consultant; W. K. Kellogg Foundation (2004) also publishes the W. K. Kellogg logic model development guide, also available on its website.
• The National Science Foundation (www.nsf.gov) offers many resources that discuss designing and conducting evaluations that integrate quantitative and qualitative techniques and address issues of
p. 65
00000001583532 - Research and Evaluation in Education and Psychology: tative, Q
ualitative, and M ixed M
ethods 03/24/2019 - RS000000000000000 Integrating D
iversity W ith Q
uantit
3/24/2019 Research and Evaluation in Education and Psychology: Integrating Diversity With Quantitative, Qualitative, and Mixed Methods
https://ncuone.ncu.edu/d2l/le/content/122307/viewContent/1252315/View?ou=122307 17/34
including members of underrepresented groups on the basis of race/ethnicity, gender, and disability.
• William M. Trochim’s (2006) The Research Methods Knowledge Base is available at http://www.socialresearchmethods.net/kb/. Trochim developed an online course in research and evaluation methods that is accessible at this website; it contains many links to other resources relevant for evaluators.
• The U.S. Government Accountability Office (GAO) (www.gao.gov) has many online resources that discuss evaluation synthesis, designing evaluations, case study evaluation, and prospective evaluation methods.
• The Evaluation Center at Western Michigan University (http://www.wmich.edu/evalctr/) maintains a website devoted to evaluation resources. One of the resources is a database of checklists for various approaches to evaluation, such as Utilization-Focused Evaluation (http://www.wmich.edu/evalctr/checklists/)
• Free Resources for Program Evaluation and Social Research Methods (http://gsociology.icaap.org/methods/) offers links to free resources. The focus is on “how to” do program evaluation and social research: surveys, focus groups, sampling, interviews, and other methods. There are also links to sites with information about how to do statistics, how to present data, and links to sites with subject areas evaluators and social researchers need to know, such as organizations and social psychology. This site also links to free online training classes and videos about research methods and evaluation. Finally, this site has the basic guides to evaluation.
• The Evaluation Portal (http://www.evaluation.lars-balzer.name/) is Lars Balzer’s website, and it also links to free online resources. The Evaluation Portal also lists associations and societies, businesses, institutions, and communication sites, such as blogs and e-mail lists. The Evaluation Portal also has a calendar listing evaluation-related events.
Evaluation Journals
American Journal of Evaluation Educational Evaluation and Policy Analysis Evaluation and Program Planning Evaluation & the Health Professions Evaluation Review Journal of Mixed Methods Research Journal of MultiDisciplinary Evaluation New Directions for Evaluation Studies in Educational Evaluation
Professional Associations The American Evaluation Association (AEA) is the primary professional organization for practicing evaluators in the United States and has a strong representation of evaluators from other countries. Other countries and nations also have professional organizations, such as the Canadian Evaluation Society, the African Evaluation Association (AfrEA), and the Australasian Evaluation Society. The AEA publishes two journals, the American Journal of Evaluation and New Directions for Evaluation.
The International Organisation for Cooperation in Evaluation (IOCE) was established in 2003 through a worldwide collaboration of evaluation organization representatives (Mertens, 2005). The IOCE maintains a website that provides links to many evaluation organizations around the world (www.ioce.net). It now lists over 100 different international, regional, and national evaluation organizations up from five organizations in 1995
p. 66
00000001583532 - Research and Evaluation in Education and Psychology: tative, Q
ualitative, and M ixed M
ethods 03/24/2019 - RS000000000000000 Integrating D
iversity W ith Q
uantit
3/24/2019 Research and Evaluation in Education and Psychology: Integrating Diversity With Quantitative, Qualitative, and Mixed Methods
https://ncuone.ncu.edu/d2l/le/content/122307/viewContent/1252315/View?ou=122307 18/34
organizations, up from five organizations in 1995.
Steps in Planning an Evaluation In some respects, the steps for conducting an evaluation parallel those used to conduct any research project. However, variations occur in terms of the process of inquiry because this is an evaluation study and because of such factors as the status of the program being evaluated and the model of evaluation chosen for use. The steps are listed in Box 2.4 and are further explained in the next section.
What follows is a general description of the steps for focusing and planning an evaluation. I have integrated ideas from all four paradigms. Before launching into a discussion of the focusing stage, I want to acknowledge that all evaluators would not necessarily move through the process in exactly the same way. Nevertheless, if these steps are viewed as a nonlinear, iterative framework for planning an evaluation, they should provide a helpful guide to that end for all evaluators, no matter what their orientation. Furthermore, my remarks are probably biased toward the perspective of an external evaluator, because that has been my perspective for the last 40 years or so; however, I did work as an internal evaluator for about 8 years prior to assuming the external status. I attempt to write information that would be pertinent and helpful to both internal and external evaluators.
BOX 2.4 Steps for Planning an Evaluation Study
Focusing the Evaluation
• Description of what is to be evaluated
• The purpose of the evaluation
• The stakeholders in the evaluation
• Constraints and opportunities associated with the evaluation
• The evaluation questions
• Selection of an evaluation model
Planning the Evaluation
• Data collection specification, analysis, interpretation, and use strategies
• Management of the evaluation
• Meta-evaluation plans
Implementing the Evaluation
• Completing the scope of work specified in the plan
Focusing Stage During the focusing stage, the evaluator needs to determine what is being evaluated, the purpose of the evaluation, the stakeholders in the evaluation, and the constraints within which the evaluation will take place. In one sense, the evaluator is stuck with a “which came first, the chicken or the egg” dilemma, even in the first stage of the evaluation, in that just learning about what is to be evaluated implies contact with at least one group of stakeholders. Typically, the evaluator is contacted to perform an evaluation by some individual or agency representing one group of stakeholders in the program. Often, this first contact is initiated by a program director, policymaker, or funding group. When listening to the description of the evaluand, purpose, stakeholders, and constraints within this initial context, the evaluator can gain valuable information by asking the right kinds of questions. These questions can provide a sufficient knowledge base to direct further planning efforts and to alert the initial contact
p. 67
00000001583532 - Research and Evaluation in Education and Psychology: tative, Q
ualitative, and M ixed M
ethods 03/24/2019 - RS000000000000000 Integrating D
iversity W ith Q
uantit
3/24/2019 Research and Evaluation in Education and Psychology: Integrating Diversity With Quantitative, Qualitative, and Mixed Methods
https://ncuone.ncu.edu/d2l/le/content/122307/viewContent/1252315/View?ou=122307 19/34
person of things that might not have been thought of, such as theoretical perspectives or groups that need to be included because they will be affected by the program (Mertens, 2009; Mertens & Wilson, 2012).
Description of the Evaluand. The evaluand, you will recall, is what is being evaluated (e.g., a substance abuse prevention program, a multicultural education program, or a state agency policy). The evaluator needs to determine the status of the evaluand: Is it a developing, new, or firmly established program? If it is developing, it is possible that the evaluator will be asked to play a role in the evolution of the evaluand. Thus, an evaluator who is using the developmental evaluation model (Patton, 2011) might find that a considerable amount of time is spent collaboratively designing the evaluand. If it is new or firmly established, the program is probably described in printed documents. It is always a good idea to ask for whatever printed documents are available about the program in advance of your meeting with the initial stakeholders, if possible. Reading documents such as an annual report, accreditation agency report, previous evaluation reports, or a proposal can give you a good background about the evaluand and the context within which it functions. Questions to start with include the following:
• Is there a written description of what is to be evaluated? • What is the status of the evaluand? Relatively stable and mature? New? Developing? How long
has the program been around? • In what context will (or does) the evaluand function? • Who is the evaluand designed to serve? • How does the evaluand work? Or how is it supposed to work? • What is the evaluand supposed to do? • What resources are being put into the evaluand (e.g., financial, time, staff, materials, etc.)? • What are the processes that make up the evaluand? • What outputs are expected? Or occur? • Why is it being evaluated? • Whose description of the evaluand is available to you at the start of the evaluation? • Whose description of the evaluand is needed to get a full understanding of the program to be
evaluated?
You should read available documents and hold initial discussions with the conscious realization that things are not always being played out exactly as they are portrayed on paper or in conversation. Therefore, it would behoove you to alert the client to the fact that you will want to observe the program in action (e.g., in its development, implementation, etc.) to get a more accurate picture of what is being evaluated. You should also let the client know that you are aware that programs are generally not static (i.e., you will expect changes in the program throughout the duration of the evaluation) and that multisite programs will probably not be implemented in exactly the same way from site to site.
You should be aware that different individuals with different relationships with the program may view the program quite differently. Explain to your client that this often happens in evaluations and that you want to build in a mechanism to discover diverse views about the program and to explore these diversities within a public context. In this way, you can increase the chances that the program you think you are evaluating is the one that is actually functioning or developing. Without this assurance, you may find at the end of the evaluation that your results are challenged because you did not adequately represent different perceptions of the program.
Part of the description of the evaluand should include ascertaining diverse views of the program’s purpose. For example, the evaluator can help program staff members, advisory committees, and partnership teams make “claims” statements regarding the program—that is, statements that represent the participants’ perceptions of what would change as a result of the program activities. The evaluator leads the meeting by asking the stakeholders to comment on what they are committed to changing— p. 69
p. 68
00000001583532 - Research and Evaluation in Education and Psychology: tative, Q
ualitative, and M ixed M
ethods 03/24/2019 - RS000000000000000 Integrating D
iversity W ith Q
uantit
3/24/2019 Research and Evaluation in Education and Psychology: Integrating Diversity With Quantitative, Qualitative, and Mixed Methods
https://ncuone.ncu.edu/d2l/le/content/122307/viewContent/1252315/View?ou=122307 20/34
and asks them to voice their real expectations for change. In this way, the evaluator hopes to increase the ownership by those responsible for the program activities for the intended changes. The following is an example of a claims statement that was generated by the group in the Stockdill et al. (1992) study of a multicultural education program:
School staff interact with students, families, and communities with cultural awareness, warmth, and sensitivity—on the phone, in person, and in written communication. They use proper pronunciation of names, arrange children’s seating to provide for integrated classroom environments, avoid racial jokes and innuendo, and confront prejudice when expressed by children or adults. (p. 30)
A common tool that evaluators use to depict the evaluand is called a logic model (W. K. Kellogg, 2004). A logic model is a graphic technique that allows the explicit depiction of the theory of change that underlies a program. It should reflect a model of how the program will work under certain conditions to solve an identified problem. Logic models can be quite simple, including three major components: the program that is delivered, the people who receive the program, and the results that are expected. Arrows connect the various elements of the logic model to indicate the logical flow from one element to the next. A good resource for developing logic models is the W. K. Kellogg Foundation Logic Model Development Guide (W. K. Kellogg, 2004). Evaluators in international development also use graphics similar to logic models, but they call them logframes—a type of table-based logic model (Davies, 2004). Many evaluators find logic models or logframes useful; however, they also acknowledge the limitations in trying to depict a complex, dynamic program in terms of a linear, two- dimensional graphic. A sample logic model based on a program to improve early literacy skills for deaf youngsters can be found in Figure 2.2.
Purpose of the Evaluation. Evaluations can be conducted for multiple purposes. As you discuss the evaluand, the purpose of the evaluation may begin to emerge. However, it is important to directly address this issue, within the context of what stimulated the need for the evaluation and the intended use of the evaluation results. The purpose of the evaluation needs to be distinguished from the purpose of the program. The sample claims statement presented above exemplifies a partial statement of purpose for the program.
00000001583532 - Research and Evaluation in Education and Psychology: tative, Q
ualitative, and M ixed M
ethods 03/24/2019 - RS000000000000000 Integrating D
iversity W ith Q
uantit
3/24/2019 Research and Evaluation in Education and Psychology: Integrating Diversity With Quantitative, Qualitative, and Mixed Methods
https://ncuone.ncu.edu/d2l/le/content/122307/viewContent/1252315/View?ou=122307 21/34
Formative evaluations are conducted during the operation of a program to provide information useful in improving the program. Summative evaluations are conducted at the end of a program to provide judgments about the program’s worth or merit. Developmental evaluations are typically designed to provide systematic data in a fluid, dynamic, changing context of program development. Although summative evaluations tend to focus more on program impact, formative and developmental evaluations can include program impact data that is viewed as a barometer for program changes.
Patton (2008; Stockdill et al., 1992) also distinguished a purpose for evaluation in terms of work that might occur prior to either a formative or summative evaluation. Formative evaluation typically assumes that ultimate goals are known and that the issue is how best to reach them. By contrast, developmental evaluation is preformative in the sense that it is part of the process of developing goals and implementation strategies. Developmental evaluation brings the logic and tools of evaluation into the early stages of community, organization, and program development.
Questions can be asked to determine the purposes of the evaluation; the questions should be asked with an understanding of diversity of viewpoints and the necessity of representation of appropriate people in the discussions. Possible questions include these:
• What is the purpose of the evaluation? • What events triggered the need for the evaluation? • What is the intended use of the evaluation findings? • Who will use the findings and for what purpose? • Whose views should be represented in the statement of evaluation purpose?
Mertens and Wilson (2012) identify a number of possible purposes for evaluations:
• To gain insights or to determine necessary inputs before a program is designed, implemented, or revised (context evaluation, needs assessment, organizational assessment)
• To find areas in need of improvement or to change practices in terms of program implementation (process evaluation)
• To assess program effectiveness (impact evaluation) • To address issues of human rights (transformative approaches)
Evaluators should be on the alert for evaluations conducted without a clear commitment to use the results, for which decisions have already been made and the decision maker is looking for justification or cases in which the evaluation is looked at only as a public relations activity (Patton, 2008). Of course decision makers are unlikely to characterize the purpose of the evaluation in such blunt terms
p. 71
p. 70
00000001583532 - Research and Evaluation in Education and Psychology: tative, Q
ualitative, and M ixed M
ethods 03/24/2019 - RS000000000000000 Integrating D
iversity W ith Q
uantit
3/24/2019 Research and Evaluation in Education and Psychology: Integrating Diversity With Quantitative, Qualitative, and Mixed Methods
https://ncuone.ncu.edu/d2l/le/content/122307/viewContent/1252315/View?ou=122307 22/34
course, decision makers are unlikely to characterize the purpose of the evaluation in such blunt terms, so the evaluator must be aware of establishing a clear purpose with intentions for use prior to commencing on the evaluation proper.
The purpose of the evaluation is also something that may change as the project evolves. For example, Wallerstein and Martinez (1994) developed an empowerment evaluation model to evaluate an adolescent substance abuse prevention program. In the initial stages of the evaluation, they viewed the purpose as documenting the processes and conditions necessary for a Freirean program to promote community change. (A Freirean program is one based on the teachings of Paulo Freire, 1970, in which empowerment is viewed as establishing a connection with others to gain control over your own life in the context of improving the life of the community.) As the evaluation continued, the purpose shifted to focusing on analyzing how individual actors engaged in the larger change process. The areas of change included three that were predicted from the Freirean theory:
1. Changes in the youths’ abilities to engage in dialogue 2. Changes in their critical-thinking ability to perceive the social nature of the problems and their
personal links to society 3. Changes in the youths’ level of actions to promote changes
The evaluators also found that two other areas of change emerged from the study that had not been predicted from the theory:
4. Changes in the youths’ emotions related to connectedness with others and self-disclosure 5. Changes in self-identity
Identification of Stakeholders. Of course, by now, you should know who the primary players in the program are. However, you should ask specific questions to ascertain whether all of the appropriate stakeholders have been identified. It is not uncommon in evaluations to address the concerns of the funding agency, the policymakers, or the program managers. However, this does not really cover all the people who are or will be affected by the program. Therefore, it is incumbent on you as the evaluator to raise questions about who should be involved in the evaluation process and to raise the consciousness of the powerful people to include those who have less power.
Possible questions at this stage include the following:
• Who is involved in the administration and implementation of the program? • Who are the intended beneficiaries? • Who has been excluded from being eligible for participation in the program? • Who stands to gain or lose from the evaluation results? • Which individuals and groups have power in this setting? Which do not? • Did the program planning and evaluation planning involve those with marginalized lives? • Who is representative of the people with the least power? • What opportunities exist for mutual sharing between the evaluators and those without power? • What opportunities exist for the people without power to criticize the evaluation and influence
future directions? • Is there appropriate representation on the basis of gender, ethnicity, disability, and income levels? • Who controls what the evaluation is about and how it will be carried out? • Who has access to the program resources? • What are the dominant cultural values and assumptions in the program? • Who are the supporters of the program? Who are its opponents?
p. 72
00000001583532 - Research and Evaluation in Education and Psychology: tative, Q
ualitative, and M ixed M
ethods 03/24/2019 - RS000000000000000 Integrating D
iversity W ith Q
uantit
3/24/2019 Research and Evaluation in Education and Psychology: Integrating Diversity With Quantitative, Qualitative, and Mixed Methods
https://ncuone.ncu.edu/d2l/le/content/122307/viewContent/1252315/View?ou=122307 23/34
• Possible stakeholders in evaluations include the following:
Sponsors and funders Governing and advisory boards State- and federal-level agency representatives Policymakers Administrators (at various levels) Staff members (e.g., regular education teachers, special education teachers, resource personnel, psychologists)
Service recipients (clients, students, etc.) Nonrecipients (who may or may not be eligible, but are not receiving services) Parents of service recipients Grassroots organization representatives in the community Public community representatives
This list is not meant to be exhaustive but, rather, to give you an idea that the breadth of the impact of an evaluation can extend far beyond the individuals with whom you first discuss the evaluation project. Various strategies can be used to identify stakeholders. You might start with the people who invited you to do the evaluation and then ask them to nominate others. You might ask for people to become involved by some type of organizational or public announcement of the evaluation. You can often identify many stakeholders by reviewing the organizational chart of the program and asking questions of your early contacts about who is represented in the charts and who is not and why that is so.
Constraints of the Evaluation. Evaluations occur within specific constraints:
• Money: “We have budgeted this much money for the evaluation.” • Time: “We need the results of the evaluation before the board meets six months from now.” • Personnel: “The teachers will be given four hours per week release time to assist with data
collection.” • Existing resources: “Data program records are on disk, and they contain information about all
recipients and their characteristics.” • Politics: “If this program were reduced, it would free up additional funds to respond to other
needs.”
Politics are integrally involved in the evaluation process. Therefore, the evaluator must be aware at the start, and sensitive throughout the process, of who supports or opposes the program; who would gain or lose if the program was continued, modified, reduced, or eliminated; who sanctions the evaluation and who refuses to cooperate; who controls access to information; and who needs to be kept informed as the evaluation progresses. This would be a good time in the evaluation planning to bring up the issue of communication lines and mechanisms so that you are sure there is a formalized understanding of who needs to be informed of what and by what means.
Also, it is important to note that people move in and out of programs, so it is not safe to assume that the understandings engendered by your initial efforts will be shared by those who enter the program later. You will need to orient those who enter the program later about the purpose, process, and so on of the evaluation. If you do not do it yourself, you might establish a mechanism to have other knowledgeable participants take on the task of orienting newcomers.
It is usually a good idea at this point to prepare a synthesis of your current understanding of a program, in terms of the evaluand, purpose of the evaluation, stakeholders, and constraints. You can
p. 74
p. 73
00000001583532 - Research and Evaluation in Education and Psychology: tative, Q
ualitative, and M ixed M
ethods 03/24/2019 - RS000000000000000 Integrating D
iversity W ith Q
uantit
3/24/2019 Research and Evaluation in Education and Psychology: Integrating Diversity With Quantitative, Qualitative, and Mixed Methods
https://ncuone.ncu.edu/d2l/le/content/122307/viewContent/1252315/View?ou=122307 24/34
p g , , p p , , then share this synthesis with the stakeholder groups, characterize it as your current understanding, and ask for their feedback.
Evaluation Questions Evaluation questions can be derived from the statement of purpose for the evaluation, expanded on by holding brainstorming sessions with stakeholder groups, borrowed from previous evaluation studies, or generated by a theoretical framework that is relevant to the study. The GAO examined categories of evaluation questions that would provide useful information for Congress (Shipman, MacColl, Vaurio, & Chennareddy, 1995). I share these with you, as they seem to have generic relevance for evaluators in local settings as well.
Descriptive questions are those that tell what the program is and what it does:
• What activities does the program support? • Toward what end or purpose? • Who performs these activities? • How extensive and costly are the activities, and whom do they reach? • Are conditions, activities, purposes, and clients fairly similar throughout the program, or is there
substantial variation across program components, providers, or subgroups of clients?
Implementation questions are those that tell about how and to what extent activities have been implemented as intended and whether they are targeted to appropriate populations or problems:
• Are mandated or authorized activities actually being carried out? • Are the activities in accordance with the purpose of the law and implementing regulations? • Do activities conform to the intended program model or to professional standards of practice, if
applicable? • Are program resources efficiently managed and expended?
Impact questions illuminate the program effects:
• Is the program achieving its intended purposes or outcomes? What is the aggregate impact of the program? How did impact or outcomes vary across participants and approaches? How did impact vary across providers? Specifically, did the program support providers whose performance was consistently weak?
• What other important effects relate to the program (side effects)? What unforeseen effects (either positive or negative) did the program have on the problems or clients it was designed to address? Did the program have an effect on other programs aimed at a similar problem or population?
• How does this program compare with an alternative strategy for achieving the same ends? Are the effects gained through the program worth its financial and other costs? Taking both costs and effects into account, is the current program superior to alternative strategies for achieving the same goals?
If evaluation questions are prepared within the transformative framework, they exemplify an understanding of the power relationships that need to be addressed (Mertens, 2009). Examples of evaluation questions from this perspective might include the following:
• How can we teach and counsel students and clients so that they do not continue to be oppressed? • Are the resources equitably distributed? • How has the institution or agency been unresponsive in meeting the needs of people with
disabilities?
p. 75
00000001583532 - Research and Evaluation in Education and Psychology: tative, Q
ualitative, and M ixed M
ethods 03/24/2019 - RS000000000000000 Integrating D
iversity W ith Q
uantit
3/24/2019 Research and Evaluation in Education and Psychology: Integrating Diversity With Quantitative, Qualitative, and Mixed Methods
https://ncuone.ncu.edu/d2l/le/content/122307/viewContent/1252315/View?ou=122307 25/34
disabilities?
The focus of these questions is to place the problem in an unresponsive system with power inequities rather than in the individual without power.
No matter what the initial evaluation questions, the evaluator should always be sensitive to emerging issues that necessitate a revision of these questions. This might be especially important in responsive and transformative evaluations.
Selection of an Evaluation Model Evaluators carry an inclination, based on their worldview, to use specific models in their evaluation work. However, the needs of the evaluation will determine the appropriateness and feasibility of using a specific model. At this point, evaluators must ask themselves question such as these: Can I conduct this evaluation using the model that seems most appropriate to me, given my view of the world? Can I modify, adjust, or adapt my way of thinking to be responsive to the needs of this client in this setting? Can I use my way of thinking to help the client think about the problem in ways that are new and different and, ideally, more constructive and productive? I suspect that older, more experienced evaluators can make choices based on compatibility with their worldviews more readily than newer, less-experienced evaluators. Perhaps newer evaluators would find it easier to adjust their model selection to the needs of the client. For example, Patton (2008) declares himself to be a pragmatist in his choice of models and methods, asserting that sensible methods decisions should be based on the purpose of the inquiry, the questions being investigated, and the resources available.
The reader should review the major models of evaluation as they are presented earlier in this chapter. If one model seems to address the needs of the evaluation more than another, consider if that is a model that aligns with your worldview. If you think that you can “enlighten” a client to a different approach to evaluation, give it a try: One of the roles that evaluators fulfill is educating the client about evaluation. Just as the implicit or explicit model that you entered the situation with influenced your decisions up to this point, the model that you choose to operationalize influences your next steps in planning the evaluation.
Planning the Evaluation
Data Collection Decisions. Data collection decisions basically answer these questions:
• What data collection strategies will you use? • Who will collect the data? • From whom will you collect the information (sampling)? • When and where will you collect the information? • How will the information be returned to you?
Your basic choices for data collection strategies are the same as those outlined in Chapters 11 and 12 (on sampling and data collection, respectively) in this text, as well as those discussed in Chapter 6 on survey research and Chapter 8 on qualitative methods. Your major concern is choosing data collection strategies that provide answers to your evaluation questions within the constraints of the evaluation study, that are culturally and linguistically appropriate, and that satisfy the information needs of your stakeholders. The constraints of data collection in public institutions and agencies sometimes include specific policies about who can have access to information. When the data collection plan is being developed, the evaluator should inquire into such possible constraints. Stakeholders should participate in the development of the data collection plan and should reach agreement that the data collected using this plan will satisfy their information needs. Of course, you need to stay flexible and responsive to emerging data collection needs.
Analysis, Interpretation, and Use. Issues related to analysis, interpretation, and use planning are directly affected by your data collection choices These topics are discussed in Chapter 12
p. 76
00000001583532 - Research and Evaluation in Education and Psychology: tative, Q
ualitative, and M ixed M
ethods 03/24/2019 - RS000000000000000 Integrating D
iversity W ith Q
uantit
3/24/2019 Research and Evaluation in Education and Psychology: Integrating Diversity With Quantitative, Qualitative, and Mixed Methods
https://ncuone.ncu.edu/d2l/le/content/122307/viewContent/1252315/View?ou=122307 26/34
directly affected by your data collection choices. These topics are discussed in Chapter 12. Nevertheless, as a part of the evaluation plan, the steps that you will take for analysis, interpretation, and use of the evaluation data should be specified. Your model of evaluation will determine how interactive and iterative the analysis and interpretation phases are. For example, in transformative or constructivist evaluations, you would expect to have fairly constant interaction with the stakeholder groups. You would share preliminary results and consult on the use of the findings for the purpose of program modification, as well as for directions for further data collection and analysis. If you are functioning within a postpositivist model, you would be less likely to have such a high level of interaction with your stakeholder groups. Rather, you would attempt to maintain distance so that your presence did not unduly influence the effects of the program. In terms of interpretation, evaluators have emphasized the development of a standard for use in making judgments about a program’s merit or worth. For example, if a program is designed to reduce the dropout rate of high school students, is a 50% reduction considered successful? How about a 25% reduction? Such standards are appropriate for studies that focus on impact evaluation, although process-oriented standards could be established in terms of number of clients served or number of participants at a workshop.
Management of the Evaluation The management plan for the evaluation should include a personnel plan as well as a budget (see Chapter 13 for more on this topic). The personnel plan specifies the tasks that will be done, as well as how, when, and by whom they will be done. The cost plan specifies the costs of the evaluation in such categories as personnel, travel, supplies, and consultants. Together, these two parts of the management plan can be used as the basis of a formal contract between the evaluator and the sponsoring agency.
Meta-Evaluation Plan. The meta-evaluation plan specifies how the evaluation itself will be evaluated. Typically, the meta-evaluation specifies when reviews of the evaluation will be conducted, by whom, and with reference to what standards. In evaluation work, three time points often seem appropriate for meta-evaluation to occur: (a) after the preliminary planning is finished, (b) during the implementation of the evaluation, and (c) after the evaluation is completed. The meta-evaluation can be accomplished by asking a person outside of the setting to review the evaluation planning documents, the progress reports, and the final report. It can also involve feedback from the stakeholder groups. One source of standards that the evaluation can be measured against is The Program Evaluation Standards: A Guide for Evaluators and Evaluation Users (3rd ed.; Yarbrough et al., 2011). These standards are discussed in the next section.
EXTENDING YOUR THINKING
Evaluation Planning Find a published evaluation study. On the basis of information given in the article, create an evaluation plan (in retrospect) using the guidelines for planning provided in this chapter. Your plan should include the following:
a. Description of what is to be evaluated
b. The purpose of the evaluation
c. The stakeholders in the evaluation
d. Constraints affecting the evaluation
e. The evaluation questions
f. Description of the evaluation model
g. Data collection specifications
h. Analysis, interpretation, and use strategies
i. Management plan p. 78
p. 77
00000001583532 - Research and Evaluation in Education and Psychology: tative, Q
ualitative, and M ixed M
ethods 03/24/2019 - RS000000000000000 Integrating D
iversity W ith Q
uantit
3/24/2019 Research and Evaluation in Education and Psychology: Integrating Diversity With Quantitative, Qualitative, and Mixed Methods
https://ncuone.ncu.edu/d2l/le/content/122307/viewContent/1252315/View?ou=122307 27/34
. a age e t p a
j. Meta-evaluation plans
If you are unable to locate information from the published article that would provide a complete retrospective plan, make a note of the missing elements. Add a section to your plan indicating what you think needs to be added to improve their plan. This will require some creative thinking on your part.
Standards for Critically Evaluating Evaluations The Program Evaluation Standards (referred to above and hereafter referred to as the Standards; Yarbrough et al., 2011) were originally developed by a joint committee that was initiated by the efforts of three organizations: the American Educational Research Association (AERA), the American Psychological Association (APA), and the National Council on Measurement in Education. Representatives of these three organizations were joined by members of 12 other professional organizations (e.g., American Association of School Administrators, Association for Assessment in Counseling, and the National Education Association) to develop a set of standards that would guide the evaluation of educational and training programs, projects, and materials in a variety of settings. The Standards provide one comprehensive (albeit not all-encompassing) framework for examining the quality of an evaluation.
The Standards are organized according to five main attributes of evaluations:
• Utility—how useful and appropriately used an evaluation is • Feasibility—the extent to which the evaluation can be implemented successfully in a specific
setting • Propriety—how humane, ethical, moral, proper, legal, and professional an evaluation is • Accuracy—how dependable, precise, truthful, and trustworthy an evaluation is • Evaluation Accountability—the extent to which the quality of the evaluation itself is assured and
controlled
Each of the main attributes is defined by standards relevant to that attribute. Box 2.5 contains a summary of the standards organized by attribute. Guidelines and illustrative cases are included in The Program Evaluation Standards text itself. The illustrative cases are drawn from a variety of educational settings, including schools, universities, the medical and health care field, the military, business and industry, the government, and law.
BOX 2.5 A Summary of the Program Evaluation Standards
Utility U1 Evaluator Credibility Evaluations should be conducted by qualified people who establish and maintain credibility in the evaluation context.
U2 Attention to Stakeholders Evaluations should devote attention to the full range of individuals and groups invested in the program and affected by its evaluation.
U3 Negotiated Purposes Evaluation purposes should be identified and continually negotiated based on the needs of stakeholders.
U4 Explicit Values Evaluations should clarify and specify the individual and cultural values underpinning purposes, processes,
p. 79
00000001583532 - Research and Evaluation in Education and Psychology: tative, Q
ualitative, and M ixed M
ethods 03/24/2019 - RS000000000000000 Integrating D
iversity W ith Q
uantit
3/24/2019 Research and Evaluation in Education and Psychology: Integrating Diversity With Quantitative, Qualitative, and Mixed Methods
https://ncuone.ncu.edu/d2l/le/content/122307/viewContent/1252315/View?ou=122307 28/34
y p y p g p p , p , and judgments.
U5 Relevant Information Evaluation information should serve the identified and emergent needs of stakeholders.
U6 Meaningful Processes and Products Evaluations should construct activities, descriptions, findings and judgments in ways that encourage participants to rediscover, reinterpret, or revise their understandings and behaviors.
U7 Timely and Appropriate Communication and Reporting Evaluations should attend in a continuing way to the information needs of their multiple audiences.
U8 Concern for Influence and Consequences Evaluations should promote responsible and adaptive use while guarding against unintended negative consequences and misuse. (Yarbrough et al., 2011, p. 3)
Feasibility F1 Project Management Evaluations should use effective project management strategies.
F2 Practical Procedures Evaluation procedures should be practical and responsive to the program operates.
F3 Contextual Viability Evaluations should recognize, monitor, and balance the cultural and political interests and needs of individuals and groups.
F4 Resource Use Evaluations should use resources effectively and efficiently. (Yarbrough et al., p. 71)
Propriety P1 Responsive and Inclusive Orientation Evaluations should include and be responsive to stakeholders and their communities.
P2 Formal Agreements Evaluations should be negotiated to make obligations explicit and take into account the needs, expectations, and cultural contexts of clients and other parties.
P3 Human Rights and Respect Evaluations should be designed and conducted to protect human and legal rights and maintain the dignity of participants and other stakeholders.
P4 Clarity and Fairness Evaluations should be understandable and fair in addressing stakeholder needs and purposes.
P5 Transparency and Disclosure Evaluations should provide complete descriptions of findings, limitations, and conclusions to all stakeholders, unless doing so would violate legal and propriety obligations.
P6 Conflicts of Interests Evaluations should openly and honestly identify and address real or perceived conflicts of interests that may compromise the evaluation.
P7 Fiscal Responsibility Evaluations should account for all expended resources and comply with sound fiscal procedures and processes. (Yarbrough et al., p. 105)
p. 80
00000001583532 - Research and Evaluation in Education and Psychology: tative, Q
ualitative, and M ixed M
ethods 03/24/2019 - RS000000000000000 Integrating D
iversity W ith Q
uantit
3/24/2019 Research and Evaluation in Education and Psychology: Integrating Diversity With Quantitative, Qualitative, and Mixed Methods
https://ncuone.ncu.edu/d2l/le/content/122307/viewContent/1252315/View?ou=122307 29/34
p ( g , p )
Accuracy A1 Trustworthy Conclusions and Decisions Evaluation conclusions and decisions should be explicitly justified in the cultures and contexts where they have consequences.
A2 Valid Information Evaluation information should serve the intended purposes and support valid interpretations.
A3 Reliable Information Evaluation procedures should yield sufficiently dependable and consistent information for the intended uses.
A4 Explicit Program and Context Descriptions Evaluations should document programs and their contexts with appropriate detail and scope for the evaluation purposes.
A5 Information Management Evaluations should employ systematic information collection, review, verification, and storage methods.
A6 Sound Designs and Analyses Evaluations should employ technically adequate designs and analyses that are appropriate for the evaluation purposes.
A7 Explicit Evaluation Reasoning Evaluation reasoning leading from information and analyses to findings, interpretations, conclusions, and judgments should be clearly and completely documented.
A8 Communication and Reporting Evaluation communications should have adequate scope and guard against misconceptions, biases, distortions, and errors. (Yarbrough et al., 2011, p. 157)
Evaluation Accountability M1 Evaluation Documentation Evaluations should fully document their negotiated purposes and implemented designs, procedures, data, and outcomes.
M2 Internal Meta-Evaluation Evaluators should use these and other applicable standards to examine the accountability of the evaluation design, procedures employed, information collected, and outcomes.
M3 External Meta-Evaluation Program evaluation sponsors, clients, evaluators, and other stakeholders should encourage the conduct of external meta-evaluations using these and other applicable standards. (Yarbrough et al., 2011, p. 225)
SOURCE: Yarbrough et al. (2011).
Mertens (2009) and Kirkhart (2005) recognize that concerns about diversity and multiculturalism have pervasive implications for the quality of evaluation work. Recall that in Chapter 1, you read about Kirkhart’s term “multicultural validity,” which she defined as that type of validity that indicates that we have established correct understandings across multiple cultural contexts. She identified five types of validity: theoretical, experiential, consequential, interpersonal, and methodological (see Chapter 1).
The specific threats to multicultural validity were identified as these:
1. It takes time to reflect multicultural perspectives soundly. Many evaluations are conducted in compressed time frames and on limited budgets, thus constraining the ability of the evaluator to
p. 81
00000001583532 - Research and Evaluation in Education and Psychology: tative, Q
ualitative, and M ixed M
ethods 03/24/2019 - RS000000000000000 Integrating D
iversity W ith Q
uantit
3/24/2019 Research and Evaluation in Education and Psychology: Integrating Diversity With Quantitative, Qualitative, and Mixed Methods
https://ncuone.ncu.edu/d2l/le/content/122307/viewContent/1252315/View?ou=122307 30/34
be sensitive to the complexity of multicultural dimensions. 2. Cultural sophistication needs to be demonstrated on cognitive, affective, and skill dimensions.
The evaluator needs to be able to have positive interpersonal connections, conceptualize and facilitate culturally congruent change, and make appropriate cultural assumptions in the design and implementation of the evaluation.
3. The evaluator must avoid cultural arrogance that is reflected in premature cognitive commitments to a particular cultural understanding as well as to any given model of evaluation.
Ethics and Evaluation: The Guiding Principles Another important resource for designing high-quality evaluations is the AEA’s Guiding Principles for Evaluators (2004; www.eval.org). There are five guiding principles:
• Systematic Inquiry. Evaluators should conduct systematic data-based inquiries about the program being evaluated.
• Competence. Evaluators provide competent performance in the design, implementation, and reporting of the evaluation, including demonstration of cultural competence.
• Integrity/Honesty. Evaluators need to display honesty and integrity in their own behavior and attempt to ensure the honesty and integrity of the entire evaluation process.
• Respect for People. Evaluators must respect the security, dignity, and self-worth of the respondents, program participants, clients, and other stakeholders with whom they interact.
• Responsibilities for the General and Public Welfare. Evaluators should articulate and take into account the diversity of interests and values that may be related to the general and public interests and values.
The Program Evaluation Standards and the Guiding Principles for Evaluators should be used by the evaluator in developing and implementing the evaluation study. Conducting the meta-evaluation at the design stage not only ensures that a worthwhile evaluation has been constructed that is likely to produce the information needed by users, but also will increase the confidence of those associated with the evaluation. Conducting the meta-evaluation across the life cycle of the evaluation will improve the evaluation.
Questions for Critically Analyzing Evaluation Studies
The following questions are designed to parallel the Standards and to address issues raised by the construct of multicultural validity as it was described by Kirkhart (2005).
Utility
1. Evaluator credibility: How did the evaluator establish his or her credibility and how did the stakeholders perceive the evaluator’s credibility?
2. Attention to stakeholders: What was the range of stakeholders included in the evaluation and were any stakeholders excluded?
3. Evaluation purpose: What were the purposes of the evaluation? How were they negotiated throughout the process of the evaluation?
4. Explicit values: What cultural and individual values were identified, and how did these relate to the purposes, processes, and judgments of the evaluation?
5. Relevant information: To what extent were the information needs of stakeholders served?
p. 82
00000001583532 - Research and Evaluation in Education and Psychology: tative, Q
ualitative, and M ixed M
ethods 03/24/2019 - RS000000000000000 Integrating D
iversity W ith Q
uantit
3/24/2019 Research and Evaluation in Education and Psychology: Integrating Diversity With Quantitative, Qualitative, and Mixed Methods
https://ncuone.ncu.edu/d2l/le/content/122307/viewContent/1252315/View?ou=122307 31/34
5. Relevant information: To what extent were the information needs of stakeholders served? 6. Meaningful processes and products: To what extent did the evaluation activities, descriptions, and
judgments yield changes in understanding and actions for participants? 7. Timely and appropriate communication and reporting: What was the schedule of communication
and reporting? How did it serve the needs of the stakeholders? 8. Concern for consequences and influence: What use was made of the evaluation findings? Was
there evidence of misuse?
Feasibility
1. Project management: What evidence is there that the evaluation used effective project management strategies?
2. Practical procedures: What evidence is there that the evaluation used practical procedures that were responsive to the program operation?
3. Contextual viability: How did the evaluation recognize, monitor, and balance the cultural and political interests and needs of individuals and groups?
4. Resource use: How did the evaluation use resources equitably, efficiently, and effectively?
Propriety
1. Responsive and inclusive orientation: How did the evaluation include and be responsive to stakeholders and their communities?
2. Formal agreements: What evidence is there that the evaluation was based on negotiated and renegotiated formal agreements taking into account the contexts, needs, and expectations of clients and other parties?
3. Human rights and respect: How did the evaluations protect human and legal rights and respect the dignity and interactions of participants and other stakeholders?
4. Clarity and fairness: Was the evaluation understandable and fair in addressing stakeholder needs and purposes?
5. Transparency and disclosure: How did the evaluation make complete descriptions of findings, limitations, and conclusions available to all stakeholders, unless doing so would violate legal and propriety obligations?
6. Conflicts of interests: What evidence is there that the evaluators identified and addressed real or perceived conflicts of interest that might compromise processes and results?
7. Fiscal responsibility: How did the evaluations for all expended resources comply with sound fiscal procedures and processes?
Accuracy
1. Justified conclusions and decisions: What evidence is there that the conclusions and decisions were justified in the cultures and contexts where they have consequences?
2. Valid information: What evidence is there that the information served the intended purposes and supported valid interpretations?
3. Reliable information: What evidence is there that the evaluation information is dependable and consistent for the intended purposes?
4. Explicit program and context descriptions: How did the evaluation document programs and their contexts with appropriate detail and scope for the evaluation purposes?
5 Information management: What information collection review verification and storage methods
p. 84
p. 83 00000001583532 - Research and Evaluation in Education and Psychology: tative, Q
ualitative, and M ixed M
ethods 03/24/2019 - RS000000000000000 Integrating D
iversity W ith Q
uantit
3/24/2019 Research and Evaluation in Education and Psychology: Integrating Diversity With Quantitative, Qualitative, and Mixed Methods
https://ncuone.ncu.edu/d2l/le/content/122307/viewContent/1252315/View?ou=122307 32/34
5. Information management: What information collection, review, verification, and storage methods were used, and were they systematic?
6. Sound designs and analyses: What evidence is there that the evaluation employed technically adequate designs and analyses that are appropriate for the evaluation purposes?
7. Explicit evaluation reasoning: How was the evaluation reasoning leading from information and analyses to findings, interpretations, conclusions, and judgments clearly documented?
8. Communication and reporting: What evidence is there that the evaluation communications were adequate in scope and as free as possible from misconceptions, distortions, and errors?
Evaluation Accountability
1. Evaluation documentation: What documentation is available that shows the negotiation of purposes and implementation of designs, procedures, data, and outcomes?
2. Internal meta-evaluation: How did evaluators use standards to examine the accountability of the evaluation design, procedures, information collected, and outcomes?
3. External meta-evaluation: What evidence is there that a meta-evaluation was supported and conducted using appropriate standards?
Interpersonal Validity
1. Personal influences: What influences did personal characteristics or circumstances, such as social class, gender, race and ethnicity, language, disability, or sexual orientation, have in shaping interpersonal interactions, including interactions between and among evaluators, clients, program providers and consumers, and other stakeholders?
2. Beliefs and values: What are the influences of the beliefs and values of the evaluator and other key players in filtering the information received and shaping interpretations?
Consequential Validity
1. Catalyst for change: What evidence is there that the evaluation was conceptualized as a catalyst for change (e.g., shift the power relationships among cultural groups or subgroups)?
2. Unintended effects: What evidence is there of sensitivity to unintended (positive or negative) effects on culturally different segments of the population?
Multicultural Validity
1. Time: Were the time and budget allocated to the evaluation sufficient to allow a culturally sensitive perspective to emerge?
2. Cultural sophistication: Did the evaluator demonstrate cultural sophistication on the cognitive, affective, and skill dimensions? Was the evaluator able to have positive interpersonal connections, conceptualize and facilitate culturally congruent change, and make appropriate cultural assumptions in the design and implementation of the evaluation?
3. Avoidance of arrogant complacency: What evidence is there that the evaluator has been willing to relinquish premature cognitive commitments and to be reflexive?
EXTENDING YOUR THINKING
Evaluation Standards
p. 85
00000001583532 - Research and Evaluation in Education and Psychology: tative, Q
ualitative, and M ixed M
ethods 03/24/2019 - RS000000000000000 Integrating D
iversity W ith Q
uantit
3/24/2019 Research and Evaluation in Education and Psychology: Integrating Diversity With Quantitative, Qualitative, and Mixed Methods
https://ncuone.ncu.edu/d2l/le/content/122307/viewContent/1252315/View?ou=122307 33/34
1. Find a published evaluation study. Critically analyze the study using the questions for critical evaluation of evaluation studies listed in this chapter. Summarize the strengths and weaknesses of the study. Suggest possible changes that might strengthen the study.
2. Find a second published evaluation study that exemplifies a different evaluation model or paradigm. Critically analyze it using the questions for critical evaluation of evaluation studies listed in this chapter. Suggest possible changes that might strengthen the study.
3. Contrast the types of strengths and weaknesses that emerge from the two studies that were conducted from a different evaluation model or paradigm. How did the differences in the studies affect your use of the evaluation standards that were presented in this chapter (The Program Evaluation Standards [Yarbrough et al., 2011] and Kirkhart’s [1995, 2005] multicultural validity standards)?
4. Because the Standards were developed within the context of educational evaluations, there has been some discussion about their applicability to evaluations in other settings. Review an evaluation study based in another type of program (e.g., drug abuse, clinical services). Critically comment on the usefulness of the Standards within this context. Use the points presented in the previous question to structure your response.
5. Brainstorm ideas for an evaluation study. Working in a small group, design an evaluation plan following the steps outlined in this chapter. Include in your plan all the components listed in Chapter 2. Critique your plan using the questions for critical analysis listed in this chapter.
Summary of Chapter 2: Evaluation
Evaluation and research share many things; however, the differences between the two genres of systematic inquiry should be apparent to you after studying this chapter. Evaluation has its own jargon, including the evaluand (what is evaluated), stakeholders (people who have a stake in the outcomes of the evaluation), and other terms reviewed in this chapter. It also has a history of development from the early 1960s to the present; it is a dynamic field with new developments occurring in response to challenges and political factors. The four major paradigms can also be used to frame evaluation studies, just as they can be for research. The planning process of an evaluation is premised on ongoing involvement of stakeholder groups. Professional associations and standards for quality in evaluation are useful resources for anyone interested in pursuing an evaluation study.
p. 86
00000001583532 - Research and Evaluation in Education and Psychology: tative, Q
ualitative, and M ixed M
ethods 03/24/2019 - RS000000000000000 Integrating D
iversity W ith Q
uantit
3/24/2019 Research and Evaluation in Education and Psychology: Integrating Diversity With Quantitative, Qualitative, and Mixed Methods
https://ncuone.ncu.edu/d2l/le/content/122307/viewContent/1252315/View?ou=122307 34/34
00000001583532 - Research and Evaluation in Education and Psychology: tative, Q
ualitative, and M ixed M
ethods