Empowerment Evaluation

profilemalrec1
chapter_11.pdf

PRINTED BY: [email protected]. Printing is for personal, private use only. No part of this book may be reproduced or transmitted without publisher's prior permission. Violators will be prosecuted.

Introduction

Can Management and Evaluation Be Joined? An Overview of the Issues

Evaluators and Managers as Partners in Evaluation Building an Evaluative Culture in Organizations: An Expanded Role for Evaluators

Creating Ongoing Streams of Evaluative Knowledge Obstacles to Building and Sustaining an Evaluative Culture

Manager Involvement in Evaluations: Limits and Opportunities Intended Evaluation Uses and Managerial Involvement

Evaluating for Accountability Evaluating for Program Improvement Manager Bias in Evaluations: Limits to Manager Involvement

Striving for Objectivity in Program Evaluations Can Program Evaluators Be Objective? Looking for a Defensible Definition of Objectivity

A Natural Science Definition of Objectivity Implications for Evaluation Practice

Criteria for High-Quality Evaluations: The Varying Views of Evaluation Associations

Summary

Discussion Questions

References

https://jigsaw.vitalsource.com/api/v0/books/9781452289595/print?from...

1 of 26 2/4/2016 9:42 AM

Chapter 11 explores the relationship between program managers and evaluators, and how that relationship is influenced by evaluation purposes and organizational contexts. We begin by reviewing Wildavsky’s (1979) seminal work on this relationship. Because Wildavsky was skeptical of how organizations could be self-evaluating, we then look at organizational cultures that support evaluation. Given that many evaluators do their work as participants in the organizations in which they do evaluations, we describe the ways in which internal evaluations can occur in such organizations. An evaluative culture is a special case where evaluative thinking and practices have been suffused throughout the organization, and we discuss the prospects for realizing such cultures in contemporary public sector organizations. We then turn to the limitations and opportunities for how managers can be involved in evaluations and how the differences between formative and summative evaluations offer incentives that can bias manager involvement in evaluations of their own programs.

The last part of Chapter 11 looks at the question of whether program evaluations can be objective. We discuss what it would take for evaluations to be objective and whether it is possible to claim that evaluations are objective. Finally, based on the guidelines and principles offered by evaluation associations, we offer some general guidance for evaluators in positioning themselves as practitioners able to make claims for doing high-quality evaluations.

Program evaluation is intended to be a flexible and situation-specific means of answering program questions, testing hypotheses, and understanding program processes and outcomes. Evaluations can focus on a broad range of issues, spanning needs, to program resources, to program outcomes. They generally are intended to yield information that reduces the level of uncertainty about the issues that prompted the evaluation.

As we learned in Chapter 1, program evaluations can be formative; that is, they can aim at producing findings, conclusions, and recommendations that are intended to improve the program. Formative evaluations are typically done with a view to offering program and organizational managers information that they can use to improve the efficiency and/or the effectiveness of an existing program. Generally, questions about the continuation of support for the program itself are not part of formative evaluation agendas.

Program evaluations can also be summative—that is, intended to render judgments on the value of the program. Summative evaluations are more directly linked to accountability requirements that are often built into the program management cycle, which was introduced in Chapter 1. Summative evaluations can focus on issues that are similar to those included in formative evaluations (e.g., program effectiveness), but the intention is to produce information that can be used to make decisions about the program’s future, such as whether to reallocate resources elsewhere or whether to terminate the program. Typically, summative program evaluations entail some kind of external reporting that may include government central agencies as a key stakeholder. In Canada, for example, most program evaluations conducted by federal departments and agencies are made public, and Treasury Board, as the principal central agency responsible for expenditure management across the government, is a recipient of the evaluations.

The purposes of an evaluation affect the relationships between evaluators, managers, and other stakeholders. Generally, managers are more likely to view formative evaluations as “friendly” evaluations and, hence, are more likely to be willing to cooperate with the evaluators. They have an incentive to do so because the evaluation is intended to assist them without raising questions that could result in major changes, including reductions to or even the elimination of a program.

Summative evaluations are generally viewed quite differently. Program managers face different incentives in providing information or even participating in such an evaluation. Notwithstanding the

https://jigsaw.vitalsource.com/api/v0/books/9781452289595/print?from...

2 of 26 2/4/2016 9:42 AM

efforts by some organizations to build evaluative cultures (Mayne, 2008; Mayne & Rist, 2006) wherein managers are encouraged to treat mistakes and perhaps even program-related failures as opportunities to learn, the future of their programs may be at stake.

From an evaluator’s standpoint, then, the experience of conducting a formative evaluation can be quite different from conducting a summative evaluation. The type of evaluation can also affect the evaluator’s relationship with the program manager(s). Typically, program evaluators depend on program managers to provide key information and to arrange access to people, data sources, and other sources of evaluation information (Chelimsky, 2008). Securing and sustaining cooperation is affected by the purposes of the evaluation—managerial reluctance or strategies to “put the best foot forward” might well be expected where the stakes include the future of the program itself. As Norris (2005) says, “Faced with high-stakes targets and the paraphernalia of the testing and performance measurement that goes with them, practitioner and organizations sometimes choose to dissemble” (p. 585).

How does program evaluation, as a part of the performance management cycle, relate to program management? Are program evaluation and program management compatible roles in public and nonprofit organizations?

Wildavsky (1979), in his seminal book Speaking Truth to Power, introduced his discussion of management and evaluation this way:

Why don’t organizations evaluate their own activities? Why don’t they seem to manifest rudimentary self-awareness? How long can people work in organizations without discovering their objectives or determining how well they are carried out? I started out thinking that it was bad for organizations not to evaluate, and I ended up wondering why they ever do it. Evaluation and organization, it turns out, are somewhat contradictory. (p. 212)

When he questioned joining together management and evaluation, Wildavsky chiefly had in mind summative evaluations where the future of programs, and possibly reallocation of funding, would be an issue. Historically, the federal government of Canada, for example, offered this definition of program evaluation in its first publication on the purposes and scope of the then new evaluation function in federal departments and agencies:

Program evaluation in federal departments and agencies should involve the systematic gathering of verifiable information on a program and demonstrable evidence on its results and cost-effectiveness. Its purpose should be to periodically produce credible, timely, useful and objective findings on programs appropriate for resource allocation, program improvement and accountability. (Office of the Comptroller General [OCG] of Canada, 1981, p. 3)

Central agencies still maintain this chiefly summative focus on evaluations. In its statement of the purposes of program evaluation, the Treasury Board of Canada Secretariat (2009), the central agency responsible for the government-wide evaluation function, offers a view of evaluation that is substantially the same as that offered nearly three decades earlier. In its “Policy on Evaluation,” the principal rationale for evaluation is that “evaluation provides Canadians, Parliamentarians, Ministers,

https://jigsaw.vitalsource.com/api/v0/books/9781452289595/print?from...

3 of 26 2/4/2016 9:42 AM

central agencies and deputy heads an evidence-based, neutral assessment of the value for money, i.e. relevance and performance, of federal government programs” (p. 3). The main thrust of the policy is clearly a summative view of evaluation that focuses on “resource allocation and reallocation” and “providing objective information to help Ministers understand how new spending proposals fit with existing programs, identifying synergies and avoid wasteful duplication” (p. 3).

If evaluations are to be used to reallocate resources as well as to improve programs, organizations must have the capacity to participate in and respond to evaluations that have both formative and summative facets. This suggests an image of organizations that are amenable to rethinking existing commitments—managers would need to balance attachment to the stability of their programs with attachment to the evidence-based evaluation process. The rational/technical view of organizations (de Lancer Julnes & Holzer, 2001), which we discussed in Chapter 9, suggests that within such organizations, decision making would be based on evidence, managers and workers would behave in ways that do not undermine a results-focused culture, and summative evaluations would be welcomed as a part of regular management processes.

Wildavsky’s (1979) view of organizations as settings where “speaking truth to power” is a challenge is similar to the political/cultural image of organizations offered by de Lancer Julnes and Holzer (2001). Wildavsky views the respective roles of evaluators and managers as painted in contrasting colors. Evaluators are described as people who question assumptions, who are skeptical, who are detached, who view organizations/programs as means and not ends in themselves, whose currency is evidence, and who ultimately focus on the social needs that the program serves rather than on organizational needs.

By contrast, in Wildavsky’s view, organizational/program managers can be characterized as people who are committed to their programs, who are advocates for what they do and what their programs do, and who do not want to see their commitments curtailed or their resources diminished.

How, then, even for formative evaluation capacity, do organizations resolve the question of who has the power and authority to make decisions, who constructs evaluation information, and who controls its interpretation and distribution? In one scenario, evaluators could be a central part of program and policy design, implementation, and assessment of results. They may suggest that new programs or policies should be implemented as experiments or quasi-experiments (perhaps as pilot programs), with clear objectives, well-constructed comparisons, baseline measurements, and sufficient control over the implementation process, to ensure the internal and construct validities of the evaluation process. This view of trying out new programs was the essence of Donald Campbell’s image of the experimenting society (Watson, 1986).

Managers, however, may prefer to implement programs to more immediately meet organizational and client needs. Objectives may, in that case, be stated in ways that facilitate flexible interpretations of what was important to convey, depending on the audience. Managers would want program objectives to be able to withstand the scrutiny of stakeholders with different values and expectations. As we might anticipate, experimentation can create political problems: What does the organization tell prospective clients who want the program but cannot get access to it because they are members of a “control group”? What do executives tell the elected officials, when client groups question either the lack of flexibility in the service (to maintain construct validity of the evaluation) or its lack of availability (to increase internal validity of the evaluation)?

Where the evaluation function is internal, it may be much more challenging to experiment with a program before its launch. An example of the dilemmas and controversies involved in designing and implementing a randomized controlled trial in a setting where there is an acute social need is the New York City Department of Homeless Services’ 2-year experiment to evaluate the Homebase program. The Homebase program is intended to provide housing-related services to families that are at risk or are already homeless. The evaluation was started in the fall of 2010, and for the ensuing 2 years, those

https://jigsaw.vitalsource.com/api/v0/books/9781452289595/print?from...

4 of 26 2/4/2016 9:42 AM

in the control group (200 families) are excluded from accessing the bundle of services that constitute the Homebase program (New York City Department of Homeless Services, 2010).

The social dilemmas inherent in this kind of situation raise the question: Where should the evaluation function be located in organizations, or even governments? One possible solution is to make program evaluation an external function. Thus, evaluators would be a part of an agency that is not under the administrative control of the organization’s managers. This solution, however, does face challenges as well. In British Columbia, for example, the Secretary of Treasury Board at one point outlined a plan for creating a centralized evaluation capacity in the government (Wolff, 1979). This approach would have been similar to the way external auditors function in governments. Treasury Board analysts housed in that central agency would have conducted evaluations of line department programs with a view to preparing reports for Treasury Board managers. The plan was never implemented, however, in part because the line departments strongly objected to the creation of a central evaluation unit that would not be accountable to line department executives. In fact, at that point, some departments were developing in-house evaluation units, which were intended to perform functions that executives argued would be duplicated by any centralized evaluation unit.

Centralized evaluation functions have certainly been developed for summative evaluation purposes. Under the Bush administration in the United States, the Office of Management and Budget (OMB), an executive agency responsible for budget preparation and expenditure management, was responsible for assessing all federal programs on a cyclical basis using the Program Assessment Rating Tool (PART) process (U.S. OMB, 2002, 2004). From 2002 through 2009, OMB assessed about 20% of all programs every year. These PART reviews were, in effect, summative evaluations that relied in part on existing program evaluation and performance measurement information, but offered an independent assessment conducted by OMB analysts.

Wildavsky’s (1979) view of self-evaluating organizations was quite pessimistic and reflected a view that saw evaluation as a form of research best done by those who had some distance from the programs being evaluated. He saw evaluation and management as being quite separate, with distinct roles for managers and evaluators. But in the past several decades, there has been a broad movement in the field of evaluation to find ways of knitting evaluation and management together. Instead of seeing evaluation as an activity that challenges management, this contrasting view assumes that evaluators can work with managers to define and execute evaluations that combine the best of what both parties bring to that relationship. Utilization-focused evaluation (Patton, 2008), for example, is premised on producing evaluations that managers and other stakeholders will use—and ensuring use means developing a working relationship between evaluators and managers. Managers are expected to be participants in the evaluation process. Patton (1997) characterizes the role of the evaluator this way:

The evaluator facilitates judgment and decision-making by intended users rather than acting as a distant, independent judge. Since no evaluation can be value-free, utilization-focused evaluation answers the question of whose values will frame the evaluation by working with clearly identified, primary intended users who have responsibility to apply evaluation findings and implement recommendations. In essence, I shall argue, evaluation use is too important to be left to evaluators. (p. 21)

https://jigsaw.vitalsource.com/api/v0/books/9781452289595/print?from...

5 of 26 2/4/2016 9:42 AM

Utilization-focused evaluation (Patton, 2008) and participatory evaluation (Cousins & Whitmore, 1998) are among a growing number of approaches that emphasize the importance of evaluators engaging with, and in some respects becoming a part of, the organizations in which they do their work. The traditional view of evaluators as experts who conduct arms-length “evaluation studies” of programs, and offer their written reports to stakeholders at the end of the process, is giving way to the view that evaluators should not stand aside from organizations but instead should get involved (Mayne & Rist, 2006).

Cousins and Whitmore (1998) suggest that the evaluation team and the practitioner team both need to be committed to improving the program. The evaluation process—identifying the key questions, design of the evaluation, collection of the data, and reporting of the results—can be shared between the evaluators and the practitioners (see also King, Cousins, & Whitmore, 2007).

Love (1991) elaborated an approach that is premised on the assumption that evaluators can be a part of organizations (i.e., paid employees who report to organizational executives) and can contribute to improving the efficiency and effectiveness of programs. For Love, “internal evaluation is the process of using staff members who have the responsibility for evaluating programs or problems of direct relevance to an organization’s managers” (p. 2).

Internal evaluation units are common and are the norm in some governments. In the federal government of Canada, for example, each department or agency typically has its own evaluation unit, which reports to the administrative head of that organization. These units are expected to work with departmental executives and managers to identify evaluation priorities and undertake program evaluations. Although external consultants are often hired to conduct parts of such projects, they are managed by internal evaluators.

Love (1991) outlines six stages in the development of internal evaluation capacity, beginning with ad hoc program evaluations and ending with strategically focused cost–benefit analyses:

Ad hoc evaluations focused on single programs Regular evaluations that describe program processes and results Program goal setting, measurement of program outcomes, program monitoring, adjustment Evaluations of program effectiveness, improving organizational performance Evaluations of technical efficiency and cost-effectiveness Strategic evaluations including cost–benefit analyses

These six stages can be seen as a gradual transformation of the intentions of evaluations from formative to summative purposes. Love (1991) highlights the importance of an internal working environment where organizational members are encouraged to participate in evaluations, and where trust of evaluators and their commitment to the organization is part of the culture. What Love is suggesting in his approach is that it is possible to transform an organizational culture so that it embraces evaluation as a strategic asset. We will consider the prospects for building evaluative cultures in the next section of this chapter.

Building an Evaluative Culture in Organizations: An Expanded Role for Evaluators

Mayne (2008) and Patton (2011) are among the advocates for a broader role for evaluation and evaluators in organizations. Like Love (1991), their view is that it is possible to build organizational capacity to perform evaluation that ultimately transforms the organization. Mayne (2008) has outlined the key features of an evaluative culture. We have summarized his main points in Table 11.1.

For Mayne (2008) and Mayne and Rist (2006), the roles of evaluators are broader than doing

https://jigsaw.vitalsource.com/api/v0/books/9781452289595/print?from...

6 of 26 2/4/2016 9:42 AM

evaluation studies/projects—they need to encompass knowledge management for the organization. Evaluators need to be prepared to engage with executives and program managers, offer them advice and assistance, take a lead role in training and other kinds of events that showcase and mainstream evaluation, and generally play a supportive role in building an organizational culture that values and relies on timely, reliable, valid, and relevant information on programs and policies. In Wildavsky’s (1979) words, an evaluative culture is one wherein both managers and evaluators feel supported in “speaking truth to power.”

Table 11.1 Characteristics of an Evaluative Culture in Organizations An organization that has a strong evaluative culture:

Engages in self-reflection and self-examination by Seeing evidence on what it is achieving, using both monitoring and evaluation approaches Using evidence of results to challenge and support what it is doing Valuing candor, challenge, and genuine dialogue both horizontally and vertically within the organization

Engages in evidence-based learning by Allocating time and resources for learning events Acknowledging and learning from mistakes and poor performance Encouraging and modeling knowledge sharing and fostering the view that knowledge is a resource and not a political weapon

Encourages experimentation and change by Supporting program and policy implementation in ways that facilitate evaluation and learning Supporting deliberate risk taking Seeking out new ways of doing business

Source: Adapted from Mayne (2008, p. 1).

Organizations with evaluative cultures can also be seen as learning organizations. Morgan (2006), following on Senge (1990), suggests that learning organizations develop capacities to

Scan and anticipate change in the wider environment to detect significant variations … Develop an ability to question, challenge, and change operating norms and assumptions … Allow an appropriate strategic direction and pattern of organization to emerge. (Morgan, 2006, p. 87)

Key to establishing a learning organization is what Morgan (2006) calls double-loop learning—that is, learning that critically assesses existing organizational goals and priorities in light of evidence and includes options for adopting new goals and objectives. Organizations must get outside their established structures and procedures and instead focus on processes to create new information, which in turn can be used to challenge the status quo and make changes.

Garvin (1993) has suggested five “building blocks” for creating learning organizations, which are

https://jigsaw.vitalsource.com/api/v0/books/9781452289595/print?from...

7 of 26 2/4/2016 9:42 AM

similar to key characteristics of organizations that have evaluative cultures: (1) systematic problem solving using evidence, (2) experimentation and evaluation of outcomes before broader implementation, (3) learning from past performance, (4) learning from others, (5) and treating knowledge as a resource that should be widely communicated.

Creating Ongoing Streams of Evaluative Knowledge

Streams of evaluative knowledge comprise both program evaluations and performance measurement results (Rist & Stame, 2006). In Chapter 9, we outlined 12 steps that are important in building and sustaining performance measurement systems in organizations. In the chapter we discussed the importance of real-time performance measurement and results being available to managers. By itself, building a performance measurement system to meet periodic external accountability expectations will not ensure that performance information will be used internally by organizational managers. The same point can apply to program evaluation. Key to a working evaluative culture would be the usefulness of ongoing evaluative information to managers, and the responsiveness of evaluators to managerial priorities.

Patton (1994, 2011) has introduced developmental evaluation as an alternative to formative and summative program evaluations. Developmental evaluations view organizations as co-evolving in complex environments. Organizational objectives (and hence program objectives) and/or the organizational environment may be in flux. Conventional evaluation approaches that assume a relatively static program structure in which it is possible to build logic models, for example, may have limited application in co-evolving settings. Patton suggests that evaluators should take on the role of organizational development specialists, working with managers and other stakeholders as team members to offer evaluative information in real time so that programs and policies can take advantage of a range of periodic and dynamic evaluative information.

Obstacles to Building and Sustaining an Evaluative Culture

What are the prospects for building evaluative cultures? Recall that in Chapter 10, we suggested that adversarial political cultures can inhibit developing and sustaining performance measurement and reporting systems—one effect of making performance results high stakes where there are significant internal consequences to reporting performance failures is to discourage managers from using externally reported performance results for internal management purposes. In effect, managers, when confronted by situations where public performance results need to be sanitized or at least carefully presented to reduce political risks, tend to decouple those measures from internal performance management uses, preferring instead to develop and use other measures that remain internal to the organization.

Mayne (2008), Mayne and Rist (2006), Patton (2011), and other proponents of evaluative cultures are offering us a normative view of what “ought” to occur in organizations. But many public sector and nonprofit organizations have to navigate environments or governments that are adversarial, engendering negative consequences to managers (and their political masters) if programs or policies are not “successful,” or if candid information about the weaknesses in performance becomes public. What we must keep in mind, much as we did in Chapter 10 when we were assessing the prospects for performance measurement and public reporting systems to be used for both accountability and performance improvement, is that the environments in which public and nonprofit organizations are embedded play an important role in the ways organizational cultures evolve and co-adapt.

To build and sustain an evaluative culture, Mayne (2008) suggests, among other things, that

https://jigsaw.vitalsource.com/api/v0/books/9781452289595/print?from...

8 of 26 2/4/2016 9:42 AM

managers need adequate autonomy to manage for results—Managers seeking to achieve outcomes need to be able to adjust their operations as they learn what is working and what is not. Managing only for planned outputs does not foster a culture of inquiry about what are the impacts of delivering those outputs. (p. 2)

Refocusing organizational managers on outcomes instead of inputs and offering them incentives to perform to those (desired) outcomes has been linked to New Public Management ideals of loosening the process constraints on organizations so that managers would have more autonomy to improve efficiency and effectiveness (Hood, 1995). But as Moynihan (2008) and Gill (2011) point out, what has tended to happen in settings where political cultures are adversarial is that performance expectations (objectives, targets, and measures) have been layered on top of existing process controls instead of replacing them. In effect, from a managerial perspective, there are more controls in place now that performance measurement and reporting are part of the picture and less “freedom to manage.”

What effect does this have on building evaluative cultures? The main issue is the impact on the willingness to take risks. Where organizational environments are substantially risk-averse, that will condition and limit the prospects for developing an organizational culture that encourages risk taking. In short, building and sustaining evaluative cultures requires not only supportive organizational leadership but also a political and organizational environment that permits reporting evaluative results that are able to acknowledge below-par performance, when it occurs.

Increasingly, program managers are expected to play a role in evaluating their own programs. In many situations, particularly for managers in nonprofit organizations, resources to conduct evaluations are scarce. But expectations that programs will be evaluated (and that information will be provided that can be used by funders to make decisions about the program’s future) are growing. Designing and implementing performance measurement systems also presumes a key role for managers.

In Chapter 10, we discussed the ways in which setting up performance measures to make summative judgments about programs can produce unintended consequences—managers will respond to the incentives that are implied by the consequences of reporting performance results and will shape their behavior accordingly. The “naming and shaming” system of England’s health care providers from 2000 to 2005 resulted in substantial problems with the validity of the performance data (Bevan & Hamblin, 2009).

Involving managers, indeed giving them a central role in evaluations that are intended to meet external accountability requirements, is different from involving them or even giving them the lead in formative evaluations. Because the field of evaluation is so broad and diverse, we see a range of views on how much and in what ways managers should be involved in evaluations (including performance measurement systems).

Intended Evaluation Uses and Managerial Involvement

Most contemporary evaluation approaches emphasize the importance of the ultimate uses of evaluations. In fact, there is a growing literature that examines and categorizes different kinds of uses

https://jigsaw.vitalsource.com/api/v0/books/9781452289595/print?from...

9 of 26 2/4/2016 9:42 AM

(Leviton, 2003; Mark & Henry, 2004). Patton (2008), in his book Utilization-Focused Evaluation, points out that the evaluation field has evolved toward making uses of evaluations a key criterion. The Program Evaluation Standards (Yarbrough, Shulha, Hopson, & Caruthers, 2011), developed by the Joint Committee on Standards for Educational Evaluation, make utility one of the five standards for evaluation quality. The other four are feasibility, propriety, accuracy, and accountability.

Many evaluation approaches support involving program managers in the process of evaluating programs. Participatory evaluation approaches, for example, emphasize the importance of having practitioners involved in evaluations, principally to increase the likelihood that the evaluations will be used (Cousins & Whitmore, 1998; Smits & Champagne, 2008).

Some evaluation approaches (empowerment evaluation is an example) emphasize evaluation use but go beyond practitioner involvement to making social justice–related outcomes an important goal of the evaluation process. Empowerment evaluation is intended in part to make evaluation part of the normal planning and management of programs and to ultimately put managers and staff in charge of their own destinies. “Too often,” argue Fetterman, Kaftarian, and Wandersman (1996),

external evaluation is an exercise in dependency rather than an empowering experience: in these instances the process ends when the evaluator departs, leaving participants without the knowledge or expertise to continue for themselves. In contrast, an evaluation conducted by program participants is designed to be ongoing and internalized in the system, creating the opportunity for capacity building. (p. 9)

Initially, Fetterman seemed to view evaluation as a formative process. He argued that the assessment of a program’s worth is not an end point in itself but part of an ongoing process of program improvement. Fetterman (2001) acknowledged, however, that

the value or strength of empowerment evaluation is directly linked to the purpose of the evaluation.… Empowerment evaluation makes a significant contribution to internal accountability, but has serious limitations in the area of external accountability … An external audit or assessment would be more appropriate if the purpose of the evaluation was external accountability. (p. 145)

In a more recent rebuttal of criticism of empowerment evaluation, Fetterman and Wandersman (2007) suggest that their approach is capable of producing unbiased evaluations and, by implication, evaluations that are defensible as summative products. In response to criticism by Cousins (2005), they suggest,

contrary to Cousins’ (2005) position that “collaborative evaluation approaches … [have] … an inherent tendency toward self-serving bias” (p. 206), we have found many empowerment evaluations to be highly critical of their own operations, in part because they are tired of seeing the same problems and because they want their programs to work. Similarly, empowerment evaluators may be highly critical of programs that they favor because they want them to be effective and accomplish their intended goals. It may appear counterintuitive, but in practice we have found appropriately designed empowerment evaluations to be more critical and penetrating than many external evaluations. (Fetterman & Wandersman, 2007, p. 184)

Below, we expand on managerial involvement in evaluation for accountability and evaluation for program improvement.

https://jigsaw.vitalsource.com/api/v0/books/9781452289595/print?from...

10 of 26 2/4/2016 9:42 AM

Evaluating for Accountability

Public accountability has become nearly a universal expectation in both the public and the nonprofit sectors internationally. There are many countries where some regime of public accountability exists at both the national and the subnational levels. Evaluating for accountability is typically summative, and often the key stakeholders are outside the organizations in which the programs being evaluated are located. Stakeholders can include central agencies, funders, elected officials, and others, including interest groups and citizens.

Summative evaluations can be aimed at meeting accountability requirements, but they do not have to be. It is possible to have an evaluation that looks at the merit or worth of a program (Lincoln & Guba, 1980) but is intended for stakeholders within an organization. A volunteer nonprofit board, for example, may be the principal client for a summative evaluation of a program, and although the decisions flowing from such an evaluation could affect the future of the program, the evaluation could be seen as internal to the organization.

A good example of an organization that conducts high-stakes accountability evaluations is the Government Accountability Office (GAO) in the United States. Although a part of the Congress, the GAO straddles the boundary between the executive and the legislative branches of the U.S. federal government. Eleanor Chelimsky (2008), from the GAO, in a candid discussion describes the “clash of cultures” between evaluation and politics, and makes a strong case for the importance of evaluator independence in the case of summative evaluations for accountability. She points to the American division-of-powers structure as both prompting a demand for evaluation and, at the same time, threatening evaluator independence:

Because our government’s need for evaluation arises from its checks-and-balances structure —which, as you know, features separation of powers, legislative oversight, and accountability to the people as protectors for individual liberty—evaluators working within that structure must deal, not exceptionally but routinely and regularly, with political infringements on their independence that result directly from that structure. (p. 400)

For Chelimsky (2008), evaluator independence is an essential asset for the GAO in its work with the Congress. At the same time, the GAO relies on government agencies to contribute to its work. It needs to secure the cooperation of the agencies in which the programs being evaluated are located. It needs the data that are housed in federal departments and agencies, to be able to construct key lines of evidence for evaluations. What Chelimsky has observed over time is a growing trend toward limiting access to agency data:

Between 1980 and 1994—that is, across the Carter, Reagan, Bush, and Clinton presidencies—we found that secrecy and classification of information were becoming prevalent in an increasing number of agencies. Yet it would be hard to find a more critical issue for evaluation than this one. (p. 407)

Chelimsky’s (2008) view is that this issue, if anything, became more critical under the Bush administrations (2001–2008). In effect, agency and managerial involvement in GAO evaluations has become a significant political issue in the American government.

The GAO model of independent evaluations is exceptional—most governments do not have a substantial institutional capacity to conduct independent evaluations. Instead, a more typical model would be the one in the Canadian federal government, wherein each department and agency has at least some evaluation capacity built into the organizational structure but evaluation unit heads report

https://jigsaw.vitalsource.com/api/v0/books/9781452289595/print?from...

11 of 26 2/4/2016 9:42 AM

to the administrative head of the agency. This model is similar to the one advocated by Love (1991) in his description of internal evaluation. Unlike audit, where there are typically both internal and external auditors to examine administrative processes and even performance, evaluation continues to be an internal function.

In the Canadian example of the federal evaluation function, housing evaluation capacity in departments and agencies makes sense from a formative standpoint; evaluators report to the heads of the agencies, and their work would, in principle, be useful for making program-related changes. But the overall thrust of the 2009 Federal Evaluation Policy is summative; that is, the emphasis in the policy is on evaluations providing information to senior elected and appointed officials and being used to fulfill accountability expectations. Evaluators who work in the Canadian federal government are expected to wear two hats: They are members of the organizations in which they do their evaluation work, but at the same time, they are expected to meet the policy requirements set forth by Treasury Board. Like their counterparts in the GAO, they need to work with managers to be able to do their work, but unlike the GAO, they do not have an institutional base that is independent of the programs they are expected to evaluate.

Evaluating for Program Improvement

Most evaluation approaches emphasize the importance of evaluating to improve programs. In Chapter 10, we saw that when public sector performance measurement systems are intended to be used for both public accountability and performance improvement purposes, one use can crowd out the other use. Specifically, requiring performance results to be publicly reported (to fulfill accountability expectations) can affect the ways that information is viewed and used within organizations. Evaluating to improve programs while evaluating to meet accountability expectations can have similar effects as happens for performance measurement systems. If organizational managers are invited to be a part of an evaluation where the results will become public and may have significant consequences for their programs or their organizations, suggesting that the evaluation is intended as well to improve the program will be viewed with some skepticism.

The political culture in which the organization is embedded will affect perceptions of risk, willingness to be candid, and perhaps even willingness to provide information for the evaluation. Chelimsky (2008) points out that organizationally based information is critical to constructing credible program evaluations. Making program evaluation high stakes, that is, making evaluation results central to deciding the future of programs or even organizations, will weaken the connections between evaluators and evaluands (the programs and managers being evaluated), and affect the likelihood of successful future evaluation engagements.

Manager Bias in Evaluations: Limits to Manager Involvement

We began with Wildavsky’s (1979) view that managers and evaluators have quite different and, in some respects, conflicting roles. The whole field of evaluation has moved toward a position that makes room for manager involvement in evaluations and raises the question of what limits, if any, there are in how managers can participate in evaluations.

At one end of a continuum of manager involvement, Fetterman and Wandersman (2007) suggest that empowerment evaluation as a participatory approach facilitates managers and other organizational members taking the lead in conducting both formative and summative evaluations of their own programs. This view has been challenged by those who advocate for a central role for program evaluators as judges of the merit and worth of programs (Scriven, 2005). Stufflebeam (1994) challenged advocates of empowerment evaluation around the issue of whether managers and other

https://jigsaw.vitalsource.com/api/v0/books/9781452289595/print?from...

12 of 26 2/4/2016 9:42 AM

stakeholders (not the evaluator[s]) should make the decisions about the evaluation process and evaluation findings. His view is that ceding that amount of control invites “corrupt or incompetent evaluation activity” (p. 324):

Many administrators caught in political conflicts over programs or needing to improve their public relations image likely would pay handsomely for such friendly, non-threatening, empowering evaluation service. Unfortunately, there are many persons who call themselves evaluators who would be glad to sell such services. Unhealthy alliances of this type can only delude those who engage in such pseudo evaluation practices, deceive those whom they are supposed to serve, and discredit the evaluation field as a legitimate field of professional practice. (p. 325)

Although Stufflebeam’s view is a strong critique of empowerment evaluation and, by implication, other evaluative approaches that cede the central position that evaluation professionals have in conducting both formative and summative evaluations, the roles that evaluators and managers have often differ. The views put forward by advocates for empowerment evaluation (Fetterman & Wandersman, 2007) suggest assumptions about what motivates program managers that are similar to Le Grand’s (2010) suggestion that historically, public servants in Britain were assumed to be interested in “doing the right thing” in their work. In other words, managers would not be self-serving but instead would be motivated by a desire to serve the public. Le Grand (2010) called such public servants “knights.” His own view is that this assumption is naïve and needs to be tempered by considering the incentives that shape behaviors.

The nature of organizational politics and the interactions between organizations and their environments usually mean that managerial interests in preserving and enhancing programs is challenged by the role that evaluators play in judging the merit and worth of programs.

Expecting managers to evaluate their own programs can result in biased program evaluations. Indeed, a culture can be built up around the evaluation function such that evaluators are expected to be advocates for programs. Under such conditions, departments and agencies would use their evaluation capacity to defend their programs, structuring evaluations and presenting results so that programs are seen to be above criticism. In the language used in Chapter 10 to describe situations where performance measurement systems produced unintended results: Gaming the program evaluation function can occur.

Evaluations produced by organizations under such conditions will tend to be viewed outside the organization with skepticism. Funders, or analysts who are employed by the funders, will work hard to expose weaknesses in the methodologies used and cast doubt on the information in the evaluation reports. In effect, adversarial relationships can develop, which serve to “expose” weaknesses in evaluations, but are generally not conducive to building self-evaluating or learning organizations. As well, such controversies can undermine a sense that the organization is accountable.

The reality is that expecting program managers to evaluate their own programs, particularly where evaluation results are likely to be used in funding decisions, is likely to produce evaluations that reflect the natural incentives and risk aversion inherent in such situations. They are not necessarily credible even to the managers themselves. Program evaluation, as an organizational function, becomes distorted and contributes to a view that evaluations are biased.

Parenthetically, Nathan (2000), who has worked with several top American policy research centers, points out that internal evaluations are not the only ones that may reflect incentives that bias evaluation results:

Even when outside organizations conduct evaluations, the politics of policy research can be hard

https://jigsaw.vitalsource.com/api/v0/books/9781452289595/print?from...

13 of 26 2/4/2016 9:42 AM

going. To stay in business, a research organization (public or private) has to generate a steady flow of income. This requires a delicate balance in order to have a critical mass of support for the work one wants to do and at the same time maintain a high level of scientific integrity. (p. 203)

Nevertheless, such incentives are likely to be more prevalent and stronger with internal evaluations. Should managers participate in evaluations of their own programs? Generally, scholars and

practitioners who have addressed this question have favored managerial involvement. Love (1991) envisions (internal) evaluators working closely with program managers to produce evaluations on issues that are of direct relevance to the managers. Patton (2008) stresses that among the fundamental premises of utilization-focused evaluation, the first is commitment to working with the intended users to ensure that the evaluation actually gets used.

Chelimsky (2008), in her description of the challenges to independence that are endemic in the work that the GAO does, makes a case for the importance of evaluations being objective:

The strongest defense for an evaluation that’s in political trouble is its technical credibility, which, for me, has three components. First, the evaluation must be technically competent, defensible, and transparent enough to be understood, at least for the most part. Second, it must be objective: That is, in Matthew Arnold’s terms (as cited in Evans, 2006), it needs to have “a reverence for the truth.” And third, it must not only be but also seem objective and competent: That is, the reverence for truth and the methodological quality need to be evident to the reader of the evaluation report. So, by technical credibility, I mean methodological competence and objectivity in the evaluation, and the perception by others that both of these characteristics are present. (p. 411)

Clearly, Chelimsky sees the value in claiming that high-stakes GAO evaluations are objective. “Objective” is also a desired attribute of the information produced in federal evaluations in Canada: “Evaluation … informs government decisions on resource allocation and reallocation by … providing objective information to help Ministers understand how new spending proposals fit” (Treasury Board of Canada Secretariat, 2009, sec. 3.2).

Evaluation is fundamentally about linking theory and practice. Notwithstanding the practitioner views cited above, that objectivity is desirable, academics in the field have not tended to emphasize “objectivity” as a criterion for good-quality evaluations (Conley-Tyler, 2005; Patton, 2008). Stufflebeam (1994), one exception, emphasizes the importance of what he calls “objectivist evaluation” (p. 326) in professional evaluation practice. His definition of objectivist evaluation picks up some of the themes articulated by Chelimsky (2008) above. For Stufflebeam (1994),

objectivist evaluations are based on the theory that moral good is objective and independent of personal or merely human feelings. They are firmly grounded in ethical principles, strictly control bias or prejudice in seeking determinations of merit and worth, … obtain and validate findings from multiple sources, set forth and justify conclusions about the evaluand’s merit and/or worth, report findings honestly and fairly to all-right-to know audiences, and subject the evaluation process and findings to independent assessments against the standards of the evaluation field. Fundamentally, objectivist evaluations are intended to lead to conclusions that

https://jigsaw.vitalsource.com/api/v0/books/9781452289595/print?from...

14 of 26 2/4/2016 9:42 AM

are correct—not correct or incorrect relative to a person’s position, standing or point of view. (p. 326)

Scriven has also advocated for good evaluations to be objective. For Scriven (1997), objectivity is defined as “with basis and without bias” (p. 480), and an important part of being able to claim that an evaluation is objective is to maintain an appropriate distance between the evaluator and what is being evaluated (the evaluand). There is a crucial difference, for Scriven, between being an evaluator and being an evaluation consultant. The former relies on validity as one’s stock-in-trade, and objectivity is a central part of being able to claim that one’s work is valid. The latter work with their clients and stakeholders, but according to Scriven, in the end they cannot offer analysis, conclusions, or recommendations that are not tainted by interactions and the biases that they entail.

In addition to Scriven’s view that objectivity is a key part of evaluation practice, other related professions have asserted, and continue to assert, that professional practice is, or at least ought to be, objective. In the 2003 edition of the Government Auditing Standards (GAO, 2003), government auditors are enjoined to perform their work this way:

Professional judgment requires auditors to exercise professional skepticism, which is an attitude that includes a questioning mind and a critical assessment of evidence. Auditors use the knowledge, skills, and experience called for by their profession to diligently perform, in good faith and with integrity, the gathering of evidence and the objective evaluation of the sufficiency, competency, and the relevancy of evidence. (p. 51)

Should evaluators claim that their work is also objective? Objectivity has a certain cachet, and as a practitioner, it would be appealing to be able to assert to prospective clients that one’s work is objective. Indeed, in situations where evaluators are competing with auditors for clients, claiming objectivity could be an important factor in convincing clients to use the services of an evaluator.

Can Program Evaluators Be Objective?

If giving managers a (substantial) stake in evaluations compromises evaluator and evaluation objectivity, then it is important to unpack what is entailed by claims that evaluations or audits are objective. Is Scriven’s definition of objectivity defensible? Is objectivity a meaningful criterion for high-quality program evaluations? Could we defend a claim to a prospective client that our work would be objective?

Scriven (1997) suggests a metaphor to understand the work of an evaluator: When we do program evaluations, we can think of ourselves as expert witnesses. We are, in effect, called to “testify” about a program, we offer our expert opinions, and the “court” (our client) can decide what to do with our contributions.

Scriven (1997) takes the courtroom metaphor further when he asserts that in much the same way that witnesses are sworn to tell “the truth, the whole truth, and nothing but the truth” (p. 496), evaluators can rely on a common-sense notion of the truth as they do their work. If such an oath “works” in courts (Scriven believes it does), then despite the philosophical questions that can be raised by a claim that something is true, we can and should continue to rely on a common-sense notion of what is true and what is not.

Scriven’s main point is that program evaluators should be prepared to offer objective evaluations and that to do so, it is essential that we recognize the difference between conducting ourselves in ways that promote our objectivity and ways that do not. Even those who assert that there cannot be any truths in our work are, according to Scriven, uttering a self-contradictory assertion: They wish to

https://jigsaw.vitalsource.com/api/v0/books/9781452289595/print?from...

15 of 26 2/4/2016 9:42 AM

claim the truth of a statement that there are no truths. Although Scriven’s argument has a common-sense appeal, it is important to examine it more

closely. There are essentially two main issues in the approach he takes. First, Scriven’s metaphor of evaluators as expert witnesses does have some limitations. In courts

of law, expert witnesses are routinely challenged by their counterparts and by opposing lawyers. Unlike Scriven’s evaluators, who do their work, offer their report, and then absent themselves to avoid possible compromises of their objectivity, expert witnesses in courts undergo a high level of scrutiny. Even where expert witnesses have offered their version of the truth, it is often not clear whether that is their view or the views of a party to a legal dispute. Expert witnesses can sometimes be “purchased.”

Second, witnesses speaking in court can be severely penalized if it is discovered that they have lied under oath. For program evaluators, it is far less likely that sanctions will be brought to bear even if it could be demonstrated that an evaluator did not speak “the truth.” Undoubtedly, an evaluator’s place in the profession can be affected when the word gets around that he or she has been “bought” by a client, but the reality is that in the practice of program evaluation, clients can and do shop for evaluators who are likely to “do the job right.” “Doing the job right” can mean that evaluators are paid to not speak “the truth, the whole truth, and nothing but the truth.”

Looking for a Defensible Definition of Objectivity

Are there other definitions of objectivity that are useful in terms of assisting our practice of program evaluation? The Federal Government of Canada’s OCG (Office of the Comptroller General) was among the government jurisdictions that historically advocated the importance of objectivity in evaluations. In one statement, objectivity was defined this way:

Objectivity is of paramount importance in evaluative work. Evaluations are often challenged by someone: a program manager, a client, senior management, a central agency or a minister. Objectivity means that the evidence and conclusions can be verified and confirmed by people other than the original authors. Simply stated, the conclusions must follow from the evidence. Evaluation information and data should be collected, analyzed and presented so that if others conducted the same evaluation and used the same basic assumptions, they would reach similar conclusions. (Treasury Board of Canada Secretariat, 1990, p. 28)

This definition of objectivity emphasizes the reliability of evaluation findings and conclusions, and is similar to the way auditors define high-quality work in their profession. This implies, at least in principle, that the work of one evaluator or one evaluation team could be repeated, with the same results, by a second evaluation of the same program.

A Natural Science Definition of Objectivity

The OCG criterion of repeatability is similar in part to the way scientists do their work. Findings and conclusions, to be accepted by the discipline, must be replicable.

There is, however, an important difference between program evaluation practice and the practice of scientific disciplines. In the sciences, the methodologies and procedures that are used to conduct research and report the results are intended to facilitate replication. Methods are scrutinized by one’s peers, and if the way the work has been conducted and reported passes this test, it is then “turned over” to the community of researchers, where it is subjected to independent efforts to replicate the results. In other words, meaningfully claiming objectivity requires both the use of replicable

https://jigsaw.vitalsource.com/api/v0/books/9781452289595/print?from...

16 of 26 2/4/2016 9:42 AM

methodologies and actual replications of programs and policies. In practical terms, satisfying both of these criteria is rare.

If a particular set of findings cannot be replicated by independent researchers, the community of research peers eventually discards the results as an artifact of the setting or the scientist’s biases. Transparent methodologies are necessary but not sufficient to establish objectivity of scientific results. The initial reports of cold fusion reactions (Fleischmann & Pons, 1989), for example, prompted additional attempts to replicate the reported findings, to no avail. Fleischman and Pons’s research methods proved to be faulty, and cold fusion did not pass the test of replicability.

A more contemporary controversy that also hinges on being able to replicate experimental results is the question of whether high-energy neutrinos can travel faster than the speed of light. If such a finding were corroborated (reproduced by independent teams of researchers), it would undermine a fundamental assumption of Einstein’s relativity theory—that no particle can travel faster than the speed of light. The back-and-forth “dialogue” in the high-energy physics community is illustrated by a publication that claims that the one set of experimental results (apparently replicating the original experiment) were wrong and that Einstein’s theory is safe (Antonello et al., 2012). The dialogue between the experimentalists and the theoreticians in physics on whether neutrinos actually have been measured traveling faster than the speed of light has the potential to change physics as we know it. The stakes are high, and therefore, the canons of scientific research must be respected.

For scientists, then, objectivity has two important elements, both of which are necessary. Methods and procedures need to be constructed and applied so that the work done, as well as the findings, are open to scrutiny by one’s peers. Although the process of doing a given science-based research project does not by itself make the research objective, it is essential that this process be transparent. Scrutability of methods facilitates repeating the research. If findings can be replicated independently, the community of scholars engaged in similar work confers objectivity on the research. Even then, scientific findings are not treated as absolutes. Future tests might raise questions, offer refinements, and generally increase knowledge.

This working definition of objectivity does not imply that objectivity confers “truth” on scientific findings. Indeed, the idea that objectivity is about scrutability and replicability of methods and repeatability of findings is consistent with Kuhn’s (1962) notion of paradigms. Kuhn suggested that communities of scientists who share a “worldview” are able to conduct research and interpret the results. Within a paradigm, “normal science” is about solving puzzles that are implied by the theoretical structure that undergirds the paradigm. “Truth” is agreement, based on research evidence, among those who share a paradigm.

In program evaluation practice, much of what we call methodology is tailored to particular settings. Increasingly, we are taking advantage of mixed qualitative–quantitative methods (Creswell, 2009; Hearn, Lawler, & Dowswell, 2003) when we design and conduct evaluations, and our own judgment as professionals plays an important role in how evaluations are designed and data are gathered, interpreted, and reported. Owen and Rogers (1999) make this point when they state,

no evaluation is totally objective: it is subject to a series of linked decisions [made by the evaluator]. Evaluation can be thought of as a point of view rather than a statement of absolute truth about a program. Findings must be considered within the context of the decisions made by the evaluator in undertaking the translation of issues into data collection tools and the subsequent data analysis and interpretation. (p. 306)

Although the OCG criterion of repeatability (Treasury Board of Canada Secretariat, 1990) in principle might be desirable, it is rarely applicable to program evaluation practice. Even in the audit community, it is rare to repeat the fieldwork that underlies an audit report. Instead, the fieldwork is

https://jigsaw.vitalsource.com/api/v0/books/9781452289595/print?from...

17 of 26 2/4/2016 9:42 AM

conducted so that all findings are documented and corroborated by more than one line of evidence (or one source of information). In effect, there is an audit trail for the evidence and the findings.

Implications for Evaluation Practice

Where does this leave us? Scriven’s (1997) criteria for objectivity—with basis and without bias—has some defensibility limitations in as much as they usually depend on the “objectivity” of individual evaluators in particular settings. Not even in the natural sciences, where the subject matter and methods are far more conducive to Scriven’s definition, do researchers rely on one scientist’s assertions about “facts” and “objectivity.” Instead, the scientific community demands that the methods and results be stated so that the research results can be corroborated or disconfirmed, and it is via that process that “objectivity” is conferred. Objectivity is not an attribute of one researcher but instead is predicated on the process in the scientific community in which that researcher practices.

In some professional settings where teams of evaluators work on projects, it may be possible to construct internal challenge functions and even share draft reports externally to increase the likelihood that the final product will be viewed as defensible and robust. But repeating an evaluation to confirm the replicability of the findings is almost never done.

The realities of the practice of program evaluation weaken claims that we evaluators can be objective in the work we do. Evaluation is not a science. Instead, it is a craft that mixes together methods with professional judgment to produce products that are methodologically defensible, tailored to contexts, and almost always have unique characteristics.

Many professional associations that represent the interests and views of program evaluators have developed codes of ethics or best practice guidelines. A review of several of these guideline documents indicates that, with one exception (American Educational Research Association [AERA], 2011), there is little specific attention to “objectivity” among the criteria suggested for good evaluations (AERA, 2011; American Evaluation Association, 2004; Australasian Evaluation Society, 2010; Yarbrough, Shulha, Hopson, & Caruthers, 2011).

Historically, Scriven (1997), Stufflebeam (1994), and, more recently, Chelimsky (2008) have emphasized objectivity as a key commodity of program evaluations, and there are government organizations that in their guidelines for assessing evaluation reports do discuss the issue of objectivity (see, e.g., Treasury Board of Canada Secretariat, 1990, 2009; U.S. OMB, 2004b). Markiewicz (2008) provides a provocative discussion about challenges of independence and objectivity in the political context of evaluation, noting,

the challenges presented by the political and stakeholder context of evaluation do raise the longstanding paradigm wars between scientific realists and social constructionists. The former group of evaluators tend to uphold concepts of objectivity and independence in evaluation, while the latter group of evaluators view themselves as negotiators of different realities. (p. 35)

There is one research and evaluation association that has explicitly included objectivity as a criterion for high-quality studies. The AERA (2008, p. 1) defines scientifically based research as “the use of rigorous, systematic, and objective methodologies to obtain valid and reliable knowledge.” The

https://jigsaw.vitalsource.com/api/v0/books/9781452289595/print?from...

18 of 26 2/4/2016 9:42 AM

full set of criteria includes the following:

a. development of a logical, evidence-based chain of reasoning;

b. methods appropriate to the questions posed;

c. observational or experimental designs and instruments that provide reliable and generalizable findings;

d. data and analysis adequate to support the findings;

e. explication of procedures and results clearly and in detail, including specification of the population to which the findings can be generalized;

f. adherence to professional norms of peer review;

g. dissemination of the findings to contribute to scientific knowledge; and

h. access to data for reanalysis, replication, and the opportunity to build on findings.

Evaluating program effectiveness (assessing cause-and-effect relationships) requires “experimental designs using random assignment or quasi-experimental or other designs that substantially reduce plausible competing explanations for the obtained results” (AERA, 2008, p. 1).

The AERA has been part of the policy changes in the United States in the field of education evaluation that began with the No Child Left Behind Act of 2002 (Duffy, Giordano, Farrell, Paneque, & Crump, 2008). Duffy et al. (2008) point out that the phrase “scientifically-based research” appeared over 100 times in the legislation. The working definition of that phrase is very similar to the AERA definition above. Since the No Child Left Behind Act was passed, privileging quantitative, experimental, and quasi-experimental evaluation designs has had an impact on the whole evaluation community in the United States (Smith, 2007).

The key question for us is whether the AERA definition of “scientifically based research” offers a credible alternative to other standards or guidelines. The AERA definition highlights the objectivity of research methodologies and mentions replication as one possible outcome from a study. But when we look at the field of education evaluation (and evaluation more broadly), we see that the efficacy of randomized controlled trials is substantially limited by contextual variables.

Lykins (2009), in his assessment of the impacts of U.S. federal policy on education research, offers this example of the limits of “scientific research” in education:

Take for instance the much-studied Tennessee STAR experiment in class-size reduction. The results of the randomized trial suggested that class-size reduction caused modest gains in the test scores of children in early grades. Boruch, De Moya, and Synder (2002) cite this study as evidence that “a single RFT can help to clarify the effect of a particular intervention against a backdrop of many nonrandomized trials” (p. 74). In fact, the experiment taught, at most, only that class-size reductions were responsible for increased test-scores for these particular students. It did not lend warrant to the claim that class-size reductions are an effective way for raising achievement as such. This became clear when California implemented a state-wide policy of class-size reduction. The California program not only failed to increase student achievement, but may have been responsible for a substantial increase in the number of poorly qualified teachers in high-poverty schools, thus actually harming student performance. (pp. 94–95, italics in original)

https://jigsaw.vitalsource.com/api/v0/books/9781452289595/print?from...

19 of 26 2/4/2016 9:42 AM

The practical effect of privileging (experimental) methodologies that are aimed at examining cause-and-effect relationships is that program evaluations are limited in their generalizability. Cronbach (1982) pointed this out and effectively countered the then dominant view in evaluation that experimental designs, with their overriding emphasis on internal validity, were the gold standard.

For evaluation associations and for evaluators, there are other quality-related criteria that are more relevant. With the exception of the AERA, the evaluation profession as a whole has generally not been prepared to emphasize objectivity as a criterion for high-quality evaluations. Instead, professional evaluation organizations tend to mention the accuracy and credibility of evaluation information (American Evaluation Association, 2004; Canadian Evaluation Society, 2012; Organisation for Economic Cooperation and Development, 2010; Yarbrough et al., 2011), the honesty and integrity of evaluators and the evaluation process (American Evaluation Association, 2004; Australasian Evaluation Society, 2010; Canadian Evaluation Society, 2012; Yarbrough et al., 2011), the fairness of evaluation assessments (Australasian Evaluation Society, 2010; Canadian Evaluation Society, 2012; Yarbrough et al., 2011), and the validity and reliability of evaluation information (American Evaluation Association, 2004; Canadian Evaluation Society, 2012; Organisation for Economic Cooperation and Development, 2010; Yarbrough et al., 2011).

In addition, professional guidelines emphasize the importance of declaring and avoiding conflicts of interest (American Evaluation Association, 2004; Australasian Evaluation Society, 2010; Canadian Evaluation Society, 2012; Yarbrough et al., 2011) and the importance of impartiality in reporting findings and conclusions (Organisation for Economic Cooperation and Development, 2010). Evaluator independence is also mentioned as a criterion (Markiewicz, 2008). Also, guidelines tend to emphasize the importance of competence in conducting evaluations, and the importance of upgrading evaluation skills (American Evaluation Association, 2004; Australasian Evaluation Society, 2010; Canadian Evaluation Society, 2012). Collectively, these guidelines cover many of the characteristics of evaluators and evaluations that we might associate with objectivity: accuracy, credibility, validity, reliability, fairness, honesty, integrity, and competence. Transparency is also a criterion mentioned in some guidelines and standards (see, e.g., Organisation for Economic Cooperation and Development, 2010; Yarbrough et al., 2011). But—and this is a key point—objectivity is more than just having good evaluators or even good evaluations; it is a process that involves corroboration of one’s findings by one’s peers. Our profession is so diverse and includes so many different epistemological and methodological stances that asserting “objectivity” would not be supported by most evaluators.

But, the evaluation profession does not exist alone in the current world of professionals who claim expertise in evaluating programs. The movement to connect evaluation to accountability expectations in public sector and nonprofit organizations has created situations where evaluation professionals, with their diverse backgrounds and standards, are compared with accounting professionals or with management consultants, who generally have a more uniform view of their respective professional standards. Because the public sector auditing community in particular has predicated objectivity of their practice, it is arguable that they have a marketing advantage with prospective clients (see Everett, Green, & Neu, 2005; Radcliffe, 1998). Furthermore, with some key central agencies asserting that in assessing the quality of evaluations one of the key criteria should be objectivity of the findings (Treasury Board of Canada Secretariat, 2009), that criterion confers an advantage on practitioners who claim that their process and products are objective. Patton (2008) refers to the politics of objectivity, meaning that for some evaluators it is important to be able to declare that their work is objective.

What should evaluators tell prospective clients who, having heard that the auditing profession or management consultants (Institute of Management Consultants, 2008) make claims about their work being objective, expect the same from a program evaluation? If we tell clients that we cannot produce an objective evaluation, there may be a risk of their going elsewhere for assistance. On the other hand,

https://jigsaw.vitalsource.com/api/v0/books/9781452289595/print?from...

20 of 26 2/4/2016 9:42 AM

claims that we can be objective are not supported, given the evaluators’ work. Perhaps the best way to respond is to offer criteria that cover much of the same ground as is

covered if one conducts evaluations with a view to their being “objective.” Criteria like accuracy, credibility, honesty, completeness, fairness, impartiality, avoiding conflicts of interest, competence in conducting evaluations, and a commitment to staying current in skills are all relevant. They would be among the desiderata that scientists and others who can make defensible claims about objectivity would include in their own practice. The criteria mentioned are also among the principal ones included by auditors and accountants in their own standards (GAO, 2003).

Patton (2008) takes a pragmatic stance in his own assessment of whether to claim that evaluations are objective:

Words such as fairness, neutrality, and impartiality carry less baggage than objectivity and subjectivity. To stay out of the argument about objectivity, I talk with intended users about balance, fairness, and being explicit about what perspectives, values, and priorities have shaped the evaluation, both the design and the findings. (p. 452)

To sum up, current guidelines and standards that have been developed by professional evaluation associations generally do not claim that program evaluations should be objective. Correspondingly, as practicing professionals, we should not be making such claims in our work. That does not mean that we are without standards, and indeed, we should be striving to be honest, accurate, fair, impartial, competent, highly skilled, and credible in the work we do. If we are these things, we can justifiably claim that our work meets the same professional standards as work done by scholars and practitioners who might claim to be objective.

The relationships between managers and evaluators are affected by the incentives that each party faces in particular contexts. If evaluators have been commissioned to conduct a summative evaluation, it is more likely that program managers will defend their programs, particularly where the stakes are perceived to be high. Expecting managers, under these conditions, to participate as neutral parties in an evaluation ignores the potential for conflicts of commitments, which can affect the accuracy and completeness of information that managers provide about their own programs. This problem parallels the problem that exists in performance measurement systems, where public, high-stakes, summative uses of performance results will tend to induce gaming of the system by those who are affected by the consequences of disseminating performance results.

Formative evaluations, where it is generally possible to project a “win-win” scenario for managers and evaluators, offer incentives for managers to be forthcoming so that they benefit from an assessment based on an accurate and complete understanding of their programs. Historically, a majority of evaluations have been formative. Although advocates for program evaluation and performance measurement imply that evaluations can be used for resource allocation/reallocation decisions, it is comparatively rare to have an evaluation that does that. There has been a gap between the promise and the performance of evaluation functions in governments in that regard (Muller-Clemm & Barnes, 1997).

Many evaluation approaches encourage or even mandate manager or organizational participation in evaluations. Where utilization of evaluation results is a central concern of evaluation processes, managerial involvement has been shown to increase uses of evaluation findings. Some evaluation approaches—empowerment evaluation is an example of an important and relatively new approach

https://jigsaw.vitalsource.com/api/v0/books/9781452289595/print?from...

21 of 26 2/4/2016 9:42 AM

—suggest that control of the evaluation process should be devolved to those in the organizations and programs being evaluated. This view is contested in the evaluation field and continues to be deliberated by other evaluation scholars and practitioners.

Promoting quality standards for evaluations continues to be an important indicator of the professionalization of evaluation practice. Although objectivity has been a desired feature of “good” evaluations in the past, professional associations have generally opted not to emphasize objectivity among the criteria that define high-quality evaluations.

Evaluators, accountants, and management consultants will continue to be connected with efforts by government and nonprofit organizations to be more accountable. In some situations, evaluation professionals, accounting professionals, and management consultants will compete for work with clients. Because the accounting profession continues to assert that their work is objective, evaluators will have to address the issue of how to characterize their own practice, so that clients can be assured that the work of evaluators meets standards of rigor, defensibility, and ethical practice.

Why are summative evaluations more challenging to do than formative evaluations?1. How should program managers be involved in evaluations of their own programs?2. What is a learning organization, and how is the culture of a learning organization supportive of evaluation?

3.

What are the advantages and disadvantages of relying on internal evaluators in public sector and nonprofit organizations?

4.

What is an evaluative culture in an organization? What roles would evaluators play in building and sustaining such a culture?

5.

What would it take for an evaluator to claim that her or his evaluation is objective? Given those requirements, is it possible for any evaluator to say that his or her evaluation is objective? Under what circumstances, if any?

6.

Suppose that you are a practicing evaluator and you are discussing a possible contract to do an evaluation for an agency. The agency director is very interested in your proposal but, in the discussions, says that he wants an objective evaluation. If you are willing to tell him that your evaluation will be objective, you have the contract. How would you respond to this situation?

7.

Other professions like medicine, law, accounting, and social work have guidelines for professional practice that can be enforced against individual practitioners, if need be. Evaluation has guidelines, but they are not enforceable. What would be the advantages and disadvantages of the evaluation profession having enforceable practice guidelines? Who would do the enforcing?

8.

American Educational Research Association. (2008). Definition of scientifically based research. Retrieved from http://www.aera.net/Portals/38/docs/About_AERA/KeyPrograms/ DefinitionofScientificallyBasedResearch.pdf

American Educational Research Association. (2011). Code of ethics: American Educational Research Association—approved by the AERA Council February 2011. Retrieved from http://www.aera.net /Portals/38/docs/

https://jigsaw.vitalsource.com/api/v0/books/9781452289595/print?from...

22 of 26 2/4/2016 9:42 AM

About_AERA/CodeOfEthics(1).pdf American Evaluation Association. (2004). Guiding principles for evaluators. Retrieved from

http://www.eval.org/Publications/GuidingPrinciples.asp Antonello, M., Aprili, P., Baibussinov, B., Baldo Ceolin, M., Benetti, P., Calligarich, E., … Zmuda, J.

(2012). A search for the analogue to Cherenkov radiation by high energy neutrinos at superluminal speeds in ICARUS. Physics Letters B, 711(3–4), 270–275.

Australasian Evaluation Society. (2010). AES guidelines for the ethical conduct of evaluations. Retrieved from http://www.aes.asn.au/

Bevan, G., & Hamblin, R. (2009). Hitting and missing targets by ambulance services for emergency calls: Effects of different systems of performance measurement within the UK. Journal of the Royal Statistical Society: Series A (Statistics in Society), 172(1), 161–190.

Canadian Evaluation Society. (2012). Program evaluation standards. Retrieved from http://www.evaluationcanada.ca/site.cgi?s=6& ss=10&_lang=EN

Chelimsky, E. (2008). A clash of cultures: Improving the “fit” between evaluative independence and the political requirements of a democratic society. American Journal of Evaluation, 29(4), 400–415.

Conley-Tyler, M. (2005). A fundamental choice: Internal or external evaluation? Evaluation Journal of Australasia, 5(1&2), 3–11.

Cousins, J. B. (2005). Will the real empowerment evaluation please stand up? A critical friend perspective. In D. Fetterman & A. Wandersman (Eds.), Empowerment evaluation principles in practice (pp. 183–208). New York, NY: Guilford Press.

Cousins, J. B., & Whitmore, E. (1998). Framing participatory evaluation. New Directions for Evaluation, 80, 5–23.

Creswell, J. W. (2009). Research design: Qualitative, quantitative, and mixed methods approaches. Thousand Oaks, CA: Sage.

Cronbach, L. J. (1982). Designing evaluations of educational and social programs (1st ed.). San Francisco, CA: Jossey-Bass.

de Lancer Julnes, P., & Holzer, M. (2001). Promoting the utilization of performance measures in public organizations: An empirical study of factors affecting adoption and implementation. Public Administration Review, 61(6), 693–708.

Duffy, M., Giordano, V. A., Farrell, J. B., Paneque, O. M., & Crump, G. B. (2008). No Child Left Behind: Values and research issues in high-stakes assessments. Counseling and Values, 53(1), 53–66.

Everett, J., Green, D., & Neu, D. (2005). Independence, objectivity and the Canadian CA profession. Critical Perspectives on Accounting, 16(4), 415–440.

Fetterman, D. (2001). Foundations of empowerment evaluation. Thousand Oaks, CA: Sage. Fetterman, D., Kaftarian, S. J., & Wandersman, A. (1996). Empowerment evaluation: Knowledge and

tools for self-assessment and accountability. Thousand Oaks, CA: Sage. Fetterman, D., & Wandersman, A. (2007). Empowerment evaluation: Yesterday, today, and tomorrow.

American Journal of Evaluation, 28(2), 179–198. Fleischmann, M., & Pons, S. (1989). Electrochemically induced nuclear fusion of deuterium. Journal

of Electroanalytical Chemistry, 261(2A), 301–308.

https://jigsaw.vitalsource.com/api/v0/books/9781452289595/print?from...

23 of 26 2/4/2016 9:42 AM

Garvin, D. A. (1993). Building a learning organization. Harvard Business Review, 71(4), 78–90. Gill, D. (Ed.). (2011). The iron cage recreated: The performance management of state organisations

in New Zealand. Wellington, NZ: Institute of Policy Studies. Government Accountability Office. (2003, August). Government auditing standards: 2003 revision

(GAO-03–673G). Washington, DC: Author. Hearn, J., Lawler, J., & Dowswell, G. (2003). Qualitative evaluations, combined methods and key

challenges: General lessons from the qualitative evaluation of community intervention in stroke rehabilitation. Evaluation, 9(1), 30–54.

Hood, C. (1995). The “new public management” in the 1980s: Variations on a theme. Accounting, Organizations and Society, 20(2–3), 93–109.

Institute of Management Consultants. (2008). IMC code of ethics & member’s pledge. Retrieved from http://www.imc.org.au/Become-a-Member/Membership/ IMC-CODE-OF-ETHICS-MEMBERS-PLEDGE.asp

King, J. A., Cousins, J. B., & Whitmore, E. (2007). Making sense of participatory evaluation: Framing participatory evaluation. New Directions for Evaluation, 114, 83–105.

Kuhn, T. S. (1962). The structure of scientific revolutions. Chicago, IL: University of Chicago Press. Le Grand, J. (2010). Knights and knaves return: Public service motivation and the delivery of public

services. International Public Management Journal, 13(1), 56–71. Leviton, L. C. (2003). Evaluation use: Advances, challenges and applications. American Journal of

Evaluation, 24(4), 525–535. Lincoln, Y. S., & Guba, E. G. (1980). The distinction between merit and worth in evaluation.

Educational Evaluation and Policy Analysis, 2(4), 61–71. Love, A. J. (1991). Internal evaluation: Building organizations from within. Newbury Park, CA:

Sage. Lykins, C. (2009). Scientific research in education: An analysis of federal policy (Doctoral

dissertation). Nashville, TN: Graduate School of Vanderbilt University. Retrieved from http://etd.library.vanderbilt.edu/available/etd- 07242009-114615/unrestricted/lykins.pdf

Mark, M. M., & Henry, G. T. (2004). The mechanisms and outcomes of evaluation influence. Evaluation, 10(1), 35–57.

Markiewicz, A. (2008). The political context of evaluation: What does this mean for independence and objectivity? Evaluation Journal of Australasia, 8(2), 35–41.

Mayne, J. (2008). Building an evaluative culture for effective evaluation and results management. Retrieved from http://www.cgiar-ilac.org/files/publications/briefs/ ILAC_Brief20_Evaluative_Culture.pdf

Mayne, J., & Rist, R. C. (2006). Studies are not enough: The necessary transformation of evaluation. Canadian Journal of Program Evaluation, 21(3), 93–120.

Morgan, G. (2006). Images of organization (Updated ed.). Thousand Oaks, CA: Sage. Moynihan, D. P. (2008). The dynamics of performance management: Constructing information and

reform. Washington, DC: Georgetown University Press. Muller-Clemm, W. J., & Barnes, M. P. (1997). A historical perspective on federal program evaluation

in Canada. Canadian Journal of Program Evaluation, 12(1), 47–70. Nathan, R. P. (2000). Social science in government: The role of policy researchers (Updated ed.).

https://jigsaw.vitalsource.com/api/v0/books/9781452289595/print?from...

24 of 26 2/4/2016 9:42 AM

Albany, NY: Rockefeller Institute Press. New York City Department of Homeless Services. (2010). City council hearing general welfare

committee “Oversight: DHS’s Homebase Study.” Retrieved from http://nycppf.org/html/dhs /downloads/pdf/ abt_testimony_120910.pdf

Norris, N. (2005). The politics of evaluation and the methodological imagination. American Journal of Evaluation, 26(4), 584–586.

Office of the Comptroller General of Canada. (1981). Guide on the program evaluation function. Ottawa, Ontario, Canada: Treasury Board of Canada Secretariat.

Organisation for Economic Cooperation and Development. (2010). Evaluation in development agencies: Better aid. Paris, France: Author.

Owen, J. M., & Rogers, P. J. (1999). Program evaluation: Forms and approaches (International ed.). Thousand Oaks, CA: Sage.

Patton, M. Q. (1994). Developmental evaluation. Evaluation Practice, 15(3), 311–319. Patton, M. Q. (1997). Utilization-focused evaluation: The new century text (3rd ed.). Thousand Oaks,

CA: Sage. Patton, M. Q. (2008). Utilization-focused evaluation (4th ed.). Thousand Oaks, CA: Sage. Patton, M. Q. (2011). Developmental evaluation: Applying complexity to enhance innovation and use.

New York: Guilford Press. Radcliffe, V. S. (1998). Efficiency audit: An assembly of rationalities and programmes. Accounting,

Organizations and Society, 23(4), 377–410. Rist, R. C., & Stame, N. (Eds.). (2006). From studies to streams: Managing evaluative systems (Vol.

12). New Brunswick, NJ: Transaction. Scriven, M. (1997). Truth and objectivity in evaluation. In E. Chelimsky & W. R. Shadish (Eds.),

Evaluation for the 21st century: A handbook (pp. 477–500). Thousand Oaks, CA: Sage. Scriven, M. (2005). Review of the book: Empowerment evaluation principles in practice. American

Journal of Evaluation, 26(3), 415–417. Senge, P. M. (1990). The fifth discipline: The art and practice of the learning organization (1st ed.).

New York: Doubleday/Currency. Smith, N. L. (2007). Empowerment evaluation as evaluation ideology. American Journal of

Evaluation, 28(2), 169–178. Smits, P., & Champagne, F. (2008). An assessment of the theoretical underpinnings of practical

participatory evaluation. American Journal of Evaluation, 29(4), 427–442. Stufflebeam, D. L. (1994). Empowerment evaluation, objectivist evaluation, and evaluation standards:

Where the future of evaluation should not go and where it needs to go. Evaluation Practice, 15(3), 321–338.

Treasury Board of Canada Secretariat. (1990). Program evaluation methods: Measurement and attribution of program results (3rd ed.). Ottawa, Ontario, Canada: Deputy Comptroller General Branch, Government Review and Quality Services.

Treasury Board of Canada Secretariat. (2009). Policy on evaluation. Retrieved from http://www.tbs- sct.gc.ca/pol/doc-eng.aspx?id=15024

U.S. Office of Management and Budget. (2002). Program performance assessments for the FY 2004 budget: Memorandum for heads of executive departments and agencies from Mitchell E. Daniels

https://jigsaw.vitalsource.com/api/v0/books/9781452289595/print?from...

25 of 26 2/4/2016 9:42 AM

Jr. Retrieved from http://www.whitehouse.gov/sites/default/files/omb/ assets/omb/memoranda/m02-10.pdf

U.S. Office of Management and Budget. (2004). What constitutes strong evidence of a program’s effectiveness? Retrieved from http://www.whitehouse.gov/omb/part/2004_ program_eval.pdf

Watson, K. F. (1986). Programs, experiments, and other evaluations: An interview with Donald Campbell. Canadian Journal of Program Evaluation, 1(1), 83–86.

Wildavsky, A. B. (1979). Speaking truth to power: The art and craft of policy analysis. Boston, MA: Little, Brown.

Wolff, E. (1979). Proposed approach to program evaluation in the Government of British Columbia. Victoria, British Columbia, Canada: Treasury Board.

Yarbrough, D., Shulha, L., Hopson, R., & Caruthers, F. (2011). Joint committee on standards for educational evaluation: A guide for evaluators and evaluation users (3rd ed.). Thousand Oaks, CA: Sage.

https://jigsaw.vitalsource.com/api/v0/books/9781452289595/print?from...

26 of 26 2/4/2016 9:42 AM