The previous chapter presented an overview of the many considerations that go into tailoring an evaluation. Although all those matters are important to evaluation design, the essence of evaluation is generating credible answers to questions about the performance of a social program. Good evaluation questions must address issues that are meaningful in relation to the nature of the program and also of concern to key stakeholders. They must be answerable with the research techniques available to the evaluator and formulated so that the criteria by which the corresponding program performance will be judged are explicit or can be determined in a straightforward way.

A set of carefully crafted evaluation questions, therefore, is the hub around which evaluation revolves. It follows that a careful, explicit formulation of those questions greatly facilitates the design of the evaluation and the use of its findings. Evaluation questions may take various forms, some of which are more useful and meaningful than others for stakeholders and program decisionmakers. Furthermore, some forms of evaluation questions are more amenable to the evaluator’s task of providing credible answers, and some address critical program effectiveness issues more directly than others.

This chapter discusses practical ways in which evaluators can fashion effective evaluation questions. An essential step is identification of the decisionmakers who will use the evaluation results, what information they need, and how they expect to use it. The evaluator’s own analysis of the program is also important. One approach that is particularly useful for this purpose is articulation of the program theory, a detailed account of how and why the program is supposed to work. Consideration of program theory focuses attention on critical events and premises that may be appropriate topics of inquiry in the evaluation.

Acritical phase in an evaluation is the identification and formulation of the questions the evaluation is to address. One might assume that this step would be very straightforward, indeed, that the questions would be stipulated routinely as part of the process of commissioning the evaluation. As described in Chapter 2, however, it is rare for final, workable evaluation questions to be specified clearly by the evaluation sponsor at the beginning of an evaluation. Nor can the evaluator usually step in and define those questions solely on the basis of his or her professional expertise. That maneuver would increase the risk that the evaluation would not be responsive to stake-holder concerns, would not be useful or used, and would be attacked as irrelevant or inappropriate.

To ensure that the evaluation will focus on the matters of greatest concern to the pertinent decisionmakers and stakeholders, the initial evaluation questions are best formulated through discourse and negotiation with them. Equally important, engaging key stakeholders increases the likelihood that they will understand, appreciate, and make effective use of the findings when they become available.

Although input from stakeholders is critical, the evaluator should not depend solely on their perspective to identify the issues the evaluation will address. Sometimes the evaluation sponsors are very knowledgeable about evaluation and will have formulated a complete and workable set of questions to which the evaluation should attend. More often, however, the evaluation sponsors and program stakeholders are not especially expert at evaluation or, if so, have not done all the groundwork needed to focus the evaluation. This means that the evaluator will rarely be presented at the outset with a finished list of issues the evaluation should address for the results to be useful, interpretable, and complete. Nor will the questions that are put forward generally be formulated in a manner that permits ready translation into research design.

The evaluator, therefore, also has a crucial role in the framing of evaluation questions. The stakeholders will be the experts on the practical and political issues facing the program, but the evaluator should know the most about how to analyze a program and focus an evaluation. The evaluator must be prepared to raise issues that otherwise might be overlooked, identify aspects of the program’s operations and outcomes that might warrant inquiry, and work with stakeholders to translate their concerns into questions that evaluation research can actually answer.

It is generally wise for the evaluator to develop a written summary of the specific questions that will guide the evaluation design. This provides a useful reference to consult while designing the evaluation and selecting research procedures. Perhaps more important, the evaluator can discuss this summary statement with the evaluation sponsor and key stakeholders to ensure that it encompasses their concerns. Such a procedure also can safeguard against later misunderstanding of what the evaluation was supposed to accomplish.

The remainder of this chapter examines the two most important topics related to specifying the questions that will guide an evaluation: (1) how to formulate evaluation questions in such a way that they can be addressed using the research procedures available to the evaluator, and (2) how to determine the specific questions on which the evaluation should focus.

3.1 What Makes a Good Evaluation Question?

The form that evaluation questions should take is shaped by the functions they must perform. Their principal role is to focus the evaluation on the areas of program performance at issue for key decisionmakers and stakeholders. They should also be able to facilitate the design of a data collection procedure that will provide meaningful information about that area of performance. In particular, a good evaluation question must identify a distinct dimension of relevant program performance and do so in such a way that the quality of the performance can be credibly assessed. Such assessment, in turn, requires an accurate description of the nature of the performance and some standard by which it can be evaluated (see Exhibit 3-A). Each of these aspects of good evaluation questions warrants further discussion.

EXHIBIT 3-A What It Means to Evaluate Something

There are different kinds of inquiry across practice areas, such as that which is found in law, medicine, and science. Common to each kind of inquiry is a general pattern of reasoning or basic logic that guides and informs the practice… . Evaluation is one kind of inquiry, and it, too, has a basic logic or general pattern of reasoning [that has been put forth by Michael Scriven]… . This general logic of evaluation is as follows:

1. Establishing criteria of merit. On what dimensions must the evaluand [thing being evaluated] do well?

2. Constructing standards. How well should the evaluand perform?

3. Measuring performance and comparing with standards. How well did the evaluand perform?

4. Synthesizing and integrating data into a judgment of merit or worth. What is the merit or worth of the evaluand?

… To evaluate anything means to assess the merit or worth of something against criteria and standards. The basic logic explicated by Scriven reflects what it means when we use the term to evaluate.

SOURCE: Quoted from Deborah M. Fournier, Establishing Evaluative Conclusions: A Distinction Between General and Working Logic, New Directions for Evaluation, no. 68 (San Francisco: Jossey-Bass, 1995), p. 16.

Dimensions of Program Performance

Good evaluation questions must first of all be reasonable and appropriate. That is, they must identify performance dimensions that are relevant to the expectations stakeholders hold for the program and that represent domains in which the program can realistically hope to have accomplishments. It would hardly be fair or sensible, for instance, to ask if a low-income housing weatherization program reduced the prevalence of drug dealing in a neighborhood. Nor would it generally be useful to ask a question as narrow as whether the program got a bargain in its purchase of file cabinets for its office. Furthermore, evaluation questions must be answerable; that is, they must involve performance dimensions that are sufficiently specific, concrete, practical, and measurable that meaningful information can be obtained about their status. An evaluator would have great difficulty determining whether an adult literacy program improved a community’s competitiveness in the global economy or whether the counselors in a drug prevention program were sufficiently caring in their relations with clients.

Evaluation Questions Must Be Reasonable and Appropriate

Program advocates often proclaim grandiose goals (e.g., improve the quality of life for children), expect unrealistically large effects, or believe the program to have accomplishments that are clearly beyond its actual capabilities. Good evaluation questions deal with performance dimensions that are appropriate and realistic for the program. This means that the evaluator must often work with relevant stakeholders to scale down and focus the evaluation questions. The manager of a community health program, for instance, might initially ask, “Are our education and outreach services successful in informing the public about the risk of AIDS?” In practice, however, those services may consist of little more than occasional presentations by program staff at civic club meetings and health fairs. With this rather modest level of activity, it may be unrealistic to expect the public at large to receive much AIDS information. If a question about this service is deemed important for the evaluation, a better version might be something such as “Do our education and outreach services raise awareness of AIDS issues among the audiences addressed?” and “Do those audiences represent community leaders who are likely to influence the level of awareness of AIDS issues among other people?”

There are two complementary ways for an evaluator, in collaboration with pertinent stakeholders, to assess how appropriate and realistic a candidate evaluation question is. The first is to examine the question in the context of the actual program activities related to it. In the example above, for instance, the low-key nature of the education and outreach services was clearly not up to the task of “informing the public about the risk of AIDS,”and there would be little point in having the evaluation attempt to determine if this was the actual outcome. The evaluator and relevant stakeholders should identify and scrutinize the program components, activities, and personnel assignments that relate to program performance and formulate the evaluation question in a way that is reasonable given those characteristics.

The second way to assess whether candidate evaluation questions are reasonable and appropriate is to analyze them in relationship to the findings reported in applicable social science and social service literature. For instance, the sponsor of an evaluation of a program for juvenile delinquents might initially ask if the program increases the self-esteem of the delinquents, in the belief that inadequate self-esteem is a problem for these juveniles and improvements will lead to better behavior. Examination of the applicable social science research, however, will reveal that juvenile delinquents do not generally have problems with self-esteem and, moreover, that increases in self-esteem are not generally associated with reductions in delinquency. In light of this information, the evaluator and the evaluation sponsor may well agree that the question of the program’s impact on self-esteem is not appropriate.

The foundation for formulating appropriate and realistic evaluation questions is detailed and complete program description. Early in the process, the evaluator should become thoroughly acquainted with the program—how it is structured, what activities take place, the roles and tasks of the various personnel, the nature of the participants, and the assumptions inherent in its principal functions. The stakeholder groups with whom the evaluator collaborates (especially program managers and staff) will also have knowledge about the program, of course. Evaluation questions that are inspired by close consideration of actual program activities and assumptions will almost automatically be appropriate and realistic.

Evaluation Questions Must Be Answerable

It is obvious that the evaluation questions around which an evaluation plan is developed should be answerable. Questions that cannot be answered may be intriguing to philosophers but do not serve the needs of evaluators and the decisionmakers who intend to use the evaluation results. What is not so obvious, perhaps, is how easy it is to formulate an unanswerable evaluation question without realizing it. This may occur because the terms used in the question, although seemingly commonsensical, are actually ambiguous or vague when the time comes for a concrete interpretation (“Does this program enhance family values?”). Or sensible-sounding questions may invoke issues for which there are so few observable indicators that little can be learned about them (“Are the case managers sensitive to the social circumstances of their clients?”). Also, some questions lack sufficient indication of the relevant criteria to permit a meaningful answer (“Is this program successful?”). Finally, some questions may be answerable only with more expertise, data, time, or resources than are available to the evaluation (“Do the prenatal services this program provides to high-risk women increase the chances that their children will complete college?”).

For an evaluation question to be answerable, it must be possible to identify some evidence or “observables” that can realistically be obtained and that will be credible as the basis for an answer. This generally means developing questions that involve measurable performance dimensions stated in terms that have unambiguous and noncontroversial definitions. In addition, the relevant standards or criteria must be specified with equal clarity. Suppose, for instance, that a proposed evaluation question for a compensatory education program like Head Start is,“Are we reaching the children most in need of this program?” To affirm that this is an answerable question, the evaluator should be able to do the following:

1. Define the group of children at issue (e.g., those in census tract such and such, four or five years old, living in households with annual income under 150% of the federal poverty level).

2. Identify the specific measurable characteristics and cutoff values that represent the greatest need (e.g., annual income below the federal poverty level, single parent in the household with educational attainment of less than high school).

3. Give an example of the evaluation finding that might result (e.g., 60% of the children currently served fall in the high-need category; 75% of the high-need children in the catchment area—the geographic area being served by the program— are not enrolled in the program).

4. Stipulate the evaluative criteria (e.g., to be satisfactory, at least 90% of the children in the program should be high need and at least 50% of the high-need children in the catchment area should be in the program).

5. Have the evaluation sponsors and other pertinent stakeholders (who should be involved in the whole process) agree that a finding meeting these criteria would, indeed, answer the question.

If such conditions can be met and, in addition, the resources are available to collect, analyze, and report the applicable data, then the evaluation question can be considered answerable.

Criteria for Program Performance

Beginning a study with a reasonable, answerable question or set of questions, of course, is customary in the social sciences (where the questions often are framed as hypotheses). What distinguishes evaluation questions is that they have to do with performance and are associated, at least implicitly, with some criteria by which that performance can be judged. Identifying the relevant criteria was mentioned above as part of what makes an evaluation question answerable. However, this is such an important and distinctive aspect of evaluation questions that it warrants separate discussion.

When program managers or evaluation sponsors ask such things as “Are we targeting the right client population?” or “Do our services benefit the recipients?” they are not only asking for a description of the program’s performance with regard to serving appropriate clients and providing services that yield benefits. They are also asking if that performance is good enough according to some standard or judgment. There is likely little doubt that at least a few of the “right client population” receive services or that some recipients receive some benefit from services. But is it enough? Some criterion level must be set by which the numbers and amounts can be evaluated on those performance dimensions.

One implication of this distinctive feature of evaluation is that good evaluation questions will, when possible, convey the performance criterion or standard that is applicable as well as the performance dimension that is at issue. Thus, evaluation questions should be much like this:“ Is at least 75% of the program clientele appropriate for services?”(by some explicit definition of appropriate) or “Do the majority of those who receive the employment services get jobs within 30 days of the conclusion of training that they keep at least three months?” In addition, the performance standards represented in these questions should have some defensible, though possibly indirect, relationship to the social needs the program addresses. There must be some reason why attaining that standard is meaningful, and the strongest rationale is that it represents a level of performance sufficient for that program function to contribute effectively to the overall program purpose of improving the target social conditions.

A considerable complication for the evaluator is that the applicable performance criteria may take different forms for various dimensions of program performance (see Exhibit 3-B). Moreover, it is not always possible to establish an explicit, consensual performance standard in advance of collecting data and reporting results. Nonetheless, to the extent that the formulation of the initial evaluation questions includes explicit criteria on which key stakeholders agree, evaluation planning is made easier and the potential for disagreement over the interpretation of the evaluation results is reduced.

We stress that the criterion issue cannot be avoided. An evaluation that only describes program performance, and does not attempt to assess it, is not truly an evaluation (by definition, as indicated in 3-A). At most, such an evaluation only pushes the issue of setting criteria and judging performance onto the consumer of the information.

With these considerations in mind, we turn attention to the kinds of performance criteria that are relevant to the formulation of useful evaluation questions. Perhaps the most common criteria are those based on program goals and objectives. In this case, certain desirable accomplishments are identified as the program aims by program officials and sponsors. Often these statements of goals and objectives are not very specific with regard to the nature or level of program performance they represent. (As will be noted later in this chapter, evaluators should distinguish between general program goals and specific, measurable objectives.) One of the goals of a shelter for battered women, for instance, might be to “empower them to take control of their own lives.” Although reflecting commendable values, this statement gives no indication of the tangible manifestations of such empowerment or what level of empowerment constitutes attainment of this goal. Considerable discussion with stakeholders may be necessary to translate such statements into mutually acceptable terminology that describes the intended outcomes concretely, identifies the observable indicators of those outcomes, and specifies the level of accomplishment that would be considered a success in accomplishing the stated goal.

EXHIBIT 3-B Many Criteria May Be Relevant to Program Performance

The standards by which program performance may be judged in an evaluation include the following:

· The needs or wants of the target population

· Stated program goals and objectives

· Professional standards

· Customary practice; norms for other programs

· Legal requirements

· Ethical or moral values; social justice, equity

· Past performance; historical data

· Targets set by program managers

· Expert opinion

· Preintervention baseline levels for the target population

· Conditions expected in the absence of the program (counterfactual)

· Cost or relative cost

Some program objectives, on the other hand, may be very specific. These often come in the form of administrative objectives adopted as targets for routine program functions. The target levels may be set according to past experience, the experience of comparable programs, a judgment of what is reasonable and desirable, or maybe only a “best guess.” Examples of administrative objectives may be to complete intake actions for 90% of the referrals within 30 days, to have 75% of the clients complete the full term of service, to have 85% “good” or “outstanding” ratings on a client satisfaction questionnaire, to provide at least three appropriate services to each person under case management, and the like. There is typically a certain amount of arbitrariness in these criterion levels. But, if they are administratively stipulated or can be established through stakeholder consensus, and if they are reasonable, they are quite serviceable in the formulation of evaluation questions and the interpretation of the subsequent findings. However, it is not generally wise for the evaluator to press for specific statements of target performance levels if the program does not have them or cannot readily and confidently develop them. Setting such targets with little justification only creates a situation in which they are arbitrarily revised when the evaluation results are in.

In some instances, there are established professional standards that can be invoked as performance criteria. This is particularly likely in medical and health programs, where practice guidelines and managed care standards have developed that may be relevant for setting desirable performance levels. Much more common, however, is the situation where there are no established criteria or even arbitrary administrative objectives to invoke. A typical situation is one in which the performance dimension is clearly recognized but there is ambiguity about the criteria for good performance on that dimension. For instance, stakeholders may agree that the program should have a low drop-out rate, a high proportion of clients completing service, a high level of client satisfaction, and the like, but only nebulous ideas as to what level constitutes “low” or “high.” Sometimes the evaluator can make use of prior experience or find information in the evaluation and program literature that provides a reasonable basis for setting a criterion level. Another approach is to collect judgment ratings from relevant stake-holders to establish the criterion levels or, perhaps, criterion ranges, that can be accepted to distinguish, say, high, medium, and low performance.

Establishing a criterion level can be particularly difficult when the performance dimension in an evaluation question involves outcome or impact issues. Program stakeholders and evaluators alike may have little idea about how much change on a given outcome variable (e.g., a scale of attitude toward drug use) is large and how much is small. By default, these judgments are often made on the basis of statistical criteria; for example, a program is judged to be effective solely because the measured effects are statistically significant. This is a poor practice for reasons that will be more fully examined when impact evaluation is discussed later in this volume. Statistical criteria have no intrinsic relationship to the practical significance of a change on an outcome dimension and can be misleading. A juvenile delinquency program that is found to have the statistically significant effect of lowering delinquency recidivism by 2% may not make a large enough difference to be worthwhile continuing. Thus, as much as possible, the evaluator should use the techniques suggested above to attempt to determine and specify in practical terms what “success” level is appropriate for judging the nature and magnitude of the program’s effects.

Typical Evaluation Questions

As is evident from the discussion so far, well-formulated evaluation questions are very concrete and specific to the program at issue and the circumstances of the prospective evaluation. It follows that the variety of questions that might be relevant to some social program or another is enormous. As noted in Chapter 2, however, evaluation questions typically deal with one of five general program issues. Some of the more common questions in each category, stated in summary form, are as follows.

Questions about the need for program services (needs assessment):

· What are the nature and magnitude of the problem to be addressed?

· What are the characteristics of the population in need?

· What are the needs of the population?

· What services are needed?

· How much service is needed, over what time period?

· What service delivery arrangements are needed to provide those services to the population?

Questions about the program’s conceptualization or design (assessment of program theory):

· What clientele should be served?

· What services should be provided?

· What are the best delivery systems for the services?

· How can the program identify, recruit, and sustain the intended clientele?

· How should the program be organized?

· What resources are necessary and appropriate for the program?

Questions about program operations and service delivery (assessment of program process):

· Are administrative and service objectives being met?

· Are the intended services being delivered to the intended persons?

· Are there needy but unserved persons the program is not reaching?

· Once in service, do sufficient numbers of clients complete service?

· Are the clients satisfied with the services?

· Are administrative, organizational, and personnel functions handled well?

Questions about program outcomes (impact assessment):

· Are the outcome goals and objectives being achieved?

· Do the services have beneficial effects on the recipients?

· Do the services have adverse side effects on the recipients?

· Are some recipients affected more by the services than others?

· Is the problem or situation the services are intended to address made better?

Questions about program cost and efficiency (efficiency assessment):

· Are resources used efficiently?

· Is the cost reasonable in relation to the magnitude of the benefits?

· Would alternative approaches yield equivalent benefits at less cost?

There is an important relationship among these categories of questions that the evaluator must recognize. Each successive type of question draws much of its meaning from the answers to the questions in the prior set. Questions about a program’s conceptualization and design, for instance, depend very much on the nature of the need the program is intended to address. Different needs are best met by different types of programs. If the need a program targets is lack of economic resources, the appropriate program concepts and the evaluation questions that might be asked about them will be different than if the program is intended to reduce drunken driving. Moreover, the most appropriate criteria for judging the program design relate to how well it fits with the need and the circumstances of those in need.

Similarly, a central set of questions about program operations and service delivery has to do with how well the program design is actually implemented in practice. Correspondingly, the principal basis for assessing program operations and service delivery is in terms of fidelity to the intended program design. Key evaluation questions, therefore, relate to whether the program as intended has actually been implemented. This means that the criteria for assessing implementation are drawn, at least in part, from assumptions about how the program is intended to function, that is, from its basic conceptualization and design. If we know the basic concept of a program is to feed the homeless through a soup kitchen, we readily recognize a problem with program operations if no food is actually distributed to homeless individuals.

Questions about program outcome, in turn, are meaningful only if the program design is well implemented. Program services that are not actually delivered, or are not the intended services, cannot generally be expected to produce the desired outcomes. Evaluators call it implementation failure when the outcomes are poor because the program activities assumed necessary to bring about the desired improvements did not actually occur. It would be an implementation failure if there were no improvements in the nutritional status of the homeless because the soup kitchen was so rarely open that few homeless individuals were actually able to eat there.

A program may be well implemented and yet fail to achieve the desired results because the program concepts embodied in the corresponding services are faulty. When the program conceptualization and design are not capable of generating the desired outcomes no matter how well implemented, evaluators refer to theory failure. In the sequence of evaluation questions shown above, therefore, those relating to outcomes are meaningful only when program operations and services are well implemented. Good implementation, in turn, is meaningful only if the program design that identifies the intended operations and services represents an appropriate response to the social problem and needs the program is attempting to alleviate. If the plan for the soup kitchen locates it a great distance from where homeless individuals congregate, it will provide little benefit to them no matter how well it is implemented. In this case, a key aspect of the basic design of the program (location) does not connect well with the needs of the target population.

The last set of evaluation questions in the list above relating to program cost and efficiency also draw much of their significance from the issues that precede them on the list. A program must produce at least minimal outcomes before questions about the cost of attaining those outcomes or more efficient ways of attaining them are interesting. We might, for instance, assess the costs of a soup kitchen for feeding the homeless. However, if there are no benefits because no homeless individuals are actually fed, then there is little to say except that any cost is too much. The lack of positive outcomes, whether resulting from poor implementation or faulty program design, makes the cost issues relatively trivial.

The Evaluation Hierarchy

The relationships among the different types of evaluation questions just described define a hierarchy of evaluation issues that has implications beyond simply organizing categories of questions. As mentioned in the overview of kinds of evaluation studies in Chapter 2, the types of evaluation questions and the methods for answering them are sufficiently distinct that each constitutes a form of evaluation in its own right. These groupings of evaluation questions and methods constitute the building blocks of evaluation research and, individually or in combination, are recognizable in virtually all evaluation studies. As shown in Exhibit 3-C, we can think of these evaluation building blocks in the form of a hierarchy in which each rests on those below it.

EXHIBIT 3-C The Evaluation Hierarchy

The foundational level of the evaluation hierarchy relates to the need for the program. Assessment of the nature of the social problem and the need for intervention produces the diagnostic information that supports effective program design, that is, a program theory for how to address the social conditions the program is intended to improve. Given a credible program theory, the next level of evaluation is to assess how well it is implemented. This is the task of process or implementation evaluation.

If we know that the social need is properly understood, the program theory for addressing it is reasonable, and the corresponding program activities and services are well implemented, then it may be meaningful to assess program outcomes. Undertaking an impact evaluation to assess outcomes thus necessarily presupposes acceptable results from assessments of the issues below it on the evaluation hierarchy. If assessments have not actually been conducted on the logically prior issues when an impact evaluation is done, its results are interpretable only to the extent that justifiable assumptions can be made about those issues.

At the top of the hierarchy we have assessment of program cost and efficiency. Pursuing questions about these matters is a relatively high-order evaluation task that assumes knowledge about all the supporting issues below it in the hierarchy. This is because answers about cost and efficiency issues are generally interpretable only when there is also information available about the nature of the program outcomes, implementation, theory, and the social problem addressed.

It follows that a major consideration in identifying the evaluation questions to be answered in a program evaluation is what is already known or can be confidently assumed about the program. It would not be sensible for the evaluation to focus on the assessment of outcomes, for instance, if there was uncertainty about how well the program was conceptualized and implemented in relation to the nature of the social conditions it was intended to improve. If there are relevant questions about these more fundamental matters, they must be built into the evaluation plan along with the outcome questions.

When developing the questions around which the plan for an evaluation will revolve, therefore, it is best for the evaluator to start at the bottom of the evaluation hierarchy and consider first what is known and needs to be known about the most fundamental issues. When the assumptions that can be safely made are identified and the questions that must be answered are defined, then it is appropriate to move to the next level of the hierarchy. There the evaluator can determine if the questions at that level will be meaningful in light of what will be available about the more fundamental issues.

By keeping in mind the logical interdependencies between the levels in the evaluation hierarchy and the corresponding evaluation building blocks, the evaluator can focus the evaluation on the questions most appropriate to the program situation. At the same time, many mistakes of premature attention to higher-order evaluation questions can be avoided.

3.2 Determining the Specific Questions the Evaluation Should Answer

The families of evaluation questions relating to the need, design, implementation, outcomes, and cost of a program are not mutually exclusive. Questions in more than one category could be relevant to a program for which an evaluation was being planned. To

develop an appropriate evaluation plan, the many possible questions that might be asked about the program must be narrowed down to those specific ones that are most relevant to the program’s circumstances. (See Exhibit 3-D for an example of specific evaluation questions for an actual program.)

As we emphasized in Chapter 2, the concerns of the evaluation sponsor and other major stakeholders should play a central role in shaping the questions that structure the evaluation plan. In the discussion that follows, therefore, we first examine the matter of obtaining appropriate input from the evaluation sponsor and relevant stakeholders prior to and during the design stage of the evaluation.

However, it is rarely appropriate for the evaluator to rely only on input from the evaluation sponsor and stakeholders to determine the questions on which the evaluation should focus. Because of their close familiarity with the program, stake-holders may overlook critical, but relatively routine, aspects of program performance. Also, the experience and knowledge of the evaluator may yield distinctive insights into program issues and their interrelations that are important for identifying relevant evaluation questions. Generally, therefore, it is desirable for the evaluator to make an independent analysis of the areas of program performance that may be pertinent for investigation. Accordingly, the second topic addressed in the discussion that follows is how the evaluator can analyze a program in a way that will uncover potentially important evaluation questions for consideration in designing the evaluation. The evaluation hierarchy, described above, is one tool that can be used for this purpose. Especially useful is the concept of program theory. By depicting the significant assumptions and expectations on which the program depends for its success, the program theory can highlight critical issues that the evaluation should look into.

Representing the Concerns of the Evaluation Sponsor and Major Stakeholders

In planning and conducting an evaluation, evaluators usually find themselves confronted with multiple stakeholders who hold different and sometimes conflicting views on the program or its evaluation and whose interests will be affected by the outcome (see Exhibit 3-E for an illustration). Recall from Chapter 2 that the stakeholders typically encountered include policymakers and decisionmakers, program and evaluation sponsors, target participants, program managers and staff, program competitors, contextual stakeholders, and the evaluation and research community. At the planning stage, the evaluator usually attempts to identify all the stakeholders with an important point of view on what questions should be addressed in the evaluation, set priorities among those viewpoints, and integrate as many of the relevant concerns as possible into the evaluation plan.

EXHIBIT 3-D Evaluation Questions for a Neighborhood Afterschool Program

An afterschool program located in an economically depressed area uses the facilities of a local elementary school to provide free afterschool care from 3:30 to 6:00 for the children of the neighborhood. The program’s goals are to provide a safe, supervised environment for latchkey children and to enhance their school performance through academic enrichment activities. The following are examples of the questions that an evaluation might be designed to answer for the stakeholders in this program, together with the standards that make the questions answerable:

Is there a need for the program?

Question:	How many latchkey children reside within a radius of 1.5 miles of the school? Latchkey children are defined as those of elementary school age who are without adult supervision during some period after school at least once a week during the school year.
Standard:	There should be at least 100 such children in the defined neighborhood. The planned enrollment for the program is 60, which should yield enough children in attendance on any given day for efficient staffing, and it is assumed that some eligible children will not enroll for various reasons.
Question:	What proportion of the children enrolled in the program are actually latchkey children?
Standard:	At least 75% of the enrolled children should meet the definition for latchkey children. This is an administrative target that reflects the program’s intent that a large majority of the enrollees be latchkey children while recognizing that other children will be attracted to, and appropriate for, the program even though not meeting that definition. Is the program well designed?
Is the program well designed?
Question:	Are the planned educational activities the best ones for this clientele and the purposes of enhancing their performance in school?
Standard:	There should be indications in the educational research literature to show that these activities have the potential to be effective. In addition, experienced teachers for the relevant grade levels should endorse these activities.
Question:	Is there a sufficient number of staff positions in the program?
Standard:	The staff-student ratio should exceed the state standards for licensed child care facilities.
Is the program implemented effectively?
Question:	What is the attendance rate for enrolled children?
Standard:	All enrolled children should either be in attendance every afternoon for which they are scheduled or excused with parental permission.
Question:	Is the program providing regular support for school homework and related tasks?
Standard:	There should be an average of 45 minutes of supervised study time for completion of homework and reading each afternoon, and all the attending children should participate.
Does the program have the intended outcomes?
Question:	Is there improvement in the attitudes of the enrolled children toward school?
Standard:	At least 80% of the children should show measurable improvement in their attitudes toward school between the beginning and end of the school year. Norms for similar students show that their attitudes tend to get worse each year of elementary school; the program objective is to reverse this trend, even if the improvement is only slight.
Question:	Is there an improvement in the academic performance of the enrolled children in their regular school work?
Standard:	The average term grades on academic subjects should be at least a half letter grade better than they would have been had the children not participated in the program.
Is the program cost-effective?
Question:	What is the cost per child for running this program beyond the fixed expenses associated with the regular operation of the school facility?
Standard:	Costs per child should be near or below the average for similar programs run in other school districts in the state.
Question:	Would the program be equally effective and less costly if staffed by community volunteers (except the director) rather than paid paraprofessionals?
Standard:	The annual cost of a volunteer-based program, including recruiting, training, and supporting the volunteers, would have to be at least 20% less than the cost of the current program with no loss of effectiveness to justify the effort associated with making such a change.

EXHIBIT 3-E Diverse Stakeholder Perspectives on an Evaluation of a Multiagency Program for the Homeless

The Joint Program was initiated to improve the accessibility of health and social services for the homeless population of Montreal through coordinated activities involving provincial, regional, and municipal authorities and more than 20 nonprofit and public agencies. The services developed through the program included walk-in and referral services, mobile drop-in centers, an outreach team in a community health center, medical and nursing care in shelters, and case management. To ensure stakeholder participation in the evaluation, an evaluation steering committee was set up with representatives of the different types of agencies involved in the program and which, in turn, coordinated with two other stakeholder committees charged with program responsibilities.

Even though all the stakeholders shared a common cause to which they were firmly committed—the welfare of the homeless—they had quite varied perspectives on the evaluation. Some of these were described by the evaluators as follows:

The most glaring imbalance was in the various agencies’ different organizational cultures, which led them to experience their participation in the evaluation very differently. Some of the service agencies involved in the Joint Program and its evaluation were front-line public organizations that were accustomed to viewing their actions in terms of a mandate with a target clientele. They were familiar with the evaluation process, both as an administrative procedure and a measurement of accountability. Among the nonprofit agencies, however, some relative newcomers who had been innovators in the area of community-based intervention were hoping the evaluation would recognize the strengths of their approach and make useful suggestions for improvement. Other nonprofit groups were offshoots of religious or charitable organizations that had been involved with the homeless for a very long time. For those groups the evaluation (and the logical, planning-based program itself) was a procedure completely outside of anything in their experience. They perceived the evaluators as outsiders meddling in a reality that they had managed to deal with up until now, under very difficult conditions. Their primary concern was the client. More than the public agencies, they probably saw the evaluation as a waste of time, money, and energy. Most of the day centers involved in the program fell into this category. They were the ones who were asked to take part in a process with which they were unfamiliar, alongside their counterparts in the public sector who were much better versed in research procedures. (p. 471)

SOURCE: From Céline Mercier, “Participation in Stakeholder-Based Evaluation: A Case Study,” Evaluation and Program Planning, 1997, 20(4):467-475.

The starting point, of course, is the evaluation sponsors. Those who have commissioned and funded the evaluation rightfully have priority in defining the issues it should address. Sometimes evaluation sponsors have stipulated the evaluation questions and methods completely and want the evaluator only to manage the practical details. In such circumstances, the evaluator should assess which, if any, stakeholder perspectives are excluded by the sponsor’s questions and whether those perspectives are sufficiently distinct and important that omitting them would compromise the evaluation. If so, the evaluator must decide whether to conduct the evaluation under the specified constraints, reporting the limitations and biases along with the results, or attempt to negotiate an arrangement whereby the evaluation is broadened to include additional perspectives.

More often, however, the evaluation sponsors’ initial specifications are not so constrained or nonnegotiable that they rule out consideration of the concerns of other stakeholders. In this situation, the evaluator typically makes the best attempt possible within the constraints of the situation to consult fully with all stakeholders, set reasonable priorities, and develop an evaluation plan that will enhance the information available about the respective concerns of all parties.

Given the usual multiplicity of program stakeholders and their perspectives, and despite an evaluator’s efforts to be inclusive, there is considerable inherent potential for misunderstandings to develop between the evaluator and one or more of the stakeholders about what issues the evaluation should address. It is especially important, therefore, that there be full and frank communication between the evaluator and the stakeholder groups from the earliest possible point in the planning process. Along with obtaining critical input from the stakeholders about the program and the evaluation, this exchange should emphasize realistic, shared understanding of what the evaluation will and will not do, and why. Most essentially, the evaluator should strive to ensure that the key stakeholders understand, and find acceptable, the nature of the evaluation process, the type of information the evaluation will produce, what it might mean if the results come out one way or another, and what ambiguities or unanswered questions may remain.

Obtaining Input From Stakeholders

The major stakeholders, by definition, have a significant interest in the program and the evaluation. It is thus generally straightforward to identify them and obtain their views about the issues and questions to which the evaluation should attend. The evaluation sponsor, program administrators (who may also be the evaluation sponsor), and intended program beneficiaries are virtually always major stakeholders. Identification of other important stakeholders can usually be accomplished by analyzing the network of relationships surrounding a program. The most revealing relationships involve the flow of money to or from the program, political influence on and by the program, those whose actions affect or are affected by the program, and the set of direct interactions between the program and its various boards, patrons, collaborators, competitors, clients, and the like.

A snowball approach may be helpful in identifying the various stakeholder groups and persons involved in relationships with the program. As each such representative is identified and contacted, the evaluator asks for nominations of other persons or groups with a significant interest in the program. Those representatives, in turn, are asked the same question. When this process no longer produces important new nominations, the evaluator can be reasonably assured that all major stakeholders have been identified.

If the evaluation is structured as a collaborative or participatory endeavor with certain stakeholders directly involved in designing and conducting the evaluation (as described in Chapter 2), the participating stakeholders will, of course, have a firsthand role in shaping the evaluation questions. Similarly, an internal evaluator who is part of the organization that administers the program will likely receive forthright counsel from program personnel. Even when such stakeholder involvement is built into the evaluation, however, this arrangement is usually not sufficient to represent the full range of pertinent stakeholder perspectives. There may be important stakeholder groups that are not involved in the participatory structure but have distinct and significant perspectives on the program and the evaluation. Moreover, there may be a range of viewpoints among the members of groups that are represented in the evaluation process so that a broader sampling of opinion is needed than that brought by the designated participant on the evaluation team.

Generally, therefore, formulating responsive evaluation questions requires discussion with members of stakeholder groups who are not directly represented on the evaluation team. Fewer such contacts may be needed by evaluation teams that represent many stakeholders and more by those on which few or no stakeholders are represented. In cases where the evaluation has not initially been organized as a collaborative endeavor with stakeholders, the evaluator may wish to consider configuring such an arrangement to ensure that key stakeholders are engaged and their views fully represented in the design and implementation of the evaluation. Participatory arrangements might be made through stakeholder advisory boards, steering committees, or simply regular consultations between the evaluator and key stakeholder representatives. More information about the procedures and benefits of such approaches can be found in Fetterman, Kaftarian, and Wandersman (1996), Greene (1988), Mark and Shotland (1985), and Patton (1997).

Outside of organized arrangements, evaluators generally obtain stakeholder views about the important evaluation issues through interviews. Because early contacts with stakeholders are primarily for orientation and reconnaissance, interviews at this stage typically are unstructured or, perhaps, semistructured around a small set of themes of interest to the evaluator. Input from individuals representing stakeholder groups might also be obtained through focus groups (Krueger, 1988). Focus groups have the advantages of efficiency in getting information from a number of people and the facilitative effect of group interaction in stimulating ideas and observations. They also may have some disadvantages for this purpose, notably the potential for conflict in politically volatile situations and the lack of confidentiality in group settings. In some cases, stake-holders may speak more frankly about the program and the evaluation in one-on-one conversations with the evaluator.

The evaluator will rarely be able to obtain input from every member of every stake-holder group, nor will that ordinarily be necessary to identify the major issues and questions with which the evaluation should be concerned. A modest number of carefully selected stakeholder informants who are representative of significant groups or distinctly positioned in relation to the program is usually sufficient to identify the principal issues. When the evaluator no longer hears new themes in discussions with diverse stakeholders, the most significant prevailing issues have probably all been discovered.

Topics for Discussion With Stakeholders

The issues identified by the evaluation sponsor when the evaluation is requested usually need further discussion with the sponsor and other stakeholders to clarify what these issues mean to the various parties and what sort of information would usefully bear on them. The topics that should be addressed in these discussions will depend in large part on the particulars of the evaluation situation. Here we will review some of the general topics that are often relevant.

Why is an evaluation needed? It is usually worthwhile for the evaluator to probe the reasons an evaluation is desired. The evaluation may be motivated by an external requirement, in which case it is important to know the nature of that requirement and what use is to be made of the results. The evaluation may be desired by program managers to determine whether the program is effective, to find ways to improve it, or to “prove” its value to potential funders, donors, critics, or the like. Sometimes the evaluation is politically motivated, for example, as a stalling tactic to avoid some unpleasant decision. Whatever the reasons, they provide an important starting point for determining what questions will be most important for the evaluation to answer and for whom.

What are the program goals and objectives? Inevitably, whether a program achieves certain of the goals and objectives ascribed to it will be pivotal questions for the evaluation to answer. The distinction between goals and objectives is critical. A program goal relates to the overall mission of the program and typically is stated in broad and rather abstract terms. For example, a program for the homeless may have as its goal “the reduction of homelessness” in its urban catchment area. Although easily understood, such a goal is too vague to determine whether it has been met. Is a “reduction of homelessness”5%, 10%, or 100%? Does “homelessness” mean only those living on the streets or does it include those in shelters or temporary housing? For evaluation purposes, broad goals must be translated into concrete statements that specify the condition to be dealt with together with one or more measurable criteria of success. Evaluators generally refer to specific statements of measurable attainments as program objectives. Related sets of objectives identify the particular accomplishments presumed necessary to attain the program goals. Exhibit 3-F presents helpful suggestions for specifying objectives.

An important task for the evaluator, therefore, is to collaborate with relevant stakeholders to identify the program goals and transform overly broad, ambiguous, or idealized representations of them into clear, explicit, concrete statements of objectives. The more closely the objectives describe situations that can be directly and reliably observed, the more likely it is that a meaningful evaluation will result. Furthermore, it is essential that the evaluator and stakeholders achieve a workable agreement on which program objectives are most central to the evaluation and the criteria to be used in assessing whether those objectives have been met. For instance, if one stated objective of a job training program is to maintain a low drop-out rate, the key stakeholders should agree to its importance before it is accepted as one of the focal issues around which the evaluation will be designed.

If consensus about the important objectives is not attained, one solution is to include all those put forward by the various stakeholders and, perhaps, additional objectives drawn from current viewpoints and theories in the relevant substantive field (Chen, 1990). For example, the sponsors of a job training program may be interested solely in the frequency and duration of postprogram employment. But the evaluator may propose that stability of living arrangements, competence in handling finances, and efforts to obtain additional education be considered as program outcomes because these lifestyle features also may undergo positive change with increased employment and job-related skills.

What are the most important questions for the evaluation to answer? We echo Patton’s (1997) view that priority should be given to evaluation questions that will yield information most likely to be used. Evaluation results are rarely intended by evaluators or evaluation sponsors to be “knowledge for knowledge’s sake.” Rather, they are intended to be useful, and to be used, by those with responsibility for making decisions about the program, whether at the day-to-day management level or at broader funding or policy levels (see Exhibit 3-G for an evaluation manager’s view of this process).

EXHIBIT 3-F Specifying Objectives

Four techniques are particularly helpful for writing useful objectives:(1) using strong verbs, (2) stating only one purpose or aim, (3) specifying a single end-product or result, and (4) specifying the expected time for achievement. (Kirschner Associates, 1975).

A “strong” verb is an action-oriented verb that describes an observable or measurable behavior that will occur. For example, “to increase the use of health education materials” is an action-oriented statement involving behavior which can be observed. In contrast, “to promote greater use of health education materials” is a weaker and less specific statement.

A second suggestion for writing a clear objective is to state only a single aim or purpose. Most programs will have multiple objectives, but within each objective only a single purpose should be delineated. An objective that states two or more purposes or desired outcomes may require different implementation and assessment strategies, making achievement of the objective difficult to determine. For example, the statement “to begin three prenatal classes for pregnant women and provide outreach transportation services to accommodate twenty-five women per class” creates difficulties. This objective contains two aims—to provide prenatal classes and to provide outreach services. If one aim is accomplished but not the other, to what extent has the objective been met?

Specifying a single end-product or result is a third technique contributing to a useful objective. For example, the statement “to begin three prenatal classes for pregnant women by subcontracting with City Memorial Hospital” contains two results, namely, the three classes and the subcontract. It is better to state these objectives separately, particularly since one is a higher-order objective (to begin three prenatal classes) which depends partly on fulfillment of a lower order objective (to establish a subcontract).

A clearly written objective must have both a single aim and a single end-product or result. For example, the statement “to establish communication with the Health Systems Agency” indicates the aim but not the desired end-product or result. What constitutes evidence of communication— telephone calls, meetings, reports? Failure to specify a clear end-product makes it difficult for assessment to take place.

Those involved in writing and evaluating objectives need to keep two questions in mind. First, would anyone reading the objective find the same purpose as the one intended? Second, what visible, measurable, or tangible results are present as evidence that the objective has been met? Purpose or aim describes what will be done; end-product or result describes evidence that will exist when it has been done. This is assurance that you “know one when you see one.”

Finally, it is useful to specify the time of expected achievement of the objective. The statement “to establish a walk-in clinic as soon as possible” is not a useful objective because of the vagueness of “as soon as possible.” It is far more useful to specify a target date, or in cases where some uncertainty exists about some specific date, a range of target dates—for example, “sometime between March 1 and March 30.”

SOURCE: Adapted, with permission, from Stephen M. Shortell and William C. Richardson, Health Program Evaluation (St. Louis, MO: C. V. Mosby, 1978), pp. 26-27.

Unfortunately, the experience of evaluators is replete with instances of evaluation findings that were virtually ignored by those to whom they were reported. There are numerous reasons why this may happen, many of which are not within the control of the evaluator. Program circumstances may change between the initiation of the evaluation and its completion in ways that make the evaluation results irrelevant when they are delivered. But lack of utilization may also occur because the evaluation does not actually provide information useful for the decisions that must be made. This can happen rather innocently as well as through ineptness. An evaluation plan may look like it will produce relevant information but, when that information is generated, it is not as useful as the recipients expected. It may also happen that those to whom the evaluation results are directed are not initially clear in their own minds about what information they need for what purposes.

With these considerations in mind, we advocate that in developing evaluation questions evaluators use backward mapping. This technique starts with a specification of the desired endpoint then works backward to determine what must be done to get there (Elmore, 1980). Taking this approach, the essential discussion with the evaluation sponsor and other key stakeholders must establish who will use the evaluation results and for what purposes. Note that the question is not who is interested in the evaluation findings but who will use them. The evaluator wants to understand, in an explicit, detailed fashion, who specifically will use the evaluation and what specifically they will use it for. For instance, the administrator and board of directors of the program may intend to use the evaluation results to set administrative priorities for the next fiscal year. Or the legislative committee that oversees a program area may desire the evaluation as input to its deliberations about continued funding for the program. Or the program monitors in the government agency that has initiated the program may want to know if it represents a successful model that should be disseminated to other sites.

EXHIBIT 3-G Lessons Learned About the Utilization of Evaluation

An evaluation manager for a social services organization summarized his observations about the use of evaluation findings by program decision-makers as follows:

1. The utilization of evaluation or research does not take care of itself. Evaluation reports are inanimate objects, and it takes human interest and personal action to use and implement evaluation findings and recommendations. The implications of evaluation must be transferred from the written page to the agenda of program managers.

2. Utilization of evaluation, through which program lessons are identified, usually demands changed behaviors or policies. This requires the shifting of priorities and the development of new action plans for the operational manager.

3. Utilization of evaluation research involves political activity. It is based on a recognition and focus on who in the organization has what authority to make x, y, or z happen. To change programs or organizations as a result of some evaluation requires support from the highest levels of management.

4. Ongoing systems to engender evaluation use are necessary to legitimate and formalize the organizational learning process. Otherwise, utilization can become a personalized issue and evaluation advocates just another self-serving group vying for power and control.

SOURCE: Quoted from Anthony Dibella, “The Research Manager’s Role in Encouraging Evaluation Use,” Evaluation Practice, 1990, 11(2):119.

In each case, the evaluator should work with the respective evaluation users to describe the range of potential decisions or actions that they might consider taking and the form and nature of information that they would find pertinent in their deliberation. To press this exercise to the greatest level of specificity, the evaluator might even generate dummy information of the sort that the evaluation will produce (e.g., “20% of the clients who complete the program relapse within 30 days”) and discuss with the prospective users what this would mean to them and how they would use such information.

A careful specification of the intended use of the evaluation results and the nature of the information expected to be useful leads directly to the formulation of questions the evaluation must attempt to answer (e.g., “What proportion of the clients who complete the program relapse during the first month?”) and helps to set priorities identifying questions that are most important. At this juncture, consideration must also be given to matters of timing. It may be that some questions must be answered before others can be asked, or users may need answers to some questions before others because of their own timetable for decision making. The important questions can then be organized into related groups, combined and integrated as appropriate, sequenced in appropriate time lines, and worked into final form in consultation with the designated users. With this in hand, developing the evaluation plan is largely a matter of working backward to determine what measures, observations, procedures, and the like must be undertaken to provide answers to the important questions in the form that the users require by the time they are needed.

Analysis of Program Assumptions and Theory

As we remarked at the outset of this discussion, in addition to consulting with stakeholders, the evaluator should normally perform an independent analysis of the program in designing appropriate and relevant evaluation questions. Most evaluation questions are variations on the theme of “Is what’s supposed to be happening actually happening?” Familiar examples include “Are the intended target participants being reached?” “Are the services adequately delivered?” and “Are the goals being met?” Consequently, a very useful analysis for purposes of identifying important evaluation questions is to delineate in some detail just what it is that is supposed to be happening. The evaluator can construct a conceptual model of how the program is expected to work and the connections presumed between its various activities and functions and the social benefits it is intended to produce. This representation can then be used to identify those aspects of the program most essential to effective performance. These, in turn, raise evaluation-related questions about whether the key assumptions and expectations are reasonable and appropriate and, if so, whether the program is enacting them in an effective manner.

What we are describing here is an explication of the program theory, the set of assumptions about the relationships between the strategy and tactics the program has adopted and the social benefits it is expected to produce. Theory has a rather grandiose sound, and few program directors would claim that they were working from any distinct theory. Among the dictionary definitions of theory, however, we find “a particular conception or view of something to be done or of the method of doing it.” It is generally this sense of the word that evaluators mean when they refer to program theory. It might alternatively be called the program conceptualization or, perhaps, the program plan, blueprint, or design.

Evaluators have long recognized the importance of program theory as a basis for formulating and prioritizing evaluation questions, designing evaluation research, and interpreting evaluation findings (Bickman, 1987; Chen and Rossi, 1980; Weiss, 1972; Wholey, 1979). However, it is described and used under various different names, for example, logic model, program model, outcome line, cause map, action theory, and so forth. Moreover, there is no general consensus about how best to depict or represent program theory, and many different versions can be found in the evaluation literature, although all show common elements. Because program theory is itself a potential object for evaluation, we will defer our full discussion of how it can be represented and evaluated to Chapter 5, which is solely devoted to this topic.

What we wish to emphasize here is the utility of program theory as a tool for analyzing a program for the purpose of formulating evaluation questions. One common depiction of program theory, for instance, is in the form of a logic model, which lays out the expected sequence of steps going from program services to client outcomes. Exhibit 3-H shows a logic model for a teen mother parenting program. For that program, the service agency and local high school are expected to recruit pregnant teens who will attend parenting classes. Those classes are expected to teach certain topics and the teen mothers, in response, are expected to become more knowledgeable about prenatal nutrition and how to care for their babies. That knowledge, in turn, is expected to lead to appropriate dietary and child care behavior and, ultimately, to healthy babies.

Simply laying out the logic of the program in this form makes it relatively easy to identify questions the evaluation might appropriately address. For example: How many eligible pregnant teens are in the catchment area of the program, and how deficient is their knowledge of nutrition and child care? What proportion actually participates in the program? Do the classes actually cover all the intended topics and is the nature of the instruction suited to the teen audience? What do the teens actually learn from the classes and, more important, what changes do they make in their behavior as a result? And finally, are their babies healthier at 12 months than they would have been without the program experience?

Each of these questions can be refined further into the more specific terms the evaluator needs in order to use them as a basis for evaluation design. The expectations for program performance in each of the corresponding domains can also be further explored to develop more specific criteria for assessing that performance. If the logic model or some other form of theory depiction has been developed in collaboration with key stakeholders and represents a consensus view, the resulting evaluation questions are quite likely to be viewed as relevant and potentially important. What makes this technique especially useful to the evaluator, however, is that it will often bring out significant program issues and questions that would not otherwise have been recognized by either the stakeholders or the evaluator. It has the additional advantage of permitting a systematic review of all aspects of the program to help the evaluator ensure that no critical issues have been overlooked.

EXHIBIT 3-H Logic Model for a Teen Mother Parenting Program

SOURCE: Adapted from United Way of America Task Force on Impact, Measuring Program Outcomes: A Practical Approach. Alexandria, VA: Author, 1996, p. 42. Used by permission, United Way of America.

3.3 Collating Evaluation Questions and Setting Priorities

The evaluator who thoroughly explores stakeholder concerns and conducts an analysis of program issues guided by carefully developed descriptions of program theory will turn up many questions that the evaluation might address. The task at this point becomes one of organizing those questions according to distinct themes and setting priorities among them.

Organization of evaluation questions is generally rather straightforward. Evaluation questions tend to cluster around different program functions (e.g., recruitment, services, outcomes) and, as already discussed, around different evaluation issues (need, design, implementation, impact, efficiency). In addition, evaluation questions tend to show a natural structure in which very specific questions (e.g., “Are elderly homebound persons in the public housing project aware of the program?”) are nested under broader questions (“Are we reaching our target population?”).

Setting priorities to determine which questions the evaluation should be designed to answer can be much more challenging. Once articulated, most of the questions about the program that arise during the planning process are likely to seem interesting to some stakeholder or another, or to the evaluators themselves. Rarely will resources be available to address them all, however. At this juncture, it is especially important for the evaluator to focus on the main purposes of the evaluation and the expected uses to be made of its findings. There is little point to investing time and effort in developing information that is of little use to any stakeholder.

That said, we must caution against an overly narrow interpretation of what information is useful. Evaluation utilization studies have shown that practical, instrumental use for program decision making is only one of the contributions evaluation information makes (Leviton and Hughes, 1981; Rich, 1977; Weiss, 1988). Equally important in many cases are conceptual and persuasive uses—the contribution of evaluation findings to the way in which a program and the social problems to which it responds are understood and debated. Evaluations often identify issues, frame analysis, and sharpen the focus of discussion in ways that are influential to the decision-making process even when there is no direct connection evident between any evaluation finding and any specific program decision. A fuller discussion of this issue is presented in Chapter 12; our purpose here is only to point out the possibility that some evaluation questions may be important to answer even though no immediate use or user is evident.

With the priority evaluation questions for a program selected, the evaluator is ready to design the substantial part of the evaluation that will be devoted to trying to answer them. Most of the remainder of this book discusses the approaches, methods, and considerations related to that task. The discussion is organized to follow the natural logical progression of evaluation questions we identified earlier in the evaluation hierarchy. It thus addresses, in turn, how to assess the need for a program, the program theory or plan for addressing that need, the implementation of the program plan and the associated program process, the impact or outcome of the program implementation on the social need, and the efficiency with which the program attains its outcomes.

Summary

A critical phase in evaluation planning is the identification and formulation of the questions the evaluation will address. Those questions focus the evaluation on the areas of program performance most at issue for key stakeholders and guide the design so that it that will provide meaningful information about program performance.

Good evaluation questions must be reasonable and appropriate, and they must be answerable. That is, they must identify clear, observable dimensions of program performance that are relevant to the program’s goals and represent domains in which the program can realistically be expected to have accomplishments.

What most distinguishes evaluation questions is that they involve performance criteria by which the identified dimensions of program performance can be judged. If the formulation of the evaluation questions can include performance standards on which key stakeholders agree, evaluation planning will be easier and the potential for disagreement over the interpretation of the results will be reduced.

Evaluation issues can be arranged in a useful hierarchy, with the most fundamental issues at the bottom. Generally, questions at higher levels of the hierarchy presuppose either knowledge or confident assumptions about issues at lower levels.

To ensure that the matters of greatest significance are covered in the evaluation design, the evaluation questions are best formulated through interaction and negotiation with the evaluation sponsors and other stakeholders representative of significant groups or distinctly positioned in relation to program decision making.

Although stakeholder input is critical, the evaluator must also be prepared to identify program issues that might warrant inquiry. This requires that the evaluator conduct an independent analysis of the assumptions and expectations on which the program is based.

One useful way to reveal aspects of program performance that may be important is to make the program theory explicit. Program theory describes the assumptions inherent in a program about the activities it undertakes and how those relate to the social benefits it is expected to produce. Critical analysis of program theory can surface important evaluation questions that might otherwise have been overlooked.

When these various procedures have generated a full set of candidate evaluation questions, the evaluator must organize them into related clusters and draw on stakeholder input and professional judgment to set priorities among them. With the priority evaluation questions for a program determined, the evaluator is then ready to design the part of the evaluation that will be devoted to answering them.

KEY CONCEPTS

Catchment area

The geographic area served by a program.

Implementation failure

The program does not adequately perform the activities specified in the program design that are assumed to be necessary for bringing about the intended social improvements. It includes situations in which no service, not enough service, or the wrong service is delivered, or the service varies excessively across the target population.

Performance criterion

The standard against which a dimension of program performance is compared so that it can be evaluated.

Program goal

A statement, usually general and abstract, of a desired state toward which a program is directed. Compare program objectives.

Program objectives

Specific statements detailing the desired accomplishments of a program together with one or more measurable criteria of success.

Theory failure

The program is implemented as planned but its services do not produce the immediate effects on the participants that are expected or the ultimate social benefits that are intended, or both.

Discussion 13