In Chapter 3 , we advocated that evaluators analyze a program’s theory as an aid in identifying potentially important evaluation questions. In this chapter, we return to the topic of program theory, not as a framework for identifying evaluation questions, but as a constituent part of the program that is being evaluated.

The social problems that programs address are often so complex and difficult that bringing about even small improvements may pose formidable challenges. A program’s theory is the conception of what must be done to bring about the intended social benefits. As such, it is the foundation on which every program rests.

A program’s theory can be a good one, in which case it represents the “know-how” necessary for the program to attain the desired results, or it can be a poor one that would not produce the intended effects even if implemented well. One aspect of evaluating a program, therefore, is to assess how good the program theory is—in particular, how well it is formulated and whether it presents a plausible and feasible plan for improving the target social conditions. For program theory to be assessed, however, it must first be expressed clearly and completely enough to stand for review. Accordingly, this chapter describes how evaluators can describe the program theory and then assess how good it is.

Mario Cuomo, former governor of New York, once described his mother’s rules for success as (1) figure out what you want to do and (2) do it. These are pretty much the same rules that social programs must follow if they are to be effective. Given an identified need, program decisionmakers must (1) conceptualize a program capable of alleviating that need and (2) implement it. In this chapter, we review the concepts and procedures an evaluator can apply to the task of assessing the quality of the program conceptualization, which we have called the program theory. In the next chapter, we describe how the evaluator can assess the quality of the program’s implementation.

Whether it is expressed in a detailed program plan and rationale or only implicit in the program’s structure and activities, the program theory explains why the program does what it does and provides the rationale for expecting that doing so will achieve the desired results. When examining a program’s theory, evaluators often find that it is not very convincing. There are many poorly designed social programs with faults that reflect deficiencies in their underlying conception of how the desired social benefits can be attained. This happens in large part because insufficient attention is given during the planning of new programs to careful, explicit conceptualization of a program’s objectives and how they are supposed to be achieved. Sometimes the political context within which programs originate does not permit extensive planning but, even when that is not the case, conventional practices for designing programs pay little attention to the underlying theory. The human service professions operate with repertoires of established services and types of intervention associated with their respective specialty areas. As a result, program design is often a matter of configuring a variation of familiar “off the shelf” services into a package that seems appropriate for a social problem without a close analysis of the match between those services and the specific nature of the problem.

For example, many social problems that involve deviant behavior, such as alcohol and drug abuse, criminal behavior, early sexual activity, or teen pregnancy, are addressed by programs that provide the target population with some mix of counseling and educational services. This approach is based on an assumption that is rarely made explicit during the planning of the program, namely, that people will change their problem behavior if given information and interpersonal support for doing so. While this assumption may seem reasonable, experience and research provide ample evidence that such behaviors are resistant to change even when participants are provided with knowledge about how to change and receive strong encouragement from loved ones to do so. Thus, the theory that education and supportive counseling will reduce deviant behavior may not be a sound basis for program design.

A program’s rationale and conceptualization, therefore, are just as subject to critical scrutiny within an evaluation as any other important aspect of the program. If the program’s goals and objectives do not relate in a reasonable way to the social conditions the program is intended to improve, or the assumptions and expectations embodied in a program’s functioning do not represent a credible approach to bringing about that improvement, there is little prospect that the program will be effective.

The first step in assessing program theory is to articulate it, that is, to produce an explicit description of the conceptions, assumptions, and expectations that constitute the rationale for the way the program is structured and operated. Only rarely can a program immediately provide the evaluator with a full statement of its underlying theory. Although the program theory is always implicit in the program’s structure and operations, a detailed account of it is seldom written down in program documents. Moreover, even when some write-up of program theory is available, it is often in material that has been prepared for funding proposals or public relations purposes and may not correspond well with actual program practice.

Assessment of program theory, therefore, almost always requires that the evaluator synthesize and articulate the theory in a form amenable to analysis. Accordingly, the discussion in this chapter is organized around two themes: (1) how the evaluator can explicate and express program theory in a form that will be representative of key stake-holders’ actual understanding of the program and workable for purposes of evaluation, and (2) how the evaluator can assess the quality of the program theory that has been thus articulated. We begin with a brief description of a perspective that has provided the most fully developed approaches to evaluating program theory.

5.1 The Evaluability Assessment Perspective

One of the earliest systematic attempts to describe and assess program theory arose from the experiences of an evaluation research group at the Urban Institute in the 1970s (Wholey, 1979). They found it often difficult, sometimes impossible, to undertake evaluations of public programs and began to analyze the obstacles. This led to the view that a qualitative assessment of whether minimal preconditions for evaluation were met should precede most evaluation efforts. Wholey and his colleagues termed the process evaluability assessment (see Exhibit 5-A).

Evaluability assessment involves three primary activities: (1) description of the program model with particular attention to defining the program goals and objectives,

(2) assessment of how well defined and evaluable that model is, and (3) identification of stakeholder interest in evaluation and the likely use of the findings. Evaluators conducting evaluability assessments operate much like ethnographers in that they seek to describe and understand the program through interviews and observations that will reveal its “social reality” as viewed by program personnel and other significant stakeholders. The evaluators begin with the conception of the program presented in documents and official information, but then try to see the program through the eyes of those closest to it. The intent is to end up with a description of the program as it exists and an understanding of the program issues that really matter to the parties involved. Although this process involves considerable judgment and discretion on the part of the evaluator, various practitioners have attempted to codify its procedures so that evaluability assessments will be reproducible by other evaluators (see Rutman, 1980; Smith, 1989; Wholey, 1994).

A common outcome of evaluability assessments is that program managers and sponsors recognize the need to modify their programs. The evaluability assessment may reveal that there are faults in a program’s delivery system, that the program’s target population is not well defined, or that the intervention itself needs to be reconceptualized. Or there may be few program objectives that stake-holders agree on or no feasible performance indicators for the objectives. In such cases, the evaluability assessment has uncovered problems with the program’s design that program managers must correct before any meaningful performance evaluation can be undertaken.

The aim of evaluability assessment is to create a favorable climate and an agreed-on understanding of the nature and objectives of the program that will facilitate the design of an evaluation. As such, it can be integral to the approach the evaluator employs to tailor an evaluation and formulate evaluation questions (see Chapters 2 and 3). Exhibit 5-B presents an example of an evaluability assessment that illustrates the typical procedure.

EXHIBIT 5-A A Rationale for Evaluability Assessment

If evaluators and intended users fail to agree on program goals, objectives, information priorities, and intended uses of program performance information, those designing evaluations may focus on answering questions that are not relevant to policy and management decisions. If program goals and objectives are unrealistic because insufficient resources have been applied to critical program activities, the program has been poorly implemented, or administrators lack knowledge of how to achieve program goals and objectives, the more fruitful course may be for those in charge of the program to change program resources, activities, or objectives before formal evaluation efforts are undertaken. If relevant data are unavailable and cannot be obtained at reasonable cost, subsequent evaluation work is likely to be inconclusive. If policymakers or managers are unable or unwilling to use the evaluation information to change the program, even the most conclusive evaluations are likely to produce “information in search of a user.” Unless these problems can be overcome, the evaluation will probably not contribute to improved program performance.

These four problems, which characterize many public and private programs, can be reduced and often overcome by a qualitative evaluation process, evaluability assessment, that documents the breadth of the four problems and helps programs—and subsequent program evaluation work—to meet the following criteria:

· Program goals, objectives, important side effects, and priority information needs are well defined.

· Program goals and objectives are plausible.

· Relevant performance data can be obtained.

· The intended users of the evaluation results have agreed on how they will use the information.

Evaluability assessment is a process for clarifying program designs, exploring program reality, and—if necessary—helping redesign programs to ensure that they meet these four criteria. Evaluability assessment not only shows whether a program can be meaningfully evaluated (any program can be evaluated) but also whether evaluation is likely to contribute to improved program performance.

SOURCE: Quoted from Joseph S. Wholey, “Assessing the Feasibility and Likely Usefulness of Evaluation,” in Handbook of Practical Program Evaluation, eds. J. S. Wholey, H. P. Hatry, and K. E. Newcomer (San Francisco: Jossey-Bass, 1994), p. 16.

EXHIBIT 5-B Evaluability Assessment for the Appalachian Regional Commission

Evaluators from the Urban Institute worked with managers and policymakers in the Appalachian Regional Commission (ARC) on the design of their health and child development program. In this evaluability assessment, the evaluators:

· Reviewed existing data on each of the 13 state ARC-funded health and child development programs

· Made visits to five states and then selected two states to participate in evaluation design and implementation

· Reviewed documentation related to congressional, commission, state, and project objectives and activities (including the authorizing legislation, congressional hearings and committee reports, state planning documents, project grant applications, ARC contract reports, local planning documents, project materials, and research projects)

· Interviewed approximately 75 people on congressional staffs and in commission headquarters, state ARC and health and child development staffs, local planning units, and local projects

· Participated in workshops with approximately 60 additional health and child development practitioners, ARC state personnel, and outside analysts

Analysis and synthesis of the resulting data yielded a logic model that presented program activities, program objectives, and the assumed causal links between them. The measurability and plausibility of program objectives were then analyzed and new program designs more likely to lead to demonstrably effective performance were presented. These included both an overall ARC program model and a series of individual models, each concerned with an identified objective of the program.

In reviewing the report, ARC staff were asked to choose among alternative courses of action. The review process consisted of a series of intensive discussions in which ARC and Urban Institute staff focused on one objective and program model at a time. In each session, the evaluators and staff attempted to reach agreement on the validity of the models presented, the importance of the respective objectives, and the extent to which any of the information options ought to be pursued.

ARC ended up adopting revised project designs and deciding to systematically monitor the performance of all their health and child development projects and evaluate the effectiveness of the “innovative” ones. Twelve of the 13 ARC states have since adopted the performance monitoring system. Representatives of those states report that project designs are now much more clearly articulated and they believe the projects themselves have improved.

SOURCE: Adapted from Joseph S. Wholey, “Using Evaluation to Improve Program Performance,” in Evaluation Research and Practice: Comparative and International Perspectives, eds. R. A. Levine, M. A. Solomon, G.-M. Hellstern, and H. Wollmann (Beverly Hills, CA: Sage, 1981), pp. 92-106.

Evaluability assessment requires program stakeholders to articulate the program’s design and logic (the program model); however, it can also be carried out for the purposes of describing and assessing program theory (Wholey, 1987). Indeed, the evaluability assessment approach represents the most fully developed set of concepts and procedures available in the evaluation literature for describing and assessing a program’s conceptualization of what it is supposed to be doing and why. We turn now to a more detailed discussion of procedures for identifying and evaluating program theory, drawing heavily on the writings associated with the practice of evaluability assessment.

Previous section

Next section

5.2 Describing Program Theory

Evaluators have long recognized the importance of program theory as a basis for formulating and prioritizing evaluation questions, designing evaluation research, and interpreting evaluation findings (Bickman, 1987; Chen and Rossi, 1980; Weiss, 1972; Wholey, 1979). However, program theory has been described and used under various names, for example, logic model, program model, outcome line, cause map, and action theory. There is no general consensus about how best to describe a program’s theory, so we will describe a scheme we have found useful in our own evaluation activities.

For this purpose, we depict a social program as centering on the transactions that take place between a program’s operations and the population it serves (Exhibit 5-C). These transactions might involve counseling sessions for women with eating disorders in therapists’ offices, recreational activities for high-risk youths at a community center, educational presentations to local citizens’ groups, nutrition posters in a clinic, informational pamphlets about empowerment zones and tax law mailed to potential investors, delivery of meals to the front doors of elderly persons, or any such point-of-service contact. On one side of this program-target transaction, we have the program as an organizational entity, with its various facilities, personnel, resources, activities, and so forth. On the other side, we have the target participants in their lifespaces with their various circumstances and experiences in relation to the service delivery system of the program.

This simple scheme highlights three interrelated components of a program theory: the program impact theory, the service utilization plan, and the program’s organizational plan. The program’s impact theory consists of assumptions about the change process actuated by the program and the improved conditions that are expected to result. It is operationalized by the program-target transactions, for they constitute the means by which the program expects to bring about its intended effects. The impact theory may be as simple as presuming that exposure to information about the negative effects of drug abuse will motivate high school students to abstain or as complex as the ways in which an eighth-grade science curriculum will lead to deeper understanding of natural phenomena. It may be as informal as the commonsense presumption that providing hot meals to elderly persons improves their nutrition or as formal as classical conditioning theory adapted to treating phobias. Whatever its nature, however, an impact theory of some sort constitutes the essence of a social program. If the assumptions embodied in that theory about how desired changes are brought about by program action are faulty, or if they are valid but not well operationalized by the program, the intended social benefits will not be achieved.

EXHIBIT 5-C Overview of Program Theory

To instigate the change process posited in the program’s impact theory, the intended services must first be provided to the target population. The program’s service utilization plan is constituted by the program’s assumptions and expectations about how to reach the target population, provide and sequence service contacts, and conclude the relationship when services are no longer needed or appropriate. For a program to increase awareness of AIDS risk, for instance, the service utilization plan may be simply that appropriate persons will read informative posters if they are put up in subway cars. A multifaceted AIDS prevention program, on the other hand, may be organized on the assumption that high-risk drug abusers who are referred by outreach workers will go to nearby street-front clinics, where they will receive appropriate testing and information.

The program, of course, must be organized in such a way that it can actually provide the intended services. The third component of program theory, therefore, relates to program resources, personnel, administration, and general organization. We call this component the program’s organizational plan. The organizational plan can generally be represented as a set of propositions: If the program has such and such resources, facilities, personnel, and so on, if it is organized and administered in such and such a manner, and if it engages in such and such activities and functions, then a viable organization will result that can operate the intended service delivery system. Elements of programs’ organizational theories include, for example, assumptions that case managers should have master’s degrees in social work and at least five years’ experience, that at least 20 case managers should be employed, that the agency should have an advisory board that represents local business owners, that there should be an administrative coordinator assigned to each site, and that working relations should be maintained with the Department of Public Health.

Adequate resources and effective organization, in this scheme, are the factors that make it possible to develop and maintain a service delivery system that enables utilization of the services by the target population. A program’s organization and the service delivery system that organization supports are the parts of the program most directly under the control of program administrators and staff. These two aspects together are often referred to as program process, and the assumptions and expectations on which that process is based may be called the program process theory.

With this overview, we turn now to a more detailed discussion of each of the components of program theory with particular attention to how the evaluator can describe them in a manner that permits analysis and assessment.

Program Impact Theory

Program impact theory is causal theory. It describes a cause-and-effect sequence in which certain program activities are the instigating causes and certain social benefits are the effects they eventually produce. Evaluators, therefore, typically represent program impact theory in the form of a causal diagram showing the cause-and-effect linkages presumed to connect a program’s activities with the expected outcomes (Chen, 1990; Lipsey, 1993; Martin and Kettner, 1996). Because programs rarely exercise direct control over the social conditions they are expected to improve, they must generally work indirectly by changing some critical but manageable aspect of the situation, which, in turn, is expected to lead to more far-reaching improvements.

The simplest program impact theory is the basic “two step” in which services affect some intermediate condition that, in turn, improves the social conditions of concern (Lipsey and Pollard, 1989). For instance, a program cannot make it impossible for people to abuse alcohol, but it can attempt to change their attitudes and motivation toward alcohol in ways that help them avoid abuse. More complex program theories may have more steps along the path between program and social benefit and, perhaps, involve more than one distinct path.

The distinctive features of any representation of program impact theory are that each element is either a cause or an effect and that the causal linkages between those elements show a chain of events that begins with program actions and ends with change in the social conditions the program intends to improve (see Exhibit 5-D). The events following directly from the instigating program activities are the most direct outcomes, often called proximal or immediate outcomes (e.g., dietary knowledge and awareness in the first example in 5-D). Events further down the chain constitute the more distal or ultimate outcomes (e.g., healthier diet in the first example in 5-D). Program impact theory highlights the dependence of the more distal, and generally more important, outcomes on successful attainment of the more proximal ones.

The Service Utilization Plan

An explicit service utilization plan pulls into focus the critical assumptions about how and why the intended recipients of service will actually become engaged with the program and follow through to the point of receiving sufficient services to initiate the change process represented in the program impact theory. It describes the program-target transactions from the perspective of the targets and their lifespaces as they might encounter the program.

A program’s service utilization plan can be usefully depicted in a flowchart that tracks the various paths that program targets can follow from some appropriate point prior to first contact with the program through a point where there is no longer any contact. Exhibit 5-E shows an example of a simple service utilization flowchart for a hypothetical aftercare program for released psychiatric patients. One characteristic of such charts is that they identify the possible situations in which the program targets are not engaged with the program as intended. In 5-E, for example, we see that formerly hospitalized psychiatric patients may not receive the planned visit from a social worker or referrals to community agencies and, as a consequence, may receive no service at all.

The Program’s Organizational Plan

The program’s organizational plan is articulated from the perspective of program management. The plan encompasses both the functions and activities the program is expected to perform and the human, financial, and physical resources required for that performance. Central to this scheme are the program services, those specific activities that constitute the program’s role in the target-program transactions that are expected to lead to social benefits. However, the organizational plan also must include those functions that provide essential preconditions and ongoing support for the organization’s ability to provide its primary services, for instance, fund-raising, personnel management, facilities acquisition and maintenance, political liaison, and the like.

EXHIBIT 5-D Diagrams Illustrating Program Impact Theories

EXHIBIT 5-E Service Utilization Flowchart for an Aftercare Program for Psychiatric Patients

There are many ways to depict a program’s organizational plan. If we center it on the target-program transactions, the first element of the organizational plan will be a description of the program’s objectives for the services it will provide: what those services are, how much is to be provided, to whom, and on what schedule. The next element might then describe the resources and functions necessary to engage in those service activities. For instance, sufficient personnel with appropriate credentials and skills will be required as will logistical support, proper facilities and equipment, funding, supervision, clerical support, and so forth.

As with the other portions of program theory, it is often useful to describe a program’s organizational plan with a diagram. Exhibit 5-F presents an example that depicts the major organizational components of the aftercare program for psychiatric patients whose service utilization scheme is shown in 5-E. A common way of representing the organizational plan of a program is in terms of inputs (resources and constraints applicable to the program) and activities (the services the program is expected to provide). In a full logic model of the program, receipt of services (service utilization) is represented as program outputs, which, in turn, are related to the desired outcomes. Exhibit 5-G shows a typical logic model drawn from a widely used workbook prepared by the United Way of America.

5.3 Eliciting Program Theory

When a program’s theory is spelled out in program documents and well understood by staff and stakeholders, the program is said to be based on an articulated program theory (Weiss, 1997). This is most likely to occur when the original design of the program is drawn from social science theory. For instance, a school-based drug use prevention program that features role-playing of refusal behavior in peer groups may be derived from social learning theory and its implications for peer influences on adolescent behavior.

When the underlying assumptions about how program services and practices are presumed to accomplish their purposes have not been fully articulated and recorded, the program has an implicit program theory or, as Weiss (1997) put it, a tacit theory. This might be the case for a counseling program to assist couples with marital difficulties. Although it may be reasonable to assume that discussing marital problems with a trained professional would be helpful, the way in which that translates into improvements in the marital relationship is not described by an explicit theory nor would different counselors necessarily agree about the process.

When a program’s theory is implicit rather than articulated, the evaluator must extract and describe it before it can be analyzed and assessed. The evaluator’s objective is to depict the “program as intended,” that is, the actual expectations held by decision-makers about what the program is supposed to do and what results are expected to follow. With this in mind, we now consider the concepts and procedures an evaluator can use to extract and articulate program theory as a prerequisite for assessing it.

Defining the Boundaries of the Program

A crucial early step in articulating program theory is to define the boundaries of the program at issue (Smith, 1989). A human service agency may have many programs and provide multiple services; a regional program may have many agencies and sites. There is usually no one correct definition of a program, and the boundaries the evaluator applies will depend, in large part, on the scope of the evaluation sponsor’s concerns and the program domains to which they apply.

EXHIBIT 5-G A Logic Model for a Teen Mother Parenting Education Program

SOURCE: Adapted from United Way of America Task Force on Impact, Measuring Program Outcomes: A Practical Approach. Alexandria, VA: Author, 1996, p. 42. Used by permission, United Way of America.

One way to define the boundaries of a program for the purpose of articulating the program theory is to work from the perspective of the decisionmakers who are expected to act on the findings of the evaluation. The evaluator’s definition of the program should at minimum represent the relevant jurisdiction of those decisionmakers and the organizational structures and activities about which decisions are likely to be made. If, for instance, the sponsor of the evaluation is the director of a local community mental health agency, then the evaluator may define the boundaries of the program around one of the distinct service packages administered by that director, such as outpatient counseling for eating disorders. If the evaluation sponsor is the state director of mental health, however, the relevant program boundaries may be defined around effectiveness questions that relate to the outpatient counseling component of all the local mental health agencies in the state.

Because program theory deals mainly with means-ends relations, the most critical aspect of defining program boundaries is to ensure that they encompass all the important activities, events, and resources linked to one or more outcomes recognized as central to the endeavor. This can be accomplished by starting with the benefits the program intends to produce and working backward to identify all the activities and resources under relevant organizational auspices that are presumed to contribute to attaining those objectives. From this perspective, the eating disorders program at either the local or state level would be defined as the set of activities organized by the respective mental health agency that has an identifiable role in attempting to alleviate eating disorders for the eligible population.

Although these approaches are straightforward in concept, they can be problematic in practice. Not only can programs be complex, with crosscutting resources, activities, and goals, but the characteristics described above as linchpins for program definition can themselves be difficult to establish. Thus, in this matter, as with so many other aspects of evaluation, the evaluator must be prepared to negotiate a program definition agreeable to the evaluation sponsor and key stakeholders and be flexible about modifying the definition as the evaluation progresses.

Explicating the Program Theory

For a program in the early planning stage, program theory might be built by the planners from prior practice and research. At this stage, an evaluator may be able to help develop a plausible and well-articulated theory. For an existing program, however, the appropriate task is to describe the theory that is actually embodied in the program’s structure and operation. To accomplish this, the evaluator must work with stakeholders to draw out the theory represented in their actions and assumptions. The general procedure for this involves successive approximation. Draft descriptions of the program theory are generated, usually by the evaluator, and discussed with knowledgeable stakeholder informants to get feedback. The draft is then refined on the basis of their input and shown again to appropriate stakeholders. The theory description developed in this fashion may involve impact theory, process theory, or any components or combination that are deemed relevant to the purposes of the evaluation. Exhibit 5-H presents one evaluator’s account of how a program process theory was elicited.

The primary sources of information for developing and differentiating descriptions of program theory are (1) review of program documents; (2) interviews with program stakeholders, and other selected informants; (3) site visits and observation of program functions and circumstances; and (4) the social science literature. Three types of information the evaluator may be able to extract from those sources will be especially useful.

Program Goals and Objectives

Perhaps the most important matter to be determined from program sources relates to the goals and objectives of the program, which are necessarily an integral part of the program theory, especially its impact theory. The goals and objectives that must be represented in program theory, however, are not necessarily the same as those identified in a program’s mission statements or in responses to questions asked of stake-holders about the program’s goals. To be meaningful for an evaluation, program goals must identify a state of affairs that could realistically be attained as a result of program actions; that is, there must be some reasonable connection between what the program does and what it intends to accomplish. Smith (1989) suggests that, to keep the discussion concrete and specific, the evaluator should use a line of questioning that does not ask about goals directly but asks instead about consequences. For instance, in a review of major program activities, the evaluator might ask about each, “Why do it? What are the expected results? How could you tell if those results actually occurred?”

The resulting set of goal statements must then be integrated into the description of program theory. Goals and objectives that describe the changes the program aims to bring about in social conditions relate to program impact theory. A program goal of reducing unemployment, for instance, identifies a distal outcome in the impact theory. Program goals and objectives related to program activities and service delivery, in turn, help reveal the program process theory. If the program aims to offer afterschool care for latchkey children to working parents, a portion of the service utilization plan is revealed. Similarly, if an objective is to offer literacy classes four times a week, an important element of the organizational plan is identified.

Program Functions, Components, and Activities

To properly describe the program process theory, the evaluator must identify each distinct program component, its functions, and the particular activities and operations associated with those functions. Program functions include such operations as “assess client need,” “complete intake,” “assign case manager,” “recruit referral agencies,” “train field workers,” and the like. The evaluator can generally identify such functions by determining the activities and job descriptions of the various program personnel. When clustered into thematic groups, these functions represent the constituent elements of the program process theory.

EXHIBIT 5-H Formulating Program Process Theory for Adapted Work Service

Adapted Work Services (AWS) was initiated at the Rochelle Center in Nashville, Tennessee, to provide low-stress, paid work and social interaction to patients in the early stages of Alzheimer’s disease. It was based on the belief that the patients would benefit emotionally and cognitively from working in a sheltered environment and their family members would benefit from being occasionally relieved of the burden of caring for them. The evaluator described the procedures for formulating a program process theory for this program as follows:

The creation of the operational model of the AWS program involved using Post-it notes and butcher paper to provide a wall-size depiction of the program. The first session involved only the researcher and the program director. The first question asked was, “What happens when a prospective participant calls the center for information?” The response was recorded on a Post-it note and placed on the butcher paper. The next step was then identified, and this too was recorded and placed on the butcher paper. The process repeated itself until all (known) activities were identified and placed on the paper. Once the program director could not identify any more activities, the Post-it notes were combined into clusters. The clusters were discussed until potential component labels began to emerge. Since this exercise was the product of only two people, the work was left in an unused room for two weeks so that the executive director and all other members of the management team could react to the work. They were to identify missing, incorrect, or misplaced activities as well as comment on the proposed components. After several feedback sessions from the staff members and discussions with the executive director, the work was typed and prepared for presentation to the Advisory Board. The board members were able to reflect on the content, provide further discussion, and suggest additional changes. Several times during monthly board meetings, the executive director asked that the model be revisited for planning purposes. This helped further clarify the activities as well as sharpen the group’s thinking about the program.

SOURCE: Quoted, with permission, from Doris C. Quinn, “Formative Evaluation of Adapted Work Services for Alzheimer’s Disease Victims: A Framework for Practical Evaluation in Health Care” (doctoral diss., Vanderbilt University, 1996), pp. 46-47.

The Logic or Sequence Linking Program Functions, Activities, and Components

A critical aspect of program theory is how the various expected outcomes and functions relate to each other. Sometimes these relationships involve only the temporal sequencing of key program activities and their effects; for instance, in a postrelease program for felons, prison officials must notify the program that a convict has been released before the program can initiate contact to arrange services. In other cases, the relationships between outcomes and functions have to do with activities or events that must be coordinated, as when child care and transportation must be arranged in conjunction with job training sessions, or with supportive functions, such as training the instructors who will conduct in-service classes for nurses. Other relationships entail logical or conceptual linkages, especially those represented in the program impact theory. Thus, the connection between mothers’ knowledge about how to care for their infants and the actual behavior of providing that care assumes a psychological process through which information influences behavior.

It is because the number and variety of such relationships are often appreciable that evaluators typically construct charts or graphical displays to describe them. These may be configured as lists, flowcharts, or hierarchies, or in any number of creative forms designed to identify the key elements and relationships in a program’s theory. Such displays not only portray program theory but also provide a way to make it sufficiently concrete and specific to engage program personnel and stakeholders.

Corroborating the Description of the Program Theory

The description of program theory that results from the procedures we have described will generally represent the program as it was intended more than as it actually is. Program managers and policymakers think of the idealized program as the “real” one with various shortfalls from that ideal as glitches that do not represent what the program is really about. Those further away from the day-to-day operations, on the other hand, may be unaware of such shortfalls and will naturally describe what they presume the program to be even if in actuality it does not quite live up to that image.

Some discrepancy between program theory and reality is therefore natural. Indeed, examination of the nature and magnitude of that discrepancy is the task of process or implementation evaluation, as discussed in the next chapter. However, if the theory is so overblown that it cannot realistically be held up as a depiction of what is supposed to happen, it needs to be revised. Suppose, for instance, that a job training program’s service utilization plan calls for monthly contacts between each client and a case manager. If the program resources are insufficient to support case managers, and none are employed, this part of the theory is fanciful and should be restated to more realistically depict what the program might actually be able to accomplish.

Given that the program theory depicts a realistic scenario, confirming it is a matter of demonstrating that pertinent program personnel and stakeholders endorse it as a meaningful account of how the program is intended to work. If it is not possible to generate a theory description that all relevant stakeholders accept as reasonable, this indicates that the program is poorly defined or that it embodies competing philosophies. In such cases, the most appropriate response for the evaluator may be to take on a consultant role and assist the program in clarifying its assumptions and intentions to yield a theory description that will be acceptable to all key stakeholders.

For the evaluator, the end result of the theory description exercise is a detailed and complete statement of the program as intended that can then be analyzed and assessed as a distinct aspect of the evaluation. Note that the agreement of stakeholders serves only to confirm that the theory description does, in fact, represent their understanding of how the program is supposed to work. It does not necessarily mean that the theory is a good one. To determine the soundness of a program theory, the evaluator must not only describe the theory but evaluate it. The procedures evaluators use for that purpose are described in the next section.

5.4 Assessing Program Theory

Assessment of some aspect of a program’s theory is relatively common in evaluation, often in conjunction with an evaluation of program process or impact. Nonetheless, outside of the modest evaluability assessment literature, remarkably little has been written of a specific nature about how this should be done. Our interpretation of this relative neglect is not that theory assessment is unimportant or unusual, but that it is typically done in an informal manner that relies on commonsense judgments that may not seem to require much explanation. Indeed, when program services are directly related to straightforward objectives, the validity of the program theory may be accepted on the basis of limited evidence or commonsense judgment. An illustration is a meals-on-wheels service that brings hot meals to homebound elderly persons to improve their nutritional intake. In this case, the theory linking the action of the program (providing hot meals) to its intended benefits (improved nutrition) needs little critical evaluation.

Many programs, however, are not based on expectations as simple as the notion that delivering food to elderly persons improves their nutrition. For example, a family preservation program that assigns case managers to coordinate community services for parents deemed at risk of having their children placed in foster care involves many assumptions about exactly what it is supposed to accomplish and how. In such cases, the program theory might easily be faulty, and correspondingly, a rather probing evaluation of it may be warranted.

It is seldom possible or useful to individually appraise each distinct assumption and expectation represented in a program theory. But there are certain critical tests that can be conducted to provide assurance that it is sound. This section summarizes the various approaches and procedures the evaluator might use for conducting that assessment.

Assessment in Relation to Social Needs

The most important framework for assessing program theory builds on the results of needs assessment, as discussed in Chapter 4. Or, more generally, it is based on a thorough understanding of the social problem the program is intended to address and the service needs of the target population. A program theory that does not relate in an appropriate manner to the actual nature and circumstances of the social conditions at issue will result in an ineffective program no matter how well the program is implemented and administered. It is fundamental, therefore, to assess program theory in relationship to the needs of the target population the program is intended to serve.

There is no push-button procedure an evaluator can use to assess whether program theory describes a suitable conceptualization of how social needs should be met. Inevitably, this assessment requires judgment calls. When the assessment is especially critical, its validity is strengthened if those judgments are made collaboratively with relevant experts and stakeholders to broaden the range of perspectives and expertise on which they are based. Such collaborators, for instance, might include social scientists knowledgeable about research and theory related to the intervention, administrators with long experience managing such programs, representatives of advocacy groups associated with the target population, and policymakers or policy advisers highly familiar with the program and problem area.

Whatever the nature of the group that contributes to the assessment, the crucial aspect of the process is specificity. When program theory and social needs are described in general terms, there often appears to be more correspondence than is evident when the details are examined. To illustrate, consider a curfew program prohibiting juveniles under age 18 from being outside their homes after midnight that is initiated in a metropolitan area to address the problem of skyrocketing juvenile crime. The program theory, in general terms, is that the curfew will keep the youths home at night and, if they are at home, they are unlikely to commit crimes. Because the general social problem the program addresses is juvenile crime, the program theory does seem responsive to the social need.

A more detailed problem diagnosis and service needs assessment, however, might show that the bulk of juvenile crimes are residential burglaries committed in the late afternoon when school lets out. Moreover, it might reveal that the offenders represent a relatively small proportion of the juvenile population who have a disproportionately large impact because of their high rates of offending. Furthermore, it might be found that these juveniles are predominantly latchkey youths who have no supervision during afterschool hours. When the program theory is then examined in some detail, it is apparent that it assumes that significant juvenile crime occurs late at night and that potential offenders will both know about and obey the curfew. Furthermore, it depends on enforcement by parents or the police if compliance does not occur voluntarily.

Although even more specificity than this would be desirable, this much detail illustrates how a program theory can be compared with need to discover shortcomings in the theory. In this example, examining the particulars of the program theory and the social problem it is intended to address reveals a large disconnect. The program blankets the whole city rather than targeting the small group of problem juveniles and focuses on activity late at night rather than during the early afternoon, when most of the crimes actually occur. In addition, it makes the questionable assumptions that youths already engaged in more serious lawbreaking will comply with a curfew, that parents who leave their delinquent children unsupervised during the early part of the day will be able to supervise their later behavior, and that the overburdened police force will invest sufficient effort in arresting juveniles who violate the curfew to enforce compliance. Careful review of these particulars alone would raise serious doubts about the validity of this program theory ( Exhibit 5-I presents another example).

One useful approach to comparing program theory with what is known (or assumed) about the relevant social needs is to separately assess impact theory and program process theory. Each of these relates to the social problem in a different way and, as each is elaborated, specific questions can be asked about how compatible the assumptions of the theory are with the nature of the social circumstances to which it applies. We will briefly describe the main points of comparison for each of these theory components.

Program impact theory involves the sequence of causal links between program services and outcomes that improve the targeted social conditions. The key point of comparison between program impact theory and social needs, therefore, relates to whether the effects the program is expected to have on the social conditions correspond to what is required to improve those conditions, as revealed by the needs assessment. Consider, for instance, a school-based educational program aimed at getting elementary school children to learn and practice good eating habits. The problem this program attempts to ameliorate is poor nutritional choices among school-age children, especially those in economically disadvantaged areas. The program impact theory would show a sequence of links between the planned instructional exercises and the children’s awareness of the nutritional value of foods, culminating in healthier selections and therefore improved nutrition.

EXHIBIT 5-I The Needs of the Homeless as a Basis for Assessing Program Theory

Exhibit 4-J in Chapter 4 described the responses of a sample of homeless men and women to a needs assessment survey. The largest proportions identified a place to live and having a job or steady income as their greatest need. Fewer than half, but significant proportions, also said they needed help with medical, substance abuse, psychological, and legal problems. The evaluators reported that among the service delivery implications of the needs assessment were indications that this population needs interventions that provide ongoing support in a range of domains at varying degrees of intensity. Thus, to be responsive, programs must have the capacity to deliver or broker access to a comprehensive range of services.

These findings offer two lines of analysis for assessment of program theory. First, any program that intends to alleviate homelessness must provide services that address the major problems that homeless persons experience. That is, the expected outcomes of those services (impact theory) must represent improvements in the most problematic domains if the conditions of the homeless are to be appreciably improved. Second, the design of the service delivery system (program process theory) must be such that multiple services can be readily and flexibly provided to homeless individuals in ways that will be accessible to them despite their limited resources and difficult circumstances. Careful, detailed comparison of the program theory embodied in any program for this homeless population with the respective needs assessment data, therefore, will reveal how sound that theory is as a design for effective intervention.

SOURCE: Daniel B. Herman, Elmer L. Struening, and Susan M. Barrow, “Self-Reported Needs for Help Among Homeless Men and Women,” Evaluation and Program Planning, 1994, 17(3):249-256.

Now, suppose a thorough needs assessment shows that the children’s eating habits are, indeed, poor but that their nutritional knowledge is not especially deficient. The needs assessment further shows that the foods served at home and even those offered in the school cafeterias provide limited opportunity for healthy selections. Against this background, it is evident that the program impact theory is flawed. Even if the program successfully imparts additional information about healthy eating, the children will not be able to act on it because they have little control over the selection of foods available to them. Thus, the proximal outcomes the program impact theory describes may be achieved, but they are not what is needed to ameliorate the problem at issue.

Program process theory, on the other hand, represents assumptions about the capability of the program to provide services that are accessible to the target population and compatible with their needs. These assumptions, in turn, can be compared with information about the target population’s opportunities to obtain service and the barriers that inhibit them from using the service. The process theory for an adult literacy program that offers evening classes at the local high school, for instance, may incorporate instructional and advertising functions and an appropriate selection of courses for the target population. The details of this scheme can be compared with needs assessment data that show what logistical and psychological support the target population requires to make effective use of the program. Child care and transportation may be critical for some potential participants. Also, illiterate adults may be reluctant to enroll in courses without more personal encouragement than they would receive from advertising. Cultural and personal affinity with the instructors may be important factors in attracting and maintaining participation from the target population as well. The intended program process can thus be assessed in terms of how responsive it is to these dimensions of the needs of the target population.

Assessment of Logic and Plausibility

A thorough job of articulating program theory should reveal the critical assumptions and expectations inherent in the program’s design. One essential form of assessment is simply a critical review of the logic and plausibility of these aspects of the program theory. Commentators familiar with assessing program theory suggest that a panel of reviewers be organized for that purpose (Chen, 1990; Rutman, 1980; Smith, 1989; Wholey, 1994). Such an expert review panel should include representatives of the program staff and other major stakeholders as well as the evaluator. By definition, however, stakeholders have some direct stake in the program. To balance the assessment and expand the available expertise, it may be advisable to bring in informed persons with no direct relationship to the program. Such outside experts might include experienced administrators of similar programs, social researchers with relevant specialties, representatives of advocacy groups or client organizations, and the like.

A review of the logic and plausibility of program theory will necessarily be a relatively unstructured and open-ended process. Nonetheless, there are some general issues such reviews should address. These are described below in the form of questions reviewers can ask. Additional useful detail can be found in Rutman (1980), Smith (1989), and Wholey (1994). Also see Exhibit 5-J for an example.

· Are the program goals and objectives well defined? The outcomes for which the program is accountable should be stated in sufficiently clear and concrete terms to permit a determination of whether they have been attained. Goals such as “introducing students to computer technology” are not well defined in this sense, whereas “increasing student knowledge of the ways computers can be used” is well defined and measurable.

· Are the program goals and objectives feasible? That is, is it realistic to assume that they can actually be attained as a result of the services the program delivers? A program theory should specify expected outcomes that are of a nature and scope that might reasonably follow from a successful program and that do not represent unrealistically high expectations. Moreover, the stated goals and objectives should involve conditions the program might actually be able to affect in some meaningful fashion, not those largely beyond its influence. “Eliminating poverty” is grandiose for any program, whereas “decreasing the unemployment rate” is not. But even the latter goal might be unrealistic for a program located in a chronically depressed labor market.

· Is the change process presumed in the program theory plausible? The presumption that a program will create benefits for the intended target population depends on the occurrence of some cause-and-effect chain that begins with the targets’ interaction with the program and ends with the improved circumstances in the target population that the program expects to bring about. Every step of this causal chain should be plausible. Because the validity of this impact theory is the key to the program’s ability to produce the intended effects, it is best if the theory is supported by evidence that the assumed links and relationships actually occur. For example, suppose a program is based on the presumption that exposure to literature about the health hazards of drug abuse will motivate long-term heroin addicts to renounce drug use. In this case, the program theory does not present a plausible change process, nor is it supported by any research evidence.

· Are the procedures for identifying members of the target population, delivering service to them, and sustaining that service through completion well defined and sufficient? The program theory should specify procedures and functions that are both well defined and adequate for the purpose, viewed both from the perspective of the program’s ability to perform them and the target population’s likelihood of being engaged by them. Consider, for example, a program to test for high blood pressure among poor and elderly populations to identify those needing medical care. It is relevant to ask whether this service is provided in locations accessible to members of these groups and whether there is an effective means of locating those with uncertain addresses. Absent these characteristics, it is unlikely that many persons from the target groups will receive the intended service.

EXHIBIT 5-J Assessing the Clarity and Plausibility of the Program Theory for Maryland’s 4-H Program

An evaluability assessment of Maryland’s 4-H youth program based on program documents and interviews with 96 stakeholder representatives included a review of key facets of the program’s theory with the following results:

Question:	Are the mission and goals clear?
Conclusion:	There is a lack of clarity about the overall mission of 4-H and some lack of agreement among the stakeholders and between persons directly involved in implementing the program and those not. Among the statements of mission were “introduce youth to farm life,” develop “sense of responsibility in agriculture and home economics,” and “developing life skills.”
Question:	Is it clear who is to be affected, who the audience is?
Conclusion:	There is some lack of agreement between 4-H faculty and the other stakeholders about the audience of 4-H. Written documents identified the audience as youth and adults; any youth between age 8 and 18 was viewed as the traditional audience for the program; recently, 6-and 7-year-olds have been targeted; some informants viewed the adult volunteers who assist with the program as one audience.
Question:	Is there agreement about intended effects?
Conclusion:	Social, mental, and physical development were listed as the program objectives in the state program direction document. There was agreement among all groups and in written documents that the effects of 4-H are primarily social in nature, for example, self-confidence/self-esteem, leadership, citizenship. There was less agreement about its effects on mental development and no agreement about its impact on physical development.
Question:	Is it plausible that the program activities would achieve the intended effects?
Conclusion:	Even if all the activities identified in the program model were implemented according to plan, the plausibility of these leading to the intended program effects is questionable. A link appears to be missing from the program logic—something like “Determine the Curriculum.” Lack of such a link prevents plausible activities in the initial program events; that is, without a curriculum plan, how can county faculty know what types of leaders to recruit, what to train volunteers to do, and what they and the volunteers should implement?

SOURCE: Adapted from Midge F. Smith, Evaluability Assessment: A Practical Approach (Norwell, MA: Kluwer, 1989), p. 91.

· Are the constituent components, activities, and functions of the program well defined and sufficient? A program’s structure and process should be specific enough to permit orderly operations, effective management control, and monitoring by means of attainable, meaningful performance measures. Most critical, the program components and activities should be sufficient and appropriate to attain the intended goals and objectives. A function such as “client advocacy” has little practical significance if no personnel are assigned to it or there is no common understanding of what it means operationally.

· Are the resources allocated to the program and its various activities adequate? Program resources include not only funding but also personnel, material, equipment, facilities, relationships, reputation, and other such assets. There should be a reasonable correspondence between the program as described in the program theory and the resources available for operating it. A program theory that calls for activities and outcomes that are unrealistic relative to available resources cannot be said to be a good theory. For example, a management training program too short-staffed to initiate more than a few brief workshops cannot expect to have a significant impact on management skills in the organization.

Assessment Through Comparison With Research and Practice

Although every program is distinctive in some ways, few are based entirely on unique assumptions about how to engender change, deliver service, and perform major program functions. Some information applicable to assessing the various components of program theory is likely to exist in the social science and human services research literature. One useful approach to assessing program theory, therefore, is to find out whether it is congruent with research evidence and practical experience elsewhere ( Exhibit 5-K summarizes one example of this approach).

There are several ways in which evaluators might compare a program theory with findings from research and practice. The most straightforward is to examine evaluations of programs based on similar concepts. The results will give some indication of the likelihood that a program will be successful and perhaps identify critical problem areas. Evaluations of very similar programs, of course, will be the most informative in this regard. However, evaluation results for programs that are similar only in terms of general theory, even if different in other regards, might also be instructive.

EXHIBIT 5-K GREAT Program Theory Is Consistent With Criminological Research

In 1991 the Phoenix, Arizona, Police Department initiated a program with local educators to provide youths in the elementary grades with the tools necessary to resist becoming gang members. Known as GREAT (Gang Resistance Education and Training), the program has attracted federal funding and is now distributed nationally. The program is taught to seventh graders in schools over nine consecutive weeks by uniformed police officers. It is structured around detailed lesson plans that emphasize teaching youths how to set goals for themselves, how to resist peer pressure, how to resolve conflicts, and how gangs can affect the quality of their lives.

The program has no officially stated theoretical grounding other than Glasser’s (1975) reality therapy, but GREAT training officers and others associated with the program make reference to sociological and psychological concepts as they train GREAT instructors. As part of an analysis of the program’s impact theory, a team of criminal justice researchers identified two well-researched criminological theories relevant to gang participation: Gottfredson and Hirschi’s self-control theory (SCT) and Akers’s social learning theory (SLT). They then reviewed the GREAT lesson plans to assess their consistency with the most pertinent aspects of these theories. To illustrate their findings, a summary of Lesson 4 is provided below with the researchers’ analysis in italics after the lesson description:

Lesson 4. Conflict Resolution: Students learn how to create an atmosphere of understanding that would enable all parties to better address problems and work on solutions together. This lesson includes concepts related to SCT’s anger and aggressive coping strategies. SLT ideas are also present: Instructors present peaceful, nonconfrontational means of resolving conflicts. Part of this lesson deals with giving the student a means of dealing with peer pressure to join gangs and a means of avoiding negative peers with a focus on the positive results (reinforcements) of resolving disagreements by means other than violence. Many of these ideas directly reflect constructs used in previous research on social learning and gangs.

Similar comparisons showed good consistency between the concepts of the criminological theories and the lesson plans for all but one of the eight lessons. The reviewers concluded that the GREAT curriculum contained implicit and explicit linkages both to self-control theory and social learning theory.

SOURCE: Adapted from L. Thomas Winfree, Jr., Finn-Aage Esbensen, and D. Wayne Osgood, “Evaluating a School-Based Gang-Prevention Program: A Theoretical Perspective,” Evaluation Review, 1996, 20(2):181-203.

Consider a mass media campaign in a metropolitan area to encourage women to have mammogram screening for early detection of breast cancer. The impact theory for this program presumes that exposure to TV, radio, and newspaper messages will stimulate a reaction that will eventuate in increased rates of mammogram screening. The credibility of the impact theory assumed to link exposure and increases in testing is enhanced by evidence that similar media campaigns in other cities have resulted in increased mammogram testing. Moreover, the program’s process theory also gains some support if the evaluations for other campaigns shows that the program functions and scheme for delivering messages to the target population were similar to that intended for the program at issue. Suppose, however, that no evaluation results are available about media campaigns promoting mammogram screening in other cities. It might still be informative to examine information about analogous media campaigns. For instance, reports may be available about media campaigns to promote immunizations, dental checkups, or other such actions that are health related and require a visit to a provider. So long as these campaigns involve similar principles, their success might well be relevant to assessing the program theory on which the mammogram campaign is based.

In some instances, basic research on the social and psychological processes central to the program may be available as a framework for assessing the program theory, particularly impact theory. Unfortunately for the evaluation field, relatively little basic research has been done on the social dynamics that are common and important to intervention programs. Where such research exists, however, it can be very useful. For instance, a mass media campaign to encourage mammogram screening involves messages intended to change attitudes and behavior. The large body of basic research in social psychology on attitude change and its relationship to behavior provides some basis for assessing the impact theory for such a media campaign. One established finding is that messages designed to raise fears are generally less effective than those providing positive reasons for a behavior. Thus, an impact theory based on the presumption that increasing awareness of the dangers of breast cancer will prompt increased mammogram screening may not be a good one.

There is also a large applied research literature on media campaigns and related approaches in the field of advertising and marketing. Although this literature largely has to do with selling products and services, it too may provide some basis for assessing the program theory for the breast cancer media campaign. Market segmentation studies, for instance, may show what media and what times of the day are best for reaching women with various demographic profiles. The evaluator can then use this information to examine whether the program’s service utilization plan is optimal for communicating with women whose age and circumstances put them at risk for breast cancer.

Use of the research literature to help with assessment of program theory is not limited to situations of good overall correspondence between the programs or processes the evaluator is investigating and those represented in the research. An alternate approach is to break the theory down into its component parts and linkages and search for research evidence relevant to each component. Much of program theory can be stated as “if-then” propositions: If case managers are assigned, then more services will be provided; if school performance improves, then delinquent behavior will decrease; if teacher-to-student ratios are higher, then students will receive more individual attention. Research may be available that indicates the plausibility of individual propositions of this sort. The results, in turn, can provide a basis for a broader assessment of the theory with the added advantage of identifying any especially weak links. This approach was pioneered by the Program Evaluation and Methodology Division of the U.S. General Accounting Office as a way to provide rapid review of program proposals arising in the Congress (Cordray, 1993; U.S. General Accounting Office, 1990).

Assessment Via Preliminary Observation

Program theory, of course, is inherently conceptual and cannot be observed directly. Nonetheless, it involves many assumptions about how things are supposed to work that an evaluator can assess by observing the program in operation, talking to staff and service recipients, and making other such inquiries focused specifically on the program theory. Indeed, a thorough assessment of program theory should incorporate some firsthand observation and not rely entirely on logical analysis and armchair reviews. Direct observation provides a reality check on the concordance between program theory and the program it is supposed to describe.

Consider a program for which it is assumed that distributing brochures about good nutrition to senior citizens centers will influence the eating behavior of persons over age 65. Observations revealing that the brochures are rarely read by anyone attending the centers would certainly raise a question about the assumption that the target population will be exposed to the information in the brochures, a precondition for any attitude or behavior change.

To assess a program’s impact theory, the evaluator might conduct observations and interviews focusing on the target-program interactions that are expected to produce the intended outcomes. This inquiry would look into whether those outcomes are appropriate for the program circumstances and whether they are realistically attainable. For example, consider the presumption that a welfare-to-work program can enable a large proportion of welfare clients to find and maintain employment. To gauge how realistic the intended program outcomes are, the evaluator might examine the local job market, the work readiness of the welfare population (number physically and mentally fit, skill levels, work histories, motivation), and the economic benefits of working relative to staying on welfare. At the service end of the change process, the evaluator might observe job training activities and conduct interviews with participants to assess the likelihood that the intended changes would occur.

To test the service utilization component of a program’s process theory, the evaluator could examine the circumstances of the target population to better understand how and why they might become engaged with the program. This information would permit an assessment of the quality of the program’s service delivery plan for locating, recruiting, and serving the intended clientele. To assess the service utilization plan of a midnight basketball program to reduce delinquency among high-risk youths, for instance, the evaluator might observe the program activities and interview participants, program staff, and neighborhood youths about who participates and how regularly. The program’s service utilization assumptions would be supported by indications that the most delinquent-prone youths participate regularly in the program.

Finally, the evaluator might assess the plausibility of the organizational component of the program’s process theory through observations and interviews relating to program activities and the supporting resources. Critical here is evidence that the program can actually perform the intended functions. Consider, for instance, a program plan that calls for the sixth-grade science teachers throughout a school district to take their classes on two science-related field trips per year. The evaluator could probe the presumption that this would actually be done by interviewing a number of teachers and principals to find out the feasibility of scheduling, the availability of buses and funding, and the like.

Note that any assessment of program theory that involves collection of new data could easily turn into a full-scale investigation of whether what was presumed in the theory actually happened. Indeed, an empirical “theory testing” study is one obvious approach to assessing program theory (see, e.g., Bickman, 1990; Exhibit 5-L gives an example). Here, however, our focus is on the task of assessing the soundness of the program theory description as a plan, that is, as a statement of the program as intended rather than as a statement of what is actually happening (that assessment comes later). In recognizing the role of observation and interview in the process, we are not suggesting that theory assessment necessarily requires a full evaluation of the program. Instead, we are suggesting that some appropriately configured contact with the program activities, target population, and related situations and informants can provide the evaluator with valuable information about how plausible and realistic the program theory is.

EXHIBIT 5-L Testing a Model of Patient Education for Diabetes

The daily management of diabetes involves a complex interaction of metabolic variables, self-care behaviors, and psychological and social adjustments to having the disease. An important component of treatment for diabetes, therefore, is the instruction of patients so that they have the skills and knowledge required to do their part. A team of university medical researchers with a particular interest in the personal meaning to patients of having diabetes formulated an impact component theory for the effects of patient education, which they diagrammed as follows:

The researchers investigated this model by examining the correlations representing some of the key hypothesized relationships on survey data collected from a sample of 220 people with diabetes recruited from clinics in several states. The data were analyzed using a structural equation analysis which showed only an approximate fit to the model. The relationships between the “personal meaning of diabetes” variables and “psychosocial adaptation” were strong, as were those between knowledge and self-care behavior. However, other relationships in the model were equivocal. The researchers’ conclusion was, “While the results showed that the data did not fit the proposed model well enough to allow for definitive conclusions, the results are generally supportive of the original hypothesis that the personal meaning of diabetes is an important element in the daily management of diabetes and the psychosocial adjustment to the disease.”

SOURCE: Adapted from George A. Nowacek, Patrick M. O’Malley, Robert A. Anderson, and Fredrick E. Richards, “Testing a Model of Diabetes Self-Care Management: A Causal Model Analysis With LISREL,” Evaluation & the Health Professions, 1990, 13(3):298-314.

5.5 Possible Outcomes of Program Theory Assessment

A program whose conceptualization is weak or faulty has little prospect for success even if it adequately operationalizes that conceptualization. Thus, if the program theory is not sound, there is little reason to assess other evaluation issues, such as the program’s implementation, impact, or efficiency. Within the framework of evaluability assessment, finding that the program theory is poorly defined or seriously flawed indicates that the program simply is not evaluable.

When assessment of program theory reveals deficiencies in the program theory, one appropriate response is for the responsible parties to redesign the program. Such program reconceptualization may include (1) clarifying goals and objectives;

(2) restructuring program components for which the intended activities are not happening, needed, or reasonable; and (3) working with stakeholders to obtain consensus about the logic that connects program activities with the desired outcomes. The evaluator may help in this process as a consultant.

If an evaluation of program process or impact goes forward without articulation of a credible program theory, then a certain amount of ambiguity will be inherent in the results. This ambiguity is potentially twofold. First, if program process theory is not well defined, there is ambiguity about what the program is expected to be doing operationally. This complicates the identification of criteria for judging how well the program is implemented. Such criteria must then be established individually for the various key program functions through some piecemeal process. For instance, administrative criteria may be stipulated regarding the number of clients to serve, the amount of service to provide, and the like, but they will not be integrated into an overall plan for the program.

Second, if there is no adequate specification of the program impact theory, an impact evaluation may be able to determine whether certain outcomes were produced (see Chapters 7- 10), but it will be difficult to explain why or—often more important— why not. Poorly specified impact theory limits the ability to identify or measure the intervening variables on which the outcomes may depend and correspondingly, the ability to explain what went right or wrong in producing the expected outcomes. If program process theory is also poorly specified, it will not even be possible to adequately describe the nature of the program that produced, or failed to produce, the outcomes of interest. Evaluation under these circumstances is often referred to as black box evaluation to indicate that assessment of outcomes is made without much insight into what is causing those outcomes.

Only a well-defined and well-justified program theory permits ready identification of critical program functions and what is supposed to happen as a result. This structure provides meaningful benchmarks against which both managers and evaluators can compare actual program performance. The framework of program theory, therefore, gives the program a blueprint for effective management and gives the evaluator guidance for designing the process, impact, and efficiency evaluations described in subsequent chapters.

6.1 What Is Program Process Evaluation and Monitoring?

As was suggested in Chapter 2, evaluators often distinguish between process (or implementation) evaluation and outcome (or impact) evaluation. Process evaluation, in Scheirer’s (1994) words, “verifies what the program is and whether or not it is delivered as intended to the targeted recipients. ”It does not, however, attempt to assess the effects of the program on those recipients. Such assessment is the province of impact evaluation, which we consider in later chapters.

Where process evaluation is an ongoing function involving repeated measurements over time, it is referred to as program monitoring. Corresponding to the distinction between process and outcome evaluation, program process monitoring is the systematic and continual documentation of key aspects of program performance that assesses whether the program is operating as intended or according to some appropriate standard, whereas outcome monitoring is the continual measurement of intended outcomes of the program, usually of the social conditions it is intended to improve. We discuss outcome monitoring in conjunction with impact evaluations later in this book.

Program process evaluation generally involves assessments of program performance in the domains of service utilization and program organization. Assessing service utilization consists of examining the extent to which the intended target population receives the intended services. Assessing program organization requires comparing the plan for what the program should be doing with what is actually done, especially with regard to providing services. Usually, program process evaluation is directed at one or both of two key questions: (1) whether a program is reaching the appropriate target population and (2) whether its service delivery and support functions are consistent with program design specifications or other appropriate standards. Process evaluation may also examine what resources are being or have been expended in the conduct of the program.

More specifically, program process evaluation schemes are designed to answer such evaluation questions as these:

· How many persons are receiving services?

· Are those receiving services the intended targets?

· Are they receiving the proper amount, type, and quality of services?

· Are there targets who are not receiving services or subgroups within the target population who are underrepresented among those receiving services?

· Are members of the target population aware of the program?

· Are necessary program functions being performed adequately?

· Is staffing sufficient in numbers and competencies for the functions that must be performed?

· Is the program well organized? Do staff work well with each other?

· Does the program coordinate effectively with the other programs and agencies with which it must interact?

· Are resources, facilities, and funding adequate to support important program functions?

· Are resources used effectively and efficiently?

· Is the program in compliance with requirements imposed by its governing board, funding agencies, and higher-level administration?

· Is the program in compliance with applicable professional and legal standards?

· Is performance at some program sites or locales significantly better or poorer than at others?

· Are participants satisfied with their interactions with program personnel and procedures?

· Are participants satisfied with the services they receive?

· Do participants engage in appropriate follow-up behavior after service?

Setting Criteria for Judging Program Process

It is important to recognize the evaluative themes in process evaluation questions such as those listed above. Virtually all involve words such as appropriate, adequate, sufficient, satisfactory, reasonable, intended, and other phrasing indicating that an evaluative judgment is required. To answer these questions, therefore, the evaluator or other responsible parties must not only describe the program’s performance but also assess whether it is satisfactory. This, in turn, requires that there be some bases for making judgments, that is, some defensible criteria or standards to apply. Where such criteria are not already articulated and endorsed, the evaluator may find that establishing workable criteria is as difficult as determining program performance on the pertinent dimensions.

There are several approaches to the matter of setting criteria for program performance. Moreover, different approaches will likely apply to different dimensions of program performance because the considerations that go into defining, say, what constitutes an appropriate number of clients served are quite different from those pertinent to deciding what constitutes an adequate level of resources. This said, the approach to the criterion issue that has the broadest scope and most general utility in program process evaluation is the application of program theory as described in Chapter 5.

Recall that program theory, as we presented it, is divided into program process theory and program impact theory. Program process theory is formulated to describe the program as intended in a form that virtually constitutes a plan or blueprint for what the program is expected to do and how. As such, it is particularly relevant to program process evaluation. Recall also that program theory builds on needs assessment (whether systematic or informal) and thus connects the program design with the social conditions the program is intended to ameliorate. And, of course, the process through which theory is derived and adopted usually involves input from the major stakeholders and, ultimately, their endorsement. Program theory thus has a certain authority in delineating what a program “should” be doing and, correspondingly, what constitutes adequate performance.

Program process evaluation, therefore, can be built on the scaffolding of program process theory. Process theory identifies the aspects of program performance that are most important to describe and also provides some indication of what level of performance is intended, thereby providing the basis for assessing whether actual performance measures up. Exhibit 5-E in the previous chapter, for instance, illustrates the service utilization component of the program process theory for an aftercare program for released psychiatric patients. This flowchart depicts, step by step, the interactions and experiences patients released from the hospital are supposed to have as a result of program service. A thorough monitoring procedure would systematically document what actually happened at each step. In particular, it would report how many patients were released from the hospital each month, what proportion were visited by a social worker, how many were referred to services and which services, how many actually received those services, and so forth.

If the program processes that are supposed to happen do not happen, then we would judge the program’s performance to be poor. In actuality, of course, the situation is rarely so simple. Most often, critical events will not occur in an all-or-none fashion but will be attained to some higher or lower degree. Thus, some, but not all, of the released patients will receive visits from social workers, some will be referred to services, and so forth. Moreover, there may be important quality dimensions. For instance, it would not represent good program performance if a released patient was referred to several community services, but these services were inappropriate to the patient’s needs. To determine how much must be done, or how well, we need additional criteria that parallel the information the monitoring procedure provides. If the monitoring procedure reports that 63% of the released patients are visited by a social worker within two weeks of release, we cannot evaluate that performance without some standard that tells us what percentage is “good.” Is 63% a poor performance, given that we might expect 100% to be desirable, or is it a very impressive performance with a clientele that is difficult to locate and serve?

The most common and widely applicable criteria for such situations are simply administrative standards or objectives, that is, stipulated achievement levels set by program administrators or other responsible parties. For example, the director and staff of a job training program may commit to attaining 80% completion rates for the training or to having 60% of the participants permanently employed six months after receiving training. For the aftercare program above, the administrative target might be to have 75% of the patients visited within two weeks of release from the hospital. By this standard, the 63% found with program monitoring is a subpar performance that, nonetheless, is not too far below the mark.

Administrative standards and objectives for program process performance may be set on the basis of past experience, the performance of comparable programs, or simply the professional judgment of program managers or advisers. If they are reasonably justified, they can provide meaningful standards against which to assess observed program performance. In a related vein, some aspects of program performance may fall under applicable legal, ethical, or professional standards. The “standards of care” adopted in medical practice for treating common ailments, for instance, provide a set of criteria against which to assess program performance in health care settings. Similarly, state children’s protective services almost always have legal requirements to meet concerning handling cases of possible child abuse or neglect.

In practice, the assessment of particular dimensions of program process performance is often not based on specific, predetermined criteria but represents an after-the-fact judgment call. This is the “I’ll know it when I see it” school of thought on what constitutes good program performance. An evaluator who collects process data on, say, the proportion of high-risk adolescents who recall seeing program-sponsored antidrug media messages may find program staff and other key stakeholders resistant to stating what an acceptable proportion would be. If the results come in at 50%, however, a consensus may arise that this is rather good considering the nature of the population, even though some stakeholders might have reported much higher expectations prior to seeing the data. Other findings, such as 40% or 60%, might also be considered rather good. Only extreme findings, say 10%, might strike all stakeholders as distressingly low. In short, without specific before-measurement criteria a wide range of performance might be regarded as acceptable. Of course, assessment procedures that are too flexible and that lead to a “pass” for all tend to be useless.

Very similar considerations apply to the organizational component of the process theory. A depiction of the organizational plan for the aftercare program was presented in Exhibit 5-F in Chapter 5. Looking back at it will reveal that it, too, identifies dimensions of program performance that can be described and assessed against appropriate standards. Under that plan, for instance, case managers are expected to interview clients and families, assess service needs, make referrals to services, and so forth. A program process evaluation would document and assess what was done under each of those categories.

Common Forms of Program Process Evaluations

Description and assessment of program process are quite common in program evaluation, but the approaches used are varied, as is the terminology they employ. Such assessments may be conducted as a one-shot endeavor or may be continuous so that information is produced regularly over an extended period of time, as in program process monitoring. They may be conducted by “outside” or “inside” evaluators or be set up as management tools with little involvement by professional evaluators. Moreover, their purpose may be to provide feedback for managerial purposes, to demonstrate accountability to sponsors and decisionmakers, to provide a freestanding process evaluation, or to augment an impact evaluation. Amid this variety, we distinguish two principal forms of program process studies, process or implementation evaluation and continuous program monitoring.

Process or Implementation Evaluation

Process or implementation evaluation is typically conducted by evaluation specialists as a separate project that may involve program personnel but is not integrated into their daily routine. When completed and, often, while under way, process evaluation generally provides information about program performance to program managers and other stakeholders, but is not a regular and continuing part of a program’s operation. Exhibit 6-A describes a process evaluation of an integrated services program for children.

As an evaluation approach, process evaluation plays two major roles. First, it can stand alone as an evaluation of a program in circumstances where the only questions at issue are about the integrity of program operations, service delivery, and other such matters. There are several kinds of situations that fit this description. A stand-alone process evaluation might be appropriate for a relatively new program, for instance, to answer questions about how well it has established its intended operations and services. Program process is often the focus of formative evaluation designed to provide useful feedback to managers and sponsors of new programs. In the case of a more established program, a process evaluation might be called for when questions arise about how well the program is organized, the quality of its services, or the success with which it is reaching the target population. A process evaluation may also constitute the major evaluation approach to a program charged with delivering a service known or presumed effective, so that the most significant performance issue is whether that service is being delivered properly. In a managed care environment, for instance, process evaluation may be employed to assess whether the prescribed medical treatment protocols are being followed for patients in different diagnostic categories.

EXHIBIT 6-A Process Evaluation to Assess Integrated Services for Children

Many analysts have observed that the traditional system of categorical funding for children’s services, with funds allocated to respond to specific problems under strict rules regarding eligibility and expenditures, has not served children’s needs well. The critics argue that this system fragments services and inhibits collaboration between programs that might otherwise lead to more effective services. In 1991, the Robert Wood Johnson Foundation launched the Child Health Initiative to test the feasibility of achieving systemic changes through the integration of children’s services and finances. Specifically, the initiative called for the development of the following components:

· A decategorization mechanism that would pool existing categorical program funds and create a single children’s health fund

· A care coordination procedure using case management that would use the pooled funds to provide comprehensive and continuous care for needy children

· A monitoring system that would identify the health and related needs of children in the community and the gaps in existing services

Nine sites across the country were selected to launch demonstration programs. The Institute for Health Policy Studies, University of California, San Francisco, conducted an evaluation of these programs with two major goals: (1) to gauge the degree to which the implementation of the projects was consistent with the original planning objectives (fidelity to the model) and (2) to assess the extent to which each of the major program components was implemented. In the first year, the evaluation focused on the political, organizational, and design phase of program development. During subsequent years, the focus turned to implementation and preliminary outcomes. A combination of methods was used, including site visits, written surveys completed by the program managers, in-depth interviews of key participants, focus groups of service providers and clients, and reviews of project-related documents.

The evaluation found that most of the nine sites experienced some degree of success in implementing the monitoring and care coordination components, but none was able to implement decategorization. The general findings for each component were as follows:

· Decategorization: Several sites successfully created small pools of flexible funds, but these were from sources other than categorical program funds. No site was able to fully implement decategorization under the definitions originally adopted.

· Care coordination: This was implemented successfully by most of the sites at the client level through case management, but there was generally less coordination at the system level.

· Monitoring: The sites encountered a number of barriers in successfully completing this task, but most instituted some appropriate process.

SOURCE: Adapted from Claire Brindis, Dana C. Hughes, Neal Halfon, and Paul W. Newacheck, “The Use of Formative Evaluation to Assess Integrated Services for Children,” Evaluation & the Health Professions, 1998, 21(1):66-90.

The second major role of process or implementation evaluation is as a complement to an impact evaluation. Indeed, it is generally not advisable to conduct an impact evaluation without including at least a minimal process evaluation. Because maintaining an operational program and delivering appropriate services on an ongoing basis are formidable challenges, it is not generally wise to take program implementation for granted. A full impact evaluation, therefore, includes a process component to determine what quality and quantity of services the program provides so that this information can be integrated with findings on what impact those services have.

Continuous Program Process Evaluation (Monitoring) and Management Information Systems

The second broad form of program process evaluation consists of continuous monitoring of indicators of selected aspects of program process. Such process monitoring can be a useful tool for facilitating effective management of social programs by providing regular feedback about how well the program is performing its critical functions. This type of feedback allows managers to take corrective action when problems arise and can also provide stakeholders with regular assessments of program performance. For these reasons, a form of process assessment is often integrated into the routine information systems of social programs so that appropriate data are obtained, compiled, and periodically summarized. In such cases, process evaluation becomes coextensive with the management information system (MIS) in a human service program. Exhibit 6-B describes an MIS that was developed for a marital and family counseling program.

MISs routinely provide information on a client-by-client basis about services provided, staff providing the services, diagnosis or reasons for program participation, sociodemographic data, treatments and their costs, outcome status, and so on. Some of the systems bill clients (or funders), issue payments for services, and store other information, such as a client’s treatment history and current participation in other programs. MISs have become the major data source in many instances because much of the information that otherwise would have to be gathered in data collection for process monitoring is available in the program’s MIS. Even when a program’s MIS is not configured to completely fulfill the requirements of a thoroughgoing process evaluation, it may nonetheless provide a large portion of the information an evaluator needs for such purposes. MISs can thus supply data that can be used by both managers and evaluators.

EXHIBIT 6-B An Integrated Information System for a Family and Marriage Counseling Agency in Israel

The Marital and Family Counselling Agency is run under the joint auspices of the Tel Aviv Welfare Department and the School of Social Work at Tel Aviv University. The agency provides marital and family counseling and community services for the Jewish, Muslim, and Christian residents of one of the poorest sections of Tel Aviv.

The integrated information system developed for the agency is designed to follow up clients from the moment they request help to the end of treatment. It is intended to serve the agency and the individual counselors by monitoring the process and outcomes of treatment and providing the data needed to make organizational and clinical decisions. To accomplish this, data are collected on three forms and then programmed into the computerized information system. The data elements include the following:

· Background data provided by the client, for example, sociodemographic characteristics, medical and psychological treatment history, the problems for which they are seeking help, the urgency of those problems, their expectations from treatment, and how they found out about the clinic.

· The McMaster Clinical Rating Scale, a standardized scale that monitors families on the basis of six dimensions of family functioning and overall family health; the counselors fill out this form once a month for each client.

· Retrospective evaluation forms filled out after treatment is completed, one by the counselors and another by the clients. This includes, for example, factual questions about the treatment such as its duration, the problems dealt with, the degree to which the client and counselor agreed on the problems, whether there were issues not addressed and why. It also includes retrospective assessments of the process, evaluations of improvement in the presented problems and the McMaster areas of functioning, and client and counselor satisfaction with the process and outcomes.

The counselors can enter and retrieve data from this system whenever they wish and are given a graph of each client’s status every three months to support clinical decisions. Also, reports are generated for the clinic’s management. For example, a report of the distribution of clients by ethnic group led to the development of a program located within Arab community centers to better reach that population. Other management reports describe the ways and times at which treatment is terminated, the problems that brought clients to the agency, and the percentage of people who applied for treatment but did not show up for the first session. The information system has also been used for research purposes. For example, studies were conducted on the predictors of treatment success, the comparative perceptions by clients and counselors of the treatment process and outcomes, and gender differences in presenting problems.

SOURCE: Adapted from Rivka Savaya, “The Potential and Utilization of an Integrated Information System at a Family and Marriage Counselling Agency in Israel,” Evaluation and Program Planning, 1998, 21(1):11-20.

6.2 Perspectives on Program Process Monitoring

There is and should be considerable overlap in the purposes of process monitoring whether they are driven by the information needs of evaluators, program managers and staff, or policymakers, sponsors, and stakeholders. Ideally, the monitoring activities undertaken as part of evaluation should meet the information needs of all these groups. In practice, however, limitations on time and resources may require giving priority to one set of information needs over another. Although there are many exceptions, the perspectives of the three key “consumer groups” on the purposes of program monitoring typically vary. These differences in perspective apply generally to outcome monitoring as well.

Process Monitoring From the Evaluator’s Perspective

A number of practical considerations underlie the need for evaluation researchers to monitor program process. All too often a program’s impact is sharply diminished and, indeed, sometimes reduced to zero because the appropriate intervention was not delivered, was not delivered to the right targets, or both. We believe that more program failures are due to such implementation problems than to lack of potentially effective services. Process monitoring studies, therefore, are essential to understanding and interpreting impact findings. Knowing what took place is a prerequisite for explaining or hypothesizing why a program did or did not work. Without process monitoring, the evaluator is engaged in “black box” research with no basis for deciding whether a larger dose of the program or a different means of delivering the intervention would have changed the impact results.

Process Monitoring From an Accountability Perspective

Process monitoring information is also critical for those who sponsor and fund programs. Program managers have a responsibility to inform their sponsors and funders of the activities undertaken, the degree of implementation of programs, the problems encountered, and what the future holds (see Exhibit 6-C for one perspective on this matter). However, evaluators frequently are mandated to provide the same or similar information. Indeed, in some cases the sponsors and funders of programs perceive program evaluators as “their eyes and ears,” as a second line of information on what is going on in a particular program.

EXHIBIT 6-C Program and Service Utilization Studies

Any service organization, especially in an era of shrinking resources, needs to evaluate its services and activities. Through these evaluative activities, an organization can develop and maintain the flexibility needed to respond to an ever-changing environment. It has been suggested that, even in an ideal world, an organization needs to be self-evaluating. Self-evaluation requires an organization to continually review its own activities and goals and to use the results to modify, if necessary, its programs, goals, and directions.

Within an agency, the essential function of evaluation is to provide data on goal achievement and program effectiveness to a primary audience consisting of administration, middle management, and the governing board. This primary audience, especially the administration and board, is frequently confronted with inquiries from important sources in the external environment, such as legislators and funding agencies. These inquiries often focus on issues of client utilization, accessibility, continuity, comprehension, outcome or effectiveness, and cost. The building block of this information is the patterns of use or client utilization study. The patterns of use study, whether it consists of simple inquiries or highly detailed, sophisticated investigations, is basically a description. It describes who uses services and how. It becomes evaluative when it is related to the requirements or purposes of the organization.

SOURCE: Adapted from G. Landsberg, “Program Utilization and Service Utilization Studies: A Key Tool for Evaluation,” New Directions for Program Evaluation, no. 20 (San Francisco: Jossey-Bass, December 1983), pp. 93-103.

Government sponsors and funding groups, including Congress, operate in the glare of the mass media. Their actions are also visible to the legislative groups who authorize programs and to government “watchdog” organizations. For example, at the federal level, the Office of Management and Budget, part of the executive branch, wields considerable authority over program development, funding, and expenditures. The U.S. General Accounting Office, an arm of Congress, advises members of the House and Senate on the utility of programs and in some cases conducts evaluations. Both state governments and those of large cities have analogous oversight groups. No social program that receives outside funding, whether public or private, can expect to avoid scrutiny and escape demand for accountability.

In addition to funders and sponsors, other stakeholders may press for program accountability. In the face of taxpayers’ reservations about spending for social programs, together with the increased competition for resources often resulting from cuts in available funding, all stakeholders are scrutinizing both the programs they support and those they do not. Concerned parties use process monitoring information to lobby for the expansion of programs they advocate or find congenial with their self-interests and the curtailment or abandonment of those programs they disdain. Stakeholders, it should be noted, include the targets themselves. A dramatic illustration of their perspective occurred when President Ronald Reagan telephoned an artificial heart recipient patient to wish him well and, with all of the country listening, the patient complained about not receiving his Social Security check.

Clearly, social programs operate in a political world. It could hardly be otherwise, given the stakes involved. The human and social service industry is not only huge in dollar volume and number of persons employed but is also laden with ideological and emotional baggage. Programs are often supported or opposed by armies of vocal community members; indeed, the social program sector is comparable only to the defense industry in its lobbying efforts, and the stands that politicians take with respect to particular programs often determine their fates in elections. Accountability information is a major weapon that stakeholders use in their battles as advocates and antagonists.

Process Monitoring From a Management Perspective

Management-oriented process monitoring (including use of MISs) often is concerned with the same questions as program process or accountability studies; the differences lie in the purposes to which the findings are to be put. Evaluators’ interest in monitoring data generally centers on determining how a program’s potential impact is related to its implementation. Accountability studies primarily provide information that decisionmakers, sponsors, and other stakeholders need to judge the appropriateness of program activities and to decide whether a program should be continued, expanded, or contracted. Such studies may use the same information base employed by program management staff, but they are usually conducted in a critical spirit. In contrast, management-oriented monitoring activities are concerned less with making decisive judgments and more with incorporating corrective measures as a regular part of program operations.

Process monitoring from a management perspective is particularly vital during the implementation and pilot testing of new programs, especially innovative ones. No matter how well planned such programs may be, unexpected results and unwanted side effects often surface early in the course of implementation. Program designers and managers need to know rapidly and fully about these problems so that changes can be made as soon as possible in the program design. Suppose, for example, that a medical clinic intended to help working mothers is open only during daylight hours. Monitoring may disclose that, however great the demand is for clinic services, the clinic’s hours of operation effectively screen out most of the target population. Or suppose that a program is predicated on the assumption that severe psychological problems are prevalent among children who act out in school. If it is found early on that most such children do not in fact have serious disorders, the program can be modified accordingly.

For programs that have moved beyond the development stage to actual operation, program process monitoring serves management needs by providing information on process and coverage (the extent to which a program reaches its intended targets), and hence feedback on whether the program is meeting specifications. Fine-tuning of the program may be necessary when monitoring information indicates that targets are not being reached, that the implementation of the program costs more than initially projected, or that staff workloads are either too heavy or too light. Managers who neglect to monitor a program fully and systematically risk the danger of administering a program that is markedly different from its mandate.

Where monitoring information is to be used for both managerial and evaluation purposes, some problems must be anticipated. How much information is sensible to collect and report, in what forms, at what frequency, with what reliability, and with what degree of confidentiality are among the major issues on which evaluators and managers may disagree. For example, the experienced manager of a nonprofit children’s recreational program may feel that the highest priority is weekly information on attendance. The evaluator, however, may be comfortable with aggregating the data monthly or even quarterly, but may believe that before being reported they should be adjusted to take into account variations in the weather, occurrence of holidays, and so on—even though the necessary adjustments require the use of sophisticated statistical procedures.

A second concern is the matter of proprietary claims on the data. For the manager, monitoring data on, say, the results of a program innovation should be kept confidential until discussed with the research committee of the board of directors and presented at the board meeting. The evaluator may wish immediately to write a paper for publication in the American Journal of Evaluation. Or a serious drop in clients from a particular ethnic group may result in the administrator of a program immediately replacing the director of professional services, whereas the evaluator’s reaction may be to do a study to determine why the drop occurred. As with all relations between program staff and evaluators in general, negotiation of these matters is essential.

A warning: There are many aspects of program management and administration (such as complying with tax regulations and employment laws or negotiating union contracts) that few evaluators have any special competence to assess. In fact, evaluators trained in social science disciplines and (especially) those primarily involved in academic careers may be unqualified to manage anything. It is wise to keep in mind that the evaluator’s role, even when sharing information from an MIS, is not to join the administrators in the running of the organization.

In the remainder of this chapter, we concentrate on the concepts and methods pertinent to monitoring program process in the domains of service utilization and program organization. It is in this area that the competencies of persons trained in social research are most relevant.

6.3 Monitoring Service Utilization

A critical issue in program process monitoring is ascertaining the extent to which the intended targets actually receive program services. Target participation concerns both program managers and sponsors. Managing a project effectively requires that target participation be kept at an acceptable level and corrective action be taken if it falls below that level.

Monitoring of service utilization is particularly critical for interventions in which program participation is voluntary or in which participants must learn new procedures, change their habits, or take instruction. For example, community mental health centers designed to provide a broad range of services often fail to attract a significant proportion of those who could benefit from their services. Even homeless patients who had been recently discharged from psychiatric hospitals and encouraged to make use of the services of community mental health centers often failed to contact the centers (Rossi, Fisher, and Willis, 1986). Similarly, a program designed to provide information to prospective home buyers might find that few persons seek the services offered. Hence, program developers need to be concerned with how best to motivate potential targets to seek out the program and participate in it. Depending on the particular case, they might, for example, need to build outreach efforts into the program or pay special attention to the geographic placement of program sites (Boruch, Dennis, and Carter-Greer, 1988).

Coverage and Bias

Service utilization issues typically break down into questions about coverage and bias. Whereas coverage refers to the extent to which participation by the target population achieves the levels specified in the program design, bias is the degree to which some subgroups participate in greater proportions than others. Clearly, coverage and bias are related. A program that reaches all projected participants and no others is obviously not biased in its coverage. But because few social programs ever achieve total coverage, bias is typically an issue.

Bias can arise out of self-selection; that is, some subgroups may voluntarily participate more frequently than others. It can also derive from program actions. For instance, a program’s personnel may react favorably to some clients while rejecting or discouraging others. One temptation commonly faced by programs is to select the most “success prone” targets. Such “creaming” frequently occurs because of the self-interests of one or more stakeholders (a dramatic example is described in 6-D). Finally, bias may result from such unforeseen influences as the location of a program office, which may encourage greater participation by a subgroup that enjoys more convenient access to program activities.

Although there are many social programs, such as food stamps, that aspire to serve all or a very large proportion of a defined target population, typically programs do not have the resources to provide services to more than a portion of potential targets. In the latter case, the target definition established during the planning and development of the program frequently is not specific enough. Program staff and sponsors can correct this problem by defining the characteristics of the target population more sharply and by using resources more effectively. For example, establishing a health center to provide medical services to persons in a defined community who do not have regular sources of care may result in such an overwhelming demand that many of those who want services cannot be accommodated. The solution might be to add eligibility criteria that weight such factors as severity of the health problem, family size, age, and income to reduce the size of the target population to manageable proportions while still serving the neediest persons. In some programs, such as WIC (Women’s, Infants and Children Nutrition Program) or housing vouchers for the poor, undercoverage is a systemic problem; Congress has never provided sufficient funding to cover all who were eligible, perhaps hoping that budgets could be expanded in the future.

The opposite effect, overcoverage, also occurs. For instance, the TV program Sesame Street has consistently captured audiences far exceeding the intended targets (disadvantaged preschoolers), including children who are not at all disadvantaged and even adults. Because these additional audiences are reached at no additional cost, this overcoverage is not a financial drain. It does, however, thwart one of Sesame Street’s original goals, which was to lessen the gap in learning between advantaged and disadvantaged children.

In other instances, overcoverage can be costly and problematic. Bilingual programs in schools, for instance, have often been found to include many students whose primary language is English. Some school systems whose funding from the program depends on the number of children enrolled in bilingual classes have inflated attendance figures by registering inappropriate students. In other cases, schools have used assignment to bilingual instruction as a means of ridding classes of “problem children,” thus saturating bilingual classes with disciplinary cases.

EXHIBIT 6-D “Creaming” the Unemployed

When administrators who provide public services choose to provide a disproportionate share of program benefits to the most advantaged segment of the population they serve, they provide grist for the mill of service utilization research. The U.S. Employment Service (USES) offers a clear and significant example of creaming, a practice that has survived half a century of USES expansion, contraction, and reorganization. The USES has as its major aim to provide employers with workers, downplaying the purpose of providing workers with work. This leads the USES to send out the best prospects among the unemployed and to slight the less promising.

It is hardly surprising that USES administrators, a generation after the establishment of the program, stressed the necessity rather than the desirability of an employer-centered service. Its success, by design, depended on serving employers, not the “hard-core” unemployed. As President Lyndon Johnson’s task force on urban employment problems noted some two weeks before the 1965 Watts riots, “We have yet to make any significant progress in reaching and helping the truly ‘hard-core’ disadvantaged.”

SOURCE: Adapted from David B. Robertson, “Program Implementation Versus Program Design,” Policy Study Review, 1984, 3:391-405.

The most common coverage problem in social interventions, however, is the failure to achieve high target participation, either because of bias in the way targets are recruited or retained or because potential clients are unaware of the program, are unable to use it, or reject it. For example, in most employment training programs only small minorities of those eligible by reason of unemployment ever attempt to participate. Similar situations occur in mental health, substance abuse, and numerous other programs (see Exhibit 6-E).We turn now to the question of how program coverage and bias might be measured as a part of program process monitoring.

Measuring and Monitoring Coverage

Program managers and sponsors alike need to be concerned with both undercoverage and overcoverage. Undercoverage is measured by the proportion of the targets in need of a program that actually participates in it. Overcoverage is often expressed as the number of program participants who are not in need, compared with the total number of participants in the program. Efficient use of program resources requires both maximizing the number served who are in need and minimizing the number served who are not in need.

EXHIBIT 6-E The Coverage of the Food Stamp Program for the Homeless

Based upon a rigorously designed survey of homeless persons sampled from shelters and food kitchens in American cities with a population of 100,000 and over, Burt and Cohen gave some precise dimensions to what we know is true virtually by definition: The homeless live on food intakes that are inadequate both in quantity and in nutritional content. There is no way that a demographic group whose incomes hover slightly above zero can have adequate diets. That the homeless do not starve is largely a tribute to the food kitchens and shelters that provide them with meals at no cost.

Because most homeless persons are eligible by income for food stamps, their participation rates in that program should be high. But they are not: Burt and Cohen reported that only 18% of the persons sampled were receiving food stamps and almost half had never used them. This is largely because certification for food stamps requires passing a means test, a procedure that requires some documentation. This is not easy for many homeless, who may not have the required documents, an address to receive the stamps, or the capability to fill out the forms.

Moreover, the food stamp program is based on implicit assumptions that participants can readily acquire their foodstuffs in a local food store, prepare servings on a stove, and store food supplies in their dwellings. These assumptions do not apply to the homeless. Of course, food stores do sell some food items that can be consumed without preparation and, with some ingenuity, a full meal of such foods can be assembled. So some benefit can be obtained by the homeless from food stamps, but for most homeless persons food stamps are relatively useless.

Legislation passed in 1986 allowed homeless persons to exchange food stamps for meals offered by nonprofit organizations and made shelter residents in places where meals were served eligible for food stamps. By surveying food providers, shelters, and food kitchens, however, Burt and Cohen found that few meal providers had applied for certification as receivers of food stamps. Of the roughly 3,000 food providers in the sample, only 40 had become authorized.

Furthermore, among those authorized to receive food stamps, the majority had never started to collect food stamps or had started and then abandoned the practice. It made little sense to collect food stamps as payment for meals that otherwise were provided free so that, on the same food lines, food stamp participants were asked to pay for their food with stamps while nonparticipants paid nothing. The only food provider who was able to use the system was one that required either cash payment or labor for meals; for this program, food stamps became a substitute for these payments.

SOURCE: Based on Martha Burt and Barbara Cohen, Feeding the Homeless: Does the Prepared Meals Provision Help? Report to Congress on the Prepared Meal Provision, vols. I and II (Washington, DC: Urban Institute, 1988). Reprinted with permission.

The problem in measuring coverage is almost always the inability to specify the number in need, that is, the magnitude of the target population. The needs assessment procedures described in Chapter 4,if carried out as an integral part of program planning, usually minimize this problem. In addition, three sources of information can be used to assess the extent to which a program is serving the appropriate target population: program records, surveys of program participants, and community surveys.

Program Records

Almost all programs keep records on targets served. Data from well-maintained record systems—particularly from MISs—can often be used to estimate program bias or overcoverage. For instance, information on the various screening criteria for program intake may be tabulated to determine whether the units served are the ones specified in the program’s design. Suppose the targets of a family planning program are women less than 50 years of age who have been residents of the community for at least six months and who have two or more children under age ten. Records of program participants can be examined to see whether the women actually served are within the eligibility limits and the degree to which particular age or parity groups are under- or overrepresented. Such an analysis might also disclose bias in program participation in terms of the eligibility characteristics or combinations of them. Another example, involving public shelter utilization by the homeless, is described in 6-F.

However, programs differ widely in the quality and extensiveness of their records and in the sophistication involved in storing and maintaining them. Moreover, the feasibility of maintaining complete, ongoing record systems for all program participants varies with the nature of the intervention and the available resources. In the case of medical and mental health systems, for example, sophisticated, computerized management and client information systems have been developed for managed care purposes that would be impractical for many other types of programs.

In measuring target participation, the main concerns are that the data are accurate and reliable. It should be noted that all record systems are subject to some degree of error. Some records will contain incorrect or outdated information, and others will be incomplete. The extent to which unreliable records can be used for decision making depends on the kind and degree of their unreliability and the nature of the decisions in question. Clearly, critical decisions involving significant outcomes require better records than do less weighty decisions. Whereas a decision on whether to continue a project should not be made on the basis of data derived from partly unreliable records, data from the same records may suffice for a decision to change an administrative procedure.

EXHIBIT 6-F Public Shelter Utilization Among Homeless Adults in New York and Philadelphia

The cities of Philadelphia and New York have standardized admission procedures for persons requesting services from city-funded or -operated shelters. All persons admitted to the public shelter system must provide intake information for a computerized registry that includes the client’s name, race, date of birth, and gender, and must also be assessed for substance abuse and mental health problems, medical conditions, and disabilities. A service utilization study conducted by researchers from the University of Pennsylvania analyzed data from this registry for New York City for 1987-1994 (110,604 men and 26,053 women) and Philadelphia for 1991-1994 (12,843 men and 3,592 women).

They found three predominant types of users: (1) the chronically homeless, characterized by very few shelter episodes, but episodes that might last as long as several years; (2) the episodically homeless, characterized by multiple, increasingly shorter stays over a long period; and (3) the transitionally homeless, who had one or two stays of short duration within a relatively brief period of time.

The most notable finding was the size and relative resource consumption of the chronically homeless. In New York, for instance, 18% of the shelter users stayed 180 days or more in their first year, consuming 53% of the total number of system days for first-time shelter users, triple the days for their proportionate representation in the shelter population. These long-stay users tended to be older people and to have mental health, substance abuse, and, in some cases, medical problems.

SOURCE: Adapted by permission from Dennis P. Culhane and Randall Kuhn, “Patterns and Determinants of Public Shelter Utilization Among Homeless Adults in New York City and Philadelphia,” Journal of Policy Analysis and Management, 1998, 17(1):23-43. Copyright © 1998, John Wiley & Sons, Inc.

If program records are to serve an important role in decision making on far-reaching issues, it is usually desirable to conduct regular audits of the records. Such audits are similar in intent to those that outside accountants conduct on fiscal records. For example, records might be sampled to determine whether each target has a record, whether records are complete, and whether rules for completing them have been followed.

Surveys

An alternative to using program records to assess target participation is to conduct special surveys of program participants. Sample surveys may be desirable when the required data cannot be obtained as a routine part of program activities or when the size of the target group is large and it is more economical and efficient to undertake a sample survey than to obtain data on all the participants.

For example, a special tutoring project conducted primarily by parents may be set up in only a few schools in a community. Children in all schools may be referred, but the project staff may not have the time or the training to administer appropriate educational skills tests and other such instruments that would document the characteristics of the children referred and enrolled. Lacking such complete records, an evaluation group could administer tests on a sampling basis to estimate the appropriateness of the selection procedures and assess whether the project is serving the designated target population.

When projects are not limited to selected, narrowly defined groups of individuals but instead take in entire communities, the most efficient and sometimes the only way to examine whether the presumed population at need is being reached is to conduct a community survey. Various types of health, educational, recreational, and other human service programs are often community-wide, although their intended target populations may be selected groups, such as delinquent youths, the aged, or women of child-bearing age. In such cases, surveys are the major means of assessing whether targets have been reached.

The evaluation of the Feeling Good television program illustrates the use of surveys to provide data on a project with a national audience. The program, an experimental production of the Children’s Television Workshop (the producer of Sesame Street), was designed to motivate adults to engage in preventive health practices. Although it was accessible to homes of all income levels, its primary purpose was to motivate low-income families to improve their health practices. The Gallup organization conducted four national surveys, each of approximately 1,500 adults, at different times during the weeks Feeling Good was televised. The data provided estimates of the size of the viewing audiences as well as of the viewers’ demographic, socioeconomic, and attitudinal characteristics (Mielke and Swinehart, 1976). The major finding was that the program largely failed to reach the target group, and the program was discontinued.

To measure coverage of Department of Labor programs, such as training and public employment, the department started a periodic national sample survey. The Survey of Income and Program Participation is now carried out by the Bureau of the Census and measures participation in social programs conducted by many federal departments. This large survey, now a three-year panel covering 21,000 households, ascertains through personal interviews whether each adult member of the sampled households has ever participated or is currently participating in any of a number of federal programs. By contrasting program participants with nonparticipants, the survey provides information on the programs’ biases in coverage. In addition, it generates information on the uncovered but eligible target populations.

Assessing Bias: Program Users, Eligibles, and Dropouts

An assessment of bias in program participation can be undertaken by examining differences between individuals who participate in a program and either those who drop out or those who are eligible but do not participate at all. In part, the drop-out rate, or attrition, from a project may be an indicator of clients’ dissatisfaction with intervention activities. It also may indicate conditions in the community that militate against full participation. For example, in certain areas lack of adequate transportation may prevent those who are otherwise willing and eligible from participating in a program.

It is important to be able to identify the particular subgroups within the target population who either do not participate at all or do not follow through to full participation. Such information not only is valuable in judging the worth of the effort but also is needed to develop hypotheses about how a project can be modified to attract and retain a larger proportion of the target population. Thus, the qualitative aspects of participation may be important not only for monitoring purposes but also for subsequent program planning.

Data about dropouts may come either from service records or from surveys designed to find nonparticipants. However, community surveys usually are the only feasible means of identifying eligible persons who have not participated in a program. The exception, of course, is when adequate information is available about the entire eligible population prior to the implementation of a project (as in the case of data from a census or screening interview). Comparisons with either data gathered for project planning purposes or community surveys undertaken during and subsequent to the intervention may employ a variety of analytical approaches, from purely descriptive methods to highly complex models.

In Chapter 11, we describe methods of analyzing the costs and benefits of programs to arrive at measures of economic efficiency. Clearly, for calculating costs it is important to have estimates of the size of populations at need or risk, the groups who start a program but drop out, and the ones who participate to completion. The same data may also be used in estimating benefits. In addition, they are highly useful in judging whether a project should be continued and whether it should be expanded in either the same community or other locations. Furthermore, project staff require this kind of information to meet their managerial and accountability responsibilities. Although data on project participation cannot substitute for knowledge of impact in judging either the efficiency or the effectiveness of projects, there is little point in moving ahead with an impact analysis without an adequate description of the extent of participation by the target population.

6.4 Monitoring Organizational Functions

Monitoring of the critical organizational functions and activities of a program focuses on whether the program is performing well in managing its efforts and using its resources to accomplish its essential tasks. Chief among those tasks, of course, is delivering the intended services to the target population. In addition, programs have various support functions that must be carried out to maintain the viability and effectiveness of the organization, for example, fund-raising, promotion advocacy, and governance and management. Program process monitoring seeks to determine whether a program’s actual activities and arrangements sufficiently approximate the intended ones.

Once again, program process theory as described in Chapter 5 is a useful tool in designing monitoring procedures. In this instance, what we called the organizational plan is the relevant component. A fully articulated process theory will identify the major program functions, activities, and outputs and show how they are related to each other and to the organizational structures, staffing patterns, and resources of the program. This depiction provides a map to guide the evaluator in identifying the significant program functions and the preconditions for accomplishing them. Program process monitoring then becomes a matter of identifying and measuring those activities and conditions most essential to a program’s effective performance of its duties.

Service Delivery Is Fundamental

As mentioned earlier in this chapter, for many programs that fail to show impacts, the problem is a failure to deliver the interventions specified in the program design, a problem generally known as implementation failure. There are three kinds of implementation failures: First, no intervention, or not enough, is delivered; second, the wrong intervention is delivered; and third, the intervention is unstandardized or uncontrolled and varies excessively across the target population.

“Nonprograms” and Incomplete Intervention

Consider first the problem of the “nonprogram” (Rossi, 1978). McLaughlin (1975) reviewed the evidence on the implementation of Title I of the Elementary and Secondary Education Act, which allocated billions of dollars yearly to aid local schools in overcoming students’ poverty-associated educational deprivations. Even though schools had expended the funds, local school authorities were unable to describe their Title I activities in any detail, and few activities could even be identified as educational services delivered to schoolchildren. In short, little evidence could be found that school programs existed that were directed toward the goal of helping disadvantaged children.

The failure of numerous other programs to deliver services has been documented as well. Datta (1977), for example, reviewed the evaluations on career education programs and found that the designated targets rarely participated in the planned program activities. Similarly, an attempt to evaluate PUSH-EXCEL, a program designed to motivate disadvantaged high school students toward higher levels of academic achievement, disclosed that the program consisted mainly of the distribution of buttons and hortative literature and little else (Murray, 1980).

A delivery system may dilute the intervention so that an insufficient amount reaches the target population. Here the problem may be a lack of commitment on the part of a front-line delivery system, resulting in minimal delivery or “ritual compliance,” to the point that the program does not exist. 6-G, for instance, expands on an exhibit presented in Chapter 2 to describe the implementation of welfare reform in which welfare workers communicated little to clients about the new policies.

Wrong Intervention

The second category of program failure—namely, delivery of the wrong intervention—can occur in several ways. One is that the mode of delivery negates the intervention. An example is the Performance Contracting experiment, in which private firms that contracted to teach mathematics and reading were paid in proportion to pupils’ gains in achievement. The companies faced extensive difficulties in delivering the program at school sites. In some sites the school system sabotaged the experiments, and in others the companies were confronted with equipment failures and teacher hostility (Gramlich and Koshel, 1975).

Another way in which wrong intervention can result is when it requires a delivery system that is too sophisticated. There can be a considerable difference between pilot projects and full-scale implementation of sophisticated programs. Interventions that work well in the hands of highly motivated and trained deliverers may end up as failures when administered by staff of a mass delivery system whose training and motivation are less. The field of education again provides an illustration: Teaching methods such as computer-assisted learning or individualized instruction that have worked well within experimental development centers have not fared as well in ordinary school systems because teachers did not have sufficient computer skills.

The distinction made here between an intervention and its mode of delivery is not always clear-cut. The difference is quite clear in income maintenance programs, in which the “intervention” is the money given to beneficiaries and the delivery modes vary from automatic deposits in savings or checking accounts to hand delivery of cash to recipients. Here the intent of the program is to place money in the hands of recipients; the delivery, whether by electronic transfer or by hand, has little effect on the intervention. In contrast, a counseling program may be handled by retraining existing personnel, hiring counselors, or employing certified psychotherapists. In this case, the distinction between treatment and mode of delivery is fuzzy, because it is generally acknowledged that counseling treatments vary by counselor.

EXHIBIT 6-G On the Front Lines: Are Welfare Workers Implementing Policy Reforms?

In the early 1990s, the state of California initiated the Work Pays demonstration project, which expanded the state job preparation program (JOBS) and modified Aid to Families with Dependent Children (AFDC) welfare policies to increase the incentives and support for finding employment. The Work Pays demonstration was designed to “substantially change the focus of the AFDC program to promote work over welfare and self-sufficiency over welfare dependence.”

The workers in the local welfare offices were a vital link in the implementation of Work Pays. The intake and redetermination interviews they conducted represented virtually the only in-person contact that most clients had with the welfare system. This fact prompted a team of evaluators to study how welfare workers were communicating the Work Pays policies during their interactions with clients.

The evaluators reasoned that worker-client transactions appropriate to the policy would involve certain “information content” and “use of positive discretion.” Information content refers to the explicit messages delivered to clients; it was expected that workers would notify clients about the new program rules for work and earnings, explain opportunities to combine work and welfare to achieve greater self-sufficiency, and inform them about available training and supportive services. Positive discretion relates to the discretion workers have in teaching, socializing, and signaling clients about the expectations and opportunities associated with welfare receipt. Workers were expected to emphasize the new employment rules and benefits during client interviews and communicate the expectation that welfare should serve only as temporary assistance while recipients prepared for work.

To assess the welfare workers’ implementation of the new policies, the evaluators observed and analyzed the content of 66 intake or redetermination interviews between workers and clients in four counties included in the Work Pays demonstration. A structured observation form was used to record the frequency with which various topics were discussed and to collect information about the characteristics of the case. These observations were coded on the two dimensions of interest: (1) information content and (2) positive discretion.

The results, in the words of the evaluators:

In over 80% of intake and redetermination interviews workers did not provide and interpret information about welfare reforms. Most workers continued a pattern of instrumental transactions that emphasized workers’ needs to collect and verify eligibility information. Some workers coped with the new demand by providing information about work-related policies, but routinizing the information and adding it to their standardized, scripted recitations of welfare rules. Others were coping by particularizing their interactions, giving some of their clients some information some of the time, on an ad hoc basis.

These findings suggest that welfare reforms were not fully implemented at the street level in these California counties. Worker-client transactions were consistent with the processing of welfare claims, the enforcement of eligibility rules, and the rationing of scarce resources such as JOBS services; they were poorly aligned with new program objectives emphasizing transitional assistance, work, and self-sufficiency outside the welfare system. (pp. 18-19)

SOURCE: Adapted by permission from Marcia K. Meyers, Bonnie Glaser, and Karin MacDonald, “On the Front Lines of Welfare Delivery: Are Workers Implementing Policy Reforms?” Journal of Policy Analysis and Management, 1998, 17(1):l-22. Copyright © 1998, John Wiley & Sons, Inc.

Unstandardized Intervention

The final category of implementation failures includes those that result from unstandardized or uncontrolled interventions. This problem can arise when the design of the program leaves too much discretion in implementation to the delivery system, so that the intervention can vary significantly across sites. Early programs of the Office of Economic Opportunity provide examples. The Community Action Program (CAP) gave local communities considerable discretion in choosing among a variety of actions, requiring only “maximum feasible participation” on the part of the poor. Because of the resulting disparities in the programs of different cities, it is almost impossible to document what CAP programs accomplished (Vanecko and Jacobs, 1970).

Similarly, Project Head Start gave local communities funds to set up preschool teaching projects for underprivileged children. In its first decade, Head Start centers varied by sponsoring agencies, coverage, content, staff qualifications, objectives, and a host of other characteristics (Cicirelli, Cooper, and Granger,1969).Because there was no specified Head Start design, it was not possible to conclude from an evaluation of a sample of projects whether the Head Start concept worked. The only generalization that could be made was that some projects were effective, some were ineffective, and, among the effective ones, some were more successful than others. Only in the past decade has a degree of standardization been achieved to the point that in 2001 it was possible to design and start an evaluation that promises to provide estimates of the program’s effectiveness (Advisory Committee on Head Start Research and Evaluation, 1999).

The Delivery System

A program’s delivery system can be thought of as a combination of pathways and actions undertaken to provide an intervention. It usually consists of a number of separate functions and relationships. As a general rule, it is wise to assess all the elements unless previous experience with certain aspects of the delivery system makes their assessment unnecessary. Two concepts are especially useful for monitoring the performance of a program’s delivery system: specification of services and accessibility.

Specification of Services

A specification of services is desirable for both planning and monitoring purposes. This consists of specifying the actual services provided by the program in operational (measurable) terms. The first task is to define each kind of service in terms of the activities that take place and the providers who participate. When possible, it is best to separate the various aspects of a program into separate, distinct services. For example, if a project providing technical education for school dropouts includes literacy training, carpentry skills, and a period of on-the-job apprenticeship work, it is advisable to separate these into three services for monitoring purposes. Moreover, for estimating program costs in cost-benefit analyses and for fiscal accountability, it is often important to attach monetary values to different services. This step is important when the costs of several programs will be compared or when the programs receive reimbursement on the basis of the number of units of different services that are provided.

For program process monitoring, simple, specific services are easier to identify, count, and record. However, complex elements often are required to design an implementation that is consistent with a program’s objectives. For example, a clinic for children may require a physical exam on admission, but the scope of the exam and the tests ordered may depend on the characteristics of each child. Thus, the item “exam” is a service but its components cannot be broken out further without creating a different definition of the service for each child examined. The strategic question is how to strike a balance, defining services so that distinct activities can be identified and counted reliably while, at the same time, the distinctions are meaningful in terms of the program’s objectives.

In situations where the nature of the intervention allows a wide range of actions that might be performed, it may be possible to describe services primarily in terms of the general characteristics of the service providers and the time they spend in service activities. For example, if a project places master craftspersons in a low-income community to instruct community members in ways to improve their dwelling units, the craftspersons’ specific activities will probably vary greatly from one household to another. They may advise one family on how to frame windows and another on how to shore up the foundation of a house. Any monitoring scheme attempting to document such services could describe the service activities only in general terms and by means of examples. It is possible, however, to specify the characteristics of the providers—for example, that they should have five years of experience in home construction and repair and knowledge of carpentry, electrical wiring, foundations, and exterior construction—and the amount of time they spend with each service recipient.

Indeed, services are often defined in terms of units of time, costs, procedures, or products. In a vocational training project, service units may refer to hours of counseling time provided; in a program to foster housing improvement, they may be defined in terms of amounts of building materials provided; in a cottage industry project, service units may refer to activities, such as training sessions on how to operate sewing machines; and in an educational program, the units may be instances of the use of specific curricular materials in classrooms. All these examples require an explicit definition of what constitutes a service and, for that service, what units are appropriate for describing the amount of service.

Accessibility

Accessibility is the extent to which structural and organizational arrangements facilitate participation in the program. All programs have a strategy of some sort for providing services to the appropriate target population. In some instances, being accessible may simply mean opening an office and operating under the assumption that the designated participants will “naturally” come and make use of the services provided at the site. In other instances, however, ensuring accessibility requires outreach campaigns to recruit participants, transportation to bring persons to the intervention site, and efforts during the intervention to minimize dropouts. For example, in many large cities, special teams are sent out into the streets on very cold nights to persuade homeless persons sleeping in exposed places to spend the night in shelters.

A number of evaluation questions arise in connection with accessibility, some of which relate only to the delivery of services and some of which have parallels to the previously discussed topic of service utilization. The primary issue is whether program actions are consistent with the design and intent of the program with regard to facilitating access. For example, is there a Spanish-speaking staff member always available in a mental health center located in an area with a large Hispanic population?

Also, are potential targets matched with the appropriate services? It has been observed, for example, that community members who originally make use of emergency medical care for appropriate purposes may subsequently use them for general medical care. Such misuse of emergency services may be costly and reduce their availability to other community members. A related issue is whether the access strategy encourages differential use by targets from certain social, cultural, and ethnic groups, or whether there is equal access for all potential targets.

Program Support Functions

Although providing the intended services is presumed to be a program’s main function, and one essential to monitor, most programs also perform important support functions that are critical to their ability to maintain themselves and continue to provide service. These functions are of interest to program administrators, of course, but often they are also relevant to monitoring by evaluators or outside decisionmakers. Vital support functions may include such activities as fund-raising; public relations to enhance the program’s image with potential sponsors, decisionmakers, or the general public; staff training, including the training of the direct service staff; recruiting and retention of key personnel; developing and maintaining relationships with affiliated programs, referral sources, and the like; obtaining materials required for services; and general advocacy on behalf of the target population served.

Program process monitoring schemes can, and often should, incorporate indicators of vital program support functions along with indicators relating to service activities. In form, such indicators and the process for identifying them are no different than for program services. The critical activities first must be identified and described in specific, concrete terms resembling service units, for example, units of fund-raising activity and dollars raised, training sessions, advocacy events, and the like. Measures are then developed that are capable of differentiating good from poor performance and that can be regularly collected. These measures are then included in the program monitoring procedures along with those dealing with other aspects of program performance.

6.5 Analysis of Program Process Monitoring Data

Data, of course, are useful only when they have been appropriately analyzed. In general, the analysis of program process monitoring data addresses the following three issues: description of the program operation, comparison between sites, and conformity of the program to its design.

Description of the Program Operation

Assessing the extent to which a program as implemented resembles the program as designed depends on having a full and accurate description of how the program actually operates. A description derived from program process monitoring data would cover the following topics: estimates of coverage and bias in participation, the types of services delivered, the intensity of services given to participants of significant kinds, and the reactions of participants to the services delivered. Descriptive statements might take the form of narrative accounts, especially when monitoring data are derived from qualitative sources, or quantitative summaries in the form of tables, graphs, and the like.

Comparison Between Sites

When a program includes more than one site, a second question concerns differences in program implementation between the sites. Comparison of sites permits an understanding of the sources of diversity in program implementation and, ultimately, outcomes, such as differences in staff, administration, targets, or surrounding environments, and it also can facilitate efforts to achieve standardization.

Conformity of the Program to Its Design

The third issue is the one with which we began: the degree of conformity between a program’s design and its implementation. Shortfalls may occur because the program is not performing functions it is expected to or because it is not performing them as well as expected. Such discrepancies may lead to efforts to move the implementation of a project closer to the original design or to a respecification of the design itself. Such analysis also provides an opportunity to judge the appropriateness of performing an impact evaluation and, if necessary, to opt for more formative evaluation to develop the desired convergence of design and implementation.

discussion14

5.5 Possible Outcomes of Program Theory Assessment

6.5 Analysis of Program Process Monitoring Data

Description of the Program Operation

Comparison Between Sites

Conformity of the Program to Its Design

image1.jpeg

image2.jpeg

image3.jpeg

image4.jpeg

image5.jpeg