Colleague Response

profileStarsinhereyes
Chapter13ProgramEvaluationStudies.docx

Chapter 13

Program Evaluation Studies

TK Logan and David Royse

Pg 221

A variety of programs have been developed to address social problems such as drug addiction, homelessness, child abuse, domestic violence, illiteracy, and poverty. The goals of these programs may include directly addressing the problem origin or moderating the effects of these problems on individuals, families, and communities. Sometimes programs are developed to prevent something from happening such as drug use, sexual assault, or crime. These kinds of problems and programs to help people are often what all acts many social workers to the profession; we want to be part of the mechanism through which society provides assistance to those most in need. Despite low wages, bureaucratic red tape, and routinely uncooperative clients, we tirelessly provide services that are invaluable but also at various Limes may be or become insufficient or inappropriate. But without conducting evaluation, we do not know whether our programs are helping or hurting, that is, whether they only postpone the hunt for real solutions or truly construct new futures for our clients. This chapter provides an overview of program evaluation in general and outlines the primary considerations in designing program evaluations. Evaluation can be done informally or formally. We are constantly, as consumers, informally evaluating products, services, and information. For example, we may choose not to return to a store or an agency again if we did not evaluate the experience as pleasant.

Similarly, we may mentally take note of unsolicited comments or anecdotes from clients and draw conclusions about a program. Anecdotal and informal approaches such as these generally are not regarded as carrying scientific credibility. One reason is that decision biases play a role in our "informal" evaluation. Specifically, vivid memories or strongly negative or positive anecdotes will be overrepresented in our summaries of how things are evaluated. This is why objective data are necessary to truly understand what is or is not working. By contrast, formal evaluations systematically examine data from and about programs and their outcomes so that better decisions can be made about the interventions designed to address the related social problem. Thus, program evaluation involves the use of social research methodologies to appraise and improve the ways in which human services, policies, and programs are conducted. Formal evaluation, by its very nature, is applied research. Formal program evaluations attempt to answer the following general question: Does the program work? Program evaluation may also address questions such as the following: Do our clients get better? How does our success rate compare to those of other programs or agencies? Can the same level of success be obtained through less expensive means?

Pg 222

What is the experience of the typical client? Should this program be terminated and its funds applied elsewhere? Ideally, a thorough program evaluation would address more complex questions in three main areas: (1) Does the program produce the intended outcomes and avoid unintended negative outcomes? (2) For whom does the program work best and under what conditions? and (3) How well was a program model developed in one setting adapted to another setting? Evaluation has taken an especially prominent role in practice today because of the focus~ on evidence-based practice in social programs. Social work, as a profession, has been asked to use evidence-based practice as an ethical obligation (Kessler, Gira, & Poertner, 2005). Evidence-based practice is defined differently, but most definitions include using program evaluation data to help determine best practices in whatever area of social programming is being considered. In other words, evidence-based practice includes using objective indicators of success in addition to practice or more subjective indicators of success. Formal program evaluations can be found on just about every topic. For instance, Fraser, Nelson, and Rivnrd (1997) have examined the effectiveness of family preservation services; Kirby, Korpi, Adivi, and Weissman (1997) have evaluated an AIDS and pregnancy prevention middle school program. Morrow-Howell, Beeker-Kemppainen, and Judy (1998) evaluated an intervention designed to reduce the risk of suicide in elderly adult clients of a crisis hotline. Richter, Snider, and Gorey ( 1997) used a quasi-experimental design to study the effects of a group work intervention on female survivors of childhood sexual abuse. Leukefeld and colleagues ( 1998) examined the effects of an HIV prevention intervention with injecting drug and crack users. Logan and colleagues (2004) examined the effects of a drug court intervention as well as the costs of drug court compared with the economic benefits of the drug court program.

Basic Evaluation Considerations

Before beginning a program evaluation, several issues must be initially considered. These issues are decisions 1 hat are critical in determining the evaluation methodology and goals. Although you may not have complete answers to these questions when beginning to plan an evaluation, these questions help in developing the plan and must be answered before an evaluation can be carried out. We can 1.um up these considerations with the following questions: who, what, where, when, and why. First, who will do the evaluation? This seems like a simple question at first glance. However, this particular consideration has major implications for the evaluation results. Program evaluators can be categorized as being either internal or external. An internal evaluator is someone who is a program staff member or regular agency employee, whereas an external evaluator is a professional, on contract, hired for the specific purpose of evaluation. There are advantages and disadvantages to using either type of evaluator. For example, the internal evaluator probably will be very familiar with the staff and the program. This may save a lot of planning time. The disadvantage is that evaluations completed by an internal evaluator may be considered less valid by outside agencies, including the funding source. The external evaluator generally is thought to be less biased in terms of evaluation outcomes because he or she has no personal investment in the program. One disadvantage is that an external evaluator frequently is viewed as an "outsider" by the staff within an agency. This may affect the amount of time necessary)' to conduct the evaluation or cause problems in the overall evaluation if agency staff are reluctant to cooperate.

Page 223

Second, what resources are available to conduct the evaluation? Hiring an outside evaluator can be expensive, while having a staff person conduct the evaluation may be less expensive. So, in a sense, you may be trading credibility for less cost. In fact, each methodological decision will have a trade-off in credibility, level of information, and resources (including time and money). Also, the amount and level of information as well as the research design will be determined, to some e11."1ent, by what resources are available. A comprehensive and rigorous evaluation does take significant resources. Third, where will the information come from? If an evaluation can be done using existing data, the cost will be lower than if data must be collected from numerous people such as clients and/or staff across multiple sites. So having some sense of where the data will come from is important. Fourth, when is the evaluation information needed? In other words, what is the timeframe for the evaluation? The timeframe will affect costs and design of research methods. Fifth, why is the evaluation being conducted? Is the evaluation being conducted at the request of the funding source? Is it being conducted to improve services? Is it being conducted to document the cost-benefit trade-off of the program? If future program funding decisions will depend on the results of the evaluation, then a lot more importance will be attached to it than 1f a new manager simply wants to know whether clients were satisfied with services. The more that is riding on an evaluation, the more attention will be given to the methodology and the more threatened staff can be, especially if they think that the purpose of the evaluation is to downsize and trim excess employees. In other words, there are many reasons an evaluation is being considered, and these reasons may have implications for the evaluation methodology and implementation. Once the issues described above have been considered, more complex questions and trade-offs will be needed in planning the evaluation. Specifically, six main issues guide and shape the design of any program evaluation effort and must be given thoughtful and deliberate consideration.

L Defining the goal of the program evaluation

2. Understanding the level of information needed for the program evaluation

3. Determining the methods and analysis that need to be used for the program evaluation

4. Considering issues that might arise and strategies to keep the evaluation on course

5. Developing results into a useful format for the program stakeholders

6. Providing practical and useful feedback about the program strengths and weaknesses as well as providing information about next steps

Defining the Goal of the Program Evaluation

It is essential that the evaluator has a firm understanding of the short- and long-term objectives of the evaluation. Imagine being hired for a position but not being given a job description or informed about how the job fits into the overall organization. Without knowing why an evaluation is called for or needed, the evaluator might attempt to answer a different set of c.1uestions from those of interest to the agency director or advisory board. The management might want to know why the majority of clients do not return after one or two visits, whereas the evaluator might think that his or her task is to determine

Page 224

whether clients who received group therapy sessions were better off than clients who received individual counseling. In defining the goals of the program evaluation, several steps should be taken. First, the program goals should be examined. These can be learned through examining official program documents as well as through talking to key program stakeholders. In clarifying the overall purpose of the evaluation, it is critical to talk with different program "stakeholders." Scriven ( 1991) defines a program stakeholder as "one who has a substantial ego, credibility, power, futures, or other capital invested in the program . . .. This includes program staff and many who aew not actively involved in the day-to-day operations" (p. 334). Stakeholders include both supporters and opponents of the program as well as program clients or consumers or even potential consumers or clients. lt is essential that the evaluator obtain a variety of different views about the program. By listening and considering stakeholder perspectives, the evaluator can ascertain the most important aspects of the program to target for the evaluation by looking for overlapping concerns, questions, and comments from the various stakeholders. However, it is important that the stakeholders have some agreement on what program success means. Otherwise, it may be difficult to conduct a satisfactory evaluation. It is also important to consult the extant literature to understand what similar programs have used to evaluate their outcomes as well as to understand the theoretical basis of the program in defining the program evaluation goals. Furthermore, it is critical that the evaluator works closely with whoever initiated the evaluation to set priorities for the evaluation. This process should identify the intended outcomes of the program and which of those outcomes, if not all of them, will be evaluated. Taking the evaluation, a step further, it may be important to include the examination of unintended negative outcomes that may result from the program. Stakeholders and the literature will also help to determine those kinds of outcomes. Once the overall purpose and priorities of the evaluation are established, it is a good idea to develop a written agreement, especially if the evaluator is an external one. Misunderstandings can and will occur months later if things are not written in black and white.

Understanding the Level of Information Needed for the Program Evaluation

The success of the program evaluation revolves around the evaluator's ability to develop practical, researchable questions. A good rule to follow is to focus the evaluation on one or two key questions. Too many questions can lengthen the process and overwhelm the evaluator with too much data that, instead of facilitating a decision, might produce inconsistent findings. Sometimes, funding sources require only that some vague undefined type of evaluation is conducted. The funding sources might neither expect nor desire dissertation-quality research; they simply might expect "good faith" efforts when beginning evaluation processes. Other agencies may be quite demanding in the types and forms of data to be provided. Obviously, the choice of methodology, data collection procedures, and reporting formats will be strongly affected by the purpose, objectives, and questions examined in the study. It is important to note the difference between general research and evaluation. In research, the investigator often· focuses on questions based on theoretical considerations or hypotheses generated to on build research in a specific area of study. Although

Page 225

program evaluations may focus on an intervention derived from a theory, the evaluation questions should, first and foremost, be driven by the program's objectives. The evaluator is less concerned with building on prior literature or contributing to the development of practice theory than with determining whether a program worked in a specific community or location. There are actually two main types of evaluation questions. There are questions that focus on client outcomes, such as, "What impact did the program have?" These kinds of questions are addressed by using outcome evaluation methods. Then there are questions that ask, "Did the program achieve its goals?" "Did the program adhere to the specified procedures or standards?" or "what was learned in operating this program?" These kinds of questions are addressed by using process evaluation methods. We will examine both of these two types of evaluation approaches in the following sections.

Process Evaluation

Process evaluations offer a "snapshot" of the program at any given time. Process evaluations typically describe the day-to-day program efforts; program modifications and changes; outside events that influenced the program; people and institutions involved; culture, customs, and traditions that evolved; and sociodemographic makeup of the clientele (Scarpitti, Inciardi, & Pottieger, 1993). Process evaluation is concerned with identifying program strengths and weaknesses. This level of program evaluation can be used in several ways, including providing a context within which to interpret program outcomes and so that other agencies or localities wishing to start similar programs can benefit without having to make the same mistakes. As an example, Bentelspacher, DeSilva, Goh, and LaRowe (1996) conducted a process evaluation of the cultural compatibility of psychoeducational family group treatment with ethnic Asian clients. As another example, Logan, Williams, Leukefeld, and Minton (2000) conducted a detailed process evaluation of the drug court programs before undertaking an outcome evaluation of the same programs.

The Logan et al. study used multiple methods to conduct the process evaluation, including .in-depth interviews with the program administrative personnel, interviews with each of five judges involved in the program, surveys and face-to-face interviews with 22 randomly selected current clients, and surveys of all program staff, 19 community treatment provider representatives, 6 randomly selected defense attorney representatives, 4 prosecuting attorney representatives, l representative 6:om the probation and parole office, 1 representative from the local county jail, and 2 police department representatives. In all, 69 different individuals representing I 0 different agency perspectives provided information about the drug court program. Also, all agency documents were examined and analyzed, observations of various aspects of the program process were conducted, and client intake data were analyzed as part of the process evaluation. The results were all integrated and compiled into one comprehensive report. What makes a process evaluation so important is that researchers often have relied only on selected program outcome indicators such as termination and graduation rates or number of rearrests to determine effectiveness. However, to better understand how and why a program such as drug court is effective, an analysis of how the program was conceptualized, implemented, and revised is needed. Consider this exan1ple-say one outcome evaluation of a drug court program showed a graduation rate of 80% of those who began the program, while another outcome evaluation found that only 40o/o of those who began the program graduated. Then, the graduates of the second program were more likely to be free from substance use and criminal behaviors at the l2-month follow-up than the graduates

Page 226

from the first program. A process evaluation could help to explain the specific differences in factors such as selection (how clients get into the programs), treatment plans, monitoring, program length, and other program features that may influence how many people graduate and slay free from drugs and criminal behavior at follow-up. Tn other words, a process evaluation, in contrast to an examination of program outcome only, can provide a clearer and more comprehensive picture of how drug court affects those involved in the program. More specifically, a process evaluation can provide information about program aspects that need to be improved and those that work well (Scarpilli, Inciardi, & Pottieger, 1993). Finally, a process evaluation may help to facilitate replication of the drug court program in other areas. This often is referred to as technology transfer. A different but related process evaluation goal might be a description of the failures and departures from the way in which the intervention originally was designed. How were the staff trained and hired? Did the intervention depart from the treatment manual recommendations? Influences that shape and affect the intervention that clients receive need to be identified because they affect the fidelity of the treatment program (e.g., delayed funding or staff hires, changes in policies or procedures). \"/hen program implementation deviates significantly from what was intended, this might be the logical explanation as to why a program is not working.

Outcome or Impact Evaluation

Outcome or impact evaluation focuses on the targeted objectives of the program, often looking at variables such as behavior change. For example, many drug treatment programs may measure outcomes or "success" by the number of clients who abstain from drug use. Questions always arise, though. For instance, an evaluation might reveal that 90% of those who graduate from the program abstain from drug use 30 days after the program was completed. However, only 50% report abstaining from drug use 12 months after the program was completed. Would key stakeholders involved all consider that a success or failure of the program? This example brings up three critical issues in outcome evaluations. One of the critical issues in outcome evaluations is related to understanding for whom docs the program work best and under what conditions. In other words, a more interesting and important question, rather than just asking whether a program works, would be to ask, "Who are those 50% of people who remained abstinent from drug use 12 months after completing the program, and how do they differ from the 50% who relapsed?" It is not unusual for some evaluation questions to need a combination of both process and impact evaluation methodologies. For example, if it turned out that results of a particular evaluation showed that the program was not effective (impact), then it might be useful to know why it was not effective (process).

In such cases, it would be important to know how the program was implemented, what changes were made in the program during the implementation, what problems were experienced during the implementation, and what was done to overcome those problems. Another important issue in outcome evaluation has to do with the timing of measuring the outcomes. Outcome effects are usually measured after treatment or postintervention. These effects may be either short term or long term. immediate outcomes, or those generally measured at the end of the treatment or intervention, might or might not provide the same results as one would get later in a 6- or 12-month follow-up, as highlighted in the example above. The third important issue in outcome evaluation has to do with what specific measures were used. Is abstinence, for example, the only measure of interest, or is reduction in use something that might be of interest? Refraining from criminal activity or holding a steady

Page 227

job may also be an important goal of a substance abuse program. If we only measure abstinence, we would never know about other kinds of outcomes the program may affect. These last two issues in outcome evaluations have to do with the evaluation methodology and analysis and are addressed in more detail below.

Determining the Methods and Analysis That Need to Be Used for the Program Evaluation

The next step in the evaluation process is to determine the evaluation design. There are several interrelated steps in this process, including determining the (a) sources of data, (h) research design, (c) measures, (d) analysis of change, and (e) cost-benefit assessment of the program.

Sources of Data

Several main sources of data can be used for evaluations, including qualitative information and quantitative information.

Qualitative Data Sources

Qualitative data sources are often used in process evaluations and might include observations, analysis of existing program documents such as policy and procedure manuals, in-depth interview data, or focus group data. There are, however, trade-offs when using qualitative data sources. On the positive side, qualitative evaluation data provide an "in-depth" snapshot of various topics such as how the program functions, what staff think are the positive or negative aspects of the programs, or what clients really think of the overall program experiences. Reporting clients' experiences in their own words is a characteristic of qualitative evaluations. Interviews arc good for collecting qualitative or sensitive data such as values and attitudes. This method requires an interview protocol or questionnaire. These usual!) are structured so that respondents are asked questions in a specific order, but they can be semi structured so that there are fewer topics, and the interviewer has the ability to change the order based on a "reading" of the client's responses. Surveys can request information of clients by mail, by telephone, or in person.

They may or may not be self-administered. So, besides considering what data are desired, evaluators must be concerned with pragmatic considerations regarding the best way in which to collect the desired data. Pocus groups also offer insight into certain aspects of the program or program functioning; participants add their input, and input is interpreted and discussed by other group members. This discussion component may provide an opportunity to uncover information that might otherwise remain undiscovered such as the meaning of certain things to different people. Focus groups typically are small informal groups of persons asked a series of questions that start out very general and then become more specific. Focus groups are increasingly being used to provide evaluative information about human services. They work particularly well in identifying the questions that might be important to ask in a survey, in testing planned procedures or the phrasing of items for the specific target population, and in exploring possible reactions to an intervention or a service.

Page 228

On the other hand, qualitative studies Lend to use small samples, and care must be used in analyzing and interpreting the information. Furthermore, although both qualitative and quantitative data are subject to method bias and threats to validity, qualitative data may be more sensitive to bias depending on how participants are selected to be interviewed, the number of observations or focus groups, and even subtleties in the questions asked. With qualitative approaches, the evaluator often has less ability to account for alternative explanations because the data are more limited. Making strong conclusions about representativeness, validity, and reliability is more difficult with qualitative data compared to something like an average rating of satisfaction across respondents (a quantitative measure). Yet, an average rating does not tell us much about why participants are satisfied with the program or why they may be dissatisfied with other aspects of the program. Thus, it is often imperative to use a mixture of qualitative and quantitative information to evaluate a program.

Quantitative Data Sources

Two main types of quantitative data sources can be used for program evaluations: secondary data and original data.

Secondary Data. One option for obtaining needed data is to use existing data. Collecting new data often is more expensive than using existing data. Examining the data on hand and already available always is a good llrst step. However, the evaluator might want to rearrange or reassemble the data, for example, dividing it by quarters or combining it into 12-month periods that help to reveal patterns and trends over time. Existing data can come from a variety of places, including the following:

Client records maintained by the program: These may include a host of demographic and service-related data items about the population served.

Program expense and financial data: These can help the evaluator to determine whether one intervention is much more expensive than another.

Agency annual reports: These can be used to identify trends in service delivery and program costs. The evaluator can compare annual reports from year to year and can develop graphs to easily identify trends with clientele and programs.

Databases maintained by the state health department and other state agencies. Public data such as births, deaths, and divorces are available from each state. Furthermore, most state agencies produce annual reports that may reveal the number of clients served by program, geographic region, and on occasion, selected sociodemographic variables (e.g., race or age).

Local and regional agencies. Planning boards for mental health services, child protection, school boards, and so forth may be able to furnish statistics on outpatient and inpatient services, special school populations, or child abuse cases.

The federal government. The federal government collects and maintains a large amount of data on many different issues and topics. State and national data provide benchmarks for comparing local demographic or social indicators to national-level demographic or social indicators. For instance, if you were working as a cancer educator whose objective is to reduce the incidence of breast cancer, you might want to consult cancercontrolplanct.cancer.gov. That Web site will furnish national-, state-, and

Page 229

county-level data on the number of new cancer cases and deaths. By comparison, it will be possible to determine if the rate in one county is higher than the state or national average. Demographic information about communities can be found at www.census.gov.

Foundations. Certain well-established foundations provide a wealth of information about problems. For example, the Annie E. Casey Foundation provides an incredible Kids Count Data Book that provides an abundance of child welfare-related data at the state, national, and county level. By using their data, you could determine if infant mortality rates were rising, teen births were increasing, or high school dropouts were decreasing. You can find the Web site at www.aecf.org.

lf existing data cannot be used or cannot answer all of the evaluation questions, then original data must be collected.

Original Data Sources. There are many types or evaluation designs (from which to choose, and no single one will be ideal for every project. The specific approach chosen for the evaluation will depend on the purpose of the evaluation, the research questions to be explored, the hoped-tor or intended results, the quality and volume of data available or needed, and staff, time, and financial resources. The evaluation design is a critical decision for a number of reasons. Without the appropriate evaluation design, confidence in the results of the evaluation might be lacking. A strong evaluation design minimizes alternative explanations and assists the evaluator in gauging the true effects attributable to the intervention. In other words, the evaluation design directly affects tl1e interpretation that can be made regarding whether an intervention should be viewed as the reason for change in clients' behavior. However, there are tradeoffs with each design in the credibility of information, causality of an)' observed changes, and resources. These trade-offs must be carefully considered and discussed with the program staff. Quantitative designs include surveys, pretest-posttest studies, quasi-experiments with nonequivalent control groups, longitudinal designs, and randomized experimental designs. Quantitative approaches transform answers to specific questions into numerical data. Outcome and impact evaluations nearly always are based on quantitative evaluation designs. Also, sampling strategies must be considered as an inregr<1l p<1rt of the research design. Below is a brief overview of the major types of quantitative evaluation designs. For an expanded discussio11 or these topics, refer Lo Royse, Thyer, Padgell, and Logan (2005).

Research Design

Cross-Sectional Surveys

A survey is limited to a description of a sample at one point in time and provides us with a "snapshot" of a group of respondents and what they were like or what knowledge or attitudes they held at a particular point in time. If the survey is to generate good generalizable data, then the sampling procedures must be carefully planned and implemented. A cross-sectional survey requires rigorous random sampling procedures to ensure that the sample closely represents the population of interest. A repeated survey is similar to a cross-sectional study but collects information at two or more points in time from the same respondents. A repeated (longitudinal) survey is effective at measuring changes in facts, attitudes, or opinions over a course of Lime.

Pretest-Posttest Designs (Nonexperimental)

Perhaps the most common quantitative evaluation design used in social and human service agencies is the pretest-posttest. In this design, a group of clients with some specific problem or diagnosis (e.g., depression) is administered a pretest prior to the start of intervention. At some point toward the end or after the intervention, the same instrument is administered to the group a second time (the posttest). The one-group pretest-posttest design can measure change, but the evaluator has no basis for attributing change solely to the program. Confidence about change increases and the design strengthens when control groups are added and when participants are randomly assigned to either a control or experimental condition.

Quasi-Experimental Designs

Also known as nonequivalent control group designs, quasi-experiments generally use comparison groups whereby two similar groups are selected and followed for a period of time. One group typically receives some program or benefit, whereas the other group (the control) does nol. Both groups are measured and compared for any differences at the end of some time period. Participants used as controls may be clients who are on a waiting list, those who are enrolled in another treatment program, or those who live in a different city or county. The problem with this design is that the control or comparison group might not, in fact, be equivalent to the group receiving the intervention. Comparing Ocean View School to Inner City School might not be a fair comparison. Even two different schools within the same rural county might be more different than similar in terms of the learning milieu, the proportion of students receiving free lunches, the number of computers and books in the school library, the principal's hiring practices, and the like. With this design, there always is the possibility that whatever the results, they might have been obtained because the intervention group really was different from the control group. However, many of these issues can be considered and either controlled for by collecting the information and performing statistical analysis with these considerations or at least can be considered within the contex1: of interpreting the results. Even so, this type of study does not provide proof of cause and effect, and the evaluator always must consider other factors (both known and measured and unknown or unmeasured) that could have affected the study's outcomes.

Longitudinal Designs

Longitudinal designs are a type of quasi-experimental design that involves tracking a particular group of individuals over a substantial period of time to discover potential changes due to the influence of a program. It is not uncommon for evaluators to want to know about the effects of a program after an extended period of Lime has passed. The question of interest is whether treatment effects last. These studies typically are complicated and expensive in time and resources. In addition, the longer a study runs, the higher the expected rate of attrition from clients who drop out or move away. High rates of attrition can bias the sample.

Randomized Experimental Designs

ln a true experimental design, participants are randomly assigned to either the control or treatment group. This design provides a persuasive argument about causal effects of a program on participants. The random assignment of respondents lo treatment and control groups helps to ensure both groups are equivalent across key variables such as age, race, area of residency, and treatment history. This design provides the best evidence that

Page 231

any observed differences between the tl'IO groups after the intervention can be attributed to the intervention, assuming the two groups were equal before the intervention. E\·en with random assignment, group differences preinLervention could exist, and the evaluator should carefully look for them and use statistical controls when necessary. One word of warning about random assignment is that key program stakeholders often view random assignment as unethical, especially if they view the treatment program as beneficial. One outcome of this diffkulty of accepting random assignment is that staff might have problems not giving the intervention they believe is effective to specific needy clients or to all of their clients instead of just to those who were randomly assigned. If they do succumb to this temptation, then the evaluation effort can be unintentionally sabottlged. The evaluator must train and prepare all of those individuals involved in the e\'aluation to help them understand the purpose and importance of the random assignment. That, more than any other procedure, provides the evidence that the treatment really does benefit the clients.

Sampling Strategies and Considerations

When the client population of interest is too large to obtain information from each individual member, a sample is drawn. Sampling allows the evaluator to make predictions about a population based on study findings from a set of cases. Sampling strategies can be very complex. lf the evaluator needs the type of precision afforded by a probability sample in which there is a known level of confidence and margin of error (e.g., 95% confidence, plus or minus 3 percentage points), then he or she might need to hire a sampling consultant. A consultant is particularly recommended when the decisions about the program or intervention are critical such as in drug research or when treatments could have potentially harmful side effects. However, there is a need to recognize the trade-offs that are made when determining sampling strategy and sample size. Large samples can be more accurate than smaller ones, yet they usually are much more expensive. Small samples can be acceptable if a big change or effect is epected As a rule, the more critical the decision, the larger (and more precise) the sample should he. There are two main categories of sampli11g strategies from which the evaluator can choose: probability sampling and nonprobability sampling. Probability sampling imposes statistical rules to ensure that unbiased samples are drawn.

These samples normally are used for impact studies. Nonprobability or convenience sampling is less complicated to implement and is less expensive. This type of sampling often is used in process evaluations. With probability sampling, the primary idea is that every individual, object, or institution in the population under study has a chance of being selected into the sample, and the likelihood of the selection of any individual is known. Probability sampling provides a firm basis for generalizing from the sample lo the population. No11probability samples severely reduce the evaluator's ability to generalize the results of the study to the larger population. The evaluator must balance the need for scientific rigor against convenience and often limited resources when determining sample size. If a major decision is being based on data collected, then precision and certainty are critical. Statistical precision increases as the sample size increases. When differences in the results are expected to be small, a larger sample guards against confounding variables that might distort the results of a treatment.

Measures

The next important method decision is to determine how best Lo measure the variables of interest needed to answer the evaluation questions. These will vary from evaluation to

Page 232

evaluation, depending on the questions being asked. In one project, the focus might be on the outcome variable of arrests (or rearrests) so as to determine whether the program reduced criminal justice involvement. In another project, the outcome variable might be number of hospitalizations or days of hospitalization. Once there is agreement on the outcome variables, objective measures for those variables must be determined. Using the example of the drug court program above, the decisions might include the following: How will abstinence be measured? How will reduction in substance use be measured? How will criminal 1 behavior be measured? How will employment be measured? This may seem simple at first glance, but there are two complicating factors. First, there are a variety of ways to measure something as simple as abstinence. One could measure it by self-report or by actually giving the client a drug test. When looking at reduction of use, the issue of measurement becomes a bit more complicated. This will likely need to be self-report and some kind of comparison (either the same measures must be used with the same clients before and after the program [this being the best way) or the same measure must be used with a control group of some kind [like program dropouts)). The second complicating factor in measurement is determining what other constructs need to be included to better understand "who benefits from the program the most and under what circumstances" and how those constructs are measured. Again, using the drug court program as an example, perhaps those clients who are most depressed, have the most health problems, or have the most anxiety do worse in drug court programs because the program may not address co-occurring disorders. If this is the case, then it will be important to include measures of depression, anxiety, and health. However, there are many different measures for each of these constructs, and different measures use different timeframes as points of reference.

In other words, some depression measures ask about 12-month periods, some ask about 2-week periods, and some ask about 30-day periods. ls one instrument or scale better than another for measuring depression? ·what are the trade-offs relative to shorter or longer instruments? (For example, the most valid instrument might be so long that clients will get fatigued and refuse to complete it.) Is it better to measure a reduction in symptoms associated with a standardized test or to employ a behavioral measure (e.g., counting the number of days that patients with chronic mental illness are compliant with taking their medications)? Is measuring attitudes about drug abuse better than measuring knowledge about the symptoms of drug addiction? Evaluators frequently have to struggle with decisions such as these and decide whether it is better to use instruments that are not "perfect" or to go to the trouble of developing and validating new ones. When no suitable instrument or available data exist for the evaluation, the evaluator might have to create a new scale or at least modify an existing one.

If an evaluator revises a previously developed measure, then he or she has the burden of demonstrating that the newly adapted instrument is reliable and valid. Then, there are issues such as the reliability of data obtained from clients. Will clients be honest in reporting actual drug and alcohol use? How accurate are their memories? A note must be made here about a special case of program evaluation: evaluating prevention programs. Evaluation of prevention programs is especially challenging because the typical goal of a prevention program is to prevent a particular problem or behavior from developing. The question then becomes, "How do you measure something that never occurs?" In other words, if the prevention program is successful, the problem will not develop, but it is difficult to determine with any certainty that the problem would have developed in the first place absent the prevention program. Thus, measures become very important as well as the design (such as including a control group). Evaluators use a multitude of methods and instruments to collect data for their studies. A good strategy is to include multiple measures and methods if possible, especially

Page 233

when random assignment is not possible. That way, one can possibly look for convergence of conclusions across methods and measures.

Analysis of Change

After the data are collected, the evaluator is faced with a sometimes difficult question of how to determine whether change had occurred. And, of course, there are several considerations within this overall decision as well One of the first issues to be decided is what the unit of analysis will be. The unit of analysis refers to the person or things being studied or measured in the evaluation of a program. Typically, the basic unit of analysis consists of individual clients but also may be groups, agencies, communities, schools, or even slates. For example, an evaluator might examine the effectiveness of a drug prevention program by looking for a decrease in drug-related suspensions or disciplinary actions in high schools in which the program was implemented. In that instance, schools are the primary unit of analysis. Another evaluator might be concerned only with the attitudes toward drugs and alcohol of students in one middle school; in that situation, individuals would be the unit of analysis. The smallest unit of analysis from which data are gathered often is referred to as a case. The unit of analysis is critical for determining both the sampling strategy and the data analysis. The analysis will also be determined by the research design such as the number of groups to be analyzed, the type of dependent variable (categorical vs. continuous), the control variables that need to be included, and whether the design is longitudinal. The literature on similar program evaluations is also usefully examine so that analysis plans can consider what has been done in the past. The analysis phase of the evaluation is basically the end product of the evaluation activities. Therefore, a careful analysis is critical to the evaluation, the interpretation of the results, and the credibility of the results. Analysis should be conducted by somebody with adequate experience in statistical methods and statistical assumptions, and limitations of the study should be carefully examined and explained to program stakeholders.

Cost-Benefit Analysis

While assessing program outcomes is obviously necessary to gauge the effectiveness of a program, a more comprehensive understanding of program "success" can be attained by examining program costs and economic benefits. In general, economic costs and benefits associated with specific programs have received relatively limited attention. One of the major challenges in estimating costs of some community-based social programs is that standard cost estimation procedures do not always reflect the true costs of the program. For example, a drug court program often combines both criminal justice supervision and substance abuse treatment community-based environment. And in order for drug court programs to work effectively, they often use many community and outside agency resources that are not necessarily directly paid for by the program. For example, although the drug court program may not directly pay for the jail time incurred as part of client sanctions, jail time is a central component in many drug court programs. Thus, jail costs must be considered a drug court program cost. A comprehensive economic cost analysis would include estimates of the value of all resources used in providing the program. When resources are donated or subsidized, the out-of-pocket cost will differ from the opportunity cost of the resources for a given program. Opportunity costs take into account the forgone value of an alternative use for program resources. Other examples of opportunity costs for the drug court program may include the time and efforts of judges, police officers, probation officers, and prosecutors.

Page 234

Including costs for which the program may not explicitly pay presents an interesting dilemma. The dilemma primarily stems from the trade-off in presenting only out-of pocket expenditures for a program (thus the program will have a lower total cost) or accurately reflecting all of the costs associated with the program regardless of whether those costs are paid out of pocket (implying a higher total program cost). furthermore, when agencies share resources (e.g., shared overhead costs), the correct proportion of these resources that are devoted specifically to a program must be properly specified. To date, there has been liule discussion in the literature about estimating the opportunity cost of programs beyond the out of pocket costs. Knowing which costs to include and what value to place on certain services or items that are not directly charged Lo the program can be complicated. A comprehensive analysis of economic benefits also presents challenges. The goal of an economic benefit analysis is to determine the monetary value of changes in a range of program outcomes, mainly derived from changes in client behavior as a result of participating in the program. When estimating the benefits of a program such as drug court, one of the most obvious and important outcomes is the reduction in criminal justice costs (e.g., reduced incarceration and supervision), and these are traditionally the only sources of benefits examined in many drug court evaluations. However, drug court programs often have a diverse set of goals in addition to reducing criminal justice costs. For example, drug court programs often focus on helping the participants become more productive in society. This includes helping participants take responsibility for their financial obligations such as child support. In addition, employment is often an important program goal for drug court clients. If the client is working, he or she is paying taxes and is less likely to use social welfare programs. Thus, the drug court program potentially reduces several different categories of costs that might have accrued had program participants not received treatment. These "avoided" costs or benefits are important components to a full economic evaluation of drug court programs. So, although the direct cost of the program usually is easily computed, the full costs and the benefits are more difficult to convert into dollars. For example, Logan et al. (2004) fo und that the average direct cost per chug court treatment episode for a graduate was $3,319, whereas the opportunity cost per episode was $5,132.

These differences in costs due to agency collaboration highlight the importance of clearly defining the perspective of the cost analysis. As discussed earlier, the trade-off in presenting only out-of-pocket expenditures for a program or accurately reflecting all of the costs associated with the program regardless of whether those costs are paid out of pocket is an important distinction that should be considered at the outset of every economic evaluation. On the benefit side of the program, results suggest that the net economic benefit was 514,526 for each graduate of the program. In other words, this translates to a return of $3.83 in economic benefit for every dollar invested in the drug court programs for graduates. Obviously, those who dropped out of the program before completing did not generate as large of a return. However, results suggest that when both graduates and terminators were examined together, the net economic benefit of any drug court experience amounted to $5,446 per participant. This translates to a return of $2.71 in economic benefit for every dollar invested in the drug court programs. When looking <~l the cost-benefit analysis of programs or comparing these costs and benefits across programs, it is important to keep in mind that cost-benefit analysis may be done very differently, and a careful assessment of the methods must be undertaken to ensure comparability across programs.