Discussion 5.1

profileLDots01
TOWARDAMOREPUBLICDISCUSSION.pdf

824 / Professional Practice

ment scholars in dialogues on important and timely issues involving the role, char- acter, and contributions of social science research and scholarship in policymak- ing and public management.

TOWARD A MORE PUBLIC DISCUSSION OF THE ETHICS OF FEDERAL SOCIAL PROGRAM EVALUATION

Jan Blustein

Abstract

Federal social program evaluation has blossomed over the past quarter century. Despite this growth, there has been little accompanying public debate on research ethics. This essay explores the origins and the implications of this relative silence on ethical matters. It reviews the federal regulations that generally govern research ethics, and recounts the history whereby the evaluation of federal programs was specifically exempted from the purview of those regulations. Through a discussion of a recent evaluation that raised ethical concerns, the essay poses—but does not answer—three questions: (1) Are there good reasons to hold federal social program evaluations to different standards than those that apply to other research?; (2) If so, what ethical standards should be used to assess such evaluations?; and (3) Should a formal mechanism be developed to ensure that federal social program evaluations are conducted ethically? © 2005 by the Association for Public Policy Analysis and Management

Why is there so little public discussion of ethical matters among social program evaluators? This question first occurred to me five years ago, while I was teaching a course in program evaluation here at the Wagner School. I wanted to include a session on the ethics of evaluation, and was having a hard time finding good mate- rial and good cases.

I knew that ethics were not entirely ignored in the field. A few social program evaluators such as Robert Boruch have devoted substantial portions of their careers to ethical issues in experimentation (for example, Boruch, 1997; Boruch & Cecil, 1983). Brief discussions of ethical issues have appeared in textbooks and edited vol- umes (for instance, Gueron, 2002; Orr, 1999). Some of the relevant professional associations have issued non-binding guidelines regarding good professional con- duct, and these pertain in part to ethical matters (for example, American Evalua- tion Association, 1994). But there has been surprisingly little debate. For instance, an electronic search of the contents of back issues of JPAM yields very little on ethics or the protection of research subjects.

I was intrigued by this paucity of public discussion because I am a physician, and my medical training included a substantial component on research ethics (more on this later). From the perspective of that training, applied social scientists tread on ethically shaky ground. For example, most codes of medical ethics include special protections for research on vulnerable and disempowered populations such as the

DOI: 10.1002/pam.20141

Professional Practice / 825

children and the poor. And the subjects of applied social research are often vulner- able and disempowered. Ethical issues are often fiercely debated in the medical community (for example, the debates on the testing of anti-HIV drugs in develop- ing countries; see Lurie & Wolfe, 1997; Varmus & Satcher, 1997).

In the years since the question occurred to me, I have tried to understand this rel- ative silence. The more I have learned, the more interesting and complex the story has become. This essay distills what I have come to understand, beginning with some his- tory of the debate on scientific research ethics in this country. I describe the so-called Belmont Report (National Commission for the Protection of Human Subjects of Bio- medical and Behavioral Research, 1979), which articulates the principles that have informed researchers’ thinking about ethics for the past quarter-century. Shortly after the Report was published, its principles were incorporated into regulations governing federally funded biomedical and behavioral research. But, as I will describe, the eval- uation of public social programs was explicitly excluded from the purview of Bel- mont, and subsequently was largely exempted from federal ethical regulation. For better or worse, the evaluation of federal social programs has therefore proceeded under a kind of ethical “honor system.” This raises the question of whether self-regu- lation has been effective, and whether evaluation rests on firm ethical ground.

In this essay, I explore these issues by using as an example a recent evaluation, the National Job Corps Study (U.S. Department of Labor Employment and Train- ing Administration, 2001). My analysis suggests that the study might not pass muster under the criteria that generally govern research involving human subjects. The essay raises—but does not answer—three questions: (1) Are there good rea- sons to hold federal social program evaluations to different standards than those that apply to other research?; (2) If so, what ethical standards should be used to assess such evaluations?; and (3) Should a formal mechanism be developed to ensure that federal social program evaluations are conducted ethically?

A CAVEAT

Before I start, I would like to say a bit more about the perspective, training, and experience that I bring to this enquiry. While I have some formal training in ethics, I am not a professional ethicist, nor do I have legal training. Perhaps most impor- tantly, I am not an evaluator. My everyday research is in health policy, and it largely involves the analysis of survey data. I am based in an academic setting. In other words, my work experience is fairly distant from the world of the evaluation firms and federal agencies that undertake much of the work that will be discussed here. As an interested observer, I bring the handicaps and advantages of the outsider.

THE BELMONT REPORT

I began my enquiry by reviewing what I knew about medical research ethics. That field blossomed in the middle of the last century, with the abuses of medical science that occurred during World War II. At the war’s end, the world responded with the Nuremberg Code. Later, in the U.S., the 1960s and 1970s brought public awareness of the abuse of vulnerable subjects at the hands of science in such places as Tuskegee and Willowbrook. Public outcry led to congressional hearings and the pas- sage of the 1974 National Research Act (PL 93–348), which created the landmark National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research. That Commission was charged with identifying the boundary between practice and research, and with establishing basic ethical principles that

826 / Professional Practice

should underlie the conduct of biomedical and behavioral research. The Commis- sion’s major written product was the Belmont Report.

The Report begins by making the distinction between the everyday private activ- ity of providing treatment and the public activity of scientific research. It argues that a fundamental distinguishing characteristic of research is generalizability. Sci- entific research is conducted to acquire knowledge that is accessible to all. That knowledge is a public good. Because research subjects assume risk in order to gen- erate this good for all, different (and higher) ethical standards apply in science than do in everyday life.

Belmont proposes three principles to guide the conduct of research: respect for persons, beneficence, and justice. Because these principles are fundamental to fed- eral regulation of research involving human subjects, they are described briefly here.1 Also mentioned are applications of the principles to research, as well as areas in which interpretation of the principles has proven problematic. Interpretation is key; as the Belmont commissioners noted, the principles are general guidelines rather than algorithmic solutions to ethical problems in research.

Respect for Persons

This principle holds that subjects should be treated as autonomous agents, with their own perspectives, goals, values, and considered opinions. The notion of informed consent is derived from this principle: The experimenter cannot know whether a subject should choose to participate in a study; only the subject knows his/her own set of preferences. The principle reflects the Kantian imperative to avoid treating people as means to some end.

There has been much discussion of the application of this principle. For example, what constitutes “informed consent”? Scholars have questioned the concept of “informed” (What kind of information must be provided to the subject? How detailed must the disclosure be? How should the facts be presented? How can the investigator ensure that information has been transmitted?), and “consent” (What are the elements of consent, and what constitutes undue influence or coercion? What ethical constraints pertain to experiments with subjects who have limited autonomy?). Resolution of these issues is of course beyond the scope of this paper. However, it is notable that voluntary consent dictates that subjects are free to refuse, and to revert to a state that might be called the “counterfactual”—that is, a world with the same expected risk and benefit structure that prevailed before the offer to participate was tendered. Thus, for example, potential subjects in medical experiments are given assurances that refusal to participate will not jeopardize their relationships with their physicians, their treating hospitals, and so on.

Beneficence

This is a two-part principle, which the Belmont committee articulated as “(1) do not harm and (2) maximize possible benefits and minimize possible harms.” The com- mittee noted that this is not an absolute injunction against exposing subjects to risk—indeed, the scientific enterprise inevitably exposes subjects to risk—but that risk is only justified in proportion to the expected benefits.

1 These rich and complex principles can be only summarized briefly here; interested readers may wish to read the original report (see “References” for the URL), or consult a standard text in the field (the most widely read is by Beauchamp and Childress [2001], now in its 5th edition).

Professional Practice / 827

The committee also noted that risks and benefits accrue both to individual sub- jects and to society at-large. They enjoined researchers to view science as a pro- gressive influence: “In the case of scientific research in general, members of the larger society are obliged to recognize the longer term benefits and risks that may result from the improvement of knowledge and from the development of novel med- ical, psychotherapeutic, and social procedures.”

The principle of beneficence also provides guidance as to when it is ethical to assign different groups of subjects to different treatments. For the investigator to avoid harming either group of subjects, there must be uncertainty as to whether the treatment or the control group will fare better. Medical ethicists use the term “equipoise” for this uncertainty. The notion of equipoise is often invoked in deter- mining when it is permissible to withhold treatment in the course of research. Many hold that placebo-controlled experiments are ethical only when there is no known effective treatment for the condition in question. In such cases, it is not clear whether the treatment or placebo group will do better. But if there is a known effec- tive treatment, an “equivalence” trial must be conducted. To give a medical exam- ple: To evaluate a new drug for the treatment of early-stage breast cancer, a placebo- controlled study would not be fair to subjects, because there is a known effective treatment for early-stage breast cancer. However, it would be ethically justifiable to evaluate the new drug in an equivalence study, in which one group of patients was offered the known effective care plus the new drug, and the other group was offered just the known effective care for cancer.

Some believe that there is an important exception to the rule that placebo-controlled studies should be undertaken only when there is no known effective treatment. If the harm (or the “disease”) is minor and short-lived, then placebo-controlled studies may be undertaken even when known effective treatments exist. For instance, trials of headache medicines or antihistamines may be placebo-controlled, even though there are known effective treatments for headaches and allergies. A very few others would go further and allow placebo-controlled trials even when potential harms are sub- stantial, if there are large societal benefits at stake (for a sense of this debate, see Roth- man, Michaels, & Baum, 2000).

As noted above, the principle of beneficence also requires the investigator to max- imize possible benefits and minimize possible harms. Balancing harms and bene- fits is one of the most problematic of ethical requirements. There are many kinds of harms and benefits, and these can be expected to accrue to different parties (indi- viduals, affected communities, society at-large), with various levels of certainty. Trade-offs between individual and societal benefits are particularly problematic; there is no formula for deciding where to strike the balance.

Justice

This refers to distributive justice, or the fair and equitable distribution of the bene- fits and burdens of research. At the time that the Report was written, there was a long history of poor patients serving as subjects while wealthy patients benefited from medical research. Citing such abuses, the Commissioners emphasized the need to avoid research in which “welfare patients, particular racial and ethnic minorities, or persons confined to institution are being systematically selected sim- ply because of their easy availability, their compromised position, or their manipu- lability, rather than for reasons directly related to the problem being studied.”

It bears emphasis that these principles were meant to apply to both biomedical and behavioral research. On the biomedical side, applying the principle of justice has lit-

828 / Professional Practice

erally changed the face of clinical research. Disadvantaged subgroups no longer bear a disproportionate burden of the risk of experimentation. While experiments can be performed exclusively in vulnerable subpopulations, there is a significant burden on the experimenter to demonstrate that this is justifiable (for example, when the disease in question occurs exclusively in that subpopulation, when members of the popula- tion and their advocates agree that the protocol is just and reasonable, and so on).

INCORPORATION OF THE REPORT INTO FEDERAL POLICY; EXCLUSION OF RESEARCH ON PUBLIC SOCIAL PROGRAMS

Consistent with the Commission’s recommendations, the Belmont Report was pub- lished as policy by the Secretary of HEW in 1979. Around the same time, the Com- mission issued a report recommending an institutional mechanism for ensuring the protection of human subjects in scientific research. Institutional Review Boards (IRBs)—composed of scientists working in the institution and “at least one mem- ber not otherwise affiliated with the institution”—would review research to ensure that the work was conducted in a fashion consistent with respect for persons, benef- icence, and justice (Protection of Human Subjects: Institutional Review Board, 1978). Regulations governing research with human subjects, and describing the nature and scope of IRB activities for HEW-sponsored research, were finalized in early 1981, as the aforementioned 45 Code of Federal Regulations 46 (Federal Reg- ister, 1981; for a detailed history of the era, see Gray, 1982). Nearly ten years later, the regulations were tailored and adopted by 16 federal agencies as a single general set of provisions known as the “Common Rule” (Federal Register, 1991).

The Common Rule requires that researchers submit detailed work plans to a committee composed of peers and at least one outside member of the community. The IRB reviews those plans to ensure that subjects are being treated in a fashion consistent with the principles articulated in the Belmont Report. The requirement for IRB review is time consuming and sometimes irksome, and many investigators view it as no more than a bureaucratic hurdle. Some IRBs function poorly, some- times in ways that seriously compromise their effectiveness (Department of Health and Human Services [DHHS], Office of the Inspector General, 1998; Institute of Medicine, 2002). Nonetheless, the presence of formal regulations with explicit cri- teria, and the accompanying institutional structure, serves as a constant reminder of ethical matters. I believe that this has been a critical factor in keeping awareness of research ethics in the forefront, particularly in the medical research community.

Debate on Applicability to Public Social Program Evaluations

To understand how research on public social programs fits into this picture, we must return to the congressional hearings of 1974. Among other things, those hear- ings revealed abuses of subjects in Tuskegee, under the sponsorship of the Public Health Service, administratively under HEW. As the hearings progressed, it became clear that unless the Department took initiative, Congress would pass legislation to govern HEW-sponsored research. In a colorful history of the debate, Richard A. Tropp, a high-level staffer in HEW at the time, describes how the agency responded by taking standing NIH human subjects regulations, and applying them to all research conducted under its purview:

Under the gun of imminent congressional passage of the National Research Act, and in order to preempt a possible Senate move to include in it mandatory ethical standards governing federally sponsored research, the secretary of HEW on May 22, 1974, signed

Professional Practice / 829

a regulation on “Protection of Human Subjects” which essentially transformed the predecessor NIH guidelines into department policy. The regulation was largely a prod- uct of the “H” part of HEW, drafted without participation by those HEW agencies which customarily commission social science research products and maintain daily conduct with the social science research world (Tropp, 1982, pp. 391–392).

But adoption of criteria that worked in the NIH context did not work for many in the Department who were involved in social programs. They were accustomed to working without the kind of external oversight that is implied by IRB review. Many HEW staffers ignored the regulation (Tropp, 1982). In response, the first of a series of lawsuits was brought on behalf of subjects in HEW-sponsored social experiments. Crane v. Matthews (417 F. Supp 532), a 1976 class action by Georgia Medicaid bene- ficiaries, sought to halt a demonstration project assessing the impact of beneficiary cost sharing on the “overutilization” of “marginally needed” medical care. The plain- tiffs argued that cost- sharing would put them at risk of foregoing needed care, and thus poorer health. In its ruling, the court halted the demonstration on the grounds that the Department had not sought IRB review, as was required in the new regula- tions. By failing to appeal this decision, the federal government essentially converted the requirement for IRB review into law. Confusion and conflict over the IRB issue escalated within HEW in the ensuing years (Tropp, 1982; for a sense of the substance and tone of the discussion, see Rivlin & Timpane, 1975).

By the time that the Belmont Commission issued its final report in 1979, the topic was sufficiently controversial that the Commissioners were unable to reach a con- sensus. A final footnote to the Belmont Report left the problem to posterity:

Because the problems related to social experimentation may differ substantially from those of biomedical and behavioral research, the Commission specifically declines to make any policy determination regarding such research at this time. Rather, the Com- mission believes that the problem ought to be addressed by one of its successor bodies (emphasis added).

The issue has never been addressed by any of the successor bodies, from the Pres- ident’s Commission for the Study of Biomedical and Behavioral Research in the early 1980s, to the current President’s Council on Bioethics.2

Public Social Program Evaluations Exempted

A few years after the Belmont Commission finished its business, the agency—now the Department of Health and Human Services—attempted to settle the ethical ques- tion through regulatory means. In a 1982 Notice of Proposed Rulemaking, it identi- fied categories of research that would be exempt from what would later become the Common Rule. One exempt category was social program evaluations, namely:

. . . [r]esearch and demonstration projects which are conducted by or subject to the approval of the Department of Health and Human Services, and which are designed to study, evaluate, or otherwise examine: (1) public benefit or service programs; (ii) proce-

2 This is not to say that national commissions and other august bodies have ignored research in the social and behavioral sciences. In fact, debate on the applicability of the “biomedical” ethical model to social and behavioral research more generally is still very much alive at the national level (for a summary, see Singer & Levine, 2003). But neither national commissions nor other august bodies have addressed the issue posed here, namely, the ethical constraints on evaluations of public social programs and the appro- priateness of the exemption under the Common Rule.

830 / Professional Practice

dures for obtaining benefits under those programs; (iii) possible changes in or alterna- tives to those programs or procedures; or (iv) possible changes in methods or levels of payment for benefits or services under those programs (42 FR 12276; March 22, 1982).

In its filing in the Federal Register, the Department did not argue that the exemp- tion was in the best interest of research subjects. Rather, it emphasized pragmatic and legal considerations: External review of social experiments was “protracted, cumbersome and duplicative” (particularly in cases where programs were imple- mented in multiple states). External review undermined DHHS’s statutory author- ity to modify program characteristics, and to learn from those modifications. The medical model was not appropriate in the evaluation context. Finally, in reference to ethical matters, the Department suggested that it had special expertise that allowed it to make more informed judgments:

We do not agree with the . . . belief that the “ethical” aspects of research in benefits pro- grams will go unreviewed unless nongovernmental individuals with expertise in the ethics of research participate in consideration of proposed studies. The questions raised by research involving government benefits are significantly different from those raised by biomedical and behavioral research. IRBs are typically constituted to deal with the special ethical and other problems involved in biomedical and behavioral research. In contrast, ethical and other problems raised by research in benefit programs will be addressed by the officials who are familiar with the programs and responsible for their successful operation under state and federal laws (48 FR 9266; March 4, 1983).

In the end, decisions about the ethics of DHHS-sponsored social program evalu- ations were removed from the public realm. In the coming years, similar positions were adopted by other federal agencies. By 1991, the exemption of social program evaluations became part of the “Common Rule,” adopted by 16 federal agencies. For evaluations conducted by those agencies, there would be no mandatory “reality checks” of ethical issues by outsiders. The protection of human subjects would be an internal concern, or one shared with the evaluation firms that carried out much of the government’s social research agenda.

A THOUGHT EXPERIMENT: APPLYING THE BELMONT PRINCIPLES TO A SOCIAL PROGRAM EVALUATION

As we have seen, the Belmont Commission left open the question of whether assess- ments of social programs are sufficiently different from other biomedical and behav- ioral research to warrant different protections for human subjects. The regulations described above suggested that they are. But those regulations were primarily justi- fied on pragmatic and procedural grounds, not ethical ones. One way to shed light on the ethical merits of treating public program evaluations differently is to treat them the same, and see what happens. For example, we can do a thought experi- ment: What happens when the Belmont principles are applied to the evaluation of a public program? Is that evaluation conducted in a fashion that is consistent with jus- tice, respect for persons, and beneficence, as articulated in Belmont? Were the inves- tigators uncertain as to whether the treatment or control group would fare better? Were subjects harmed? Did the investigators take steps to minimize harm and max- imize benefit? Was informed consent obtained? If not, how did the test case fall short? If it fell short, are we content with this state of affairs?

Choosing a case for this thought experiment is somewhat arbitrary, since there are many different contexts and strategies for social program evaluation. As JPAM

Professional Practice / 831

readers well know, some evaluations are experiments and others are not. Some eval- uate programs by offering more; other evaluate programs that offer less. In some cases the programs that are being evaluated were formerly legal entitlements; in other cases they were not.

For this thought experiment, my case is the National Job Corps Study. I selected it because it was the first (and one of the only) studies that I found when I initially Google™-searched “ethics of social program evaluation.” While the study is in many ways atypical (it was an experimental evaluation of an existing program that involved denial of services), it has features that make it a good case. It is similar to a medical clinical trial in that it involved random assignment. The evaluation also had a clear mandate, and it had ethical repercussions.

The National Job Corps Study

Let me begin by providing some of the facts of this case, as I have come to under- stand them through publicly available documents. I will review the program that was evaluated, the context and planning for the study, and some of the key issues that arose in implementation.

The Job Corps Program. In 1964, Job Corps began to offer educational, vocational, and support services to young people to help them to “become more responsible, employable and productive citizens.”3 The program was not a categorical entitle- ment; a fixed number of slots were funded each year. Recruitment was carried out by independent agencies under contract with Job Corps. Participants were between the ages of 16 and 24, with approximately 70% from communities of color. Because it was a primarily residential program, Job Corps was expensive ($14,000 per par- ticipant at the time of the evaluation; Burghardt et al., 2001).

Context for the Study. When the study was being planned in 1993, there had been only one prior evaluation of the program. That non-randomized study had shown increases in earnings and decreases in criminal activity among Job Corps partici- pants. However, the study had been conducted in the 1970s. By the mid-eighties, experimental studies of some job training programs had shown no significant impact (and even negative impacts) on some subgroups of participants. Doubts about the internal validity of non-randomized studies of job training programs had lead several expert panels to specifically endorse the random assignment approach based upon the enhanced internal validity of the impact estimates (Betsey, Hollis- ter, & Papageorgiou, 1985; also see Stromsdorfer et al., 1985). The National JPTA Act amendments of 1992 mandated a new evaluation of Job Corps, with a random assignment methodology “if feasible” (29 USC §1732(d)(2)(A)).

Planning the Study Design. In 1993, a panel met to consider the design of the eval- uation. At the meetings were the investigators, Department of Labor and OMB staff, Job Corps representatives, and several highly regarded researchers who were not affiliated with the evaluation (Frazer & McConnell, 1993). In weighing the design options, the group recognized ethical obstacles to the random assignment strategy, and acknowledged the need to “minimize the potential for damaging [recruiting] agencies referral relationships and eroding community support for the Job Corps program,” “deal with politically sensitive and particularly needy subgroups of eligi- ble Job Corps applicants,” and “minimize the impacts of the study for those placed in the control group.” However, the group felt that the “problems posed by the ran- dom assignment design were outweighed by its methodological rigor and the

3 Job Corps still exists despite modifications over the years. It is no longer a free-standing entity, but has been subsumed and to some extent reorganized under the Workforce Investment Act.

832 / Professional Practice

greater credibility of its results” (Frazer & McConnell 1993). They unanimously rec- ommended the random assignment approach, with one abstention.

Getting Cooperation from Staff at the National and Local Levels. The evaluation drew a national sample from over 100 different Job Corps recruiting agencies, and involved more than 1,300 recruiters. Given that scale, the researchers faced a daunt- ing challenge in getting cooperation from senior staff, as well as support from staff at the local level. In their communications, the evaluators stressed that continued program funding depended upon the evaluation being fielded, and that random assignment was the only politically credible evaluation method:

The senior staff understood that the future of the program hinged on the results of the study. Congress showed a keen interest in the study and it had directed DOL, through the Job Training Partnership Act, to evaluate its programs including Job Corps using random assignment, where feasible. Moreover, the previous study, which showed that Job Corps had positive impacts and was cost-effective, was instrumental in increasing funding for the program. . . .

By developing a clear and appealing message, national office staff effectively commu- nicated their commitment to random assignment to Job Corps staff nationwide. They argued that a demonstration of the effectiveness of Job Corps was important for per- suading Congress that Job Corps deserves the large investment of public funds it receives. Staff was reminded that, while people who work in Job Corps know the pro- gram works, others do not have the same opportunity to observe its success. They acknowledged that random assignment was painful—turning youth away hurt the pro- gram’s image in its communities and may harm some individuals who could benefit from Job Corps. However, random assignment was necessary because it was the only way to provide Congress and the public with credible evidence about the success of the program (Burghardt, McConnell, Meckstroth, & Schochet, 1997, pp. 20–21).

Recruitment and Assignment of Subjects. Anticipating that line staff would be reluctant to withhold services from needy applicants, a brochure addressed the question, “Is it fair to deny Job Corps services to some eligible applicants for pur- poses of finding out whether Job Corps is effective?” In responding in the affirma- tive, the researchers cited the obligation to taxpayers and program participants. They also suggested that random assignment was justified by a scarcity of program slots:

A large, unfilled need for Job Corps appears to exist: Job Corps serves about 62,000 stu- dents annually, yet there are 3 to 4 million economically disadvantaged youths nation- wide. During the period of the study, if screeners can increase the number of youths who apply for Job Corps as planned, the number of students actually served by the pro- gram will not change, even though some applicants are placed in the control group. The random selection process is fair because all eligible applicants will have an equal chance of being chosen for the program group (Burghardt et al., 1999, Appendix C).

The reference to scarcity (or an “unfilled need”) for Job Corps is noteworthy, since at the time that the study was being planned, Job Corps slots were not scarce. While the ratio of eligible applicants to program slots varied throughout the nation, that ratio was less than one (Burghardt et al., 1993, p. 60). But if an experiment was to be performed, some applicants needed to be turned away. Under the circumstances, this would mean reducing the total number of people receiving program services. The investigators felt that such a reduction would be unethical. So in the study

Professional Practice / 833

design, they took active measures: they recruited more subjects. By earmarking funds to do so, they ensured that they would have adequate numbers of applicants to turn down, without reducing the number of people receiving services (Burghardt et al., 1993, p. 59).

Random assignment occurred in a series of steps. When applicants met with recruiters to discuss Job Corps, they were also told about the study. It was explained that if they wished to have the chance to participate in the program, they would need to participate in the study; those declining to sign a consent form would not be eligible to enter Job Corps. Those who consented were asked to provide further information that would determine whether they were eligible for the program. Lists of eligibles were communicated to a national office, where random assignment was performed. Sites were then informed of assignments, and recruitment counselors informed control group members of their status. Those placed in the control group were told that they could not enter Job Corps for three years. However, they were free to pursue other employment-related services in their communities, and staff were permitted to refer them to such services. Members of the control group, their parents and advocates were invited to call a toll-free number if they wished to dis- cuss this outcome with evaluation staff. The line was “heavily used”: on average the evaluators fielded 17 calls per day during the sample intake period (Burghardt et al., 1997, p. 14). Sample intake occurred over 13 months, during which time 5,977 subjects (7.3% of the 80,883 eligible applicants) were assigned to the control group.

Study Findings: Impacts of Job Corps. Subjects were followed by interview over a four-year period following random assignment. At the end of that time, the experi- ences of the treatment and control groups were compared. These comparisons showed that Job Corps participants experienced improved employment and earn- ings, beginning with the third year of follow-up. By the final year of follow-up, the gain in average earnings per participant was $1,150, or 12%. Job Corps also reduced criminal activity. In the treatment group, the arrest rate was reduced by 16% (about 5 percentage points), with much of the reduction during the first year, when par- ticipants were in the program (Burghardt et al., 2001). A cost benefit analysis showed a favorable profile for the program, with benefits to society of $17,000 per participant in 1995 dollars (McConnell & Glazerman, 2001).

Evidence of Harms Caused by the Study. A report by the evaluators suggests that in most cases the experiment caused little harm to recruiters and their relations with potential subjects. But there were exceptions. One-third of subjects were recruited by a counselor who said that random assignment “caused significant problems that made recruiting more difficult” (Burghardt et al., 1999). One-quarter were recruited by a counselor who said that “at least one referral source stopped making referrals because of the study.” Twenty-seven percent of counselors reported that an applicant delayed application because of random assignment. Whether some would-be applicants failed to apply to Job Corps due to the experiment is unknown.

The study may have caused other harms. Three years into the evaluation, a class action suit was filed on behalf of those randomized to the control group (Gillespie v. Reich, U.S. District Court for the District of Montana No. CV 96–180–M–DWM). In their complaint, the plaintiffs alleged damages including diminished lifetime earning capacity, an increased risk of incarceration, and an increased risk of having to rely on family for financial support as a result of being denied access to the pro- gram. The legal grounds were that (1) barring subjects from participating in Job Corps was a violation of due process; (2) the study was not conducted in a fashion consistent with the Common Rule; and (3) DOL had violated administrative proce-

834 / Professional Practice

dure by failing to give public notice of the study and the concomitant change in Job Corps rules. Montana District Judge Donald H. Molloy reportedly had strong ethi- cal misgivings about the study procedures (personal communication V. Constan- tino, Attorney for the Department of Labor, December 17, 2001); however, the judge declined to discuss the case with me as a matter of policy. In any event, as often hap- pens when cases raise wider ethical or social concerns, the judge’s ruling for the plaintiffs was on rather narrow legal grounds: By undertaking the study without giving public notice of a change in the program rules, the Department had violated the Administrative Procedure Act. In the settlement, the government agreed to locate every subject who had been randomized to the control group and to offer them Job Corps, if they were eligible. Additionally, DOL paid $1,000 to each of the 15 control group members named in the suit, for the “time, energy, and resources” that they contributed to the legal proceeding. One discouraged control group mem- ber summarized her feelings by saying, “We deserve a settlement and an apology. We were all guinea pigs” (quoted in Price, 1999).

There was little public attention to the ethical issues raised by the lawsuit, or to its outcome. Indeed, the sole mention in the print media was in Mother Jones (Price, 1999). To my knowledge, there was no discussion in the scholarly or practitioner lit- erature.

Evidence of Benefits Derived from the Study. Findings from the National Job Corps study were released to the public in 2001. The resultant impact on program fund- ing and program design is difficult to ascertain. Appropriations to Job Corps had increased modestly and uninterruptedly in nominal dollars throughout the 1990s (U.S. Department of Labor Employment and Training Administration, 2001), and they continued to do so through the Bush administration (The Workforce Alliance, 2004). This continued funding allowed hundreds of thousands of young people to enjoy the benefits of the program. However, I am unable to find evidence that speaks to the extent to which these sustained funding levels and program offerings were driven by the Job Corp study findings, as opposed to prevailing political forces or ideologies. Indeed, as many others have noted, in the absence of the counterfac- tual it is difficult to make such inferences (see, for example, Greenberg & Mandell, 1991).4

Applying the Three Principles to the National Job Corps Study

With this background, it is possible to start thinking systematically about the ethi- cal aspects of the National Job Corps Study. In the pages that follow, I take each of the Belmont principles in turn, and discuss areas in which there may be tensions between the Principles and the practices of the National Job Corps Study. While

4 History may be helpful by showing how findings have been used in a similar context. For instance, the National JPTA study failed to show positive impacts for youth participants, but did show gains for adults. Subsequently funding for the youth component was decreased dramatically, and a search was begun for alternatives, consistent with a need to rationalize government expenditures. However, while funding of the adult component continued, program policy surprisingly “moved away from the approaches that were found to be the most effective in the National JPTA Study. Stand-alone OJT and job search assis- tance [were] abolished and emphasis shifted to classroom training, which [had been shown to be] not cost-effective for adult women and was the least cost-effective strategy for adult men. These changes were made as part of the 1992 amendments, before the results of the National JPTA study became avail- able, but there [was] no effort to reverse them on the basis of the study findings. In fairness to DOL, however, it must be noted that the results for adults were much less clear-cut than the results for youth” (Orr et al., 1996, p. 229). In other words, in the case of the National JPTA study, the connection between evaluation findings and program direction was somewhat loose.

Professional Practice / 835

many issues could be raised, I have chosen to highlight those that are potentially applicable to a wide range of evaluation settings, in order to stimulate further thought and discussion.

Respect for Persons. The National Job Corps study raises three issues in regard to this principle. These are: (1) the feasibility of informed consent when an evaluation involves the reduction of services, (2) the ethical weight of program scarcity in jus- tifying random assignment, and (3) the importance of the perception of respect on the part of subjects. Each of these pertains to the potential pitfalls of treating sub- jects as members of a class, rather than as individuals.

As noted above, non-coercion is an important element in informed consent. If the subject chooses to decline participation, s/he should be able to revert to the situa- tion that s/he would have faced, absent the research study. But informed consent in this sense is incompatible with evaluations like the National Job Corps study. As Larry Orr has noted, when experiments evaluate ongoing programs:

. . . the informed consent of the applicant cannot be taken to mean that he or she expects the experiment to convey net positive benefits relative to his or her situation in the absence of the experiment. In th[ese] case[s], the applicant may have received pro- gram services in the absence of the experiment. Therefore, refusal to consent, resulting in exclusion from the program, leaves the applicant worse off than he or she would have been in the absence of the experiment. Thus, consent implies only that the appli- cant prefers some chance of receiving program services to no chance at all (Orr, 1999, p. 22).

In justifying the denial of services, many social experimenters point to counter- vailing benefits to future program participants (and to taxpayers; more on this later). These benefits may accrue, but this is a utilitarian argument, and one that will not necessarily resonate with current subjects. Some subjects may feel kinship for future program beneficiaries to the extent that they are willing to bear the bur- dens of research. Others may not. Applicants for advertised programs who learn that they will be denied services in order to benefit others like them would there- fore seem justified in feeling that they are being treated as members of a class, rather than as individuals.

A second issue pertains to scarcity as a justification for randomization. Several experimentalists have noted that when program resources are scarce—for example, when funding for a program is limited, or the number of program participants is capped—some of those needing help from the program will need to be denied serv- ices. In such cases, a random assignment merely reallocates services or benefits to a different set of participants than those who would have received services without the study. There is no reduction in the number of people served.

This could be an argument for randomization when resources are truly scarce (for example when there is a scarcity of organs for transplantation). However, as we have seen, such scarcity did not prevail in the National Job Corps Study. On the con- trary, applicants to Job Corps were scarce—sufficiently so that the investigators recruited additional subjects in order to ensure that they would have adequate numbers to turn down, without reducing the number of people receiving services. Even with this extra effort, the period of recruitment was extended in order to enroll adequate numbers of subjects in both groups.

How then can we credit the claim that Job Corps slots were scarce? In similar contexts, some evaluators have argued that there is a latent scarcity of program resources in the population. According to this view, there are often program eligi-

836 / Professional Practice

bles who “need” program services, but aren’t conscious of their need, or aren’t aware of the opportunity, or just don’t take action. Those requesting program serv- ices are therefore but a sample from the larger population of those who might request services. From this perspective scarcity is real, and randomization is fair and justified.

It is notable that program staff and other local sponsors often have a different view of these matters. For example, in response to randomization in an evaluation of another federal program:

Many [program staff and local sponsors] questioned the ethics of assigning volunteers to a control group. Often, they described their program as a virtual entitlement for all eligibles who applied. They felt that by the very act of applying, applicants were demonstrating a special need or worthiness. . . .

Often, individuals at the local level viewed their program as serving everyone who wanted or needed services, even though funding limitations generally result in . . . serv- ices being available to no more then 5 to 10 percent of the eligible population. They held this view because recruitment was a persistent problem (Doolittle & Traeger, 1990, pp. 37–38).

Skepticism that applicants are different from non-applicants “by the very act of applying” sounds strange coming from experimentalists, since occult differences between volunteers and non-volunteers is often cited as the experimental raison d’être. But even disregarding the methodologic issue, the question arises as to whether applicants are morally different from non-applicants. In other words, is help-seeking morally relevant? People generally seek help from a program because they recognize that they have a problem and believe that the program can help. In the National Job Corps study, many subjects were put into a help-seeking frame of mind by the experimental apparatus (for instance, through advertisements or other additional outreach efforts). Some were put in this state of mind only to be disap- pointed when they were randomized to the control group. Of course the magnitude of this disappointment is unknown, and could well be minor. But it is not hard to imagine that if subjects knew that they were actively recruited to participate in the program only to ensure an adequate number of control group subjects, they might feel ill-used. It could be argued that those feelings of ill use would reflect a lack of information: At the time that the study was mounted, there was no solid evidence that Job Corps had a positive impact. But even if evaluators were uncertain about program benefits, applicants probably were not. “Job Corps” sounds like a program that helps people get jobs.

This raises the question of whether subjects’ perceptions—and in particular their sense of being treated fairly—has ethical weight. For example, let us suppose that all of those who were denied services in the National Job Corps Study believed that they were worse off as a result of the denial. Further, suppose that they learned that there was a questionable scarcity of Job Corps slots at the time that they were denied services. If they felt ill-used, would their feelings of ill use have ethical weight? I believe that they would. History has shown that injustice in the name of science is particularly destructive and long-lived; as Judith Gueron has noted, “sus- picions of the ethics of researchers run deep” (p. 22). This may be particularly true in communities of color. For example, in the medical arena, distrust of researchers among African Americans has a firm historical basis (Gamble, 1997), is highly prevalent (Corbie-Smith, Thomas, & St. George, 2002), and has been identified as

Professional Practice / 837

an important barrier to care seeking and participation in clinical trials. To the extent that research subjects are historically marginalized—and in social program evaluations they often are—it is arguable that researchers have an ethical obligation to cultivate and help sustain a view of science as positive and progressive enterprise. This may mean that researchers need to do more than satisfy themselves that the research is acceptable and fair. They may have an obligation to ensure that the experiment is understood and accepted as fair by the affected community. This is not the same as the logistical imperative to get local “buy in.” It is more compli- cated, and is likely to be difficult to codify and/or enforce. Nonetheless, an ethical imperative for community consultation has been increasingly recognized among researchers working in developing countries (National Bioethics Advisory Commis- sion, 2001) and those conducting research among vulnerable and marginalized communities domestically.

Beneficence

As was noted above, this is a two-part principle, with the first holding that research should balance risks and benefits, and the second that the research should not harm subjects. Reviewing the Job Corps study protocol, it is clear that investigators made efforts to minimize harms within the constraints of random assignment. First, by undertaking a large study, they were able to assign a relatively small proportion (7.3%) of applicants to the no-treatment group. Second, staff were permitted to refer control subjects to other, non–Job Corps related programs. Third, the investigators established a hot line to explain the study to those who were assigned to the control group.

The investigators might have further increased benefits and minimized losses to the control group had they compensated them for the financial loss of not being assigned to receive Job Corps services. Notably, this sort of “stop loss” protection was afforded to participants in the RAND Health Insurance Study. While increasing beneficence, this kind of compensation can be logistically complex and may com- promise the internal validity of a study (for a discussion, see Boruch & Cecil, 1983; also, Boruch, 1997). Perhaps this is why the decision was taken not to compensate the Job Corps study control group at the evaluation’s end.

The second part of beneficence involves avoiding harm to subjects. As noted above, equipoise is a reflection of beneficence in this sense. We have seen that there were grounds for uncertainty about the outcome of the National Job Corps Study. While the evaluation would therefore seem to meet the criteria for equipoise, it is worth raising the more general question of whether equipoise ever fails to prevail in federal social program evaluations. To the extent that programs vary—eligibility cri- teria are modified, macroeconomic conditions change, or a different set of services are bundled together—one might make the case that equipoise always pertains. Or it might be argued that while there are often good reasons to forecast a particular direction of program effect in evaluating a new iteration of a program, the magni- tude of the effect is always uncertain.

Questions about equipoise in this sense are especially vexing in the context of the waiver demonstrations of the pre-welfare reform era, as well as ongoing studies of the impact of TANF. Each study is “new,” investigating a different timeframe, or dif- ferent eligibility criteria, or a different state, or a different bundle of services. While relatively few of these studies use random assignment, many involve the denial of services or the imposition of sanctions on one group or another. These include prac- tices like sanctioning families with truant children by reducing their already mea- ger monthly family incomes. Many of these families include small children, who

838 / Professional Practice

presumably bear the brunt of deviant family “behaviors.” It seems difficult to be in plausible equipoise with respect to impact of some of the more draconian program elements, and even more difficult to imagine subjecting such studies to the strin- gent protection-of-children aspects of the Common Rule. Yet such work has been the bread and butter of the recent social program evaluation enterprise.

While further discussion of the relationship between beneficence and the evalua- tion of public programs is beyond the scope of this paper, it seems clear that evalu- ation as it has been practiced might be difficult to reconcile with beneficence, as it has been articulated in the Common Rule and interpreted in other research con- texts.

Justice

As we have seen, this principle has to do with the distribution of the burdens and ben- efits of research. Research is prima facie unjust if some groups disproportionately bear the burdens and others reap the benefits. Yet over the past 30 years, evaluations have been conducted almost exclusively on public programs that benefit low-income and vulnerable populations. Middle-class benefits like Medicare, the home mortgage deduction, and the college Work-Study programs have been largely untouched. To the extent that participants in social program evaluations assume risk or miss out on desired services, this disparity would seem to raise questions of justice.

As noted previously, utilitarian arguments have been advanced to support the assumption of risk in social program evaluations. Substantial societal benefits can accrue through knowing whether the program is effective or not:

In considering whether experiments are ethical, it is important not to focus too exclu- sively on the issue of denial of service to controls. Other members of society have a stake in whether the experiment is performed. In particular, failure to obtain reliable estimates of the efficacy of an ongoing program can entail substantial costs to the taxpayers who support it. An ineffective program can waste millions or billions of the taxpayers’ dollars year after year.

Moveover, failure to detect ineffective programs imposes costs on the intended benefi- ciaries of those programs. Not only do such programs waste participants’ time and cre- ate false expectations but also they consume resources that might otherwise be devoted to more effective solutions to the problems the programs were intended to address. Thus “protecting” program beneficiaries from experiments is not necessarily in their best interest (Orr, 1999, p. 21).

Indeed, as has been suggested, the National Job Corps study may have helped rationalize governmental expenditures. But if this benefit carries ethical weight, it is not clear why the criterion should be applied disproportionately to programs ben- efiting vulnerable Americans. Though we may all favor prudent expenditure, few of us would want to be “human subjects” in studies of the effectiveness of the pro- grams that we depend upon. This is probably true even though we understand that the programs that we use are not optimally configured. We are invested in the sta- tus quo, and we make our plans around it—in short, we feel entitled to it.

Looking more critically at this entitlement may be helpful. Some have pointed to program entitlement (that is, the presence of statutes identifying classes of individ- uals who are “entitled” to receive program services by virtue of their membership in that class) as determining whether and how social programs can be evaluated.

Professional Practice / 839

For instance, Boruch (1997), Orr (1999), and Gueron (2002) suggest that denial-of- services experiments cannot be conducted for programs that are entitlements in this sense. All note that to do so would mean denying program services to subjects who would receive them otherwise. Some of these authors allow ambiguity as to whether this is an ethical matter or a legal obstacle under the Fourteenth Amend- ment. But in either case, this focus on program entitlements might explain the dis- proportionate interest in evaluating programs for the poor, to the extent that pro- grams benefiting the middle class are often entitlements.

But legal entitlement has proved a remarkably elastic concept, particularly as applied to programs targeted to disadvantaged Americans. Lawyer Lucy A. Williams has chronicled this elasticity in the period leading up to the Personal Responsibil- ity and Work Opportunity Reconciliation Act of 1996 (PRWORA), when many states decreased or even eliminated AFDC benefits under federally waivered “experi- ments” and “demonstration projects” (Williams, 1994). To the extent that waivers disproportionately targeted what would otherwise have been entitlements to vul- nerable sub-populations, the problem of distributive justice would seem to remain.

A counter-argument could be that researchers didn’t modify these entitlements, legislators and agencies did. Even if some of the resultant program elements were abhorrent to some, they were nonetheless legal. Why not at least learn from the modifications by evaluating their impacts? Perhaps the most important reason is that linking science with such policies undercuts science as a benevolent and pro- gressive public enterprise—and thus damages the very core of science. And after today’s science is tainted, tomorrow’s science is impossible. Another reason has been articulated by political scientists Evelyn Brodkin and Alexander Kaufman: When science is used as a vehicle to legitimize policy choice, doing science becomes a form of policymaking. In their view, during the period between the 1960s and the 1990s, social program evaluation was a “shadow institution,”

. . . more successful as a vehicle for smuggling in policies piecemeal that could not win approval as national legislation. …[T]his tactic was employed so extensively, extending new programs and requirements so broadly, as to constitute de facto policymaking. In the case of welfare, the exceptions made to national policy in order to conduct the income maintenance experiments and WIN demonstrations were just the prelude to the administrative practice of waiving statutory requirements for virtually any state claiming to test new policies. By the time a new welfare law was enacted in 1996, for- mally devolving more policymaking authority to the states, some 40 states had already sought to received approval for federal waivers to implement new policies, among them “learnfare,” “family caps,” benefit reductions, and even benefit elimination for entire categories of welfare recipients (Brodkin & Kaufman, 2000, p. 534).

There is a notable lack of a parallel history of waivers and “shadow institution” policymaking for middle-class entitlements.

Reviewing the history of the social experimental field—an important subfield of social program evaluation—one researcher has noted that most studies have been conducted:

. . . in populations that are politically weak. Unwed mothers, public assistance recipi- ents, disadvantaged job-seekers, and unemployed insurance claimants do not have much political clout. Perhaps the ethical objections to experimentation do not occur to members of these populations, although that seems doubtful. It is more likely that peo- ple in these groups are too inarticulate and weakly organized to resist research gov-

840 / Professional Practice

ernmental and foundation efforts to test a policy through random assignment” (Burt- less, 2002, p. 194).

Most likely, then, the choice of programs for evaluation reflects a simple political reality: expenditures on programs for the poor require a different kind of justification in order to maintain funding. But this is precisely because the poor lack political power and influence. To the extent that Belmont enjoins us to hold science to a higher stan- dard, can science that starts with a premise of social inequality meet that standard?

The points in this section converge on big questions: What is the nature of the relationship between social program evaluators to those agencies that fund their projects, and what moral obligations (if any) do evaluators have to ensure justice for their subjects? When evaluating programs under federal support and with fed- eral direction, is it acceptable to perform evaluations just as long as they are con- sistent with statutes, regulations, and congressional mandates? Are ethical ques- tions put to rest by examining the legal issues? No doubt these matters have been discussed many times inside the evaluation community. But there is scant evidence of that debate from the outside.

CONCLUDING REMARKS

As I have prepared this essay, I have discussed my findings with colleagues from a variety of fields, and have gotten a range of responses. Some have been dismayed by the story of the Job Corps evaluation, finding it at odds with fundamental ethi- cal principles. Others have been nonplussed to learn that I would expend so much effort pursuing what to them seems a matter of trivial ethical import. But most have taken a middle ground. Many have suggested that evaluations of public social pro- grams take place in a different ethical context than biomedical research. And while Belmont was intended to cover both biomedical and behavioral research, it is in fact best suited to the biomedical context. In closing this essay, I would like to summa- rize these suggestions and present brief comments on each.

SUGGESTION #1: Public social programs are not about life-and-death mat- ters; medical interventions are. The outcomes that are measured in medical exper- iments are serious. They may include breathing capacity, blood pressure, pain and discomfort, days lost from work, heart attack, and even survival. These outcomes and risks have a complex biological basis that is poorly understood by the typical layperson. In contrast, the outcomes measured in social program evaluations may include earnings, employment, criminal activity, teen pregnancy, and housing status. These are not life-and-death matters. They are social phenomena that are relatively well-understood by the typical layperson, and they vary with life’s vicissitudes. Sub- jects who miss out on one program have other avenues to improve their outcomes: They can find jobs, refrain from criminal activity, avoid pregnancy, and so on.

In short, the potential harms and risks in social experiments are of small conse- quence, and pertain to matters of which the subject otherwise has control in his/her daily life. Researchers therefore need be less concerned about beneficence and equipoise.

Comment on Suggestion #1: Research on sick patients facing last-resort options is probably a small fraction of medical research. While I have no data to support this contention, my sense is that the more typical medical study is conducted among healthy volunteers, in order to assess the impact of a drug on a relatively minor symptom. But in any event, is it true that social outcomes are not life-and- death matters? Surely poverty, unemployment, criminal activity, teen motherhood,

Professional Practice / 841

and substandard housing can lead to suffering on par with that of many serious ill- nesses. Moreover, there is growing evidence that such social phenomena have an impact on rates of disease and death, with stress as a possible mediator (Berkman & Kawachi, 2000; Marmot & Wilkinson, 1999).

SUGGESTION #2: Social programs have small impacts, medical treatments have large impacts. Even if the outcomes at stake are significant, the impacts of social programs are at best modest. This being so, the relative harms done to different groups is likely to be small in social program evaluations. In this sense, testing social programs is like testing headache medicines or allergy remedies, as described previ- ously. At worst, subjects in one group or another suffer transient and minor harms.

Comment on Suggestion #2: This is an empirical claim that merits further study. However, data on comparative effect sizes are likely scarce, and magnitudes of impacts probably vary according to the types of programs and therapies being eval- uated and the outcomes measured. And in any event, the claim that impacts are invariably small and transient would seem to undercut the social program evalua- tion enterprise.

SUGGESTION #3: Medical researchers can have an advocacy relationship to their subjects, but social program evaluators should not. Medical and social science grew out of different traditions, and operate in different “markets for knowledge” with different standards of professional credibility. Medical experimen- tation, for example, has a long history growing out of medical practice. Many exper- imenters are active clinicians, and sometimes their patients are also their subjects. Imperatives like equipoise, informed consent, and the balance of harms and bene- fits flow naturally from the clinician role.

In contrast, social program evaluation is a new field. The evaluator’s role has grown in opposition to the model of the clinician/researcher, namely, the program director or advocate who makes extravagant claims about program successes. Many evaluators have technical expertise, but few have backgrounds in the delivery of social services. By bringing quantitative skills to bear on issues of program cost and effectiveness, evaluation can “more confidently separate fact from advocacy” (Gueron, 2002, p. 15). The primary market for the facts generated by social program evaluations is administrators and the Congress. If evaluators were to advocate for their subjects, they would lose credibility in this market.

Comment on Suggestion #3: I believe that there is a distinction between advo- cacy that “contaminates” research (for example, the use of analytic strategies that push results in a particular direction) from advocacy that urges attention to the risks incurred by research subjects. Advocacy in the first sense entails bad research; I am not sure that advocacy in the second sense precludes valid findings.

There is an important wrinkle here. Many evaluations of public social programs are done by firms that compete for government contracts. To the extent that gov- ernment agencies value the societal benefits of research over individual subjects’ costs, firms’ economic interests may conflict with experimenters’ desire to mini- mize harm to subjects. In other words, firms that advocate (in the second sense) may lose business. If this is so, then ensuring ethical conduct probably cannot be left to the marketplace. (For an account of the power asymmetry between federal agencies and evaluation firms in another context, see Metcalf, 1998.)

SUGGESTION #4: Changes are made in social programs routinely. These changes afford the opportunity to learn. Why not take advantage of them? Governmental bodies have considerable discretion in the way that programs are set up and executed. For example, school boards have the authority to institute sweep- ing reforms even when there is little or no evidence that those reforms will have a

842 / Professional Practice

positive impact. States have the authority to change welfare rules under waiver pro- visions. Why not learn something in the process?

Comment on Suggestion #4: It is true that government agencies routinely make program changes that may harm some citizens. That is business as usual. The ques- tion raised here is whether evaluations of those programs are part of that business as usual, from the ethical perspective. Are researchers performing a purely techni- cal consultative function, or are they scientists with responsibility for the welfare of their subjects?

I have suggested that this is may be particularly problematic in two evaluation contexts. The first is evaluations of “waivered” or “demonstration” programs, to the extent that evaluation and program change go hand-in-hand, and evaluation justi- fies or expedites program change (as in “shadow policymaking”). The second is ran- dom assignment evaluations like the National Job Corps Study. In this evaluation context, experimenters take an active role in assignment.

SUGGESTION #5: The alternative to social program evaluation is ignorance. The knowledge that can be gained from evaluation can improve programs, and often determines program funding. Without knowledge of what works, we cannot help those in need of assistance. Programs that don’t work, or don’t work optimally, waste our money and waste the time and opportunity of those who might be better served. Moreover, participation in some social programs has a negative impact, as evaluation of some of the employment programs has shown. Therefore, we may even have an ethical duty to evaluate.

Comment on Suggestion #5: Knowing whether programs work is important, and evaluations that pass ethical muster should be undertaken. I don’t favor igno- rance, but I do favor a more clearly articulated sense of what is fair to subjects.

Moreover, as readers of JPAM are well aware, the relationship between empirical findings and program support is at best loose. This is not the place to recap the the- oretical and empirical arguments that cast doubt on the rational model of the pol- icy process (Stone, 2001). Of course, most evaluators would acknowledge that their work has had modest and intermittent successes influencing policy. Critics have gone further, noting that policymakers use research findings selectively to support their ideological positions.

SUGGESTION #6: Biomedical research is primarily about maximizing bene- fit. Federal social program research is primarily about minimizing cost. Med- ical experiments generally compare the effectiveness of interventions. Although cost comparisons are increasingly performed, the trade-offs between costs and outcomes have rarely been the focus of research. In contrast, evaluations of public benefit pro- grams are often driven by budgetary considerations. When the cost to the federal budget is the primary issue at stake, beneficence becomes less important. This is par- ticularly so when anticipated program benefits are relatively modest.

Comment on Suggestion #6: Historically, this has been a fundamental differ- ence. It may account for much of the differential treatment of human subjects issues in the two arenas.

However, as cost considerations and economic interests begin to dominate medi- cine, this distinction may blur. For example, many quality improvement (QI) efforts in organizations—hospitals and physician practices, for example—involve cost- quality trade-offs. As providers investigate these trade-offs, they are grappling with how and whether to apply the usual human subjects protections (see, for example, Cassett, Karlawish, & Sugarman, 2000). These developments may lead to change in the way we think about human subjects on the medical side. The medical commu- nity may come to see social program evaluators’ methods as cutting-edge.

Professional Practice / 843

ARE SHARED NORMS ENOUGH?

Members of the social program evaluation community have told me that many of the issues raised in this essay are not new. Many were hashed out in the 1970s (for a fascinating look back at those discussions, see Rivlin & Timpane, 1975). Three decades of work have solidified the shared ethical norms that now guide the federal social program evaluation enterprise. Those working in evaluation firms have told me that ethical matters are always considered when studies are planned. This con- sideration is often done on an informal and ad hoc basis. While the large firms have IRBs, they do not review each case. Nor are IRBs in those firms necessarily consti- tuted under the federal regulations. Thus they are not required to assess proposed work with respect to any generally accepted ethical criteria. Moreover, non-federal IRBs are not obliged to include outside community voices.

Several “old-timers” in the field have acknowledged that it may be time to revisit the ethics of social program evaluation. In the coming years, leadership will be passed to junior investigators who did not live the history of the late 1970s. At the same time, national attention is now focused on the adequacy of human subjects protection in both the biomedical and social/behavioral research spheres (Institute of Medicine, 2002; Panel on Institutional Review Boards, Surveys, and Social Sci- ence Research, National Research Council, 2003; for an interesting review, see Singer & Levine, 2003).

What might follow from a debate on the issues raised here? At best, a discussion might yield a greater clarity of purpose, and a more explicit sense of the ethical bounds of research in the very important—and perhaps unique—federal social pro- gram evaluation context. Those bounds might be expressed in written principles. However, as the Belmont Report noted, principles “cannot always be applied so as to resolve beyond dispute particular ethical problems.” Principles provide an ana- lytic framework to guide the resolution of ethical problems. Whether an institu- tional mechanism should ensure adherence to the principles could be debated.

At worst, raising ethical questions will get members of the social program evalu- ation community to dig in defensively. But just as war is too important to be left to generals, the evaluation of social programs may be too important to be left to eval- uators. This is not because these evaluators have wrong ideas or bad intentions. It is because (as the authors of the Belmont Report noted) science is not a private activity. It is an activity undertaken by society at-large. As such, it is an activity for which we are all accountable.

ACKNOWLEDGMENTS

I am grateful to Tod Mijanovich, Ellen Schall, Katherine O’Regan, and the members of the Wagner faculty Work In Progress (WIP) group for their encouraging and astute comments on this work. All opinions expressed here are my own.

JAN BLUSTEIN, M.D., Ph.D, is Associate Professor of Health Policy at the Robert F. Wagner School of Public Service, New York University.

REFERENCES

American Evaluation Association Task Force on Guiding Principles for Evaluators. (1994). Guiding principles for evaluators. Retrieved May 30, 2003, from www.eval.org/ EvaluationDocuments/aeaprin6.html.

844 / Professional Practice

Beauchamp, T. L., & Childress J. F. (2001). Principles of biomedical ethics, 5th edition. New York: Oxford University Press.

Berkman, L. F., & Kawachi, I. (Eds.). (2000). Social epidemiology. New York: Oxford Univer- sity Press.

Betsey, C., Hollister, R. G., & Papageorgiou, M. (1985). Youth employment and training pro- grams: The YEDPA years. Washington, DC: National Academy Press.

Boruch, R. F. (1997). Randomized experiments for planning and evaluation: A practical guide. New York: Sage Publications.

Boruch, R. F., & Cecil, J. S. (Eds.). (1983). Solutions to ethical and legal problems in social research. New York: Academic Press.

Brodkin, E. Z., & Kaufman, A. (2000). Poverty experiments and poverty politics. Social Services Review, 74, 507–532.

Burghardt, J., Gritz, M., Jackson, R., Johnson, T., McConnell, S., Metcalf, C., Shochet, P. (1993). Evaluation of the impact of the Job Corps Program on participants’ postprogram labor mar- ket and related behavior: Options for study design. November 12, 1993. Princeton: Mathe- matica Policy Research, Inc. (MPR Ref. 8140).

Burghardt, J., McConnell, S., Meckstroth, A., & Schochet, P. (1997). Implementing random assignment: Lessons from the National Job Corps Study. October 27, 1997. Princeton: Math- ematical Policy Research, Inc.

Burghardt, J., McConnell, S., Meckstroth, A., Schochet, P., Johnson, T., & Homrighausen, J. (1999). National Job Corps Study: Report on study implementation. Princeton: Mathematica Policy Research, Inc. (MPR Ref. 8140–510).

Burghardt, J., Schochet, P., McConnell, S., Johnson, T., Gritz, R. M., Glazerman, S., Hom- righausen, J., & Jackson R. (2001). Does Job Corps work: Summary of the National Job Corps Study. June 2001. Princeton: Mathematica Policy Research, Inc. (MPR Ref. 8140–530).

Burtless, G. (1995). The case for randomized field trials in economic and clinical research. Journal of Economic Perspectives, 9(2), 64–84.

Burtless, G. (2002). Why not randomized trials in education? In F. Mosteller and R. Boruch (Eds.), Evidence matters: Randomized trials in education research (pp. 179–197). Washing- ton, DC: The Brookings Institution.

Burtless, G., & Orr, L. L. (1986). Are classical experiments needed for manpower policy? The Journal of Human Resources, 21, (4), 606–639.

Cassett, D., Karlawish, J. H., & Sugarman, J. (2000). Determining when quality improvement initiatives should be considered research. Journal of the American Medical Association, 283, 2275–2280.

Chard, J. A., & Lilford, R. L. (1998). The use of equipoise in clinical trials. Social Science and Medicine, 47(7), 891–898.

Corbie-Smith, G., Thomas, S. B., & St. George, D. M. (2002). Distrust, race, and research. Archives of Internal Medicine. 162(21), 2458–2463.

Department of Health and Human Services, Office of the Inspector General. Institutional Review Boards. (1998). A time for reform. June. OEI-01–97–0093. Washington, DC: Author.

Doolittle, F., & Traeger, L. (1990). Implementing the National JPTA Study. New York: MDRC.

Federal Register. (1981). Final regulations amending basic HHS policy for the protection of human research subjects. Monday, January 26, 46(16), 8366–8383.

Federal Register. (1991). Federal policy for the protection of human subjects. Tuesday, June 18, 56(117), 28002–28032.

Frazer, H., & McConnell, S. (1993). Summary of the first advisory panel meeting of the Job Corps Evaluation. Typescript memorandum dated 12/6/93, obtained from J. Burghardt of Mathematica Policy Research, Inc.

Professional Practice / 845

Gamble, V. N. (1997). Under the shadow of Tuskegee: African Americans and health care. American Journal of Public Health. 87(11), 1773–1778.

Gray, B. H. (1982). The Regulatory Context of Social and Behavioral Research. In T. L. Beauchamp, et al. (Eds.), Ethical issues in social science research. Baltimore, MD: Johns Hopkins Press.

Greenberg, D., & Mandell, M. (1991). Research utilization in policy making: A tale of two series of social experiments. Journal of Policy Analysis and Management, 10, 633–656.

Gueron, J. M. (2002). The politics of random assignment: Implementing studies and affect- ing policy. In F. Mosteller & R. Boruch (Eds.), Evidence matters: Randomized trials in edu- cation research (pp. 15–49). Washington, DC: The Brookings Institution.

Institute of Medicine, Committee on Assessing the System for Protecting Human Research Participants. (2002). Responsible research: A systems approach to protecting research par- ticipants. Washington, DC: National Academies Press.

Lurie, P., & Wolfe, S. M. (1997). Unethical trials to reduce perinatal transmission of the human immunodeficiency virus in developing countries. New England Journal of Medi- cine, 337(12), 853–856.

Marmot, M. G., & Wilkinson, R. G. (Eds.), (1999). Social determinants of health. New York: Oxford University Press.

McConnell, S., & Glazerman, S. (2001). The National Job Corps Study: Benefits and costs of Job Corps. Retrieved August 9, 2004, from http://wdr.doleta.gov/opr/fulltext/ 01-es_jcbenefit.pdf.4

Metcalf, C. E. (1998). Presidential address: Research ownership, communication of results, and threats to objectivity in client-driven research. Journal of Policy Analysis and Man- agement, 17(2), 153–163.

Mosteller, F., & Boruch, R. (2002). Evidence matters: Randomized trials in education research. Washington, DC: The Brookings Institution.

National Bioethics Advisory Commission. (2001). Ethical and policy issues in international research: Clinical trials in developing countries. Retrieved June 12, 2003, from www.georgetown.edu/research/nrcbl/nbac/pubs.ntml.

National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research. (1979). The Belmont Report: Ethical principles for the protection of human sub- jects of biomedical and behavioral research. Retrieved March 1, 2002, from http:// ohrp.osophs.dhhs.gov/humansubjects/guidance/belmont.htm.

Orr, L. L. (1999). Social experiments: Evaluating public programs with experimental meth- ods. Thousand Oaks, CA: Sage Publications.

Orr, L. L., Bloom, H. S., Bell, S. H., Doolittle, F., Lin, W., & Cave, G. (1996). Does job train- ing for the disadvantaged work? Washington, DC: Urban Institute Press.

Panel on Institutional Review Boards, Surveys, and Social Science Research, National Research Council. (2003). Protecting participants and facilitating social and behavioral sci- ences research. Washington, DC: National Academies Press.

Price, J. (1999). Job Corps lottery. Mother Jones. January-February. Retrieved June 13, 2003, from http://www.motherjones.com/mother_jones/JF99/price.html.

Protection of Human Subjects: Institutional Review Board. (1978). Report and recommen- dations of the National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research. Federal Register, 43(231), 56173–56197.

Psaty, B. M., & Rennie, D. R. (2003). Stopping medical research to save money. JAMA. 289, 2128–2131.

Rivlin, A. M., & Timpane, P. M. (Eds.). (1975). Ethical and legal issues in social experimen- tation. Washington, DC: The Brookings Institution.

846 / Professional Practice

Rothman, K. J., Michaels, K. B., & Baum, M. (2000). For and against: Declaration of Helsinki should be strengthened. British Medical Journal (BMJ), 321(7258), 442–445.

Shadish, W. R., Newman, D., Scheirer, M. A., & Wye, C. (Eds.). (1995). Guiding principles for evaluators. San Francisco: Jossey-Bass Publishers.

Singer, E., & Levine, F. J. (2003). Protection of human subjects of research: Recent develop- ments and future prospects for the social sciences. Public Opinion Quarterly, 67, 148–164.

Stone, D. A. (2001). Policy paradox: The art of political decision making. New York: W.W. Norton.

Stromsdorfer, E., Bloom, H., Boruch, R., Borus, M., Gueron, J., Gustman, A., et al. (1985). Recommendations of the Job Training Longitudinal Survey. Washington, DC: Employment and Training Administration, U.S. Department of Labor.

Tropp, R. A. (1982). A regulatory perspective on social science research. In T. L. Beauchamp, et al. (Eds.), Ethical issues in social science research. Baltimore, MD: Johns Hopkins Press.

The Workforce Alliance. (2004). Funding for key job training, vocational education and employment programs during the Bush administration (2001–2004). Retrieved August 9, 2004, from http://www.workforcealliance.org/policy/analysis.shtm.

U.S. Department of Labor Employment and Training Administration. (2001). Job Corps annual report. Retrieved August 3, 2004, from http://www.doleta.gov/sga/rfp/ JobCorpsAnnualReportPY01.cfm#16d.

Varmus, H., & Satcher, D. (1997). Ethical complexities of conducting research in developing countries. New England Journal of Medicine, 337(18), 1331–1332 (October 29).

Williams, L. A. (1994). The abuse of Section 1115 waivers: Welfare reform in search of a stan- dard. Yale Law and Policy Review, 12(1), 8–37.

THE ETHICS OF FEDERAL SOCIAL PROGRAM EVALUATION: A RESPONSE TO JAN BLUSTEIN

Burt S. Barnow

Jan Blustein has performed a valuable service by challenging evaluators to rethink the ethical issues in random assignment evaluations. Blustein notes that the Bel- mont Report, which was adopted by the Department of Health and Human Services (DHHS) in 1979 for research other than evaluations of public social programs, describes three guiding principles that are to be used in assessing the ethical appro- priateness of research: respect for persons, beneficence, and justice.

RESPECT FOR PERSONS

Blustein states that the respect criterion raises three issues: informed consent, the existence of limited slots to justify random assignment, and the importance of sub- jects’ perceptions. Blustein notes that if demand for program slots exceeds supply, then slots must be rationed in some manner and the use of random assignment might be justified. In the case of the Job Corps, however, there was no excess sup- ply of volunteers for the program, so the researchers had to enhance recruitment to generate sufficient applicants for a control group. Blustein finds the artificial

DOI: 10.1002/pam.20142