Final Critique

shandrikaf

artice5.pdf

Home >Literature homework help >Final Critique

Head Start’s Long-Run Impact: Evidence from the Program’s Introduction

Owen Thompson

Journal of Human Resources, Volume 53, Number 4, Fall 2018, pp. 1100-1139 (Article)

Published by University of Wisconsin Press

For additional information about this article

Access provided by Ebsco Publishing (27 Oct 2018 09:42 GMT)

https://muse.jhu.edu/article/706377

Head Start’s Long-Run Impact Evidence from the Program’s Introduction

Owen Thompson

ABSTRACT

This paper estimates the effect of Head Start on health, education, and labor market outcomes observed through age 48. I combine outcome data from the NLSY79 with archival records on early Head Start funding levels and for identification exploit differences across counties in the introduction timing and size of local Head Start programs. This allows me to compare the long- term outcomes of children who were too old for Head Start when the program was introduced in their county with the outcomes of children who were sufficiently young to be eligible. I find that individuals from counties that had an average-sized program when they were in Head Start’s target age range experienced a $2,199 increase in annual adult earnings, completed 0.125 additional years of education, were 4.6 percentage points less likely to have a health limitation at age 40, and overall experienced a 0.081 standard deviation improvement in a summary index of these and other outcome measures. Funding levels at ages outside of Head Start’s target range are not significantly correlated with long-term outcomes. Estimated treatment effects are largest among blacks, the children of lower-education parents, and children exposed to better funded Head Start programs—heterogeneity that is consistent with a causal program impact.

I. Introduction

On average children from economically disadvantaged backgrounds experience worse life outcomes than their more affluent peers in the United States, and how best to improve the life chances of poor children has long been a question of intense policy and research interest (see Coleman et al. 1966; Almond and Currie 2011). Because many early markers of success already appear worse for poor children

Owen Thompson is Assistant Professor, Department of Economics, Williams College. He thanks Martha Bailey and Andrew Goodman-Bacon for assembling and making available Head Start funding data and John Heywood for helpful comments. The data and computer code used in this article are available on the author’s personal web page (https://sites.google.com/site/othompsonecon/). The author is willing to assist ([email protected]). [Submitted February 2016; accepted May 2017]; doi:10.3368/jhr.53.4.0216-7735R1 JEL Classification: I260, J24, and H430 ISSN 0022-166X E-ISSN 1548-8004 ª 2018 by the Board of Regents of the University of Wisconsin System

T H E J O U R N A L O F H U M A N R E S O U R C E S � 5 3 � 4

https://sites.google.com/site/othompsonecon/

mailto:[email protected]

by the time they enter kindergarten, researchers often view preschool-based interven- tions as the policies with the most potential to promote early human capital development and social mobility (Bronfenbrenner 1979; Currie 2001). This view has been reinforced by the success of small-scale model preschool programs like the Abecedarian and Perry Preschool projects, and by interdisciplinary evidence that early childhood constitutes a sensitive period with disproportionate influence on long-term outcomes (Shonkoff and Phillips 2000; Knudsen et al. 2006). Head Start is by far the largest scale preschool-based intervention in the United States,

with the Department of Health and Human Services reporting that Head Start currently serves nearly 1 million low-income children nationwide at a cost of approximately $8 billion annually (DHHS 2014). Since its inception in 1965 as a central component of the War on Poverty, Head Start’s effectiveness has been a topic of considerable contro- versy. Evenafter a large-scale randomized evaluation anda numberofquasi-experimental studies, which are reviewed in greater detail below, skepticism and controversy regarding Head Start’s causal impact onthe outcomes of participants remain, especially withrespect to longer-term outcomes (see Haskins 2004; Barnett 2011; Klein 2011). The present paper assesses the impact of Head Start on a variety of health, education,

and labor market outcomes observed through age 48. My empirical approach uses archival data on county-level Head Start spending in the early years of the program to compare the adult outcomes of children with different levels of childhood exposure to Head Start. Variation in program exposure is primarily due to the fact that some children in my sample were too old for Head Start when the program was introduced in their county, while other children from the same county were sufficiently young for Head Start when it was introduced. Individual level outcome data are drawn from the 1979 National Longitudinal Survey

of Youth (NLSY79), whose respondents were born between 1957 and 1964. Approxi- mately 50 percent of individuals from these cohorts were beyond Head Start’s target age range when the program was launched, while the other 50 percent were sufficiently young to participate at the time of its introduction, providing a rich source of plausibly exogenous variation in program exposure within the NLSY79 sample. Respondents have been closely followed well into middle-age, allowing me to assess Head Start’s impact further into the life course and for a wider range of outcomes than most previ- ous research. The main finding is that exposure to early implementations of Head Start had sta-

tistically and substantively significant effects on a variety of long-term outcomes. My preferred models use a composite measure of adult socioeconomic well-being as the dependent variable and restrict the sample to individuals who were between the ages of two and seven when Head Start was introduced in their county. Estimates from these models indicate that being exposed to an average-sized Head Start program led to a 0.081 standard deviation improvement in adult socioeconomic well-being.1 With re- spect to more specific outcomes, I find that exposure to an average-sized Head Start program increased annual adult earnings by $2,199 (in 2012 dollars), improved final educational attainment by 0.125 years, and reduced the probability of a health limitation at age 40 by 4.6 percentage points, among other impacts.

1. As described in detail below, an average Head Start program is defined here as onewith expenditures of $170 per child ages three to six living in the county (in 2012 dollars).

Thompson 1101

To validate my research design I present a set of balancing tests that show that the baseline characteristics of children in my sample with and without positive Head Start exposure are very similar. I also demonstrate that Head Start funding at ages outside of the program’s target range is not significantly associated with improved long-term outcomes and that my main findings are largely robust to the inclusion of controls for exposure to other War on Poverty programs or county-specific time trends, and to a variety of al- ternative sample restrictions and specifications. Analyses of treatment effect heterogeneity indicate that program exposure has the largest effects among blacks, the children of lower- education parents, and children exposed to better funded Head Start programs. Given Head Start eligibility criteria and participation rates, these heterogeneous effects are generally consistent with causal program impacts. Finally, I present analyses that compare the outcomes of siblings with different levels of exposure to Head Start during childhood, and the results similarly suggest substantial long-term program effects, but these sibling- based estimates are imprecise and not statistically significant at conventional levels. Relative to the existing Head Start literature, the present study examines a broader

range of outcomes further into the life cycle than most previous studies and also im- plements an identification strategy that relies on cross-county variation in the timing and intensity of Head Start’s initial introduction, which complements the sibling and policy- discontinuity based approaches of previous studies. I additionally note that both the nature of Head Start programing and the counterfactual environments of Head Start participants have changed substantially since its original implementation, and as a result, the current findings are not directly comparable to evaluations of more recent imple- mentations of Head Start, which have been the focus of most previous studies.

II. Background

A. The Head Start Program

Head Start was introduced as an eight-week summer program in 1965, when 560,000 children were enrolled and the program received $96 million in federal funding. Federal Head Start funding increased to approximately $200 million in 1966 and to over $300 million in subsequent years as new centers were added, enrollment at existing centers increased, and many programs transitioned from summer-only to full-year programing (DHHS 2014). Initial Head Start programs were funded through the War on Poverty’s Community

Action Programs (CAP) and administered by the Office of Economic Opportunity (OEO). In keeping with the general approach of all CAP programs, Head Start grants were issued directly to thousands of local organizations, rather than via the states. Bypassing state governments was designed to encourage the “maximum feasible partic- ipation” of beneficiaries while also limiting the ability of southern states to direct funds away from low-income African American communities. As a result of this approach, the Head Start rollout across the country was characterized by the sporadically timed intro- duction of local programs, which were also of widely varying sizes and quality (Levitan 1969; Vinovskis 2008). These geographic differences in the timing and intensity of local program introduction form the basis of my identification strategy below. One important difference between the early implementations of Head Start studied

here and the modern program is the age profile of participating children. Detailed data

1102 The Journal of Human Resources

on the ages of Head Start participants in both summer and full-year programs from 1965–1968 are reported in Table 1, using data from representative surveys of Head Start centers conducted by the Census Bureau.2 The table shows that in full-year Head Start programs in 1966, 1967, and 1968, three-year-old children comprised between 10 percent and 18 percent of participants, while approximately 45 percent of participants were age four, just over 30 percent were age five, and less than 10 percent were age six or older. Participants insummerprograms from this periodtendedto beolder, with30to40percent of participants age six or older and very few three-year-olds.3

In contrast, DHHS (2014) reports that in contemporary Head Start programs there are significant numbers of three-year-old participants and that fewer than 5 percent of participants are older than five. The unique participant age profiles of early versions of Head Start are reflected in the construction of the program exposure measures described in Section III below. Other than being of an age served by the program, the main eligibility requirement for

both historical and current Head Start programs is a family income below the federal poverty line, although up to 10 percent of a local program’s enrollees can have incomes above this level. With respect to Head Start program content in the period under study, most early

Head Start programs were designed as holistic child development interventions and placed particular emphasis on health, self-esteem, noncognitive skills, and parental en- gagement rather than purely academic objectives such as learning the alphabet or counting (Vinovskis 2008). Common health-relevant program activities included the provision of nutritious meals and snacks, immunizations, and screenings for common health condi- tions, such as tuberculosis and dental problems. Most of the initial Head Start programs

Table 1 Age Distributions of Head Start Participants, 1965–1968

Summer 1965

Full Year 1966

Summer 1966

Full Year 1967

Summer 1967

Full Year 1968

Summer 1968

Younger than 3 0% 1% 1% 1% 0% 3% 1% 3 years–3 years and 11 months

1% 10% 2% 12% 1% 18% 3%

4 years–4 years and 11 months

13% 45% 18% 44% 20% 43% 20%

5 years–5 years and 11 months

42% 31% 44% 34% 45% 31% 40%

6 years or older 39% 10% 35% 5% 31% 3% 34% Not reported 6% 3% 2% 3% 3% 2% 2%

Notes: Table reports the percentage of Head Start participants in each age category for the indicated time period. Data are drawn from Bureau of Census (1968, 1970, 1972).

2. See Bureau of Census (1968, 1970, 1972). 3. Similar age profiles are reported in Levitan (1969) Table 4-4.

Thompson 1103

included home visits and frequent formal parent–teacher meetings, and parental volunteering and paid parental classroom employment were also widespread (Zigler and Valentine 1979; Bureau of Census 1968). The recruitment of adequate numbers of qualified professional staff was problematic for early Head Start programs, leading to the widespread use of paraprofessionals with minimal training (Levitan 1969). While program content in modern implementations of Head Start has substantial

overlap with early programs, current programs have a more professionalized staff and a more uniform curriculum that places greater emphasis on cognitive development and academic preparation, among other important differences. The implications of these programing changes for interpreting the present study’s main findings, as well as the effect of differences in the likely counterfactual environments of early versus contem- porary Head Start participants, are discussed in Section VII.

B. Existing Research

Head Start’s impacts are the topic of a large interdisciplinary literature, with excellent reviews provided by Gibbs, Ludwig, and Miller (2013) and Duncan and Magnuson (2013). Early evaluations of Head Start, most prominently Westinghouse Learning Corporation (1969), typically found modest short-term effects on cognitive test score outcomes that faded by second or third grade. Early evaluations did not addressselection into program participation in a rigorous manner and could not evaluate outcomes other than short-term test scores. In part to address the ambiguity of early observational studies, in 2002 the federal

government sponsored a randomized experiment known as the National Head Start Impact Study (NHSIS), with the official findings reported in Puma et al. (2010). The NHSIS included 4,667 children who had applied to wait-listed Head Start programs across the country, with approximately 50 percent of participants then randomly assigned admission to the program for which they had applied. Around 86 percent of children assigned to the treatment group enrolled, while children who were not ran- domly selected for admission were free to enroll in other Head Start programs in their area, and 18 percent did so (a much larger percentage of control observations enrolled in non–Head Start preschool programs). Various cognitive and social–emotional devel- opment measures were recorded through spring of each participant’s first grade year, with a limited followup in third grade. Ludwig and Phillips (2007) calculated treatment effects on the treated in the NHSIS,

which account for the described imperfect compliance patterns, and found statistically significant program impacts of approximately 0.2 to 0.4 standard deviations for most language and literacy related test scores, with smaller effects on math scores and social– emotional outcomes. However, these effects fade almost entirely by the end offirst grade (Duncan and Magnuson 2013), leading some observers to conclude that the program is ineffective (for example, Barnett 2011). No outcome measures beyond third grade were recorded. Various quasi-experimental evaluations of Head Start have looked beyond impacts on

short-term test scores. One strand of this literature uses cross-sibling variation in Head Start participation to account for unobserved family level characteristics, typically uti- lizing longitudinal survey data to measure outcomes. Prominent examples include Currie and Thomas (1995), who use data from the NSLY79 Mother-Child Supplement, also

1104 The Journal of Human Resources

known as the CNLSY; Garces, Thomas, and Currie (2002), who use the Panel Study of Income Dynamics (PSID) sample with additional data collection by the authors; and Deming (2009), who also uses outcome data from the CNLSY. These sibling-based studies have typicallyfoundshort-term test scoregains thatfadeas childrenmove through school, but they have also found large impacts on a variety of medium-term socioeco- nomic outcomes, including grade-repetition, high school graduation and college enroll- ment rates, arrest rates, and self-rated health in early adulthood. A potential issue with the sibling fixed-effects approach is that unobserved child-varying characteristics may in- fluence the decision to enroll one sibling in Head Start but not the other, and spillovers from participating to nonparticipating siblings are also possible. These threats to identi- fication are discussed at length in the cited studies, which include axillary tests for within- family selection and spillovers that on balance support the validity of the approach. Additional quasi-experimental studies have used discontinuities in Head Start funding

levels or eligibility rules to identify program effects. For instance, Ludwig and Miller (2007) exploited Head Start grant-writing assistance that the OEO provided to the 300 highest poverty counties as of 1960, which resulted in a large and lasting discontinuity in Head Start funding levels. Using vital statistics data, the authors found that counties just below the 300-poorest threshold experienced large reductions in age and cause specific mortality rates compared to counties just above the threshold. Improvements in edu- cational attainment were also observed using data from the Census and the National Educational Longitudinal Survey, though various data limitations made these estimates less than conclusive, and no lasting effects on standardized test scores were found. Additionally, Carneiro and Ginja (2014) compared the outcomes of children in the CNLSY sample from families with income-to-needs ratios on either side of state- specific Head Start eligibility thresholds.4 They found that Head Start participation had significant positive effects on the health and behavioral outcomes of males ob- served at ages 12–21 and substantive but statistically insignificant effects on inter- mediate educational outcomes, such as grade repetition and special education usage, with no significant effects on cognitive test scores. Finally, a number of recent papers use Head Start to study specific issues in early

childhood education policy. For instance, Gelber and Isen (2013) find that Head Start participation increases the amount of time parents spend engaging in educational ac- tivities with children, even after Head Start participation is completed. Walters (2015) uses programing differences across Head Start centers to identify which preschool char- acteristics most influence test scores. Kline and Walters (2016) evaluate how substitution between Head Start, other preschool programs, and home-based care affects estimates of Head Start’s test score effects and fiscal impacts.5

Overall, the current state of the literature can be summarized as finding short-term test score effects that quickly fade, coupled with substantial effects on a broader set of socioeconomic outcomes though the early 20s. The present paper extends this lit- erature in two important ways. First, it examines a broader range of outcomes sub- stantially further into the life cycle than most existing work. Second, it implements a new approach to identifying causal effects in observational data, which complements the existing set of quasi-experimental methods.

4. While Head Start eligibility determination is not typically state-specific, children determined to be AFDC or TANF eligible are usually automatically Head Start eligible, and AFDC/TANF requirements vary by state. 5. All of these studies use data from the NHSIS.

Thompson 1105

III. Data

A. 1979 National Longitudinal Survey of Youth (NLSY79)

My primary individual-level data source is the 1979 National Longitudinal Survey of Youth (NLSY79), which follows a sample of 12,686 individuals who were ages 14–21 when the survey began in 1979. Participants were eligible to be interviewed annually until 1994 and biannually thereafter, with the most recent wave available at the time of writing occurring in 2012, when respondents were ages 48–55.6 The extensive NLSY survey instrument includes detailed information on labor market outcomes, educational attainment, and a variety of health measures. The utilized outcome measures are de- scribed in greater detail below. Central to my empirical approach is the fact that NLSY79 respondents are members

of the 1957–1964 birth cohorts. Since Head Start was rolled out beginning in the summer of 1965, approximately half of the NLSY79 sample was over the program’s target age by the time of its launch, while the other half was sufficiently young to be potentially eligible for Head Start. The NLSY79 also contains data on state and county of birth,which allow me to link respondents to local Head Start funding levels when they were in the program’s target age range.7

Because NLSY79 surveying did not begin until respondents were ages 14–21, con- temporaneous reports of actual Head Start participation are not available. A retrospective question asking whether respondents had attended Head Start as children was included in the 1994 wave of the survey, when respondents were ages 30–37. While these retro- spective self-reports of Head Start attendance are in general positively correlated with the Head Start funding measures I use in the main analysis below, these correlations are generally quiteweak, which prevents me from directly analyzing the effects of actual Head Start participation rather than exposure to Head Start funding. Estimates of the relationship between Head Start funding levels and self-reported enrollment, as well as results using an alternative enrollment data source, are reported and discussed in Section V. As discussed above, children are typically eligible for Head Start only if they come

from families with incomes below the federal poverty level, and since most respondents in the full NLSY79 sample did not grow up in poor households, it will be difficult to detect any impacts of Head Start in the full sample. Given this, most of the analysis below focuses on the approximately 70 percent of NLSY79 respondents whose own parent(s) had 12 years of education or less, since we would expect Head Start partici- pation rates to be very low among the children of higher-education parents. Indeed, records indicate that only 5 to 10 percent of Head Start enrollees in this period had a parent with any post-secondary education (see Bureau of Census 1968; Bureau of Census 1970, 1972; Westinghouse Learning Corporation 1969 Appendix A). As a falsification test, I also report results that use the subsample of NLSY79 respondents who have one or more college-educated parents and find effects close to zero, as would be expected given the low Head Start participation rates in this subpopulation.

6. The NLSY79 survey design included oversamples of minorities, economically disadvantaged whites, and military members. The economically disadvantaged white and military oversamples were dropped between 1984 and 1990 for budgetary reasons and are excluded from the current analysis. 7. State and county of birth are available in a restricted access NLSY-geocode supplement. See http://www.bls .gov/nls/nlsgeo.htm (accessed January 9, 2018) for application procedures.

1106 The Journal of Human Resources

http://www.bls.gov/nls/nlsgeo.htm

B. Head Start Funding Data

Head Start funding data are drawn from the National Archives and Records Adminis- tration Community Action Program (NACAP) electronic files (Community Services Administration 1981).8 The NACAP files consist of two record types. First are records for all 4,769 organizations receiving any Community Action Program grant between 1965 and 1981, and among other items these grantee-level records contain the recipi- ent organization’s county. Second is a record for each specific grant action, such as a disbursement, extension, renewal, or termination. This grant action-level data contain information on total federal grant dollars, the service delivery county (which in a limited number of cases differs from the grant recipient’s county), and the year of disbursement. The grant action-level records also contain a brief project description that indicates whether the grant was for a Head Start program. The information in these two sets of NACAP records is used to calculate aggregate

federal Head Start grant dollars at the county–year level. Most of the utilized county– year Head Start funding data were assembled and generously shared by Bailey and Goodman-Bacon (2015), with some supplemental data collection by the author from the primary NACAP records. I then divide the annual federal Head Start grant totals for each county by the number of children in the county who were ages three to six in each year (which as noted above was the age range of Head Start participants in this period) and express these grant amounts per child aged three to six in 2012 dollars.9

To construct a measure of Head Start exposure for individual NLSY79 respondents, I calculate the average level of Head Start funding per child aged three to six that occurred in each NLSY79 respondent’s county of birth during the three calendar years that they were ages three to four, four to five, and five to six. For instance, respondents born in calendar year 1961 are assigned the mean of the Head Start spending that occurred in their county of birth during calendar years 1965 (when they were ages three to four), 1966 (when they were ages four to five), and 1967 (when they were ages five to six). One important feature of measuring exposure as the mean of local funding levels

during the three calendar years when each respondent was ages three to six is that greater weight is given to funding levels occurring at ages four and five than at ages three and six. This occurs because both of the calendar years in which an individual was ages four and five are included in this measure, but only one of the two calendar years in which they wereagesthreeand six.IbelievethisisappropriategiventhedatainTable1 indicating that 60–80 percent of participants in early Head Start implementations were ages four or five, with smaller numbers of three- and six-year-olds participating. In Section VI below, I also present results that estimate the effects of funding levels at each age separately, and the strongest effects are found for ages four and five. Anotherimportantconsequence ofconstructing theHeadStart exposurevariableinthis

way is that it results in a continuous treatment measure, as opposed to a binary indicator of whether a program existed in a given county–year. This is especially important given that I do not reliably observe actual Head Start enrollments in the NLSY79 because higher per-capita funding levels are likely indicative of higher enrollment rates, making it more

8. The electronic NARA archives can be accessed at http://aad.archives.gov/aad/series-description.jsp?s=536& cat=TS16&bc=,sl (accessed June 21, 2017). 9. County population totals are drawn from the decennial censuses with linear interpolations for noncensus years.

Thompson 1107

http://aad.archives.gov/aad/series-description.jsp?s=536&cat=TS16&bc=,sl

likely that children from counties with higher per-capita funding actually participated. Lower per-capita funding levels are also likely indicative of summer-only programs rather than more expensive full-year programs, and the current measure allows for this variable treatment intensity to be taken into account. Results using binary treatment measures are presented below and result in much less precise estimates.10

It is essential for the validity of the analysis that the utilized NACAP Head Start funding data be accurate. Given this, I have cross-validated the NACAP data using two additional, independent sources of information on early Head Start funding levels. First are county level data for 1968 and 1972 from the Federal Outlays System Files, as assembled by Ludwig and Miller (2007), which report federal expenditures on various programs, including Head Start.11 There is a high level of agreement between the Federal Outlays data and the NACAP grant records used here, with a simple correlation between the two measures of 0.893 for 1968 and 0.875 for 1972. An additional cross-validation of the NACAP grant data was performed using transcriptions of state-level Head Start expenditure totals reported in the OEO’s first, second, and fourth annual reports to Congress (OEO 1965, 1966, 1968).12 The state aggregates in these reports for 1966 and 1968 correspond quite closely to those generated from the NARA grant records, with simple correlation coefficients of 0.896 for 1966 and 0.962 for 1968.13

While these strong correlations between independently collected funding measures for 1966 and 1968 are reassuring, there are large discrepancies in 1965 funding levels between the NACAP grant records and the state aggregates from the 1965 OEO annual report, with much lower levels in the NACAP records. Bailey and Duquette (2014) also note this discrepancy, and they suggest it is potentially related to the fact that in 1965 Head Start existed only as a summer program, and the NACAP grant record dating (which was initially performed in fiscal years) may have charged these summer expenditures to 1966. This explanation is especially plausible given that in this period the federal fiscal year began on July 1, making it ambiguous which fiscal year summer Head Start expendi- tures should be assigned to. Regardless of the root cause of these discrepancies, since no reliable 1965 Head Start funding data are available, 1965 funding levels are set to zero in

10. The use of a three-year average to define Head Start exposure also helps account for ambiguity that arises from a lack of information on the exact birthday cutoffs used to determine age-based eligibility for local Head Start programs. For instance, a child born in June of 1964 would have been age two (and presumably not eligible for Head Start) for approximately six months of 1966, but would then be age three (and potentially eligible) for the other six months of 1966. The evolving mix of summer and full-year programs during the study period introduces additional imprecision in the assignment of funding data to individual NLSY79 respondents, since some summer participants may have attended a full-year program as well, while for others no full-year program was available, which also makes the more flexible three-year average exposure definition attractive. 11. As discussed in greater detail in Ludwig and Miller (2007), the authors determined that the Federal Outlays data for Head Start were unreliable for years other than 1968 and 1972. 12. The OEO’s third annual report to Congress, OEO (1967), did not disaggregate Head Start expenditures from expenditures on other CAP activities. Also note that the online data appendix for Bailey and Duquette (2014) reports a similar validation of the NACAP grant figures using OEO annual reports, but does so for total CAP spending rather than for Head Start specifically and reports high levels of agreement. 13. A related data quality concern is that some counties with active Head Start programs may have been recorded as not receiving any Head Start funding in the NACAP grant data because of incompleteness in how the NACAP data treated recipients providing services in multiple counties. However, there is a very high level of agreement between the NACAP data and the FederalOutlays datawithrespect tothe number of counties receivingHead Start funding in each state–year, with simple correlations of 0.99 in 1968 and 0.98 in 1972. It should be cautioned, however, that Ludwig and Miller (2007) indicate that the Federal Outlays data may also be flawed in accounting for agencies providing Head Start services in multiple counties, making this exercise less than conclusive.

1108 The Journal of Human Resources

the working data set. While not ideal, the practical consequences of this are likely to be minimal given how the utilized Head Start exposure measure is constructed. In par- ticular, most children who attended a 1965 summer program, but are not coded as such due to missing summer 1965 data, will still be assigned positive exposure due to positive funding levels in years 1966 and beyond. Figure 1 uses the NACAP grant data to map the timing of Head Start program

introductions from 1966–1970, which is the range of years with valid data that poten- tially impacted NLSY79 respondents.14 Of the approximately 3,000 counties in the United States, 1,439 introduced a Head Start program during this period, and Figure 1 indicates a relatively uniform distribution of introduction timing across the relevant years, which will produce ample variation in program exposure within the NLSY79 sample. While counties in Appalachia and the coastal states appear somewhat more likely to have introduced Head Start programs in this period, which is in line with historical accounts, no dramatic geographic patterns are apparent. Since the estimation sample in my baseline models is approximately 2,700 individ-

uals, while there are approximately 3,000 counties in the United States, small within- county sample sizes are a potential concern. However, the working sample contains 61 counties with 10–19 individual observations, 19 counties with 20–29 individual ob- servations, and 15 counties with 30 or more individual observations. The relatively large number of counties with substantial numbers of respondents despite the modest overall sample size is due to clustering in the NLSY79 sampling design. Additionally, obser- vations in the working sample are split relatively evenly between those who were assigned positive Head Start exposure using the method described above (59 percent), and thosewhowere assigned no exposure because they were too old for Head Start when a local program was launched (41 percent), which generates reasonably large samples of both treated and untreated observations within counties. Figure 2 displays a kernel density plot of county level Head Start spending per child

ages three to six for the counties represented in the NSLY79 sample over the same period (in 2012 dollars). Mean spending, indicated with the dashed line, was $170 in this period. More importantly, Figure 2 indicates substantial heterogeneity in early Head Start funding intensity across counties, with several counties spending in excess of $400 per child ages three to six. As noted above, this variation likely reflects the total en- rollment of local Head Start programs, as well as the prevalence of summer versus full- year programs, and allows me to analyze whether children exposed to more highly funded Head Start programs have better outcomes than children exposed to programs with lower spending levels.

C. Outcome Measures

A major advantage of the utilized data and approach is that I am able to estimate the effects of Head Start on a wider variety of outcomes and over a longer portion of the life cycle than previous work in this area. I study outcomes from three broad areas where Head Start was designed to have positive impacts: educational attainment, labor market outcomes, and health status.

14. The youngest NLSY79 respondents were born in 1964 and therefore turned six in calendar years 1969 and 1970.

Thompson 1109

F ig u re

1 H ea d S ta rt In tr o d u ct io n b y C o u n ty , 1 9 6 6 – 1 9 7 0

N o te s: D at a fr o m

N at io n al A rc h iv es

an d R ec o rd s A d m in is tr at io n .

1110 The Journal of Human Resources

I measure educational attainment using each respondent’s total years of completed education, as well as indicators of whether they were awarded a high school diploma or a four-year college degree. To ensure that I observe final educational attainment, I define these measures using the most recently available survey wave completed after age 30, which in most cases occurred when respondents were in their mid 40s. With respect to labor market outcomes, I begin with two income measures. First is the

mean of all individual wage and salary observations occurring between ages 30 and 48 (measured in annual 2012 dollars), which I refer to as “own income.” This individual-level measure conflates labor supply decisions with earning power, which may be especially problematic for females, and it also omits some relatively common forms of nonwage income. As such, I also construct an income measure that includes wage and salary income for both the individual and their resident spouse (if present), as well as unemployment insurance, child support, and investment income for both the individual and their resident spouse. I again convert annual observations to 2012 dollars, then take the mean of all observations from ages 30–48, and I refer to this measure as “family income.”15 In addition to income, an important aspect of labor market well-being is employment status. Each NLSY79 wave collected information on weeks unemployed in the past year, and I use this information to construct avariable measuring the proportion ofobservationsfromages 30– 48 that each individual was unemployed for two or more weeks. As noted above, health and nutrition have always been a major component of Head

Start programing. While the NLSY79 did not collect comprehensive health data in each

Figure 2 County Head Start Funding Density Notes: Figure shows kernel density plot of county level Head Start funding per child ages three to six in 2012 dollars. Estimated with Epanechnikov kernel and bandwidth of 50. Dashed line indicates sample mean of $170. Counties with over $1,000 in spending per child ages three to six, representing approximately 2 percent of observations, are omitted from the figure but used in the calculation of the sample mean.

15. For both income measures only respondents with five or more valid annual observations are used.

Thompson 1111

wave, as respondents turned 40 they completed a detailed “40 and over health module.” Using information from this module, I first assess two global health measures: Each individual’s self-rated health (on a 1–5 scale) and an indicator of whether health limits their ability to perform moderately strenuous activities, their ability towork, or their ability to engage in social activities. Respondents also report whether they suffer from various chronic conditions, and I construct a variable measuring how many of the following conditions the individual reports: a heart condition, severe tooth or gum problems, asthma, and high cholesterol. These conditions were selected because they are rela- tively common and are potentially sensitive to the health related services that were typically included the studied Head Start programs. While the availability of many outcome measures is in general a strength, using such a

large set of dependent variables also presents some estimation-related issues. The most important of these is multiple inference: With nine separate outcomes, as well as various subsamples and specifications, I test dozens of hypotheses, which increases the risk of false rejection (type 1 error). Additionally, many of the utilized outcome measures are closely related. For instance, educational attainment is strongly correlated with both own income and family income, and both income and education are strongly correlated with health outcomes. These correlations across measures make it difficult to ascertain how much new information is contained in results for each specific outcome. A final issue is measurement error. All of the utilized outcomes can reasonably beviewed as components of a single underlying index of socioeconomic well-being, but each specific outcome is likely measured with substantial error, which can destabilize the corresponding estimates. To address these issues I follow O’Brien (1984), Carneiro and Ginja (2014), Deming

(2009), and others and construct a summary index of the nine outcomes described above. Specifically, I first standardize each measure to have a mean of zero and a standard deviation of one and equalize signs across outcomes, so that positive values corre- spond to more desirable outcomes. I thentake theweighted average of thesestandardized measures using weights equal to the inverse of the sample covariance matrix, which accounts for dependence across outcomes.16

Finally I restandardize this weighted mean so that corresponding regression co- efficients can be interpreted in standard deviation units. This index has the desirable property that adding additional dimensions does not increase the risk of type 1 error and also accounts for correlations across the outcomes and reduces measurement error. For completeness, in most instances I also present results for each of the nine outcomes separately.

D. Descriptive Statistics

Table 2 reports means and standard deviations for each of the nine utilized out- come measures as well as basic demographic characteristics within the estimation sample from the baseline regression models estimated below. On average respon- dents in the working sample completed just over 13 years of schooling and had own incomes of $41,144 and household incomes of $68,024. The average respondent suf- fered from 0.35 chronic conditions, and 25 percent of respondents had some type of

16. The covariance matrix used to derive these weights is presented in Appendix Table A1.

1112 The Journal of Human Resources

health related limitation at age 40. The demographic characteristics indicate substan- tial ethnic and socioeconomic diversity.

IV. Empirical Strategy

I use the described data to estimate several variations of the following regression model:

(1) yicy = a + b HeadStartExposurecy + qc + cy + Xicyk + licy

where yicy denotes an outcome for person i born in county c and in birth cohort y, HeadStartExposurecy denotes the average level of Head Start spending in county c across the three calendar years when cohort y was ages three to six, rc and gy are county and cohort fixed effects, Xicy is a vector of individual level demographiccontrols, and micy is an error term. The primary coefficient of interest is b, which estimates the conditional change in the outcome variable from a unit increase in local Head Start funding at ages three to six.

Table 2 NLSY79 Sample Characteristics

Mean Standard Deviation Observations

Own income 41,144 33,438 2,272 Household income 68,024 53,163 2,273 Unemployment proportion 0.13 0.19 2,554 Highest grade completed 13.12 2.25 2,559 High school dropout 0.08 0.27 2,559 College graduate 0.18 0.39 2,559 Self-rated health (0–5) 3.67 0.98 2,270 Number of health conditions (0–4) 0.35 0.63 2,271 Age 40 health limitation 0.25 0.43 2,271 Birth order 3.04 2.05 2,685 Number of siblings 3.38 2.32 2,685 Black 0.15 0.36 2,685 White 0.78 0.41 2,685 Hispanic 0.07 0.25 2,685 Female 0.50 0.50 2,685 Maternal education 10.79 1.96 2,685

Notes: Sample consists of NLSY79 respondents who were ages two to seven at the time of local Head Start implementation and did not have a parent that attended college. Income measures are the mean of all available observations from ages 30–48 expressed in 2012 dollars. Education measures are defined using the latest available observation occurring after age 30. Health measures are observed in the first interview completed after respondents turned 40. Sampling weights applied.

Thompson 1113

Equation 1 is essentially a difference-in-difference specification with a continuous treatment variable, and if estimated using the full sample, this specification will compare the long-term outcomes of individuals born in the same county who had different levels of childhood exposure to Head Start (while also accounting for a general cohort effect and for individual level demographic characteristics). Because the cohorts in the sample span the introduction of Head Start, much of the variation in program exposure will be driven by differences between individuals who were sufficiently young for Head Start at the time of its local introduction and individuals from the same county who were too old for Head Start at the time of its local introduction. Additional identifying variation comes from differences in per-capita funding levels across counties, since the utilized Head Start exposure measure will increase by more after the introduction of a relatively large program than from the introduction of a program with lower spending per child ages three to six. While results using the full sample are presented below, my preferred estimation

sample is restricted to respondents who were between the ages of two and seven at the time that a local Head Start program was implemented. I argue that restricting com- parisons to individuals who were relatively close to the target age range of Head Start programs in the study period is preferable because observations in this subsample with varying levels of program exposure are more likely to share unobserved characteristics than are individuals born further apart.17 While in principle it would be desirable to restrict the sample to individuals born very close to the age-based Head Start eligibility threshold, as in a regression-discontinuity design, the relatively wide age range of early Head Start participants and modest available sample sizes make it difficult to identify an age-based eligibility cutoff that is sufficiently precise to yield credible results. The main assumption needed for the model given by Equation 1 to yield valid causal

estimates is that children from the same county who experienced different levels of Head Start exposure due to the timing and size of a local program’s introduction would have had similar long-term outcomes in the absence of program exposure, after accounting for cohort specific effects and individual level demographic characteristics. I use several methods to probe the validity of this key identifying assumption. First, I conduct a series of tests assessing whether children with and without Head

Start exposure have similar baseline characteristics, and find little evidence of sub- stantively or statistically significant differences in the demographic traits or family backgrounds of children with higher and lower levels of exposure.18 Another threat to the identifying assumption is that the counties adopting Head Start in earlier versus later years may have had different underlying trends in child outcomes, which would violate the “common-trends” assumption required by difference-in-difference specifi- cations like Equation 1. I note that such violations are less likely when restricting the sample to respondents who were relatively close to Head Start’s target age range when the program was introduced locally, as is done here, since any differences in trends between counties would need to occur across just six cohorts. However, I still present

17. Note that this sample restriction limits the analysis to individuals from counties that adopted a Head Start program at some point during the study period, since age at the time of Head Start implementation is not defined for individuals from counties that did not adopt a program. Results with such individuals included are reported in Table 9 below. 18. Bias could also arise if parents behaved strategically to manipulate their children’s program eligibility, but this seems unlikely to be problematic in the current context given that I define treatment using county of birth, so that any manipulation of child age at the time of Head Start’s introduction would have required parents to foresee the county-specific program introduction date and change their fertility accordingly.

1114 The Journal of Human Resources

results from a series of models that allow for limited departures from the common-trends assumption. I specifically estimate models that are similar to Equation 1 but include a linear birth cohort variable interacted with either county indicators or with state indi- cators and measures of 1960 county characteristics (that is, the county’s characteristics prior to the advent of Head Start). Results of these models are generally similar to the baseline findings, suggesting that the main results are not due to differential underlying trends within early-adopting counties, though the models with county-specific cohort trends are substantially less precise than the baseline estimates. A related threat to identification is that Head Start programs may have been intro-

duced simultaneously with other county-level policies that affected children’s long- term outcomes, for instance other War on Poverty programs or school desegregation. Restricting the sample to individuals who were relatively close to Head Start’s target age range at the time of its introduction again reduces the likelihood of such con- founding, since few contemporaneous policy changes were specifically targeted at children ages three to six and therefore would have had approximately equal impacts on children who were modestly too old for Head Start and those who were suffi- ciently young for Head Start by a modest margin. However, I additionally show in Section VI that the results are robust to including detailed county-level measures of other War on Poverty programs and are present among groups that are unlikely to be strongly effected by school desegregation efforts, specifically northern blacks. Finally, I provide additional validations of the research design through a series of

placebo and robustness tests and by exploring heterogeneous treatment effects.

V. Main Results

A. Baseline Estimates

My preferred estimates of Equation 1 are reported in Table 3. To facilitate a clear inter- pretation of the coefficient of interest, the Head Start exposure variable is divided by its mean of $170, which allows me to interpret the estimated coefficients as approximating the effect of going from no Head Start exposure to being exposed to an average-sized Head Start program. The vector of demographic controls includes sets of indicators for race, gender, birth order, number of siblings, and maternal education, and below I dem- onstrate that the main findings are robust to excluding these demographic controls. Custom NLSY79 sampling weights are applied, though below I demonstrate that the results are nearly identical when sampling weights are not applied. Standard errors are clustered at the county level. Column 1 of Table 3 reports results for the composite measure of long-run socio-

economic well-being described above. The estimated coefficient indicates that relative to having no Head Start exposure, being exposed to an average-sized Head Start pro- gram leads to a 0.081 standard deviation improvement in long-run socioeconomic well- being. This effect is statistically significant at the 1 percent level. The remaining columns of Table 3 report results for the individual components of the

index measure. Head Start exposure is associated with improvements in all nine out- comes, and five of these effects are statistically significant at conventional levels. The magnitudes of the estimates are also economically significant. For instance, it is esti- mated that exposure to an average level of Head Start spending at ages three to six

Thompson 1115

T ab

le 3

T h e E ff ec t o f H ea d S ta rt E xp o su re

o n L o n g -T er m O u tc o m es , In te n t to

T re a t E st im a te s

C o m p o si te

O w n

In co m e

H o u se h o ld

In co m e

U n em

p lo y m en t

H ig h

G ra d e

H ig h

S ch o o l

D ro p o u t

C o ll eg e

G ra d u at e

S el f-

R at ed

H ea lt h

C o n d it io n s

H ea lt h

L im

it at io n

(1 )

(2 )

(3 )

(4 )

(5 )

(6 )

(7 )

(8 )

(9 )

(1 0 )

H ea d S ta rt

ex p o su re

0 .0 8 1 * * *

2 ,1 9 9 * *

2 ,9 1 8 * *

-0 .0 0 5

0 .1 2 5 * *

-0 .0 1 2

0 .0 2 2 *

0 .0 4 1

-0 .0 0 3

-0 .0 4 6 * * *

(0 .0 2 3 )

(8 7 7 .1 4 5 ) (1 ,4 3 7 .3 7 5 )

(0 .0 0 7 )

(0 .0 5 1 )

(0 .0 1 0 )

(0 .0 1 3 )

(0 .0 3 2 )

(0 .0 1 8 )

(0 .0 1 5 )

O b se rv at io n s

2 ,6 8 5

2 ,2 7 2

2 ,2 7 3

2 ,5 5 4

2 ,5 5 9

2 ,2 7 0

2 ,2 7 1

C o n tr o l g ro u p

m ea n

4 0 ,5 0 3

6 8 ,5 1 8

0 .1 2 8

1 3 .0 5

0 .0 7 8 4

0 .1 7 4

3 .6 6 6

0 .2 9 7

0 .2 4 5

P er ce n t ch an g e

5 .4 %

4 .3 %

-4 .2 %

1 .0 %

-1 4 .7 %

1 2 .8 %

1 .1 %

-1 .2 %

-1 8 .7 %

N o te s: C o lu m n ti tl es

in d ic at e th e d ep en d en t va ri ab le ; se e S ec ti o n II o f th e te x t fo r d et ai le d va ri ab le

d es cr ip ti o n s. H ea d S ta rt ex p o su re

is m ea su re d u si n g av er ag e p ro g ra m

ex p en d it u re s p er

ch il d ag es

th re e to

si x in

ea ch

re sp o n d en t’ s co u n ty

o f b ir th

d u ri n g th e ca le n d ar

y ea rs th at th ey

w er e ag es

th re e to

si x , sc al ed

b y th e sa m p le m ea n o f $ 1 7 0 .

S am

p le

is re st ri ct ed

to re sp o n d en ts w h o w er e ag es

tw o to

se ve n at

th e ti m e o f lo ca l H ea d S ta rt im

p le m en ta ti o n an d w h o d id

n o t h av e a p ar en t w h o at te n d ed

co ll eg e. T h e

re p o rt ed

co n tr o lg ro u p m ea n s ar e fo r re sp o n d en ts w it h n o H ea d S ta rt ex p o su re .A

ll m o d el s co n ta in co u n ty an d co h o rt fi xe d ef fe ct s an d se ts o f in d ic at o rs fo r ra ce ,g en d er ,b ir th

o rd er ,n u m b er o f si b li n g s, an d m at er n al ed u ca ti o n .S ta n d ar d er ro rs cl u st er ed

at th e co u n ty le ve la re in p ar en th es es .C

u st o m N L S Y 7 9 sa m p li n g w ei g h ts ar e ap p li ed .S ig n if ic an ce

le ve ls : * p < 0 .1 0 , * * p < 0 .0 5 , * * * p < 0 .0 1 .

1116 The Journal of Human Resources

increases own income from ages 30–48 by $2,199, increases final educational attain- ment by 0.125 years, increases the probability of graduating from college by 2.2 per- centage points, and reduces health related activity limitations at age 40 by 4.6 per- centage points.19 Means of the outcome variables for individuals with no Head Start exposure are also reported and relative to these means most of the estimated improve- ments in most outcomes translate to gains in the range of 5 to 20 percent.20

B. First-Stage Estimates

The results from Table 3 are intent to treat (ITT) effects that estimate the impacts of Head Start exposure rather than actual program participation. These ITTeffects are very likely to be smaller than the corresponding impacts among actual participants, which are typically referred to as treatment effects on the treated (ToT), since not all NLSY79 respondents in the working sample were eligible for Head Start and because not all eligible respondents actually enrolled. Estimates of Head Start’s ToTeffects would be of clear value, but require estimating a first-stage regression that models the effect of Head Start funding on actual enrollment, then scaling the estimates from Table 3 (which can be viewed as reduced-form results) by this amount. The NLSY79 does contain one direct measure of Head Start participation. Specifi-

cally, in the 1994 survey wave, all respondents born after 1959 were asked: “Now think back to when you were a child. To your knowledge, did you ever attend a Head Start program when you were a pre-schooler?”21 Overall, 12.5 percent of the NSLY79 re- spondents who were asked this question answered affirmatively, which is broadly consistent with national enrollment data from this era. Self-reported enrollment is also higher among blacks than whites (44 percent versus 6 percent) and among the children of parents who did not attend college than among children with such a parent (16 percent versus 6 percent), patterns that are broadly consistent with historical data on the char- acteristics of early Head Start participants (Bureau of Census 1968, 1970, 1972). Columns 1 and 2 of Table 4 use this self-reported Head Start attendance indicator to

estimate the effect of local Head Start funding exposure (as measured above) on pro- gram enrollment. Column 1 reports results from a bivariate model of this kind that uses the full NLSY79 sample. The results indicate that in this population being exposed to an average-sized Head Start program increases the probability of self-reporting partici- pation by a statistically significant 5.1 percentage points. Column 2 of Table 4 estimates a more fully specified first stage that follows the reduced-form specifications in Table 3 by including county and cohort fixed effects and additional demographic controls and by restricting the sample to respondents without a parent who attended college and who were between the ages of two and seven at the time that a local Head Start program was

19. With respect to the health-related findings, note that previous research (for example, Ludwig and Miller 2007) has found large reductions in childhood mortality from Head Start exposure, and such differential survivorship would bias my estimates towards a negative health finding. 20. In unreported results I have also estimated the impact of Head Start exposure on AFQT scores. In line with previous research I do not find that Head Start has a significant long-term effect on standardized cognitive test scores, as models that use AFQT percentile scores as the dependent variable estimate an impact of 0.50 percentile points with a standard error of 0.63. 21. The 1957–1959 cohorts were not asked this question, presumably because they were age six or older when Head Start was launched in 1966, though the significant number of six year olds participating in early summer programs discussed above suggests this assumption may be inappropriate.

Thompson 1117

implemented. This result indicates that being exposed to an average-sized Head Start program increases the probability of self-reporting participation by 1.6 percentage points and that this effect is not statistically significant. In addition to being statistically insignificant when fully specified, these first-stage

estimates are much too small to be consistent with ToTeffects of plausible magnitudes. Even the bivariate estimate in Column 1 of Table 4 would imply a ToT effect for the composite outcome of 0.081/0.051 = 1.59 standard deviations, and the analogous esti- mate for the fully specified first-stage is more than five standard deviations. One potential explanation for these weak first-stage relationships is that the utilized

Head Start funding data are inaccurate. While possible, this explanation is not consistent with the successful cross-validations of the NACAP funding data described in Section III. Another potential explanation for the weak first-stage is that the retrospective self- reports of Head Start enrollment in the NLSY79 are inaccurate, which seems more plausible given that NSLY79 respondents were ages 30–37 when they were asked to recall participation in a program that would have occurred when they were ages three to six. Even though Head Start participation is the dependent variable in the models from Table 3, so that classical measurement error would not attenuate the first-stage coeffi- cients, it is well established that the nonclassical measurement error arising from mis- classification of a binary dependent variable will cause attenuation, so that misreporting of participation is a plausible explanation for the small effects found in Columns 1 and 2 of Table 4 (Hausman, Abrevaya, and Scott-Morton 1998; Hausman 2001).

Table 4 The Effect of Head Start Exposure on Measures of Head Start Participation

Bivariate with NLSY79 Self-Reports

Fully Specified with

NLSY79 Self-Reports

State Level Enrollment

Totals (1) (2) (3)

Head Start exposure 0.051*** 0.016 0.244*** (0.013) (0.015) (0.076)

Observations 7,897 2,448 45

Notes: The dependent variable in the models from Columns 1 and 2 is an indicator of self-reported Head Start participation in the NLSY79 sample, and Head Start Exposure is measured using average program expenditures per child ages three to six in each respondent’s county of birth during the calendar years that they were ages three to six, scaled by the sample mean of $170. The model in Column 1 includes no covariates and uses all NLSY79 respondents with valid exposure and enrollment data. The model in Column 2 follows the specifications in Table 3 by including county and cohort fixed effects and additional demographic controls and by restricting the sample to respondents without a parent who attended college and who were between the ages of two and seven at the time that a local Head Start program was implemented. The models in Columns 1 and 2 apply NLSY79 sampling weights and cluster the standard errors at the county level. The dependent variable in the model from Column 3 is the fraction of children ages three to six enrolled in Head Start in each state, calculated using state level enrollment totals from OEO (1966). Head Start exposure in the model from Column 3 is measured using total state level expenditures per child ages three to six, also scaled by $170. The model in Column 3 is weighted by the total number of NLSY79 respondents in the working sample from each state. Significance levels: *p < 0.10, **p< 0.05, ***p < 0.01.

1118 The Journal of Human Resources

If the first-stage results using NLSY79 Head Start participation are indeed not valid due to large nonclassical measurement error, an alternative is to estimate the relationship between the utilized Head Start funding measures and enrollment data from an alter- native source. Geographically disaggregated enrollment data from the early years of Head Start are scarce, but state level enrollment totals for 1966 were reported in an appendix to the OEO’s second annual report to Congress (OEO 1966). Using these enrollment totals, I have calculated the fraction of children ages three to six enrolled in Head Start in each state and regressed this fraction onto state funding totals from the NACAP grant records, with the results reported in Column 3 of Table 4. To make the units comparable to those used in the baseline reduced-form results, state level grant dollar totals are scaled by the the mean funding level of $170 per child ages three to six, and the regression is weighted by the total number of NLSY79 respondents in the working sample from each state.22

The estimatesin Column 3 of Table 4 indicate that a $170increasein Head Start funding per child ages three to six is associated with a 24.4 percentage point increase in state level Head Start enrollment rates and that this relationship is highly statistically significant. It is important to acknowledge that this estimate of the first-stage relationship faces major limitations and is not directly comparable to the reduced-form results from Table 3. Specifically, the enrollment regression from Column 3 of Table 4 is identified with cross- sectional variation in funding and enrollment measured at the state-level, whereas the reduced-form resultsin Table 3 are estimated withindividual andcounty leveldata anduse within-county variation in funding levels. Given these differences, the results in Column 3 of Table 4 are best viewed as suggestive evidence and should be interpreted cautiously. Bearing these limitations in mind, the estimates in Column 3 of Table 4 do imply ToT

effects that are of plausible magnitudes and in line with existing estimates. For instance, the first-stage coefficient of 0.244 implies a ToT effect for the composite outcome of 0.081/0.244 = 0.33 standard deviations, which is very similar to previous ToTestimates of Head Start’s impacts from this historical period. Specifically, Duncan and Magnuson (2013) perform a meta-analysis and report that in the mid 1960s, participation in an early childhood education program (both Head Start and non–Head Start) had an average impact of approximately 0.30 standard deviations. While the lack of a significant first- stage in the NLSY79 data itself raises substantive concerns, it is somewhat reassuring that estimates of the relationship between Head Start funding and enrollment that use administrative enrollment totals translate into ToTestimates that are plausible and in line with previous research.

VI. Additional Results

A. Treatment Effect Heterogeneity

While actual Head Start participation is not reliably observed in the NLSY79 data, it is well known that participation rates were higher among some groups than others and we

22. The fraction of children ages three to six enrolled in Head Start is calculated using population totals from the counties with positive Head Start funding levels in the 1966 NACAP data as the denominator, so that this variable technically measures the fraction of children enrolled in Head Start within the counties of each state that had active Head Start programs, rather than the fraction of all children in the state.

Thompson 1119

would expect larger reduced-form effects in these subpopulations. Known differences in participation rates therefore present an opportunity to probe the validity of the research design by testing whether estimated ITTeffects differ in subpopulations known to have different Head Start participation rates, and I do so in Table 5. While I report results from all of the outcomevariables, the discussion focuses primarily on the index measure since many of the individual estimates become somewhat erratic within smaller subsamples. Panels A and B of Table 5 report results separately for blacks and whites. Historical

accounts indicate that early Head Start programs intentionally focused on recruiting black participants, and census records show that 40 to 50 percent of all Head Start enrollees were black in the early years of the program, despite the fact that blacks constituted only around 11percentof thenational population in 1970 (Bureau of Census 1968, 1970, 1972). Given this, we would expect larger treatment effects among blacks than whites, and the results in Table 5 are consistent with this prediction. Specifically, the results indicate that being exposed to an average-sized Head Start program improved the long-term outcomes of whites by 0.064 standard deviations, while the analogous effect for blacks is approxi- mately 50 percent larger at 0.092 standard deviations. While a plausible alternative ex- planation for these racial differences is that actual program effects were larger among blacks, previous research has not found systematically larger ToTeffects from Head Start among blacks. For instance, Garces, Thomas, and Currie (2002) find larger Head Start effect sizes among whites than blacks, while Ludwig and Miller (2007) and Deming (2009)findapproximatelyequaleffectsizesbyraceforlong-termoutcomes.This suggests that the current race-specific findings are likely due to differential participation rates.23

Another characteristic by which participation rates would be expected to vary is parental education, since typically only children from families with incomes below the poverty line were Head Start eligible. This is the reason that the baseline models were estimated using respondents from families where neither parent had any post-secondary education, as individuals who continued their education beyond high school in the study period were in most cases relatively affluent, and census records indicate that only 5 to 10 percent Head Start enrollees in this period had a parent with any post-secondary education (Bureau of Census 1968, 1970, 1972). Panel C of Table 5 reports results for the subsample of respondents from families where one or both parents had completed some post-secondary education, and as expected given Head Start eligibility criteria, there are no statistically significant effects within this subpopulation. Indeed, for the index measure and several of the individual outcomes, the point estimates are non- trivially negative, though these negative effects are very plausibly due to sampling error given the small sample sizes and correspondingly large standard errors. A final potentially important dimension of heterogeneity is treatment intensity. The

distribution of per-capita Head Start expenditures displayed in Figure 2 indicated wide variation in funding levels even among counties with operational programs over the sample period. Since greater per-capita expenditures are likely associated with higher

23. The are substantial racial differences in estimated effect sizes for several of the individual outcome measures as well, and the effects for individual income, college graduation and health limitations actually show larger treatment effects for whites than blacks. These results may reflect racial differences in the margins at which Head Start impacts long-term outcomes, but given the modest within-group sample sizes and racial differences in baseline means, it is difficult to infer reliably whether race-specific treatment effects differ across outcomes.

1120 The Journal of Human Resources

T ab

le 5

T re a tm en t E ff ec t H et er o g en ei ty , In te n t to

T re a t E st im a te s

C o m p o si te

O w n

In co m e

H o u se h o ld

In co m e

U n em

p lo y m en t

H ig h

G ra d e

H ig h

S ch o o l

D ro p o u t

C o ll eg e

G ra d u at e

S el f-

R at ed

H ea lt h

C o n d it io n s

H ea lt h

L im

it at io n

(1 )

(2 )

(3 )

(4 )

(5 )

(6 )

(7 )

(8 )

(9 )

(1 0 )

P an

el A : W h it e

H ea d S ta rt

ex p o su re

0 .0 6 4 * *

2 ,0 8 1

2 ,6 1 4

-0 .0 0 7

0 .0 6 7

-0 .0 0 3

0 .0 1 9

0 .0 1 2

0 .0 0 1

-0 .0 5 8 * *

(0 .0 3 2 )

(1 ,3 4 9 )

(1 ,7 0 1 )

(0 .0 0 9 )

(0 .0 7 4 )

(0 .0 0 8 )

(0 .0 2 0 )

(0 .0 3 4 )

(0 .0 2 2 )

O b se rv at io n s

1 ,3 2 2

1 ,1 3 5

1 ,1 3 6

1 ,2 5 2

1 ,2 5 4

1 ,1 0 4

1 ,1 0 5

P an

el B : B la ck

H ea d S ta rt

ex p o su re

0 .0 9 2 * *

1 6 6

4 ,7 1 6

0 .0 0 7

0 .1 4 6

-0 .0 1 1

0 .0 1 5

0 .1 3 5 *

-0 .0 6 2

-0 .0 4 7

(0 .0 4 0 )

(1 ,7 2 0 )

(3 ,5 2 4 )

(0 .0 2 0 )

(0 .1 1 7 )

(0 .0 1 3 )

(0 .0 1 6 )

(0 .0 7 3 )

(0 .0 5 0 )

(0 .0 2 9 )

O b se rv at io n s

8 2 6

6 7 7

(3 ,5 2 4 .9 5 1 )

7 8 9

7 9 2

7 0 6

P an

el C : H ig h P ar en ta l E d u ca ti on

H ea d S ta rt

ex p o su re

-0 .0 4 2

3 ,6 2 9

3 ,5 4 1

0 .0 1 4

-0 .0 6 4

-0 .0 1 6

-0 .0 0 7

-0 .0 1 0

0 .0 6 6

0 .0 1 6

(0 .0 8 2 )

(4 ,7 2 3 )

(5 ,2 9 6 )

(0 .0 1 3 )

(0 .2 1 7 )

(0 .0 1 4 )

(0 .0 4 2 )

(0 .0 9 4 )

(0 .0 7 1 )

(0 .0 4 1 )

O b se rv at io n s

1 ,7 2 5

1 ,4 3 6

1 ,4 3 7

1 ,6 2 4

1 ,6 3 1

1 ,4 3 1

1 ,4 3 2

1 ,4 3 3

(c o n ti n u ed )

Thompson 1121

T ab

le 5 (c o n ti n u ed )

C o m p o si te

O w n

In co m e

H o u se h o ld

In co m e

U n em

p lo y m en t

H ig h

G ra d e

H ig h

S ch o o l

D ro p o u t

C o ll eg e

G ra d u at e

S el f-

R at ed

H ea lt h

C o n d it io n s

H ea lt h

L im

it at io n

(1 )

(2 )

(3 )

(4 )

(5 )

(6 )

(7 )

(8 )

(9 )

(1 0 )

P an

el D : B y T re at m en t In te n si ty

A ny

H ea d S ta rt

ex p o su re

-0 .0 1 2

2 ,7 2 0

-3 ,2 8 1

-0 .0 0 0

-0 .1 2 1

0 .0 1 9

-0 .0 2 2

0 .0 3 3

-0 .0 6 4

0 .0 1 6

(0 .1 0 4 )

(4 ,9 6 5 )

(7 ,8 9 2 )

(0 .0 2 4 )

(0 .2 7 9 )

(0 .0 3 5 )

(0 .0 5 0 )

(0 .1 2 6 )

(0 .1 0 2 )

(0 .0 6 8 )

P er -c ap it a H ea d

S ta rt fu n d in g

0 .0 8 1 * * *

2 ,2 2 6 * *

2 ,8 8 5 * *

-0 .0 0 5

0 .1 2 4 * *

-0 .0 1 1

0 .0 2 2

0 .0 4 1

-0 .0 0 4

-0 .0 4 6 * * *

(0 .0 2 3 )

(8 7 6 )

(1 ,4 1 7 )

(0 .0 0 7 )

(0 .0 5 1 )

(0 .0 0 9 )

(0 .0 1 4 )

(0 .0 3 2 )

(0 .0 1 8 )

(0 .0 1 5 )

O b se rv at io n s

2 ,6 8 5

2 ,2 7 2

2 ,2 7 3

2 ,5 5 4

2 ,5 5 9

2 ,2 7 0

2 ,2 7 1

N o te s: C o lu m n ti tl es

in d ic at e th e d ep en d en t va ri ab le .I n P an el s A – C ,H

ea d S ta rt E x p o su re is m ea su re d u si n g av er ag e p ro g ra m

ex p en d it u re s p er ch il d ag es

th re e to si x in ea ch

re sp o n d en t’ s co u n ty

o f b ir th

d u ri n g th e ca le n d ar

y ea rs th at th ey

w er e ag es

th re e to

si x ,s ca le d b y th e sa m p le m ea n o f $ 1 7 0 .I n P an el D H ea d S ta rt ex p o su re is si m u lt an eo u sl y

m ea su re d w it h th is co n ti n u o u s va ri ab le an d a b in ar y va ri ab le in d ic at in g an y H ea d S ta rt ex p o su re . T h e sa m p le in

P an el A co n si st s o f w h it es

w it h o u t a p ar en t w h o at te n d ed

co ll eg e, w h il e th e sa m p le in P an el B co n si st s o f b la ck s w it h o u ta

p ar en tw

h o at te n d ed

co ll eg e. T h e sa m p le in P an el C in cl u d es

al lr ac ia lg ro u p s an d is re st ri ct ed

to re sp o n d en ts

w it h a p ar en t w h o at te n d ed

co ll eg e, w h il e th e sa m p le in

P an el D in cl u d es

al l ra ci al g ro u p s an d is re st ri ct ed

to re sp o n d en ts w it h o u t a p ar en t w h o at te n d ed

co ll eg e. A ll m o d el s

re st ri ct th e sa m p le to re sp o n d en ts w h o w er e ag es

tw o to se ve n at th e ti m e o f lo ca lH

ea d S ta rt im

p le m en ta ti o n ,a n d al lm

o d el s co n ta in co u n ty an d co h o rt fi x ed

ef fe ct s an d se ts o f

in d ic at o rs fo r ra ce ,g en d er ,b ir th o rd er ,n u m b er o f si b li n g s, an d m at er n al ed u ca ti o n .S

ta n d ar d er ro rs cl u st er ed

at th e co u n ty le ve la re in p ar en th es es .C

u st o m N L S Y 7 9 sa m p li n g

w ei g h ts ar e ap p li ed . S ig n if ic an ce

le ve ls : * p < 0 .1 0 , * * p < 0 .0 5 , * * * p < 0 .0 1 .

1122 The Journal of Human Resources

enrollments, the presence of full-year programs, and higher program quality, larger treatment effects would be expected from better funded programs. The final panel of Table 5 estimates models that simultaneously include a binary

indicator of Head Start presence at ages three to six and the continuous funding mea- sure that has been used throughout the analysis. The binary indicator is very close to zero, with a coefficient of -0.012. More importantly, the coefficient on the contin- uous measure is of the same magnitude as previous estimates and is highly statistically significant. This indicates that even among children with a Head Start program in their county, those exposed to programs with higher funding levels per child ages three to six experienced systematically better outcomes, which is consistent with my empirical ap- proach capturing causal program impacts. In addition to treatment effect heterogeneity along the dimensions explored in Table

5, it is noteworthy that recent analyses of data from the Head Start Impact Study have found that, in its modern form, Head Start’s effectiveness varies considerably across different centers (Walters 2015; Bloom and Weiland 2015; Kline and Walters 2016). The evidence in these studies suggests that much of the variability in treatment effects is attributable to program characteristics, such as hours of care and the use of home visits, as well as to the types of alternative care arrangements available. While data limita- tions prevent me from analyzing how the characteristics of Head Start centers from the period studied here impacted their effectiveness, substantial heterogeneity seems likely given the low standards for receiving initial Head Start grants and the lack of uniform performance standards and monitoring in this period. It is therefore likely that the average effects reported in Table 3 mask substantial heterogeneity across higher and lower quality programs.

B. Balancing Tests

The validity of the baseline estimates from Table 3 rests on the assumption that indi- viduals who had low Head Start exposure because they were beyond the program’s target age when it was locally introduced were similar to individuals who were suffi- ciently young for the program at the time of its introduction and therefore had higher exposure, such that the expected long-term outcomes of these two groups were the same in the absence of program exposure. While this assumption cannot be tested directly, a partial assessment of its validity can be made by comparing the pretreatment character- istics of the individuals within the estimation sample who did and did not have positive levels of Head Start exposure. This exercise is conceptually similar to the “balancing tests” that are standard practice for regression discontinuity research designs (see Lee and Lemieux 2010) and is useful in the current context as well even though the main specifications are difference-in-difference models. Panel A of Table 6 compares the simple means of predetermined characteristics for

individuals who had positive Head Start exposure versus those who had no Head Start exposure. Specifically, I report the race and gender composition of the two groups, parental education levels, birth order, and number of siblings. Reassuringly, the levels of these predetermined characteristics are very similar across the two groups, and t-tests (not shown) indicate that none of the differences are statistically significant. Although none of the individual characteristics from Table 6 vary significantly with

Head Start exposure, there may be some concern that they are jointly related to Head Start

Thompson 1123

T ab

le 6

B a la n ci n g Te st s

M at er n al

E d u ca ti o n

P at er n al

E d u ca ti o n

B la ck

W h it e

F em

al e

B ir th

O rd er

N u m b er

o f S ib li n g s

S u m m ar y

o f P re d et er m in ed

C h ar ac te ri st ic s

(1 )

(2 )

(3 )

(4 )

(5 )

(6 )

(7 )

(8 )

P an

el A : M ea n s b y E xp

os u re

S ta tu s

P o si ti ve

H ea d S ta rt ex p o su re

1 0 .8 4 8

1 0 .2 2 6

0 .1 4 4

0 .8 5 6

0 .4 7 3

2 .9 5 1

3 .3 7 5

-0 .0 4 3

N o H ea d S ta rt ex p o su re

1 0 .7 5 0

1 0 .3 1 6

0 .1 5 4

0 .8 4 6

0 .5 2 4

3 .1 0 0

3 .3 7 8

-0 .0 4 6

P an

el B : R eg re ss io n R es u lt s

H ea d S ta rt ex p o su re

-0 .0 3 4 9

-0 .0 0 2 6

0 .0 0 4 7

-0 .0 0 4 7

0 .0 1 3 7

-0 .0 8 3 6

-0 .0 8 6 4

-0 .0 0 0 7

(0 .0 4 8 5 )

(0 .0 6 9 9 )

(0 .0 0 5 5 )

(0 .0 1 5 7 )

(0 .0 5 8 2 )

(0 .0 6 9 7 )

(0 .0 0 5 9 )

O b se rv at io n s

2 ,6 8 5

N o te s: P an el A re p o rt s si m p le m ea n s o f th e va ri ab le s in d ic at ed

in th e co lu m n ti tl es

w it h in th e es ti m at io n sa m p le ,s p li tb y in d iv id u al s w h o h ad

p o si ti ve

H ea d S ta rt ex p o su re an d

th o se

w h o h ad

n o H ea d S ta rt ex p o su re .P

an el B re p o rt s es ti m at es

fr o m

m o d el s id en ti ca l to

th o se

in T ab le 3 bu t th at u se

th e ch ar ac te ri st ic s in d ic at ed

in th e co lu m n ti tl es

as th e

d ep en d en t va ri ab le .S

ta n d ar d er ro rs cl u st er ed

at th e co u n ty le ve la re in p ar en th es es .T

h e su m m ar y o f p re d et er m in ed

ch ar ac te ri st ic s is a va ri ab le ta k in g o n th e p re d ic te d va lu es

fr o m a re g re ss io n w it h th e o u tc o m e in d ex

as th e d ep en d en t va ri ab le an d th e li st ed

p re d et er m in ed

ch ar ac te ri st ic s as

in d ep en d en t va ri ab le s. C u st o m N L S Y 7 9 sa m p li n g w ei g h ts

ar e ap p li ed . S ig n if ic an ce

le ve ls : * p < 0 .1 0 , * * p < 0 .0 5 , * * * p < 0 .0 1 .

1124 The Journal of Human Resources

exposure.To addressthis possibilitythe final column ofTable6 PanelA reports means for a summary measure of the listed characteristics. This index consists of the predicted values from a regression with the outcome index used in the main specifications as the dependent variable and the listed predetermined characteristics as independent variables, an approach similar to the one used by Almond et al. (2010). The means of this summary measure also indicate no statistically or substantively significant differences in the characteristics of individuals with and without positive Head Start exposure. Panel B of Table 6 implements more formal balancing tests by reestimating the

preferred specification from Table 3 above but replacing long-term outcomes with predetermined covariates as the dependent variable. None of the the predetermined characteristics display a practically or statistically significant association with Head Start eligibility, and the summary measure of predetermined characteristics is also uncorrelated with treatment. These results suggest that the observed increases in long- term outcomes are not an artifact of structural differences in the characteristics of the treatment and control groups

C. Endogenous Program Adoption

Despite the similar baseline characteristics of individuals with different levels of Head Start exposure, it may still be the case that counties adopting Head Start in earlier versus later years had different underlying trends in child outcomes, and such nonrandom selection into initiating a Head Start program would violate the “common trends” assumption required by difference-in-difference specifications like those estimated in Table 3. Given this, Table 7 presents results from several models that help to account for potential departures from the common trends assumption. The most flexible method of accounting for potential differences across counties in

the trends of unobserved determinants of long-term outcomes is to add county-specific linear birth cohort trends to Equation 1, which allow linear trends in outcomes across cohorts to vary for each individual county. The main issue with this approach is that the county-specific trends absorb much of the identifying variation in Head Start exposure, substantially reducing precision. Models with county-specific linear cohort trends are reported in Panel A of Table 7. As expected, the estimates are much less precise than in the baseline specification, and the standard error for the composite outcome nearly quadruples from 0.023 to 0.089. However, the point estimate for the composite outcome remains positive, increases in magnitude to 0.154, and remains statistically significant at the 10 percent level despite the reduced precision. These results are indicative that the baseline findings are not an artifact of differential trends in early-adopting counties, but the instability of the point estimate and lack of precision make this finding suggestive rather than conclusive. An alternative to county-specific trends is allowing cohort trends in outcomes to vary

at lower levels of aggregation. Such specifications are less flexible in accounting for any violations of the common trends assumption, but will typically produce greater preci- sion since trends do not vary at the same level at which treatment is defined and therefore have lower levels of collinearity with the exposure measure. Panel B of Table 7 presents the results of models that include interactions between a linear cohort variable and both state indicators and the following 1960 county characteristics: total population, percent black, percent urban, percent of individuals in families with incomes under $3,000, and

Thompson 1125

T ab

le 7

Te st s o f E n d o g en o u s H ea d S ta rt A d o p ti o n

C o m p o si te

O w n

In co m e

H o u se h o ld

In co m e

U n em

p lo y m en t

H ig h

G ra d e

H ig h

S ch o o l

D ro p o u t

C o ll eg e

G ra d u at e

S el f- R at ed

H ea lt h

C o n d it io n s

H ea lt h

L im

it at io n

(1 )

(2 )

(3 )

(4 )

(5 )

(6 )

(7 )

(8 )

(9 )

(1 0 )

P an

el A : C ou

n ty -S p ec if ic T re n d s

H ea d S ta rt

ex p o su re

0 .1 5 4 *

5 ,9 9 9 * *

8 ,1 5 4

0 .0 1 1

0 .1 3 7

-0 .0 1 5

-0 .0 0 8

0 .2 4 6 * * *

-0 .0 5 4

-0 .0 5 0

(0 .0 8 9 )

(2 ,8 4 5 )

(6 ,2 8 0 )

(0 .0 2 2 )

(0 .1 8 3 )

(0 .0 2 3 )

(0 .0 3 1 )

(0 .0 8 7 )

(0 .0 6 4 )

(0 .0 5 5 )

O b se rv at io n s

2 ,6 8 5

2 ,2 7 2

2 ,2 7 3

2 ,5 5 4

2 ,5 5 9

2 ,2 7 0

2 ,2 7 1

P an

el B : S ta te

T re n d s an

d C ou

n ty – C oh

or t In te ra ct io n s

H ea d S ta rt

ex p o su re

0 .0 8 8 * * *

2 ,5 5 4 * *

3 ,7 4 5

0 .0 0 4

0 .0 4 4

0 .0 1 0

0 .0 2 8 *

0 .0 8 7 *

-0 .0 5 5 * *

-0 .0 3 8 * *

(0 .0 3 3 )

(1 ,1 8 2 )

(2 ,6 2 1 )

(0 .0 0 9 )

(0 .0 7 4 )

(0 .0 1 3 )

(0 .0 1 5 )

(0 .0 4 8 )

(0 .0 2 3 )

(0 .0 1 8 )

O b se rv at io n s

2 ,6 8 5

2 ,2 7 2

2 ,2 7 3

2 ,5 5 4

2 ,5 5 9

2 ,2 7 0

2 ,2 7 1

P an

el C : O th er

P ol ic y C on

tr ol s

H ea d S ta rt

ex p o su re

0 .0 6 2 * *

1 ,5 0 8

1 ,6 6 1

-0 .0 0 8

0 .0 7 0

-0 .0 0 1

0 .0 2 5 *

0 .0 4 5

-0 .0 2 8

-0 .0 4 3 * * *

(0 .0 2 5 )

(1 ,3 2 7 )

(1 ,4 8 4 )

(0 .0 0 9 )

(0 .0 5 2 )

(0 .0 1 2 )

(0 .0 1 5 )

(0 .0 4 1 )

(0 .0 2 8 )

(0 .0 1 6 )

O b se rv at io n s

2 ,3 9 3

2 ,0 5 0

2 ,0 5 1

2 ,2 9 0

2 ,2 9 5

2 ,0 4 3

2 ,0 4 4

P an

el D : N or th er n B la ck

H ea d S ta rt

ex p o su re

0 .0 7 2

-1 ,2 2 5

-2 ,1 6 7

0 .0 0 5

0 .1 6 8

0 .0 1 7

0 .0 2 3

0 .2 6 3 *

-0 .0 6 8

-0 .0 3 7

(0 .0 9 3 )

(3 ,4 7 7 )

(4 ,5 0 3 )

(0 .0 3 5 )

(0 .2 4 6 )

(0 .0 2 4 )

(0 .0 4 2 )

(0 .1 3 9 )

(0 .0 5 8 )

(0 .0 9 4 )

O b se rv at io n s

3 5 7

2 8 2

3 3 7

3 3 9

2 9 3

N o te s: C o lu m n ti tl es

in d ic at e th e d ep en d en tv ar ia b le .T

h e sa m p le co m p o si ti o n an d in cl u d ed

co va ri at es

ar e id en ti ca lt o th o se

in T ab le 3 w it h th e fo ll ow

in g ex ce p ti o n s. M o d el s in

P an el A in cl u d e a li n ea r b ir th co h o rt va ri ab le in te ra ct ed

w it h co u n ty in d ic at o rs .M

o d el s in P an el B in cl u d e a li n ea r b ir th co h o rt va ri ab le in te ra ct ed

w it h st at e in d ic at o rs an d w it h

th e fo ll ow

in g 1 9 6 0 co u n ty ch ar ac te ri st ic s: to ta lp o p u la ti o n ,p er ce n t b la ck ,p er ce n t u rb an ,p er ce n t o f in d iv id u al s in fa m il ie s w it h in co m es

u n d er $ 3 ,0 0 0 ,a n d p er ce n t o f la n d in

fa rm

in g . M o d el s in

P an el

C co n ta in

co n tr o ls

fo r ch il d h o o d ex p o su re

to M ed ic ai d , fo o d st am

p s, ca sh

as si st an ce , C A P ad m in is tr at iv e g ra n ts , C A P h ea lt h p ro g ra m s, an d

co m m u n it y h ea lt h ce n te rs .M

o d el s in P an el D ar e es ti m at ed

u si n g th e sa m p le o f b la ck s b o rn

o u ts id e o f th e so u th .S ta n d ar d er ro rs cl u st er ed

at th e co u n ty le ve la re in p ar en th es es .

C u st o m

N L S Y 7 9 sa m p li n g w ei g h ts ar e ap p li ed . S ig n if ic an ce

le ve ls : * p < 0 .1 0 , * * p < 0 .0 5 , * * * p < 0 .0 1 .

1126 The Journal of Human Resources

percent of land in farming.24 These models help account for potential differences in early and late adopting counties by allowing for differential trends across states and across counties with different baseline characteristics. The results in Panel B of Table 7 arevery similar to the baseline results from Table 3, with a point estimate of 0.088 for the composite outcomes that is statistically significant at beyond the 1 percent level, sug- gesting that differences in the trends of early-adopting counties are unlikely to explain the main treatment effect estimates. A related issue is that because Head Start was introduced as one component of the

broader War on Poverty, the reported estimates may confound Head Start’s effects with those of other simultaneously introduced programs. As noted above, restricting the sample to individuals who were relatively close to Head Start’s target age range at the time a local program was introduced reduces the risk of such confounding, since few contemporaneous policy changes specifically targeted four-year-olds and therefore would have impacted children on either side of the Head Start eligibility threshold about equally. Still, this was a complex period of rapidly changing policy, and a more direct assessment of possible confounding is warranted. To this end, Panel C of Table 7 estimates models that are identical to the preferred

specification from Table 3 but include controls for exposure to six other aspects of the War on Poverty: Medicaid, food stamps, cash assistance, CAP administrative grants, CAP health programs, and community health centers. County level data on these pro- grams are drawn from the NACAP files described above and from the Regional Eco- nomic Information System (REIS), and for each NLSY79 respondent I calculate each program’s mean level of funding for the same set of three calendar years over which Head Start exposure was measured.25

The estimated effects of Head Start after controlling for these contemporaneous policies are very similar to the baseline findings, suggesting minimal confounding.26

Another important change occurring over the study period was southern school de- segregation. Although the Brown v. Board ruling declared segregated schools uncon- stitutional in 1954, only token desegregation occurred in the Deep South over the next ten years, and in the fall of 1964 just 7 percent of black children in the South attended a desegregated school (Cascio et al. 2010). After passage of the 1964 Civil Rights Act there was a rapid dismantling of segregated schools in the South, and by 1970 virtually all southern black children attended desegregated schools. The timing of these changes coincides closely with the rollout of Head Start, raising the possibility that what has been

24. Values of these characteristics from 1960 are used because this is the last decennial census prior to the introduction of Head Start, so the interactions will account for trends in the adoption of Head Start that vary by these observable pre-introduction county characteristics. Hoynes, Whitmore-Schanzenbach, and Almond (2016) use a very similar specification with state-specific cohort trends and cohort–county characteristic interactions to account for potential endogeneity in county-level adoption of the Food Stamp program. 25. For community health centers I calculate the proportion of years in which a center was operational in each respondent’s county rather than using funding levels. Results are very similar if exposure to these additional programs occurring over the full course of childhood is used rather than for ages three to six. Most of the data on these additional programs were again compiled and generously shared by Bailey and Goodman-Bacon (2015). 26. Because missing values for the measures of other War on Poverty programs lead to substantial sample size reductions in the models from Panel C, these results cannot be directly compared to the baseline results from Table 3, which exclude these controls. A more direct comparison can be made with the estimated treatment effect from models that use the sample from Panel C of Table 7 while excluding the policy controls and for the composite outcome such a model produces a coefficient of 0.076 with a standard error of 0.029 (not shown).

Thompson 1127

identified as a Head Start effect may partially reflect increased exposure to desegregated schools, especially since the estimated Head Start impacts were largest among black NLSY79 respondents. A transparent method of testing this possibility is to estimate effects among northern

blacks. While many high-profile desegregation court cases occurred in the North, the absolute magnitudes of segregation changes were much smaller than in the South, primarily because approximately 90 percent of northern black children were attending a desegregated school by the mid 1960s. Panel D of Table 7 reports results from esti- mating the baseline model for the northern black sample and finds an effect of 0.072 standard deviations. This estimate is not statistically significant, likely because the sample size is reduced to 357, but the practically large positive point estimate observed in this subsample makes it unlikely that the main results are driven by differential exposure to desegregated schools.

D. The Effects of Funding at Alternate Ages

Table 1 showed that during the period under study Head Start was most commonly attended by children ages four and five, with lower but substantial attendance by chil- dren ages three and six, and very low levels of participation outside of the three to six age range. Given this participant age profile, we would generally not expect the Head Start funding levels that occurred at ages other than three to six to be associated with long- term outcomes. If any such associations were present, it would raise serious concerns about the validity of the research design, so that estimating the effect of funding at a variety of ages is a useful placebo exercise. Such estimates are also useful for verifying that it is appropriate to define treatment as the average Head Start funding level across the three calendar years that individuals were ages three to six. Table 8 reports estimates from models that are identical to the baseline specification,

but that use measures of funding at each age between three and eight in place of average funding from ages three to six.27 Head Start funding at each age in Table 8 is measured using the mean funding level across the two calendar years in which respondents were that particular age. Note that this implies that Head Start exposure levels across the studied ages are effectively a two calendar year moving average. For instance, an in- dividual born in 1963 was threeyears old for portions of both 1966 and 1967, so their age three exposure is defined as the mean of the Head Start funding in their county of birth during these two years. The same individual’s age four exposure is similarly defined as the mean of the funding levels that occurred in the two calendar years that they were four years old, which are 1967 and 1968, so that 1967 funding is included in their exposure measures for both age three and age four. For succinctness I focus on results for the composite outcome measure. Table 8 finds that Head Start funding at age four has the largest impact on long-term

outcomes, with an estimated treatment effect of 0.061 standard deviations, while funding at ages three and five has modestly smaller impacts of 0.057 and 0.052 standard deviations, respectively. Such differences are expected given the age profiles of the

27. Because the youngest NLSY79 cohorts were born in 1964, and Head Start funding data begin in 1966, three is the youngest age at which Head Start exposure takes on positive values for more than a small fraction of NLSY79 respondents.

1128 The Journal of Human Resources

participants in initial Head Start programs shown in Table 1. Tests of statistical sig- nificance (not shown) indicate that differences between the estimated effect of age four exposure and exposure at ages three and five are not statistically significant at con- ventional levels, with p-values of 0.820 and 0.314, respectively. Column 4 of Table 8 indicates that funding at age six improves long-term outcomes by a statistically insig- nificant 0.029 standard deviations, while the final two columns of Table 8 indicate that funding at ages seven and eight are uncorrelated with long-term outcomes, which is reassuring given that very few Head Start programs in this period served children older than six.28 The differences between the estimated effect of age four exposure and exposure at ages six, seven, and eight are marginally statistically significant, with p- values of 0.116, 0.124, and 0.086, respectively.

E. Additional Sensitivity Tests

Table 9 presents several additional results and robustness checks. The baseline estimates restricted the sample to individuals who were ages two through seven at the time of local Head Start implementation. While using this estimation sample arguably creates more comparable treatment and control groups, it also excludes approximately 2,000 other- wise usable observations, and estimating models with a more inclusive sample is an important robustness check. Panel A of Table 9 reports results from models that impose

Table 8 The Effect of Head Start Exposure at Ages Three to Eight

Age 3 Exposure

Age 4 Exposure

Age 5 Exposure

Age 6 Exposure

Age 7 Exposure

Age 8 Exposure

(1) (2) (3) (4) (5) (6)

Head Start exposure

0.057** 0.061*** 0.052*** 0.029 -0.007 -0.034 (0.029) (0.017) (0.016) (0.027) (0.023) (0.025)

Observations 2,685 2,685 2,685 2,685 2,685 2,685

Notes: In all models the dependent variable is the composite outcome measure described in Section II of the text. The independent variable is per-capita Head Start expenditures in each respondent’s county of birth in the calendar years that they were the indicated age, scaled by the overall sample mean of $170. Sample is restricted to respondents who were ages two to seven at the time of local Head Start implementation and who did not have a parent who attended college. All models contain county and cohort fixed effects and sets of indicators for race, gender, birth order, number of siblings, and maternal education. Standard errors clustered at the county level are in parentheses. Custom NLSY79 sampling weights are applied. Significance levels: *p < 0.10, **p < 0.05, ***p < 0.01.

28. The small and insignificant effect of age six funding, despite the fact that substantive numbers of six- year-old children were served by Head Start in this period, may be due to the fact that the age six fund- ing measure includes funding from the calendar year that children turned turned seven, when they were unlikely to participate, and may also reflect reduced efficacy of the summer programs that older children attended.

Thompson 1129

T ab

le 9

A d d it io n a l R o b u st n es s

C o m p o si te

O w n

In co m e

H o u se h o ld

In co m e

U n em

p lo y m en t

H ig h

G ra d e

H ig h

S ch o o l

D ro p o u t

C o ll eg e

G ra d u at e

S el f-

R at ed

H ea lt h

C o n d it io n s

H ea lt h

L im

it at io n

(1 )

(2 )

(3 )

(4 )

(5 )

(6 )

(7 )

(8 )

(9 )

(1 0 )

P an

el A : F u ll S am

p le

H ea d S ta rt

ex p o su re

0 .0 4 0 * *

9 2 2

1 ,2 8 2

-0 .0 0 2

-0 .0 0 3

-0 .0 1 3

0 .0 0 8

0 .0 6 1 *

0 .0 0 5

-0 .0 2 8 *

(0 .0 1 9 )

(9 9 2 )

(1 ,1 6 9 )

(0 .0 0 6 )

(0 .0 5 0 )

(0 .0 0 8 )

(0 .0 1 2 )

(0 .0 3 5 )

(0 .0 1 4 )

O b se rv at io n s

4 ,7 4 3

4 ,0 4 7

4 ,0 4 8

4 ,5 1 4

4 ,5 2 3

4 ,0 5 0

4 ,0 4 7

4 ,0 5 1

P an

el B : S ib li n g F ix ed

E ff ec ts

H ea d S ta rt

ex p o su re

0 .0 7 4

2 ,0 3 0

1 ,4 2 4

-0 .0 0 9

0 .1 2 2

-0 .0 0 1

0 .0 3 0

0 .0 6 0

0 .0 2 7

-0 .0 4 3

(0 .0 7 4 )

(2 ,4 1 1 )

(3 ,2 4 5 )

(0 .0 1 5 )

(0 .1 5 1 )

(0 .0 1 0 )

(0 .0 3 7 )

(0 .0 7 3 )

(0 .0 3 8 )

(0 .0 4 7 )

O b se rv at io n s

4 ,2 2 7

3 ,5 9 4

3 ,5 9 5

4 ,0 2 6

4 ,0 3 3

3 ,6 0 1

3 ,5 9 8

3 ,6 0 2

P an

el C : N on

m ov in g S u b sa m p le

H ea d S ta rt

ex p o su re

0 .1 0 0 * * *

2 ,2 9 2 * *

4 ,0 2 4 * *

0 .0 0 3

0 .1 8 6 * * *

-0 .0 1 4

0 .0 2 1 *

0 .1 2 7 * *

-0 .0 2 3

-0 .0 3 6 * *

(0 .0 2 8 )

(1 ,1 3 3 )

(1 ,9 4 7 )

(0 .0 0 7 )

(0 .0 6 8 )

(0 .0 1 1 )

(0 .0 1 2 )

(0 .0 6 1 )

(0 .0 2 1 )

(0 .0 1 6 )

O b se rv at io n s

1 ,8 1 8

1 ,5 3 3

1 ,5 3 4

1 ,7 2 9

1 ,7 3 1

1 ,5 3 9

1 ,5 4 0

1130 The Journal of Human Resources

P an

el D : N o C on

tr ol s

H ea d S ta rt

ex p o su re

0 .0 7 7 * * *

1 ,6 9 6 * *

3 ,1 2 3 * *

-0 .0 0 3

0 .1 3 4 * *

-0 .0 1 0

0 .0 2 5 * *

0 .0 3 0

-0 .0 0 4

-0 .0 4 1 * * *

(0 .0 2 2 )

(8 6 0 )

(1 ,3 6 4 )

(0 .0 0 7 )

(0 .0 5 7 )

(0 .0 0 9 )

(0 .0 1 3 )

(0 .0 3 0 )

(0 .0 1 9 )

(0 .0 1 4 )

O b se rv at io n s

2 ,7 6 8

2 ,3 3 7

2 ,3 3 8

2 ,6 3 0

2 ,6 3 5

2 ,3 3 5

2 ,3 3 6

2 ,3 3 7

P an

el E : U n w ei gh

te d

H ea d S ta rt

ex p o su re

0 .0 9 0 * * *

1 ,8 6 0 *

3 ,4 5 4 * *

-0 .0 0 1

0 .1 5 7 * * *

-0 .0 1 7 *

0 .0 2 1 * *

0 .0 8 1 * *

-0 .0 0 9

-0 .0 3 3 * *

(0 .0 2 3 )

(9 4 7 )

(1 ,3 6 6 )

(0 .0 0 8 )

(0 .0 5 5 )

(0 .0 1 0 )

(0 .0 3 5 )

(0 .0 2 3 )

(0 .0 1 5 )

O b se rv at io n s

2 ,6 8 5

2 ,2 7 2

2 ,2 7 3

2 ,5 5 4

2 ,5 5 9

2 ,2 7 0

2 ,2 7 1

N o te s: C o lu m n ti tl es

in d ic at e th e d ep en d en tv ar ia b le .T

h e sa m p le co m p o si ti o n an d in cl u d ed

co va ri at es

ar e id en ti ca lt o th o se

in T ab le 3 w it h th e fo ll ow

in g ex ce p ti o n s. M o d el s in

P an el A d o n o te xc lu d e o b se rv at io n s o u ts id e o f ag es

tw o to se ve n at th e ti m e o f H ea d S ta rt ’s in tr o d u ct io n .M

o d el s in P an el B in cl u d e si b li n g g ro u p in d ic at o rs in p la ce

o f co u n ty

in d ic at o rs

an d d o n o t ex cl u d e o b se rv at io n s o u ts id e o f ag es

tw o to

se ve n at

th e ti m e o f H ea d S ta rt ’s

in tr o d u ct io n . M o d el s in

P an el

C ar e es ti m at ed

u si n g th e sa m p le

o f

re sp o n d en ts w h o w er e st il ll iv in g in th ei r co u n ty o f b ir th as

o f ag e 1 4 .M

o d el s in P an el D ex cl u d e al lc o n tr o ls ex ce p tc o u n ty an d co h o rt fi x ed

ef fe ct s. M o d el s in P an el E d o n o t

ap p ly

sa m p li n g w ei g h ts . S ta n d ar d er ro rs cl u st er ed

at th e co u n ty

le ve l ar e in

p ar en th es es . S ig n if ic an ce

le ve ls : * p < 0 .1 0 , * * p < 0 .0 5 , * * * p < 0 .0 1 .

Thompson 1131

no age-based restrictions on the sample, and thus include individuals who were outside this age range when a local program was introduced or who were born in counties that never introduced a program. The estimated treatment effect for the composite outcome measure in this sample is 0.040 standard deviations and is statistically significant at the 5 percent level, with nontrivial positive impacts observed for most of the individual outcomes as well, although most fail to achieve statistical significance. While there is no obvious explanation for the attenuated treatment effect magnitude in the full sample, it is reassuring that the main estimate remains statistically and economically significant without sample restrictions. As noted above, several previous evaluations of Head Start have relied on compari-

sons of siblings that differed in their Head Start participation (Currie and Thomas 1995; Garces, Thomas and Currie 2002; Deming 2009). One criticism of this identification strategy has been that unobserved child-varying characteristics may influence the deci- sion to enroll one sibling in Head Start but not the other. This issue may be less acute during the Head Start rollout period studied here, since the nonattendance of one sibling will often be due to the program not existing when they were in the target age range rather than direct parental decisions.29 The NLSY79 also contains large numbers of siblings, so that sibling fixed-effects models can be implemented with the current data set. Panel B of Table 9 reports results from sibling fixed-effects models. The sample used

in these models continues to exclude families with high levels of parental education, but is no longer restricted to individuals who were ages two through seven at the time of local Head Start implementation, since these models attempt to account for unobserved heterogeneity by restricting comparisons to individuals from the same family. I continue to use the funding-based Head Start exposure measure that was described above. The estimated treatment effect of 0.074 for the composite outcomes is very similar to the baseline model in Table 3, but it is much less precise and fails to achieve statistical significance (p = 0.313). Results for the other outcomes also typically have the expected sign and are of similar magnitudes as the baseline results, but have large standard errors and fail to achieve statistical significance. These reductions in precision are likely due to there being less variation in program exposure within sibling groups than within non- siblings from the same county who born in a relatively narrow range of cohorts. In addition to resembling the baseline results from Table 3 of the present paper, the

magnitudes of the estimates in Panel B of Table 9 are generally consistent with the findings of previous studies using sibling fixed-effects strategies. For instance Deming (2009) uses data from the children of women in the NLSY79, for whom actual Head Start participation is observed contemporaneously, and estimates a ToT effect of 0.23 standard deviations for an index of young adult outcomes. Given the differences in historical era and the lack of first-stage estimates in the present study, this estimate cannot be directly compared to the sibling fixed-effects estimates in Table 9, but a first stage of approximately 0.25 would imply that the ITT effect of 0.074 from Table 9 translates into a ToTeffect of approximately 0.30, which is broadly in line with Deming (2009) and other sibling fixed-effects based estimates. An additional robustness related issue is that Head Start exposure in the models

reported above was assigned using county of birth. However, it is likely that some

29. Aizer and Cunha (2012) also rely on sibling comparisons during Head Start’s rollout to identify short-term program effects and their interactions with other human capital investments.

1132 The Journal of Human Resources

respondents moved out of their county of birth before ages three to six, which would lead to mismeasurement in program exposure. The NLSY79 also recorded county of residence at age 14, and one way of reducing the potential for bias related to cross- county residential mobility is to restrict the sample to individuals whowere still living in their county of birth as of age 14.30 Results using this subsample are reported in Panel C of Table 9 and are very similar to the baseline results, with a estimated treatment effect for the composite measure of 0.10 standard deviations. Panel D of Table 8 reports the results of specifications that exclude all demographic

controls and include only county and cohort fixed-effects. Given that demographic characteristics were demonstrated to be well balanced across individuals with and without program exposure, the exclusion of these controls is expected to have min- imal effects on the results, and Panel D shows that this is indeed the case. Finally, all of the models up to this point have been weighted using custom NLSY79

sampling weights to account for survey design features such as oversampling and clustering. However, given that a nonrandom subset of the full NLSY79 sample is being utilized, it is not wholly clear that applying sampling weights is appropriate. As an additional robustness check, Panel E of Table 9 shows estimates from models that exclude sampling weights, and the results indicate that weighting is not consequential in practice, as the estimated treatment effects in the unweighted models are very similar to the weighted estimates.

VII. Implications for Current Head Start Policy

As noted above, the early versions of Head Start studied here differed substantially from more modern implementations of the program, and the counterfac- tual environments of participating children have likely changed substantially over time as well. These considerations suggest that the treatment effect estimates presented above are not directly applicable to the modern Head Start program, and it is useful to consider how the program’s treatment effects may differ across the two periods. With respect to the characteristics of Head Start programs themselves, overall quality

has very likely improved substantially since the 1960s. There is a consensus among historical accounts that in the rush to spend allocated CAP funds, the quality standards for initial Head Start grant applications were very low, with nearly 90 percent of ap- plications receiving approval, and a lack of qualified staff was particularly problematic (Levitan 1969; Vinovskis 2008). Subsequent reforms increased the average qualifications of paid staff as well as the

average level of per-participant funding. More specifically, Hulsey et al. (2011) report that as of 2009, 81 percent of Head Start teachers held at least an associate’s degree, 46 percent held a bachelor’s degree, and that the average level of teacher’s classroom experience was nine years. In contrast, data from Bureau of Census (1970) indicate that from 1965–1968 approximately 25 percent of Head Start teachers had completed three or four years of college, with an additional 10–15 percent of teachers having completed

30. While it is plausible that some individuals may have moved out of their county of birth at ages three to six, but then returned by age 14, it seems likely that such cases are relatively rare.

Thompson 1133

one or two years of college, while Levitan (1969) reports that in 1966 and 1967, only 2,700 out of 18,000 Head Start teachers had completed a recommended eight-week training program. The increased use of professional staff, as well as the introduction of more full-year programs, led per-participant funding to increase as well, from $1,865 in 1966 (Head Start’s first full year of operation) to $8,992 in 2014 (DHHS 2014). One potentially negative effect of professionalizing Head Start’s staffing was a

corresponding decrease in paid parental employment, which was quite common in the early years of the program. For instance, Zigler and Valentine (1979) report that in 1965 Head Start centers directly employed 47,000 parents, and Head Start Advi- sory Committee member Eveline Omwake remarked that by 1967 “the employment function of the project was taking precedence over the educative function” (Vinovskis 2008). Given this, the estimates presented above may be driven in part by income effects due to parental employment rather than actual program content, whereas benefits of contemporary programs may be driven more by the impacts of adequately qualified and trained staff. In addition to increased staff qualifications, there have been a notable shifts in the

Head Start curriculum towards traditional academic skills, rather than health services, noncognitive skills, and parental engagement. For instance, a 48-page booklet prepared for administrators of the inaugural 1965 summer Head Start programs contained an extensive list of “broad goals” for the program, but this list omitted traditional academic skills like learning the alphabet almost entirely, instead focusing on health and emo- tional development goals (Vinovskis 2008). Contemporary programs have much more emphasis on Head Start’s academic mission, to the point that there have been proposals to move the program’s administrative home from the Department of Health and Human Services to the Department of Education (Gibbs, Ludwig, and Miller 2013). These shifts in curricular emphasis suggest that the mechanisms driving the results presented here may be primarily noncognitive, while more recent program effects may be more attrib- utable to cognitive and academic mechanisms. If on balance Head Start quality has improved since the initial implementations

studied here, then the program’s treatment effect would be expected to increase as well, all else constant. On the other hand, the counterfactual environments that participating children would have experienced in the absence of Head Start havevery likely improved over time as well, which would be expected to decrease the program’s treatment effect. One important change in counterfactual environments is due to the fact that at the time

of Head Start’s introduction therewerevery few alternative preschools available, and the availability of non–Head Start childcare options has increased dramatically since the 1960s. For instance, Gibbs, Ludwig, and Miller (2013) use CPS data to calculate that while only around 10 percent of children ages three and four were enrolled in preschool in 1965, by 2010 this number had grown to more than 50 percent. Additionally, the growth of public kindergarten throughout the 1960s and 1970s dramatically changed the counterfactual environments within the population of children five and six years old who commonly participated in early Head Start programs (Cascio 2009). There is also evidence that contemporary Head Start participants would be better able to have their basic health and nutrition needs met in the absence of the program. For instance the Bureau of Census (1972) reports that 30–40 percent of Head Start participants in 1969 had not received even basic vaccinations, whereas recent government reports indicate that over 90 percent of incoming Head Start children are vaccinated (DHHS 2014).

1134 The Journal of Human Resources

As a result of these and related changes, the conditions that Head Start participants would have experienced in the absence of the program have on average improved sub- stantially over time. These changes would likely lead to substantial reductions in Head Start’s causal effect on long-term outcomes, a view supported by recent work showing that differences in counterfactual environments can have large effects on the estimated impacts of Head Start. For instance, Kline and Walters (2016) and Feller et al. (2016) use NHISIS data to show that test score gains from Head Start are significantly larger among children that would have otherwise not attended formal preschool, while Duncan and Magnuson (2013) document large declines in the estimated impacts of preschool in- terventions since the 1950s. Given these changes in program content and counterfactual environments, it is im-

portant to be cognizant of the study period when comparing the current paper’s findings to the previous literature. In particular, the current results are not directly comparable to evaluations of more recent Head Start programs, such as those by Deming (2009) and Carneiro and Ginja (2014), or to findings from the Head Start Impact Study (Puma et al. 2010), but can be viewed as corroborating the evaluations of relatively early Head Start implementations performed by Garces, Thomas, and Currie (2002) and by Ludwig and Miller (2007).

VIII. Conclusion

Both policymakers and researchers have long viewed preschool based interventions as among the most promising strategies for improving the life chances of poor children, and Head Start is by far the largest such program in the United States, but Head Start’s impacts on long-term socioeconomic outcomes are not fully understood. In this paper, I developed an identification strategy based on geographic variation in the timing of Head Start’s original introduction, which was used to estimate the program’s causal effect on a wide variety of outcomes observed further into the life cycle than most previous research on this topic. The main finding has been that exposure to early implementations of Head Start

had an intent to treat effect of 0.081 standard deviations on an index of socioeconomic well-being constructed with measures observed through age 48. An extensive series of balancing tests, placebo exercises, heterogeneous treatment effect estimates, and ro- bustness checks were employed to help validate the research design and results, and subject to some important caveats related to data quality and precision, these tests on balance supported a causal interpretation of the main estimates. A notable limitation of the analysis is that the utilized Head Start funding measure was not strongly correlated with retrospective self-reports of Head Start enrollment, and it was argued that this was likely due poor data quality. This study’s findings contribute to an emerging consensus that despite relatively rapid

fade-out of test score gains, Head Start has large positive effects on adult outcomes. While the precise noncognitive mechanisms responsible for these effects remains lar- gely an open question, the present research demonstrates that long-term positiveimpacts extend to a wide range of socioeconomic and health related outcomes and are detectable more than four decades after Head Start participation occurred.

Thompson 1135

A p p en d ix

T ab

le A 1

C o va ri a n ce s o f O u tc o m e V a ri a b le s

O w n

In co m e

H o u se h o ld

In co m e

U n em

p lo y m en t

P ro p o rt io n

H ig h es t

G ra d e

C o m p le te d

H ig h

S ch o o l

D ro p o u t

C o ll eg e

G ra d u at e

S el f-

R at ed

H ea lt h

N u m b er

o f H ea lt h

C o n d it io n s

H ea lt h

L im

it at io n

O w n in co m e

1 .0 0

H o u se h o ld

in co m e

0 .6 1

1 .0 1

U n em

p lo y m en t

p ro p o rt io n

0 .2 8

0 .2 7

0 .8 5

H ig h es t g ra d e

co m p le te d

0 .4 0

0 .3 7

0 .2 0

1 .0 0

H ig h sc h o o l d ro p o u t

0 .1 6

0 .1 0

0 .4 6

0 .9 3

C o ll eg e g ra d u at e

0 .3 8

0 .3 5

0 .1 8

0 .8 3

0 .1 6

1 .0 1

S el f- ra te d h ea lt h

0 .2 4

0 .2 3

0 .1 3

0 .2 4

0 .1 4

0 .1 9

0 .9 8

N u m b er

o f h ea lt h

co n d it io n s

0 .0 7

0 .0 8

0 .0 3

0 .0 5

0 .0 6

0 .0 3

0 .2 7

0 .9 9

H ea lt h li m it at io n

0 .1 8

0 .1 5

0 .0 8

0 .1 0

0 .0 7

0 .3 8

0 .1 9

0 .9 9

N o te s: A ll va ri ab le s m ea su re d in

st an d ar d u n it s. S ee

S ec ti o n II o f th e te x t fo r d et ai le d va ri ab le d es cr ip ti o n s.

1136 The Journal of Human Resources

References

Aizer, A., and F. Cunha. 2012. “The Production of Human Capital: Endowments, Investments and Fertility.” NBER Working Paper 18429. Cambridge, MA: NBER.

Almond, D., and J. Currie. 2011. “Human Capital Development before Age Five.” Handbook of Labor Economics 4:1315–486.

Almond, D., J. Doyle, A. Kowalski, and H. Williams. 2010. “Estimating Marginal Returns to Medical Care: Evidence from At-Risk Newborns.” Quarterly Journal of Economics 125(2):591–634.

Bailey, M., and N. Duquette. 2014. “How Johnson Fought The War On Poverty: The Economics and Politics of Funding at the Office Of Economic Opportunity.” Journal of Economic History 74(2):351–88.

Bailey, M., and A. Goodman-Bacon. 2015. “The War on Poverty’s Experiment in Public Medi- cine: The Impact of Community Health Centers on the Mortality of Older Americans.” American Economic Review 105(3):1067–1104.

Barnett, W. 2011. “Effectiveness of Early Educational Intervention.” Science 333(6045): 975–78.

Bloom, H., and C. Weiland. 2015. “Quantifying Variation in Head Start Effects on Young Chil- dren’s Cognitive and Socio-Emotional Skills Using Data from the National Head Start Impact Study.” Unpublished.

Bronfenbrenner, U. 1979. The Ecology of Human Development: Experiments by Nature and Design. Cambridge, MA: Harvard University Press.

Bureau of Census. 1968. “Project Head Start 1965–1966: A Descriptive Report of Programs and Participants.” Washington, DC: Child Development Services Bureau (DHEW/OCD).

Bureau of Census. 1970. “Project Head Start 1968: A Descriptive Report of Programs and Par- ticipants.” Washington, DC: Child Development Services Bureau (DHEW/OCD).

Bureau of Census. 1972. “Project Head Start 1969–1970: A Descriptive Report of Programs and Participants.” Washington, DC: Child Development Services Bureau (DHEW/OCD).

Carneiro, P., and R. Ginja. 2014. “Long-Term Impacts of Compensatory Preschool on Health and Behavior: Evidence from Head Start.” American Economic Journal: Economic Policy 6(4):135–73.

Cascio, E. 2009. “Do Investments in Universal Early Education Pay Off? Long-Term Effects of Introducing Kindergartens into Public Schools.” NBER Working Paper 14951. Cambridge, MA: NBER.

Cascio, E., N. Gordon, E. Lewis, and S. Reber. S. 2010. “Paying for Progress: Conditional Grants and the Desegregation of Southern Schools.” The Quarterly Journal of Economics 125(1):445–82.

Coleman, J., E. Campbell, C. Hobson, J. McPartland, A. Mood, F. Weinfeld, and R. York. 1966. Equality of Educational Opportunity. Washington, DC: U.S. Government Printing Office.

Community Services Administration. 1981. Office of management, finance and grants manage- ment division. Records of the Community Services Administration, Record Group 381. College Park, MD: National Archives.

Currie, J. 2001. “Early Childhood Education Programs.” Journal of Economic Perspectives 15(2):213–38.

Currie, J., and D. Thomas. 1995. “Does Head Start Make a Difference?” American Economic Review 85(3):341–64.

Deming, D. 2009. “Early Childhood Intervention and Life-Cycle Skill Development: Evidence from Head Start.” American Economic Journal: Applied Economics 1(3):111–34.

Department of Health and Human Services (DHHS). 2014. Head Start Program Fact Sheet.

Thompson 1137

Duncan, G., and K. Magnuson. 2013. “Investing in Preschool Programs.” Journal of Economic Perspectives 27(2):109.

Feller, A., T. Grindal, L. Miratrix, and L. Page. 2016. “Compared to What? Variation in the Impacts of Early Childhood Education by Alternative Care-Type Settings.” Annals of Applied Statistics 10(3):1245–85.

Garces, E., D. Thomas, and J. Currie. 2002. “Longer-Term Effects of Head Start.” American Economic Review 92(4):999–1012.

Gelber, A., and A. Isen. 2013. “Children’s Schooling and Parents’ Behavior: Evidence from the Head Start Impact Study.” Journal of Public Economics 101:25–38.

Gibbs, C., J. Ludwig, and D. Miller. 2013. “Head Start Origins and Impacts.” In Legacies of the War on Poverty, ed. M. Bailey and S. Danziger, 39–65. New York: Russel Sage Foundation.

Haskins, R. 2004. “Competing Visions.” Education Next 4(1):26–33. Hausman, J. 2001. “Mismeasured Variables in Econometric Analysis: Problems from the Right and Problems from the Left.” Journal of Economic Perspectives 15(4):57–67.

Hausman, J., J. Abrevaya, and F. Scott-Morton. 1998. “Misclassification of the Dependent Variable in a Discrete-Response Setting.” Journal of Econometrics 87(2):239–69.

Hoynes, H., D. Whitmore-Schanzenbach, and D. Almond. 2016. “Long-Run Impacts of Child- hood Access to the Safety Net.” American Economic Review 106(4):903–34.

Hulsey, L., K. Nikki-Aikens, A. Kopack, J. West, E. Moiduddin, and L. Tarullo. 2011. Head Start Children, Families, and Programs: Present and Past Data from FACES. Princeton NJ: Mathe- matica Policy Research.

Klein, J. 2011. “Time to Ax Public Programs that Don’t Yield Results.” Time Magazine http://content.time.com/time/nation/article/0,8599,2081778,00.html (accessed December 13, 2017).

Kline, P., and C. Walters. 2016. “Evaluating Public Programs with Close Substitutes: The Case of Head Start.” Quarterly Journal of Economics 131(4):1795–848.

Knudsen, E., J. Heckman, C. Cameron, and J. Shonkoff. 2006. “Economic, Neurobiological, and Behavioral Perspectives on Building America’s Future Workforce.” Proceedings of the National Academy of Sciences 103(27):10155–62.

Lee, D., and T. Lemieux. 2010. “Regression Discontinuity Designs in Economics.” Journal of Economic Literature 48:281–355.

Levitan, S. 1969. Great Society’s Poor Law: A New Approach to Poverty. Washington DC: National Agricultural Library.

Ludwig, J., and D. Miller. 2007. “Does Head Start Improve Children’s Life Chances? Evidence from a Regression Discontinuity Design.” Quarterly Journal of Economics 122 (1):159–208.

Ludwig, J., and D. Phillips. 2007. “The Benefits and Costs of Head Start.” NBER Working Paper 12973. Cambridge, MA: NBER.

O’Brien, P. 1984. “Procedures for Comparing Samples with Multiple Endpoints.” Biometrics 40(4):1079–87.

Office of Economic Opportunity. 1965. “1st Annual Report.” Washington, DC: U.S. Government Printing Office.

Office of Economic Opportunity. 1966. “Quiet Revolution, 2nd Annual Report.” Washington, DC: U.S. Government Printing Office.

Office of Economic Opportunity. 1967. “The Tide of Progress, 3rd Annual Report.” Washington, DC: U.S. Government Printing Office.

Office Economic Opportunity. 1968. “As the Seed is Sown, 4th Annual Report.” Washington, DC: U.S. Government Printing Office.

Puma, M., S. Bell, R. Cook, C. Heid, G. Shapiro, P. Broene, et al. 2010. “Head Start Impact Study: Final Report.” Washington, DC: Administration for Children & Families.

1138 The Journal of Human Resources

http://content.time.com/time/nation/article/0,8599,2081778,00.html

Shonkoff, J., and D. Phillips. 2000. From Neurons to Neighborhoods: The Science of Early Childhood Development. Washington DC: National Academies Press.

Vinovskis, M. 2008. The Birth of Head Start: Preschool Education Policies in the Kennedy and Johnson Administrations. Chicago, IL: University of Chicago Press.

Walters, C. 2015. “Inputs in the Production of Early Childhood Human Capital: Evidence from Head Start.” American Economic Journal: Applied Economics 7(4):76–102.

Westinghouse Learning Corporation. 1969. “The Impact of Head Start: An Evaluation of the Effects of Head Start on Children’s Cognitive and Affective Development.” Washington, DC: Office of Economic Opportunity.

Zigler, E., and J. Valentine. 1979. Project Head Start: A Legacy of the War on Poverty. New York: The Free Press.

Thompson 1139

Copyright of Journal of Human Resources is the property of University of Wisconsin Press and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.