elem econ homework
Difference-in-differences
Panel data
When we are facing OVB and don’t have a good instrument on hand or a discontinuity in treatment status, we may be able uncover average causal effects using panel data methods.
Remember that panel data involves observations for multiple cross-sectional units over multiple periods of time.
It’s essentially a combination of cross-sectional and time series data.
Example of panel data:
Weekly grades for each student in this class.
Monthly GDP for multiple countries
…
We use two subscripts for variables which have observations across time and cross sectional units.
DD Motivating example: monetary policy
We motivate the first panel data method that we will examine, differences-in-differences, by examining the effect of monetary policy on the banking system. Specifically, does injecting money into the banking system during a recession benefit the health of the banking system?
To tackle this question we look at data on numbers of operating banks in Mississippi during the Great Depression.
During the Depression-era, Federal Reserve districts had a lot of autonomy to set monetary policy.
The border of the 6th District (run by the Atlanta Fed) and the 8th (run by the St. Louis Fed) is in Mississippi.
The two districts had distinct policy opinions: 6th preferred increasing bank lending in a recession, and the 8th preferred restricting bank lending in a recession.
DD example: natural experiment setup
The idea is we look at outcomes in both regions from a regional banking crisis from 1930-1931.
We look at the number of banks open in the two regions just before the crisis began, and the number open a year later.
Use the 8th district as a sort of control group to establish a counterfactual for the more money-liberal 6th district had the 6th district not increased bank lending.
It won’t do to simply compare the two districts at one period of time after the crisis started, given that.
As of July 1, 1931 D6 had 11 fewer banks open than D8, but D8 also had more banks before the policy change in D6.
Thus we need another method to compare these two districts to uncover a causal effect.
DD estimation: one way
The difference-in-differences estimator is carried out just exactly as the name states (sometimes econometrics gets to the point).
Let represent the number of banks open in district d in year t
In this case d is 6 or 8
t is 1930 (before crisis) or 1931 (after crisis)
The DD estimate of the effect of “easy money” in D6 is then given by:
Which, per the book is 19
But how does this serve to get us a causal effect…
Treatment effect in DD
Following the trend
The preceding graphical analysis brings up the key condition necessary for DD estimation to yield an average causal effect.
The condition we need is called the parallel trends (or common trends) assumption.
Basically, this assumption says that absent the treatment, the values of the outcome variable would move parallel between the treatment and control groups.
That is, the lines would be parallel.
If this assumption is satisfied, then the dotted line we drew is a true counterfactual, and DD gives us the treatment effect we want.
With just two data points, we have no way of testing this condition, but if we have more years worth of data…
…We can see if it holds
DD estimation: another way
We can also use regression for DD estimation by estimating the following equation:
is a dummy variable for the treatment group, in this case district 6.
is a dummy variable for time periods after the treatment.
The interaction term gives us our regression discontinuity estimate. It indicates observations from the treatment group after treatment.
Regression DD: advantages
Using regression for DD has a couple of key advantages.
It allows us to use more than two time periods, such as in Figure 5.3, to estimate the causal effect. This may give a more precise estimate (in the book the rDD estimate is 20.5).
It makes inference easier by directly giving us the relevant standard error for the DD estimate.
To regress, or not to regress
That is a moot question if we have only two time periods.
We can show, and I will on the board, that for two time periods, the regression DD estimate is the same estimate from the standard DD estimation.
With more than two time periods, you must use regression, since the standard DD estimation only uses 4 data points, 2 for each treatment status group.
Example 2: minimum wages & employment
In an influential paper 1994 paper titled “Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania”, Card and Krueger examine the effect of minimum wage increases on employment (as I’m sure the title suggested).
Conventional economic theory unequivocally predicts that a binding minimum wage increase will result in a decrease in employment.
But does the data agree…
Example 2: minimum wages & employment
To answer this question Card and Krueger collected data from New Jersey and Pennsylvania fast food stores in two waves, 1991 and 1992.
NJ enacted a minimum wage increase between the two interview waves, while PA did not.
So NJ is the treatment group, and 1992 is the post treatment period.
They estimated the equation
But wait…
The model Card and Krueger estimated looks a bit different from the regression DD framework that we just presented.
But in fact in the case of 2 time periods it would have been equivalent for Card and Krueger to have estimated the familiar framework: , as I will demonstrate on the board.
With that in mind, I’m sure you’re dying to know the results. Does a minimum wage increase really decrease employment??? Card and Krueger’s estimates revealed an increase of about 2-3 full time equivalent workers per store.
Before we stop the presses with the headline that minimum wages actually increase employment, we should think of the validity of this DD design. With what condition could this design be invalid?
Parallel trends
The answer is of course the parallel trends assumption.
Other researchers have since examined employment data from both states in time periods surrounding the increase, and have argued that the parallel trends condition is not met (so Card and Krueger’s estimates are biased).
A good DD design ideally includes multiple other time periods to provide evidence of the parallel trends assumption (like in Figure 5.3).
Most researchers now agree that there is no positive employment effect from minimum wage increases (though they are from in agreement as to whether the effect is zero or negative)
Example 3: Returning to MLDA
For our third example, we return to the question of whether lowering the MLDA would result in higher mortality rates for minors.
Currently all states have a MLDA of 21, but between the years 1971 and 1988 many states decreased the MLDA in response to the 26th amendment in 1971 which lowered the voting age to 18.
For example, Alabama lowered it’s MLDA to 19 in 1975, whereas Arkansas held their MLDA at 21. So we could use these two states and time periods before and after the Alabama decrease in MLDA to estimate:
Multiple treatments in DD
Or we could take advantage of another aspect of regression DD. And that is that we can include multiple treatments.
We can use multiple states and time periods with the following regression equation:
LEGAL is a variable measuring the proportion of 18-20 year olds who could legally drink in state s and year t. This variable is used with multiple states to account for the fact that some states had differing MLDAs (like AL’s choice of 19) and there was differing timing.
The last two term are state and time dummies (with one state and one time period exlcuded as the “reference” case to avoid the dummy variable trap.)
Not following the trend
The preceding regression equation assumed that all states have parallel trends absent treatment. It’s tough to believe two states have parallel trends, let alone all of them.
Thus this assumption is most likely violated. What can go wrong if this assumption is violated?
We may end up with a spurious MLDA effect where there is none (graph on board, or see Figure 5.5 in MM)
Or perhaps there is a true MLDA effect, but assuming parallel trends will result in a bias (again graph on board or see Fig. 5.6 in MM)
Correcting the trend
For that last case, as long as all of the trends (absent treatment) are strictly linear, there is a relatively simple fix to the regression equation that saves DD:
This allows for each state to have a different time trend (a linear one to be exact) relative to the reference state.
The book provides estimates of about a 9-12 increase in the death rate (up from about 6-9 in the sharp RD example). The takeaway from this is that you should be responsible, especially this weekend!…
Have a happy break
And on that note, beat Xicihigan, and have a happy and safe Thanksgiving break!