elem econ homework
IV and 2 Stage Least Squares
IV and regression
Recall that we established that the instrumental variables estimate is given by the ratio of the reduced form to the first stage. We can think about these two components in a regression framework.
Again, we define the arbitrary variables:
: outcome variable
: treatment variable
: instrumental variable
Then we wish to estimate the model: .
However, directly applying OLS is useless because D is correlated with some unobservable component in the error term. In other words,
IV and regression: first stage
The first condition we need for IV to work is to have a valid first stage.
We can model the first stage as follows:
So the first condition states that should be non-zero.
This is easily testable…
We can simply estimate the model and test the instrument for significance. That is we test the null hypothesis .
IV and regression: reduced form
The reduced form is the effect of the instrument on the outcome variable.
This can be modeled by
In order for instrumental variables to be a valid alternative to OLS, we should have . That is the instrument is uncorrelated with the unobserved components in the regression of Y on D.
This condition summarizes the other two conditions for IV: the exclusion restriction and random assignment of the instrument.
This condition is not really testable. After all, we is unobservable. We usually must appeal to economic intuition or good research design (such as the lottery for KIPP admission).
IV and regression
We can see where the IV estimate comes from by having a closer look at the three regression equations we have introduced so far.
Start from the original regression equation linking the outcome and treatment variables:
Plug in the first stage equation in the above equation:
Comparing the reduced form equation to the above equation we have:
So we have:
IV and regression
So we see that in order to derive the IV estimate for we can estimate the reduced form and first stage regressions and simply take the ratio of the coefficient estimates:
Of course, knowing what we know about regressions involving a binary independent variable, this is exactly the same as the simple average difference estimates we looked at before (for the case of a binary instrument).
IV and regression
The treatment variable and instrument need not be binary in order to use instrumental variables estimation.
If the instrument is not binary, then we cannot use the average difference (because there are not two groups defined by instrument status).
However, we can still estimate the reduced form and first stage using regression.
2SLS
The purpose of instrumental variables is to use a “portion” of the treatment variable that is not correlated with the unobserved determinants of the dependent variable.
With this motivation in mind, we can explore another means of estimating the causal effect of the treatment variable on the outcome variable.
Consider again the first stage equation for the instrumental variables estimator: , once we estimate this model we can split into two parts:
is the portion of D that is explained by the instrumental variable, which is uncorrelated with the error term in the regression of Y on D.
is the portion of D that is explained by everything else, including the parts of D that are correlated with the error term in the regression of Y on D.
2SLS
So under the assumption that the instrument and the error term are uncorrelated, we should be able to use the fitted value from the first stage to estimate the causal effect. This is exactly the procedure for 2 Stage Least Squares estimation.
Estimate the first stage equation, and get the fitted values
Using the fitted values calculated from the first stage, estimate the regression equation
The estimate from the second stage is the 2SLS estimate of the effect of D on Y.
2SLS and IV
So what is the difference between the 2SLS estimates and the IV estimates?
In the case of a single instrument, there is in fact no difference:
It is not always the case that 2SLS and IV estimates are identical. This is because 2SLS can generalize IV to include multiple instruments.
2SLS and IV: multiple instruments
It may be possible that we have multiple instruments for our endogenous treatment variable that satisfy the exclusion restriction and random assignment conditions.
Suppose we have 2 such instruments that are uncorrelated with the unobserved determinants of the dependent variable.
We can modify 2SLS estimation by including both instruments in the first stage:
Estimate and calculate the fitted values.
Using the residuals calculated from the first stage, estimate the regression equation . Exactly the same as the single instrument case.
By nature of the IV estimator, there does not exist a single IV estimate that includes multiple instruments. You must use one or the other, so 2SLS allows us to include more information in our estimates.
Endogenous instruments
We can summarize the three conditions for IV/2SLS to yield valid estimates as follows:
: This ensures a valid first stage.
: This ensures the exclusion restriction and random assignment conditions are satisfied.
In the case of a violation of the second condition, we may be able to use IV/2SLS.
For the second condition to be violated, the instrument must be correlated with one of determinants of the dependent variable that we are not controlling for.
If this factor is unobservable, then our instrument is not valid, and we should look elsewhere.
However, if the factor is observable, then we can adjust our estimation to control for the missing variable.
Endogenous instruments
To illustrate: suppose there is an omitted variable . Then we can write the error term as . Further, suppose that
Then we simply modify IV by including in both the first stage and reduced form equations.
And for 2SLS we also modify both stages by including
2SLS, bias, and consistency
In general, IV/2SLS estimators are biased.
So why do we use IV estimators in the first place?
The reason is because IV and 2SLS are consistent estimators of the causal effect we are looking for.
Recall that consistency means that as the sample size grows larger (towards infinity) the estimate should get “closer” to the true value we are trying to estimate.
This means that, in general, IV estimates are much more reliable with large sample sizes than in small samples.
Inference with 2SLS
We would like to be able to perform inference using 2SLS as opposed to IV, since 2SLS estimates come straight from one regression estimation (the second stage).
This is doable, however, the standard errors that arise out of manually estimating 2SLS are in fact not valid.
We won’t get technical, but the problem is that in the second stage, we are using a regressor that is generated from another regression.
These standard errors can be adjusted, but we will not cover how to in this class (and I will not ask you to conduct correct inference in IV as once we have the correct standard errors the procedure is no different from OLS).
Statistical packages will automatically adjust the standard errors for you if you directly estimate 2SLS instead of manually doing so.
There are multiple packages in R that allow you to do this. I won’t make you do this in HW, but if you are curious I would be happy to point you to one of them.