Urgent Econometrics test
Instrumental Variables Regression
A10.1 yi = β1 + β2xi + ei correctly describes the relationship between yi and xi in the population, where β1 and β2 are unknown (fixed) parameters and ei is an unobservable random error term.
A10.2 The data pairs (xi, yi), i = 1, …, N, are obtained by random sampling. That is, the data pairs are collected from the same population, by a process in which each pair is independent of every other pair. Such data are said to be independent and identically distributed.
A10.3 The expected value of the error term e, conditional on the value of x, is zero.
- If E(e|x) = 0, then we can show that it is also true that x and e are uncorrelated, and that cov(x, e) = 0. Explanatory variables that are not correlated with the error term are called exogenous variables.
- Conversely, if x and e are correlated, then cov(x, e) ≠ 0 and we can show that E(e|x) ≠ 0. Explanatory variables that are correlated with the error term are called endogenous variables.
Linear Regression with Random x’s
A10.4 In the sample, x must take at least two different values.
A10.5 var(e|x) = σ2. The variance of the error term, conditional on any x, is a constant σ2.
A10.6 The distribution of the error term is normal.
Assumption A10.2 states that both y and x are obtained by a sampling process, and thus are random
- This is the only one new assumption on our list
The result that under the classical assumptions, and fixed x’s, the least squares estimator is the best linear unbiased estimator, in a finite sample, or a small sample
- This means is that the result does not depend on the size of the sample
Under assumptions A10.1–A10.6:
- The least squares estimator is unbiased
- The least squares estimator is the best linear unbiased estimator of the regression parameters, and the usual estimator of σ2 is unbiased
- The distributions of the least squares estimators, conditional upon the x’s, are normal, and their variances are estimated in the usual way
- The usual interval estimation and hypothesis testing procedures are valid
If x is random, as long as the data are obtained by random sampling and the other usual assumptions hold, no changes in our regression methods are required
For the purposes of a ‘‘large sample’’ analysis of the least squares estimator, it is convenient to replace assumption A10.3 by:
A10.3* E(e) = 0 and cov(x, e) = 0
Now we can say:
Under assumptions A10.1, A10.2, A10.3*, A10.4, and A10.5, the least squares estimators:
- Are consistent.
- They converge in probability to the true parameter values as N→∞.
- Have approximate normal distributions in large samples, whether the errors are normally distributed or not.
- Our usual interval estimators and test statistics are valid, if the sample is large.
- If assumption A10.3* is not true, and in particular if cov(x,e) ≠ 0 so that x and e are correlated, then the least squares estimators are inconsistent.
- They do not converge to the true parameter values even in very large samples.
None of our usual hypothesis testing or interval estimation procedures are valid
FIGURE 10.1 (a) Correlated x and e
FIGURE 10.1 (b) Plot of data, true and fitted regression functions
The statistical consequences of correlation between x and e is that the least squares estimator is biased — and this bias will not disappear no matter how large the sample
- Consequently the least squares estimator is inconsistent when there is correlation between x and e
Endogeniety: When an explanatory variable and the error term are correlated, the explanatory variable is said to be endogenous
- This term comes from simultaneous equations models
- It means ‘‘determined within the system’’
- Using this terminology when an explanatory variable is correlated with the regression error, one is said to have an ‘‘endogeneity problem’’
1. Measurement Error: The errors-in-variables problem occurs when an explanatory variable is measured with error
- If we measure an explanatory variable with error, then it is correlated with the error term, and the least squares estimator is inconsistent
Cases in Which x and e are Correlated
Let y = annual savings and x* = the permanent annual income of a person
A simple regression model is:
- Current income is a measure of permanent income, but it does not measure permanent income exactly.
- It is sometimes called a proxy variable
- To capture this feature, specify that:
Substituting:
(10.3)
In order to estimate Eq. 10.3 by least squares, we must determine whether or not x is uncorrelated with the random disturbance e
- The covariance between these two random variables, using the fact that E(e) = 0, is:
(10.4)
The least squares estimator b2 is an inconsistent estimator of β2 because of the correlation between the explanatory variable and the error term
- Consequently, b2 does not converge to β2 in large samples
= In large or small samples b2 is not approximately normal with mean β2 and variance
2. Simultaneous Equation Bias: Another situation in which an explanatory variable is correlated with the regression error term arises in simultaneous equations models
- Suppose we write:
(10.5)
There is a feedback relationship between P and Q
- Because of this, which results because price and quantity are jointly, or simultaneously, determined, we can show that cov(P, e) ≠ 0
- The resulting bias (and inconsistency) is called the simultaneous equations bias
3. Omitted Variable Bias: When an omitted variable is correlated with an included explanatory variable, then the regression error will be correlated with the explanatory variable, making it endogenous
- Consider a log-linear regression model explaining observed hourly wage:
- What else affects wages? What have we omitted?
We might expect cov(EDUC, e) ≠ 0
- If this is true, then we can expect that the least squares estimator of the returns to another year of education will be positively biased, E(b2) > β2, and inconsistent
- The bias will not disappear even in very large samples
Estimating our wage equation, we have:
- We estimate that an additional year of education increases wages approximately 10.75%, holding everything else constant
- If ability has a positive effect on wages, then this estimate is overstated, as the contribution of ability is attributed to the education variable
When all the usual assumptions of the linear model hold, the method of moments leads to the least squares estimator
- If x is random and correlated with the error term, the method of moments leads to an alternative, called instrumental variables estimation, or two-stage least squares estimation, that will work in large samples
The kth moment of a random variable Y is the expected value of the random variable raised to the kth power:
- The kth population moment in Eq. 10.7 can be estimated consistently using the sample (of size N) analog:
The method of moments estimation procedure equates m population moments to m sample moments to estimate m unknown parameters
-Example:
Estimators Based on the Method of Moments
The first two population and sample moments of Y are:
Solve for the unknown mean and variance parameters:
And
In the linear regression model y = β1 + β2x + e, we usually assume:
(10.13)
- If x is fixed, or random but not correlated with e, then:
We have two equations in two unknowns:
These are equivalent to the least squares normal equations and their solution is:
- Under "nice" assumptions, the method of moments principle of estimation leads us to the same estimators for the simple linear regression model as the least squares principle
Suppose that there is another variable, z, such that:
- z does not have a direct effect on y, and thus it does not belong on the right-hand side of the model as an explanatory variable
- z is not correlated with the regression error term e
- Variables with this property are said to be exogenous
- z is strongly [or at least not weakly] correlated with x, the endogenous explanatory variable
- A variable z with these properties is called an instrumental variable
If such a variable z exists, then it can be used to form the moment condition:
(10.16)
Use Eqs. 10.13 and 10.16, the sample moment conditions are:
Solving these equations leads us to method of moments estimators, which are usually called the instrumental variable (IV) estimators:
These new estimators have the following properties:
- They are consistent, if z is exogenous, with E(ze) = 0
- In large samples the instrumental variable estimators have approximate normal distributions
- In the simple regression model:
These new estimators have the following properties (Continued):
- The error variance is estimated using the estimator:
Note that we can write the variance of the instrumental variables estimator of β2 as:
- Because the variance of the instrumental variables estimator will always be larger than the variance of the least squares estimator, and thus it is said to be less efficient
To extend our analysis to a more general setting, consider the multiple regression model:
- Let xK be an endogenous variable correlated with the error term
- The first K - 1 variables are exogenous variables that are uncorrelated with the error term e - they are ‘‘included’’ instruments
We can estimate this equation in two steps with a least squares estimation in each step
The first stage regression has the endogenous variable xK on the left-hand side, and all exogenous and instrumental variables on the right-hand side
- The first stage regression is:
- The least squares fitted value is:
The second stage regression is based on the original specification:
- The least squares estimators from this equation are the instrumental variables (IV) estimators
- Because they can be obtained by two least squares regressions, they are also popularly known as the two-stage least squares (2SLS) estimators
- We will refer to them as IV or 2SLS or IV/2SLS estimators
The IV/2SLS estimator of the error variance is based on the residuals from the original model:
In the simple regression, if x is endogenous and we have L instruments:
The two sample moment conditions are:
Solving using the fact that , we get:
Sometimes we have more instrumental variables at our disposal than are necessary
- Suppose we have L = 2 instruments, z1 and z2
- Then we have:
We have three sample moment conditions:
The first stage regression is a key tool in assessing whether an instrument is ‘‘strong’’ or ‘‘weak’’ in the multiple regression setting
Suppose the first stage regression equation is:
The key to assessing the strength of the instrumental variable z1 is the strength of its relationship to xK after controlling for the effects of all the other exogenous variables
Suppose the first stage regression equation is:
- We require that at least one of the instruments be strong
Consider the model with an instrumental variable MOTHEREDUC:
To implement instrumental variables estimation using the two-stage least squares approach, we obtain the predicted values of education from the first stage equation and insert it into the log-linear wage equation to replace EDUC
- Then estimate the resulting equation by least squares
- The instrumental variables estimates of the log-linear wage equation are:
Using FATHEREDUC, the first stage equation is:
Table 10.1 First-Stage Equation
The IV/2SLS estimates are:
In a multiple regression model, the coefficients are the effect of a unit change in an explanatory, independent, variable on the expected outcome, holding all other things constant
- In calculus terminology, the coefficients are partial derivatives
The multiple regression model, including all K variables, is:
Think of G = Good explanatory variables, B = Bad explanatory variables and L = Lucky instrumental variables
- It is a necessary condition for IV estimation that L ≥ B
- If L = B then there are just enough instrumental variables to carry out IV estimation
- The model parameters are said to just identified or exactly identified in this case
- The term identified is used to indicate that the model parameters can be consistently estimated
- If L > B then we have more instruments than are necessary for IV estimation, and the model is said to be overidentified
Consider the B first-stage equations:
The predicted values are:
In the second stage of estimation we apply least squares to:
Consider the model with B = 2:
The first-stage equations are:
When testing the null hypothesis H0: βk = c, use of the test statistic
is valid in large samples
- It is common, but not universal, practice to use critical values, and p-values, based on the distribution rather than the more strictly appropriate N(0,1) distribution
- The reason is that tests based on the t-distribution tend to work better in samples of data that are not large
When testing a joint hypothesis, such as H0: β2 = c2, β3 = c3, the test may be based on the chi-square distribution with the number of degrees of freedom equal to the number of hypotheses (J) being tested
- The test itself may be called a “Wald” test, or a likelihood ratio (LR) test, or a Lagrange multiplier (LM) test
-These testing procedures are all asymptotically equivalent
Unfortunately R2 can be negative when based on IV estimates
-Therefore the use of measures like R2 outside the context of the least squares estimation should be avoided
Can we test for whether x is correlated with the error term?
- This might give us a guide of when to use least squares and when to use IV estimators
Can we test if our instrument is valid, and uncorrelated with the regression error, as required?
The null hypothesis is H0: cov(x, e) = 0 against the alternative H1: cov(x, e) ≠ 0
Specification Tests
If null hypothesis is true, both the least squares estimator and the instrumental variables estimator are consistent
- Naturally if the null hypothesis is true, use the more efficient estimator, which is the least squares estimator
If the null hypothesis is false, the least squares estimator is not consistent, and the instrumental variables estimator is consistent
- If the null hypothesis is not true, use the instrumental variables estimator, which is consistent
There are several forms of the test, usually called the Hausman test
Consider the model:
- Let z1 and z2 be instrumental variables for x.
Estimate the model by least squares, and obtain the residuals .
- If there are more than one explanatory variables that are being tested for endogeneity, repeat this estimation for each one, using all available instrumental variables in each regression
Consider the model :
Include the residuals computed in step 1 as an explanatory variable in the original regression,
- Estimate this "artificial regression" by least squares, and employ the usual t-test for the hypothesis of significance
Consider the model:
If more than one variable is being tested for endogeneity, the test will be an F-test of joint significance of the coefficients on the included residuals
A test of the validity of the surplus moment conditions is:
Compute the IV estimates using all available instruments, including the G variables x1=1, x2, …, xG that are presumed to be exogenous, and the L instruments
Obtain the residuals
Regress on all the available instruments described in step 1
A test of the validity of the surplus moment conditions is (Continued):
- Compute NR2 from this regression, where N is the sample size and R2 is the usual goodness-of-fit measure
- If all of the surplus moment conditions are valid, then
- If the value of the test statistic exceeds the 100(1−α)-percentile from the distribution, then we conclude that at least one of the surplus moment conditions restrictions is not valid
Table 10.2 Hausman Test Auxiliary Regression
*
12
iii
yxv
=b+b+
*
iii
xxu
=+
(
)
(
)
*
12
12
122
12
i
yxv
xuv
xvu
xe
=b+b+
=b+b-+
=b+b+-b
=b+b+
(
)
(
)
(
)
(
)
(
)
*
2
22
22
cov,
0
u
xeExeExuvu
Eu
éù
==+-b
ëû
=-b=-bs¹
(
)
(
)
2
2
2
var
bxx
s
=-
å
12
QPe
=b+b+
(
)
2
1234
ln
WAGEEDUCEXPEREXPERe
bbbb
=++++
(
)
(
)
(
)
(
)
(
)
(
)
2
ln0.52200.10750.04160.0008
se 0.1986 0.0141
0.0132 0.0004
WAGEEDUCEXPEREXPER
=-+´+´-´
(
)
th
moment of
k
k
EYkY
=m=
(
)
·
th
ˆ
sample moment of
kk
ki
EYkYyN
=m==
å
(
)
(
)
(
)
2
222
var
YEYEY
=s=-m=-m
(
)
(
)
1
22
22
Population MomentsSample Moments
ˆ
ˆ
i
i
EYyN
EYyN
=m=mm=
=mm=
å
å
ˆ
i
yNy
m==
å
(
)
2
222
222
2
ˆˆ
i
ii
yy
yyNy
y
NNN
-
-
s=m-m=-==
å
åå
%
(
)
(
)
12
00
ExeExyx
=Þé-b-bù=
ëû
(
)
(
)
12
00
iii
EeEyx
=Þ-b-b=
(
)
(
)
12
12
1
0
1
0
ii
iii
ybbx
N
xybbx
N
--=
--=
å
å
(
)
(
)
(
)
2
2
12
ii
i
xxyy
b
xx
bybx
--
=
-
=-
å
å
(
)
(
)
12
00
EzeEzyx
=Þé-b-bù=
ëû
(
)
(
)
12
12
1
ˆˆ
0
1
ˆˆ
0
ii
iii
yx
N
zyx
N
-b-b=
-b-b=
å
å
(
)
(
)
(
)
(
)
2
12
ˆ
ˆˆ
ii
iiii
iiiiii
zzyy
Nzyzy
Nzxzxzzxx
yx
--
-
b==
---
b=-b
å
ååå
åååå
(
)
2
22
2
2
ˆ
~,
zxi
N
rxx
æö
s
bb
ç÷
ç÷
-
èø
å
(
)
2
12
2
ˆˆ
ˆ
2
ii
IV
yx
N
-b-b
s=
-
å
(
)
(
)
(
)
2
2
2
2
2
2
var
ˆ
var
zx
zxi
b
r
rxx
s
b==
-
å
2
1
zx
r
<
122
βββ
KK
yxxe
=++++
L
1221111
KKKLLK
xxxzzv
--
=g+g++g+q++q+
LL
1221111
ˆˆ
ˆˆˆ
ˆ
KKKLL
xxxzz
--
=g+g++g+q++q
LL
*
122
ˆ
βββ
KK
yxxe
=++++
L
(
)
2
122
2
ˆˆˆ
βββ
ˆ
σ
iiKKi
IV
yxx
NK
----
=
-
å
L
111
ˆˆ
ˆ
ˆ
LL
xzz
=g+q++q
L
µ
µ
µ
µ
µ
12
12
1
()0
1
()0
ii
iii
yx
N
xyx
N
-b-b=
-b-b=
å
å
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
2
12
ˆˆ
ˆ
ˆ
β
ˆ
ˆˆ
ˆˆ
ββ
ii
ii
ii
ii
xxyy
xxyy
xxxx
xxxx
yx
--
--
==
--
--
=-
å
å
å
å
ˆ
xx
=
(
)
(
)
2212
ββ0
EzeEzyx
éù
=--=
ëû
(
)
(
)
(
)
12
1122
2123
1
ˆˆ
ˆ
ββ0
1
ˆˆ
ˆ
ββ0
1
ˆˆ
ˆ
ββ0
iii
iii
iii
yxm
N
zyxm
N
zyxm
N
--==
--==
--==
å
å
å
1221111
KKKK
xxxzv
--
=g+g++g+q+
L
·
(
)
(
)
(
)
(
)
(
)
2
9.77510.04890.00130.2677
se 0.42490.0417 0.0012
0.0311
EDUCEXPEREXPERMOTHEREDUC
=+-+
(
)
·
(
)
(
)
(
)
(
)
(
)
2
ln0.19820.04930.04490.0009
se 0.4729 0.0374 0.
0136 0.0004
WAGEEDUCEXPEREXPER
=++-
2
12312
γγγθθ
EDUCEXPEREXPERMOTHEREDUCFATHEREDUCv
=+++++
(
)
·
(
)
(
)
(
)
(
)
(
)
2
ln0.04810.06140.04420.0009
se 0.4003 0.0314 0.
0134 0.0004
WAGEEDUCEXPEREXPER
=++-
12211
GexogenousvariablesBendogenousvariables
GGGGKK
yxxxxe
++
=b+b+b+b++b+
ÁÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÂÁÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃ
ÃÃÃÃÃÂ
LL
12211
,
1,,
GjjjGjGjLjLj
xxxzzv
jB
+
=g+g++g+q++q+
=
LL
K
12211
ˆˆ
ˆˆˆ
ˆ
,
1,,
GjjjGjGjLjL
xxxzz
jB
+
=g+g++g+q++q
=
LL
K
*
12211
ˆˆ
GGGGKK
yxxxxe
++
=b+b+b+b++b+
LL
1221111
GGGGGG
yxxxxe
++++
=b+b+b+b+b+
L
11121211112121
21222221212222
γγγθθ
γγγθθ
GGG
GGG
xxxzzv
xxxzzv
+
+
=++++++
=++++++
L
L
(
)
(
)
ˆˆ
se
kk
tc
=b-b
11122
xzzv
=g+q+q+
11122
ˆˆ
ˆ
ˆ
vxzz
=-g-q-q
12
yxe
=b+b+
12
ˆ
yxve
=b+b+d+
(
)
(
)
0
1
:0no correlation between and
:0correlation between and
Hxe
Hxe
d=
d¹
ˆ
k
b
122
ˆ
ˆ
.
ˆˆ
KK
eyxx
=-b-b--b
L
ˆ
e
22
()
.
~
LB
NR
-
c