microeconometric
Microeconometrics 440.618. Week 2 Problem Set Content Owned by N. Goldstein
1. In Problem 8 of Problem Set 1 we considered the estimator γ̂ = θ̂1 θ̂2
of the parameter γ = θ1 θ2
.
Assuming that θ̂ = ( θ̂1 θ̂2
) is asymptotically normally distributed, use the Delta Method to
find the estimate of the asymptotic standard error of γ̂ if
θ̂ =
−1.5
0.5
Âvar(θ̂) =
1 −0.4 −0.4 2
2. We know from Problem 7(b) in Problem Set 1 that, when the dependent variable y is ex- pressed as a log, the exact percentage change in yi from a one-unit change in a discrete xj is exp{βj}−1. Consider the vector function
g(β) =
exp{β1}−1
exp{β2}−1
Using the Delta Method, find the estimator of the asymptotic variance-covariance matrix of g(β̂) as a function of β̂1, β̂2, Âvar[β̂1], Âvar[β̂2], and Âcov[β̂1, β̂2].
3. Let θ̂ and θ̃ be two consistent, asymptotically normal estimators of the P × 1 parameter vector θ with Avar[
√ n(θ̂ − θ)] = V1 and Avar[
√ n(θ̃ − θ)] = V2. Define the Q × 1 param-
eter vector γ ≡ g(θ) for which g(·) is a continuously differentiable function. Show that, if θ̂ is asymptotically efficient relative to θ̃, then γ̂ ≡ g(θ̂) is asympotically efficient relative to γ̃ ≡ g(θ̃). (Hint: first express Avar[
√ n(γ̂−γ)] and Avar[
√ n(γ̃−γ)] using the Delta Method.)
4. This problem illustrates the ambiguity that can be caused by the invariance of the Wald statistic. Let θ̂ be an asymptotically normal estimator for the scalar θ > 0. Let γ̂ = log(θ̂) be an estimator of γ = log(θ).
(a) Show that γ̂ is a consistent estimator of γ.
(b) Using the Delta Method, find the asymptotic variance of γ̂ in terms of the asymptotic variance of θ̂.
(c) Suppose that, for a sample of data, θ̂ = 4 and var[θ̂] = 4. What are the estimates γ̂ and var[γ̂]?
(d) What is the t-statistic for testing H0 : θ = 1 given the results in (c)?
(e) State the null hypothesis in (d) equivalently in terms of γ, and use γ̂ and var[γ̂] in (c) to test that null. What do you conclude?
Page 1
Microeconometrics 440.618. Week 2 Problem Set Content Owned by N. Goldstein
5. This problem uses data from attend.txt which contains 680 observations about student class attendance and academic achievement. The variables include:
Variable Name Variable Label attend classes attended out of 32 termgpa GPA for that term priGPA cumulative GPA prior to that term ACT ACT score final final exam score hwrte percent of homework turned in frosh =1 if freshman soph =1 if sophomore stndfnl standardized final exam score
(a) These data include students from multiple classes. The standardized final score stndfnli is calculated as finali−x
sx , where x and sx are the mean and standard deviation of final for the
course attended by student i. Why is it sensible to use stndfnl rather than final as a measure of academic achievement?
(b) Let atndrtei be the attendance rate (i.e., attendi/32). To determine the effect of attend- ing lecture on final exam performance, estimate
stndfnli = β0 + β1 atndrtei + β2 froshi + β3 sophi + ui
by OLS. Provide the partial effect of a change in the attendance rate on the standardized final score and its robust standard error.
(c) How confident are you that the OLS estimate in (b) is estimating the causal effect of attendance? Explain.
(d) As an alternative to robust standard errors, estimate the equation in (b) by weighted least squares (WLS). To do this, generate ûi from an OLS regression and calculate û2i as an estimate of var[ui|xi]; then weight the data by 1√
û2i and re-estimate by OLS. (Remember to
weight all dependent and independent variables, including the intercept.) Provide the partial effect of a change in the attendance rate on the standardized final score and its standard error, and explain the assumptions that underlie the validity of the estimation.
(e) Add priGPA and ACT to the equation in (b) as proxy variables for student ability and explain how this changes the estimated partial effect and its robust standard error. (We will cover the theory involving proxy variables in detail during Lecture 3.)
(f) Using the estimation in (e), perform two different tests–a robust Wald test and a non- robust CR test–of the significance of atndrte. Do you reach the same conclusion in each test? (Note that, in an OLS setting, the non-robust CR test of parameter restrictions is commonly called referred to as an “F-test.” Also, to say one is “testing the significance” of a variable is shorthand for testing whether the coefficient on that variable is statistically different from zero.)
Page 2
Microeconometrics 440.618. Week 2 Problem Set Content Owned by N. Goldstein
(g) Using the estimation in (e), perform a robust LM test of the significance of atndrte two different ways—using the traditional matrix form of the statistic, and using the regression- based version. Do you obtain the same result? (Note: Using the traditional matrix form of the statistic does not require that you to form and multiply the matrices yourself. Statistical software like SAS and Stata use the matrix form of the statistic in any procedure that includes the LM test statistic as an option.)
(h) Using the estimation in (e), perform a robust Wald test of the null hypothesis H0 : β2 = β3 using two different methods and show that both options produce the same test statistic. First, use the sum-of-a-variance formula to calculate the standard error of β2 − β3, which you can calculate by hand. Second, rewrite the equation so that the linear combination β2 −β3 enters as a single parameter, denoted θ, and reestimate the equation.
(i) Add quadratics (i.e., squares) of priGPA and ACT and the interaction term priGPA · ACT as regressors in the equation in (e), and de-mean the appropriate variables such that the average partial effect of a change in priGPA on the standardized final score is identified by a single parameter. Estimate this modified equation and provide an estimate of that average partial effect and its robust standard error.
(j) Would you feel confident using the estimation in (i) to perform hypothesis tests involv- ing β0? Why or why not?
6. This problem uses data from nls80.txt which contains observations about the wages and characteristics of 935 working men in 1980. These data were used by Blackburn and Neumark (1992) (“Unobserved Ability, Efficiency Wages, and Interindustry Wage Differentials,” Quar- terly Journal of Economics 107, 1421-1436) to estimate the determinants of wages. The variables include:
Variable Name Variable Label wage monthly earnings hours average weekly hours IQ IQ score KWW knowledge of world work score educ years of education exper years of work experience tenure years with current employer age age in years married =1 if married black =1 if black south =1 if lives in south urban =1 if lives in SMSA sibs number of siblings brthord birth order meduc mother’s years of education feduc father’s years of education
Consider the population model
Page 3
Microeconometrics 440.618. Week 2 Problem Set Content Owned by N. Goldstein
log(wagei) = β0 + β1 experi + β2 tenurei + β3 marriedi + β4 southi + β5 urbani+
β6 blacki + β7 educi + abilityi + ui
Note that ability is unobserved.
(a) Why is it sensible to use log(wage) rather than wage as the dependent variable?
(b) Ignoring ability, estimate the model by OLS using robust standard errors. Provide 95% confidence intervals for the estimated return to education (i.e., the estimated partial effect of an extra year of education on the log wage) and the estimated log wage differential between black and non-black workers.
(c) Perform a robust Wald test and a non-robust CR test of the joint significance of educ and black. Do you reach the same conclusion in each test?
(d) Explain how to convert your non-robust CR test statistic in (c), which is asymptotically distributed as an F, into one that is asymptotically distributed as a χ2. Do you reach the same conclusion using either statistic?
(e) Using the estimates from (b), calculate the exact percentage changes in wage in re- sponse to a change in educ and black, respectively. Write in matrix form the robust Wald statistic for a test of their joint significance, which is a function of β̂6, β̂7, Âvar[β̂6], Âvar[β̂7], and Âcov[β̂6, β̂7]. (Hint: You have already found the formula for the relevant asymptotic variance-covariance matrix in Problem 2.)
(f) Consider the model in (b) but add as regressors iq, kww, and three interaction terms: iq ·educ, iq ·kww, and iq ·educ ·kww. Try to perform a RESET test for neglected nonlinearities by adding {ŷ2, ŷ3, ŷ4}. Why does it fail? Adjust the procedure to successfully perform the test.
(g) Using h(xi) = { ŷi, ŷ
2 i
} , perform a robust Wald test of homooscedasticity in u for the
model described in (f). What does the result tell you about the validity of your F-test in (c)?
7. [Extra Credit] It is common in economic models to express the dependent and indepen- dent variables as natural logs and interpret the coefficient on the independent variable as an elasticity. In Problem 6 of Problem Set 1 we considered the strong assumption about u under which this is valid. This problem examines a second issue, namely that the log-log regression strategy is feasible if and only if y and x are positive; one cannot take the natural log of a nonpositive number. In Problem 5, stndfnl can assume a negative value, and therefore any elasticity must be calculated from the formula ∂E[yi|xi]
∂xji · xji E[yi|xi]
.
(a) Using the estimation in Problem 5(e), provide an estimate of the average elasticity of stndfnl in response to a change in ACT . Calculate its robust standard error, noting that the average elasticity is a function of both parameters and variables and therefore you will need
to separately calculate the components 1 n Âvar[g(xi,yi,β)], 1n
n∑ i=1
G(xi,yi, β̂), and Âvar[β̂].
(b) Comparing the results in (a) with the results of Problem 5(e), the average elasticity
Page 4
Microeconometrics 440.618. Week 2 Problem Set Content Owned by N. Goldstein
found in (a) appears to have the wrong sign. Why did this happen? (Hint: examine the values of �̂i.)
(c) Comparing the results in (a) with the results of Problem 5(e), the average elasticity is very imprecisely estimated whereas the estimated partial effect of ACT is very precisely estimated. Provide some intuition for why this occurs.
(d) Repeat the calculation in (a) but ignore the most extreme values of �̂. (You can, for instance, eliminate the smallest and largest values.) What do you find?
Page 5