microeconometric

asdf123
Handout1_StandardErrorsOfVectorFunctions1.pdf

Microeconometrics 440.618. Handout #1 Content Owned by N. Goldstein

Estimating the Asymptotic Standard Errors of Vector Functions

I. Overview

This handout derives the asymptotic variance-covariance matrix estimators of a vector func- tion g(·). Although the most general derivation requires the use of Score vectors and Hessian matrices that are not formally introduced until Lecture 9, that derivation is presented here in order to provide a unified framework for all of the cases that arise in this course.

• The Delta Method. The first variance-covariance matrix estimator considers Avar[g(θ̂)], the simplest setting in which g(·) depends on the parameter estimates θ̂ only. In this setting g(·), does not vary by cross section unit i. The resulting formula is referred to as the Delta Method:

Âvar[g(θ̂)] = G(θ̂) · Âvar[θ̂] ·G(θ̂)′

for which G(·) is the Jacobian (i.e., the matrix of partial derivatives) of g(·).

• The General Formula. The second estimator considers Avar[g(wi, θ̂)], the far more com- plicated setting in which g(·) depends on both parameter estimates θ̂ and random variables wi = {yi,xi} (with y endogenous and x exogenous). This formula can accommodate situ- ations in which one wants to impose assumptions about the distribution of y|x as well as situations in which no such assumptions are made. Because g(·) varies by cross section unit

i, the measurement of interest is usually the average g = 1 n

n∑ i=1

g(wi, θ̂), and therefore the

variance-covariance matrix estimator of interest1 is

Âvar[g] = 1 n Âvar

[ g(wi, θ̂)

] + G · Âvar[θ̂] ·G′ − 1

n gs ·H−1 ·G′ − 1

n G ·H−1 ·gs′

for which, denoting G(wi, θ̂) as the Jacobian of g(wi, θ̂), s(wi, θ̂) as the Score vector, and H(wi, θ̂) as the Hessian matrix, and θ0 as the true value of θ,

Âvar[g(wi, θ̂)] =

 

1 n

n∑ i=1

Ey[g(wi,θ0) g(wi,θ0) ′|xi] −

( 1 n

n∑ i=1

Ey[g(wi,θ0)|xi] )(

1 n

n∑ i=1

Ey[g(wi,θ0)|xi] )′

if imposing assumptions about y|x and expectation has closed-form solution

1 n

n∑ i=1

g(wi, θ̂) g(wi, θ̂) ′ − (

1 n

n∑ i=1

g(wi, θ̂)

)( 1 n

n∑ i=1

g(wi, θ̂)

)′ otherwise

1This estimator is most commonly used for conditional moment tests with quasi-maximum likelihood estima- tion, which are introduced in Lecture 11. There are, of course, measurements other than g that can be calculated, but their variance-covariance matrix estimators are not addressed in this course.

Page 1

Microeconometrics 440.618. Handout #1 Content Owned by N. Goldstein

G =

 

1 n

n∑ i=1

Ey[G(wi,θ0)|xi] if imposing assumptions about y|x and expectation has a closed-form solution

1 n

n∑ i=1

G(wi, θ̂) otherwise

H =

 

1 n

n∑ i=1

Ey[H(wi,θ0)|xi] if imposing assumptions about y|x and expectation has a closed-form solution

1 n

n∑ i=1

H(wi, θ̂) otherwise

gs =

 

1 n

n∑ i=1

Ey[g(wi,θ0) s(wi,θ0) ′|xi]

if imposing assumptions about y|x and expectation has a closed-form solution

1 n

n∑ i=1

g(wi, θ̂) s(wi, θ̂) ′ otherwise

There are three special cases that greatly simplify the second estimator:

Special Case #1: g(·) is a function of θ̂ and xi but not yi.2 In this setting, g(xi,θ) is uncorre- lated with s(wi,θ) such that gs = 0, resulting in

Âvar[g] = 1 n Âvar[g(xi, θ̂)] +

( 1 n

n∑ i=1

G(xi, θ̂)

) · Âvar[θ̂] ·

( 1 n

n∑ i=1

G(xi, θ̂)

)′

Special Case #2: g(·) is a function of θ̂ only. In this setting, g(θ) is uncorrelated with s(wi,θ), and g(θ) and G(θ) do not vary with i. Therefore, Âvar[g(θ̂)] = 0, G = G(θ̂), and gs = 0, resulting in the Delta Method formula given above.

Special Case #3: g(·) is a function θ̂ and wi = {xi,yi}, has an expectation of zero, and the entire conditional distribution of yi|xi is both correctly specified and imposed during the estimation.3 In this setting, E[s(wi,θ0) s(wi,θ0)′] = −E[H(wi,θ0)], E[g(wi,θ0) s(wi,θ0)′] = −E[G(wi,θ0)], and Avar[g(wi, θ̂)] = E[g(wi,θ0) g(wi,θ0)′]. Therefore, ss = −H and gs = −G, resulting in

Âvar[g] = 1 n Âvar[g(wi, θ̂)] + G · Âvar[θ̂] ·G

for which 2This case most commonly arises when g(xi, θ̂) represents the estimated partial effect for cross section unit i

and therefore g represents the average partial effect. 3This case is most commonly used for conditional moment tests with conditional maximum likelihood estima-

tion, which are introduced in Lecture 11.

Page 2

Microeconometrics 440.618. Handout #1 Content Owned by N. Goldstein

Âvar[g(wi, θ̂)] =

 

1 n

n∑ i=1

Ey[g(wi,θ0) g(wi,θ0) ′|xi] if expectation has closed-form solution

1 n

n∑ i=1

g(wi, θ̂) g(wi, θ̂) ′ otherwise

G =

 

1 n

n∑ i=1

Ey[G(wi,θ0)|xi] if expectation has closed-form solution

1 n

n∑ i=1

G(wi, θ̂) otherwise

II. The Delta Method: Estimating Avar[g(θ̂)]

A. The Derivation

A mean-value expansion of g(θ̂) around the true value θ0 produces

g(θ̂) = g(θ0) + G(θ̈) · (θ̂ −θ0)

for which G(θ̈) ≡ 5θ g(θ̈) is the Jacobian of g(θ̈) and the qth row of G(θ̈) is evaluated at an unknown value θ̈q that is “trapped” between θ̂q and θ0q. Because plim[θ̂] = θ0 and θ̈ is trapped between θ̂ and θ0, it must be that plim[θ̈] = θ0. By the Slutsky Theorem,

plim[G(θ̈)] = G(plim[θ̈]) = G(θ0)

Substituting,

g(θ̂) a ≈ g(θ0) + G(θ0) · (θ̂ −θ0)

which implies

√ n [ g(θ̂)−g(θ0)

] a ≈ G(θ0) ·

√ n (θ̂ −θ0)

When the Central Limit Theorem applies to the estimation of θ (which is always the case for the procedures covered in this course),

√ n ( θ̂ −θ0

) a∼ N (0,V)

for some positive-definite matrix V. By the Continuous Mapping Theorem,

Page 3

Microeconometrics 440.618. Handout #1 Content Owned by N. Goldstein

G(θ0) · √ n (θ̂ −θ0)

a∼ N(0,G(θ0) ·V ·G(θ0)′)

Substituting,

√ n [ g(θ̂)−g(θ0)

] a∼ N(0,G(θ0) ·V ·G(θ0)′)

Substituting consistent parameter estimates for parameters and sample averages for popula- tion moments,

Âvar[g(θ̂)] ≈ 1 n G(θ̂) · V̂ ·G(θ̂)′ = G(θ̂) · 1

n V̂ ·G(θ̂)′ = G(θ̂) · Âvar[θ̂] ·G(θ̂)′

B. Examples

For the examples below, suppose that we have the consistent parameter estimates

θ̂ =

  θ̂1

θ̂2

 

and the variance-covariance matrix of the parameter estimates

Âvar[θ̂] =

  Âvar[θ̂1] Âcov[θ̂1, θ̂2]

Âcov[θ̂2, θ̂1] Âvar[θ̂2]

 

Note that, when applying the Delta Method, the Jacobian will always be of dimension Q×P , for which Q is the number of restrictions (i.e., the number of rows in g(·)) and P is the number of parameters that comprise θ.

Example 1. Consider the asymptotic variance of the function g(θ̂1) = log(θ̂1). Because g(θ1) is a scalar function of one parameter, its Jacobian will be a scalar function of one parameter given by

G(θ1) = ∂g(θ1) ∂θ1

= 1 θ1

(This is why, in this example, both g(·) and G(·) are unbolded.) Therefore,

Âvar[g(θ̂1)] = G(θ̂1) · Âvar[θ̂1] ·G(θ̂1) = 1 θ̂1 · Âvar[θ̂1] · 1

θ̂1 = 1

θ̂21 · Âvar[θ̂1]

Example 2. Consider the asymptotic variance-covariance matrix of the vector function

Page 4

Microeconometrics 440.618. Handout #1 Content Owned by N. Goldstein

g(θ̂1) =

  g1(θ̂1)

g2(θ̂1)

  =

  log(θ̂1)

exp{θ̂1}

 

Because g(θ1) is a vector function of dimension two involving one parameter, its Jacobian will be a 2×1 vector of that parameter given by

G(θ1) =

 

∂g1(θ1) ∂θ1

∂g2(θ1) ∂θ1

  =

  1θ1

exp{θ1}

 

(This is why, in this example, g(·) and G(·) are bolded.) Therefore,

Âvar[g(θ̂1)] = G(θ̂1) · Âvar[θ̂1] ·G(θ̂1)′

=

 

1 θ̂1

exp{θ̂1}

  · Âvar[θ̂1] ·( 1θ̂1 exp{θ̂1} )

Example 3. Consider the asymptotic variance of the function g(θ̂) = log(θ̂1 θ̂2). Because g(θ) is a scalar function of two parameters, its Jacobian will be a 1 × 2 vector of two parameters given by

G(θ) = (

∂g(θ) ∂θ1

∂g(θ) ∂θ2

) = (

1 θ1

1 θ2

) (This is why, for this example, g(·) is unbolded but G(·) is bolded.) Therefore,

Âvar[g(θ̂)] = G(θ̂) · Âvar[θ̂] ·G(θ̂)′

= (

1 θ̂1

1 θ̂2

) Âvar[θ̂1] Âcov[θ̂1, θ̂2] Âcov[θ̂2, θ̂1] Âvar[θ̂2]

   

1 θ̂1

1 θ̂2

 

= (

1 θ̂1 · Âvar[θ̂1] + 1

θ̂2 · Âcov[θ̂2, θ̂1] 1

θ̂1 · Âcov[θ̂1, θ̂2] + 1

θ̂2 · Âvar[θ̂2]

) 1 θ̂1

1 θ̂2

 

= 1 θ̂1 · Âvar[θ̂1] · 1

θ̂1 + 1

θ̂2 · Âcov[θ̂2, θ̂1] · 1

θ̂1 + 1

θ̂1 · Âcov[θ̂1, θ̂2] · 1

θ̂2 + 1

θ̂2 · Âvar[θ̂2] · 1

θ̂2

= 1 θ̂21 · Âvar[θ̂1] + 1

θ̂22 Âvar[θ̂2] +

2 θ̂1·θ̂2

· Âcov[θ̂1, θ̂2]

Page 5

Microeconometrics 440.618. Handout #1 Content Owned by N. Goldstein

Example 4. Consider the asymptotic variance-covariance matrix of the vector function

g(θ̂) =

  g1(θ̂1, θ̂2)

g2(θ̂1, θ̂2)

  =

  log(θ̂1 · θ̂2)

exp{θ̂1 · θ̂2}

 

Because g(θ) is a vector function of dimension two involving two parameters, its Jacobian will be a 2×2 vector of two parameters given by

G(θ) =

 

∂g1(θ) ∂θ1

∂g1(θ) ∂θ2

∂g2(θ) ∂θ1

∂g2(θ) ∂θ2

  =

  1θ1 1θ2

θ2 exp{θ1 ·θ2} θ1 exp{θ1 ·θ2}

 

(This is why, for this example, g(·) and G(·) are bolded.) Therefore,

Âvar[g(θ̂)] = G(θ̂) · Âvar[θ̂] ·G(θ̂)′

=

  1θ̂1 1θ̂2

θ̂2 exp{θ̂1 · θ̂2} θ̂1 exp{θ̂1 · θ̂2}

    Âvar[θ̂1] Âcov[θ̂1, θ̂2]

Âcov[θ̂2, θ̂1] Âvar[θ̂2]

   

1

θ̂1 θ̂2 exp{θ̂1 · θ̂2}

1

θ̂2 θ̂1 exp{θ̂1 · θ̂2}

 

= ( θ̂2 θ̂1 · Âvar[θ̂1] + 2 Âcov[θ̂1, θ̂2] + θ̂1

θ̂2 · Âvar[θ̂2]

) 1

θ̂1·θ̂2 exp{θ̂1 · θ̂2}

exp{θ̂1 · θ̂2} θ̂1 · θ̂2

 

III. The General Method: Estimating Avar[g(wi, θ̂)]

The results in this section rely on the assumption Ey[s(wi,θ0)|xi] = 0.

A. The Derivation

The derivation requires two mean-value expansions. First, a mean-value expansion of n∑ i=1

s(wi, θ̂)

around the true value θ0 generates

n∑ i=1

s(wi, θ̂) = n∑ i=1

s(wi,θ0) +

( n∑ i=1

H(wi, θ̈1)

) · (θ̂ −θ0)

for which the pth row of H(wi, θ̈1) is evaluated at an unknown value θ̈p that is “trapped” between θ̂p and θ0p. By definition, the sum of the Score vectors evaluated at θ̂ equals the zero vector because that is the solution of the minimization or maximization problem that generates θ̂. Setting this expansion equal to the zero vector and dividing by

√ n,

Page 6

Microeconometrics 440.618. Handout #1 Content Owned by N. Goldstein

1√ n

n∑ i=1

s(wi,θ0) +

( 1 n

n∑ i=1

H(wi, θ̈1)

) · √ n (θ̂ −θ0) = 0

Therefore,

√ n(θ̂ −θ0) =

( 1 n

n∑ i=1

H(wi, θ̈1)

)−1 · ( − 1√

n

n∑ i=1

s(wi,θ0)

) By the Slutsky theorem and the Weak Law of Large Numbers,

plim

[( 1 n

n∑ i=1

H(wi, θ̈)

)−1] = plim

[ 1 n

n∑ i=1

H(wi, θ̈)

]−1 = E [H(wi,θ0)]

−1

such that

√ n(θ̂ −θ0)

a ≈ E[H(wi,θ0)]−1 ·

( − 1√

n

n∑ i=1

s(wi,θ0)

) for which the right-hand side consists of i.i.d terms and has zero expectation:

E

[ E[H(wi,θ0)]

−1 · ( − 1√

n

n∑ i=1

s(wi,θ0)

)] = − 1√

n E[H(wi,θ0)]

−1 · n∑ i=1

E[s(wi,θ0)]︸ ︷︷ ︸ equals 0

Therefore, by the Central Limit Theorem,

√ n(θ̂ −θ0)

a∼ N(0,V1)

for which

V1 = Avar[E[H(wi,θ0)] −1 ·s(wi,θ0)]

= E [ E[H(wi,θ0)]

−1 ·s(wi,θ0) s(wi,θ0)′ ·E[H(wi,θ0)]−1 ]

= E[H(wi,θ0)] −1 ·E[s(wi,θ0) s(wi,θ0)′] ·E[H(wi,θ0)]−1

Substituting consistent parameter estimates for population parameters and sample averages for population moments,

Âvar[θ̂] = 1 n V̂1

= 1 n

( 1 n

n∑ i=1

H(wi, θ̂)

)−1 · (

1 n

n∑ i=1

s(wi, θ̂) s(wi, θ̂) ′ ) · (

1 n

n∑ i=1

H(wi, θ̂)

)−1

=

( n∑ i=1

H(wi, θ̂)

)−1 · (

n∑ i=1

s(wi, θ̂) s(wi, θ̂) ′ ) · (

n∑ i=1

H(wi, θ̂)

)−1

Page 7

Microeconometrics 440.618. Handout #1 Content Owned by N. Goldstein

A second mean-value expansion of n∑ i=1

g(wi, θ̂) around θ0 generates

n∑ i=1

g(wi, θ̂) = n∑ i=1

g(wi,θ0) +

( n∑ i=1

G(wi, θ̈2)

) · (θ̂ −θ0)

for which the qth row of G(wi, θ̈2) is evaluated at an unknown value θ̈q that is “trapped” between θ̂q and θ0q. Dividing by

√ n and substituting for

√ n(θ̂ −θ0),

1√ n

n∑ i=1

g(wi, θ̂) = 1√ n

n∑ i=1

g(wi,θ0) +

( 1 n

n∑ i=1

G(wi, θ̈2)

) · √ n (θ̂ −θ0)

= 1√ n

n∑ i=1

g(wi,θ0)+

( 1 n

n∑ i=1

G(wi, θ̈2)

) · (

1 n

n∑ i=1

H(wi, θ̈1)

)−1 · ( − 1√

n

n∑ i=1

s(wi,θ0)

)

Because plim[θ̂] = θ0 and because θ̈1 and θ̈2 are each trapped between θ̂ and θ0, it must be

that plim[θ̈1] = plim[θ̈2] = θ0. By the Weak Law of Large Numbers, plim [ 1 n

n∑ i=1

G(wi, θ̂)

] =

E [G(wi,θ0)] and plim [ 1 n

n∑ i=1

H(wi, θ̂)

] = E [H(wi,θ0)]. Therefore,

1√ n

n∑ i=1

g(wi, θ̂) a ≈ 1√

n

n∑ i=1

g(wi,θ0) + E [G(wi,θ0)] ·E [H(wi,θ0)] −1 ·

( − 1√

n

n∑ i=1

s(wi,θ0)

)

= 1√ n

n∑ i=1

( g(wi,θ0)−E[G(wi,θ0)] ·E[H(wi,θ0)]−1 ·s(wi,θ0)

) In order to generate an expression of i.i.d. terms with zero expectation that can be applied to the Central Limit Theorem, subtracting

√ n E[g(wi,θ0)] from both sides generates

( 1√ n

n∑ i=1

g(wi, θ̂)

) − √ n E[g(wi,θ0)]

a ≈

( 1√ n

n∑ i=1

( g(wi,θ0)−E[G(wi,θ0)] ·E[H(wi,θ0)]−1 ·s(wi,θ0)

)) − √ n E[g(wi,θ0)]

which, denoting g = 1 n

n∑ i=1

g(wi, θ̂), can be written as

√ n(g −E[g(wi,θ0)])

a ≈

1√ n

n∑ i=1

( g(wi,θ0)−E[g(wi,θ0)]−E[G(wi,θ0)] ·E[H(wi,θ0)]−1 ·s(wi,θ0)

)

Page 8

Microeconometrics 440.618. Handout #1 Content Owned by N. Goldstein

Note that the right-hand side is a linear combination of i.i.d. terms with zero expectation:

E

[ 1√ n

n∑ i=1

( g(wi,θ0)−E[g(wi,θ0)]−E[G(wi,θ0)] ·E[H(wi,θ0)]−1 ·s(wi,θ0)

)] =

1√ n

n∑ i=1

 E[g(wi,θ0)]−E[g(wi,θ0)]︸ ︷︷ ︸

equals zero

−E[G(wi,θ0)] ·E[H(wi,θ0)]−1 ·E[s(wi,θ0)]︸ ︷︷ ︸ equals zero

 

Therefore, by the Central Limit Theorem,

√ n(g −E[g(wi,θ0)])

a∼ N(0,V2)

for which

V2 = Avar [ g(wi,θ0)−E[g(wi,θ0)]−E[G(wi,θ0)] ·E[H(wi,θ0)]−1 ·s(wi,θ0)

] = E

[( g(wi,θ0)−E[g(wi,θ0)]−E[G(wi,θ0)] ·E[H(wi,θ0)]−1 ·s(wi,θ0)

) ·(

g(wi,θ0)−E[g(wi,θ0)]−E[G(wi,θ0)] ·E[H(wi,θ0)]−1 ·s(wi,θ0) )′]

Substituting consistent parameter estimates for population parameters and sample averages for population moments, one form of the variance-covariance matrix estimator is

Âvar[g] = 1 n V̂2

= 1 n

n∑ i=1

( g(wi, θ̂)−g −

( 1 n

n∑ i=1

G(wi, θ̂)

) · (

1 n

n∑ i=1

H(wi, θ̂)

)−1 ·s(wi, θ̂)

) ·(

g(wi, θ̂)−g − (

1 n

n∑ i=1

G(wi, θ̂)

) · (

1 n

n∑ i=1

H(wi, θ̂)

)−1 ·s(wi, θ̂)

)′

To find an alternative form of the estimator, expand V2:

V2 = E [( g(wi,θ0)−E[g(wi,θ0)]−E[G(wi,θ0)] ·E[H(wi,θ0)]−1 ·s(wi,θ0)

) ·(

g(wi,θ0)−E[g(wi,θ0)]−E[G(wi,θ0)] ·E[H(wi,θ0)]−1 ·s(wi,θ0) )′]

= E [(g(wi,θ0)−E[g(wi,θ0)]) (g(wi,θ0)−E[g(wi,θ0)])′]− E [ g(wi,θ0) ·

( E[G(wi,θ0)] ·E[H(wi,θ0)]−1 ·s(wi,θ0)

)′] +

E [ E[g(wi,θ0)] ·

( E[G(wi,θ0)] ·E[H(wi,θ0)]−1 ·s(wi,θ0)

)′]− E [ E[G(wi,θ0)] ·E[H(wi,θ0)]−1 ·s(wi,θ0) g(wi,θ0)′

] −

E [ E[G(wi,θ0)] ·E[H(wi,θ0)]−1 ·s(wi,θ0) ·E[g(wi,θ0)]′

] +

E [ E[G(wi,θ0)] ·E[H(wi,θ0)]−1 ·s(wi,θ0) ·

( E[G(wi,θ0)] ·E[H(wi,θ0)]−1 ·s(wi,θ0)

)′]

Page 9

Microeconometrics 440.618. Handout #1 Content Owned by N. Goldstein

= E [(g(wi,θ0)−E[g(wi,θ0)]) (g(wi,θ0)−E[g(wi,θ0)])′]︸ ︷︷ ︸ equals Avar[g(wi, θ0)]

E [g(wi,θ0) s(wi,θ0) ′] ·E[H(wi,θ0)]−1 ·E[G(wi,θ0)]′+

E[g(wi,θ0)] ·E[s(wi,θ0)′]︸ ︷︷ ︸ equals zero

·E[H(wi,θ0)]−1 ·E[G(wi,θ0)]′−

E[G(wi,θ0)] ·E[H(wi,θ0)]−1 ·E[s(wi,θ0) g(wi,θ0)′]−

E[G(wi,θ0)] ·E[H(wi,θ0)]−1 ·E[s(wi,θ0)]︸ ︷︷ ︸ equals zero

·E[g(wi,θ0)]′+

E[G(wi,θ0)] ·E[H(wi,θ0)]−1 ·E[s(wi,θ0) s(wi,θ0)′] ·E[H(wi,θ0)]−1︸ ︷︷ ︸ equals V1

·E[G(wi,θ0)]

= Avar [g(wi,θ0)] + E[G(wi,θ0)] ·V1 ·E[G(wi,θ0)]− E [g(wi,θ0) s(wi,θ0)

′] ·E[H(wi,θ0)]−1 ·E[G(wi,θ0)]′− E[G(wi,θ0)] ·E[H(wi,θ0)]−1 ·E[s(wi,θ0) g(wi,θ0)′]

Substituting consistent parameter estimates for population parameters, sample averages for population moments, and n ·var[θ̂] for V̂1,

Âvar[g] = 1 n V̂2

= 1 n var[g(wi, θ̂)] +

1 n

( 1 n

n∑ i=1

G(wi, θ̂)

) ·n · Âvar[θ̂] ·

( 1 n

n∑ i=1

G(wi, θ̂)

) −

1 n

( 1 n

n∑ i=1

g(wi, θ̂) s(wi, θ̂) ′ ) · (

1 n

n∑ i=1

H(wi, θ̂)

)−1 · (

1 n

n∑ i=1

G(wi, θ̂)

)′ −

1 n

( 1 n

n∑ i=1

G(wi, θ̂)

) · (

1 n

n∑ i=1

H(wi, θ̂)

)−1 · (

1 n

n∑ i=1

s(wi, θ̂) g(wi, θ̂) ′ )

= 1 n Âvar[g(wi, θ̂)] + G · Âvar[θ̂] ·G

′ − 1 n gs ·H−1 ·G′ − 1

n G ·H−1 ·gs′

using Âvar[g(wi, θ̂)], G, H, and gs as defined in the first section.

Example: Testing whether the error term in an OLS regression is ”white noise”

Given the population model E[yi|xi] = θ0 + θ1xi, consider a hypothesis test of whether the error term ui = yi − θ0 − θ1xi is symmetrically distributed around zero, i.e., whether the first conditional population moment E[ui|xi] and the third conditional population moment E[u3i |xi] jointly equal zero. Further suppose that the model is estimated by OLS and therefore no assumptions are made about the distribution of y|x; the only assumption placed on y or u is that u is uncorrelated with x. A natural way to perform this test is to use a conditional

Page 10

Microeconometrics 440.618. Handout #1 Content Owned by N. Goldstein

moment test4 of the hypothesis H0 : E[g(wi,θ)] = 0 for which

g(wi,θ) =

  g1(wi,θ)

g2(wi,θ)

  =

  yi −θ0 −θ1xi

(yi −θ0 −θ1xi)3

  =

  ui

u3i

 

The test statistic g · Âvar[g]−1 ·g′ a∼ χ22 requires an estimate of Avar[g], which requires:

Âvar[g(wi, θ̂)] = 1 n

n∑ i=1

g(wi, θ̂) g(wi, θ̂) ′ − (

1 n

n∑ i=1

g(wi, θ̂)

)( 1 n

n∑ i=1

g(wi, θ̂)

)′

= 1 n

n∑ i=1

  ûi

û3i

 ( ûi û3i )−

 1 n

n∑ i=1

  ûi

û3i

   (1

n

n∑ i=1

( ûi û

3 i

))

=

  û2 − û

2 û4i − û · û3

û4i − û3 · û û6 − û3 2

 

and

G = 1 n

n∑ i=1

 

∂g1(wi,θ̂) ∂θ0

∂g1(wi,θ̂) ∂θ1

∂g2(wi,θ̂) ∂θ0

∂g2(wi,θ̂) ∂θ1

 

= 1 n

n∑ i=1

  −1 −xi −3(yi − θ̂0 − θ̂1 xi)2 −3xi · (yi − θ̂0 − θ̂1 xi)2

 

= −

  1 x

3û2 3x · û2

 

Because the OLS objective function for a single-equation linear model is the sum of squares, i.e.,

n∑ i=1

q(wi,θ) = n∑ i=1

1 2 u2i =

n∑ i=1

1 2 (yi −θ0 −θ1xi)2

the corresponding Score vector is

4Conditional moment tests are covered in Lecture 11.

Page 11

Microeconometrics 440.618. Handout #1 Content Owned by N. Goldstein

s(wi,θ) =

 

∂q(wi,θ) ∂θ0

∂q(wi,θ) ∂θ1

  =

  −(yi −θ0 −θ1xi) −xi(yi −θ0 −θ1xi)

  = −

  ui

xi ·ui

 

such that

gs = 1 n

n∑ i=1

g(wi,θ) s(wi,θ) ′

= −1 n

n∑ i=1

  ui

u3i

 ( ui xi ·ui )

= −

  u2 x ·u2

u4 x ·u4

 

The corresponding Hessian vector is

H(wi,θ) =

 

∂2q(wi,θ) ∂θ20

∂2q(wi,θ) ∂θ0 ∂θ1

∂2q(wi,θ) ∂θ1 ∂θ0

∂2q(wi,θ) ∂θ21

  =

  1 xi

xi x 2 i

 

such that

H = 1 n

n∑ i=1

  1 xi

xi x 2 i

  =

  1 x

x x2

 

Substituting,

Âvar[g] ≈ 1 n Âvar[g(wi, θ̂)] + G ·var[θ̂] ·G

′ − 1

n gs ·H

−1 ·G ′ − 1

n G ·H

−1 ·gs′

= 1 n

  û2 − û

2 û4i − û · û3

û4i − û3 · û û6 − û3 2

 +

  1 x

3û2 3x · û2

  ·

  Âvar[θ̂0] Âcov[θ̂0, θ̂1]

Âcov[θ̂1, θ̂0] Âvar[θ̂1]

  ·

  1 x

3û2 3x · û2

 ′−

1 n

  û2 x · û2

û4 x · û4

  ·

  1 x

x x2

 −1 ·

  1 x

3û2 3x · û2

 ′−

1 n

  1 x

3û2 3x · û2

  ·

  1 x

x x2

 −1 ·

  û2 x · û2

û4 x · û4

 ′

Page 12

Microeconometrics 440.618. Handout #1 Content Owned by N. Goldstein

B. Special Case #1: Estimating Avar[g] using g(xi, θ̂)

If g(·) is a function of θ̂ and x only, then, using the Law of Iterated Expectations,

E[g(xi,θ0) s(wi,θ0) ′] = Ex[Ey[g(xi,θ0) s(wi,θ0)|xi]]

= Ex[g(xi,θ0) ·Ey[s(wi,θ0)|xi] = 0

This implies

V2 = Avar [g(xi,θ0)] + E[G(xi,θ0)] ·V1 ·E[G(xi,θ0)]′− E [g(xi,θ0) s(wi,θ0)

′] ·E[H(wi,θ0)]−1 ·E[G(xi,θ0)]′− E[G(xi,θ0)] ·E[H(wi,θ0)]−1 ·E[s(wi,θ0) g(xi,θ0)′]

= Avar [g(xi,θ0)] + E[G(xi,θ0)] ·V1 ·E[G(xi,θ0)]′

Substituting consistent parameter estimates for parameters, sample averages for population moments, and n · Âvar[θ̂] for V̂1,

Âvar[g] = 1 n V̂2 =

1 n Âvar[g(xi, θ̂)] +

( 1 n

n∑ i=1

G(xi, θ̂)

) · Âvar[θ̂] ·

( 1 n

n∑ i=1

G(xi, θ̂)

)′

Example: Calculating the standard error of the average partial elasticity

Consider the population model E[yi|xi] = θ0 +θ1x1i +θ2x2i for which the measures of interest are the average partial elasticities of yi with respect to a change in (continuous) x1i and x2i, respectively:

E[g(xi,θ)] =

 

Ex

[ ∂Ey[yi|xi] ∂x1i

· x1i Ey[yi|xi]

] Ex

[ ∂Ey[yi|xi] ∂x2i

· x2i Ey[yi|xi]

]   =

 

E [ θ1 · x1iθ0+θ1x1i+θ2x2i

] E [ θ2 · x2iθ0+θ1x1i+θ2x2i

]  

Further suppose that the model is estimated by OLS and therefore no assumptions are made about the distribution of y|x; the only assumption placed on y or u is that u is uncorrelated with xi. Substituting consistent parameter estimates for population parameters and sample averages for population moments, define

g(xi, θ̂) =

  g1(xi, θ̂)

g2(xi, θ̂)

  =

 

θ̂1x1i θ̂0+θ̂1x1i+θ̂2x2i

θ̂2x2i θ̂0+θ̂1x1i+θ̂2x2i

  =

 

θ̂1x1i ŷi

θ̂2x2i ŷi

 

Page 13

Microeconometrics 440.618. Handout #1 Content Owned by N. Goldstein

The estimator

Âvar[g] = 1 n Âvar[g(xi, θ̂)] +

( 1 n

n∑ i=1

G(xi, θ̂)

) · Âvar[θ̂] ·

( 1 n

n∑ i=1

G(xi, θ̂)

)′ requires

var[g(xi, θ̂)] = 1 n

n∑ i=1

g(xi, θ̂) g(xi, θ̂) ′ − (

1 n

n∑ i=1

g(xi, θ̂)

) · (

1 n

n∑ i=1

g(xi, θ̂)

)′

= 1 n

n∑ i=1

 

θ̂1x1i ŷi

θ̂2x2i ŷi

 ( θ̂1x1i

ŷi

θ̂2x2i ŷi

) −

  1n n∑

i=1

 

θ̂1x1i ŷi

θ̂2x2i ŷi

   ( 1n n∑

i=1

( θ̂1x1i ŷi

θ̂2x2i ŷi

))

=

 

1 n

n∑ i=1

θ̂21x 2 1i

ŷ2 i − (

1 n

n∑ i=1

θ̂1x1i ŷi

)2 1 n

n∑ i=1

θ̂1θ̂2x1ix2i ŷ2 i

− (

1 n

n∑ i=1

θ̂1x1i ŷi

)( 1 n

n∑ i=1

θ̂2x2i ŷi

)

1 n

n∑ i=1

θ̂2θ̂1x2ix1i ŷ2 i

− (

1 n

n∑ i=1

θ̂2x2i ŷi

)( 1 n

n∑ i=1

θ̂1x1i ŷi

) 1 n

n∑ i=1

θ̂22x 2 2i

ŷ2 i − (

1 n

n∑ i=1

θ̂2x2i ŷi

)2  

and

G = 1 n

n∑ i=1

 

∂g1(xi,θ̂) ∂θ0

∂g1(xi,θ̂) ∂θ1

∂g1(xi,θ̂) ∂θ2

∂g2(xi,θ̂) ∂θ0

∂g2(xi,θ̂) ∂θ1

∂g2(xi,θ̂) ∂θ2

 

= 1 n

n∑ i=1

  − θ̂1x1i

(θ̂0+θ̂1x1i+θ̂2x2i) 2 − θ̂1x

2 1i

(θ̂0+θ̂1x1i+θ̂2x2i) 2 +

x1i θ̂0+θ̂1x1i+θ̂2x2i

− θ̂1x1ix2i (θ̂0+θ̂1x1i+θ̂2x2i)

2

− θ̂2x2i (θ̂0+θ̂1x1i+θ̂2x2i)

2 − θ̂2x2ix1i

(θ̂0+θ̂1x1i+θ̂2x2i) 2

− θ̂2x 2 2i

(θ̂0+θ̂1x1i+θ̂2x2i) 2 +

x2i θ̂0+θ̂1x1i+θ̂2x2i

 

= −1 n

 

n∑ i=1

θ̂1x1i ŷ2 i

n∑ i=1

x1i ŷi

( θ̂1x1i ŷi −1 ) n∑

i=1

θ̂1x1ix2i ŷ2 i

n∑ i=1

θ̂2x2i ŷ2 i

n∑ i=1

θ̂2x2ix1i ŷ2 i

n∑ i=1

x2i ŷi

( θ̂2x2i ŷi −1 )  

C. Special Case #2: Estimating Avar[g] using g(θ̂)

If g(·) is a function of θ̂ only, then g(θ̂) assumes a constant value for each cross section i, such that var[g(θ̂)] = 0 and G = G(θ̂). As in Special Case #1, g(xi,θ) is uncorrelated with s(wi,θ) such that gs = 0. Imposing these restrictions results in the standard Delta Method formula:

Âvar[g] = 1 n Âvar[g(wi, θ̂)] + G · Âvar[θ̂] ·G

′ − 1 n gs ·H−1 ·G′ − 1

n G ·H−1 ·gs′

= G(θ̂) · Âvar[θ̂] ·G(θ̂)′

Page 14

Microeconometrics 440.618. Handout #1 Content Owned by N. Goldstein

D. Special Case #3: Estimating Avar[g] when f(yi|xi; θ) is known and Ey[g(wi,θ0)] = 0

Under these assumptions, two conditions are satisfied that greatly simplify the estimator of Avar[g].

First, if the entire conditional distribution yi|xi is both correctly specified and imposed during the estimation and Ey[s(wi,θ0)|xi] ≡ 0 , then −E[H(wi,θ0)] = E[s(wi,θ0) s(wi,θ0)′] (a result known as the Unconditional Information Matrix Equality). Using

Ey[s(wi,θ0)|xi] ≡ ∫ y s(wi,θ0) ·f(yi|xi; θ0) ∂yi

and differentiating both sides,

0 = ∇θ ∫ y s(wi,θ0) ·f(yi|xi; θ0) ∂yi

Assuming sufficient “smoothness” to reverse the order of gradient and integral,

0 = ∫ y ∇θ (s(wi,θ0) ·f(yi|xi; θ0)) ∂yi

= ∫ y (s(wi,θ0) ·∇θf(yi|xi; θ0) +∇θs(wi,θ0) ·f(yi|xi; θ0)) ∂yi

= ∫ y s(wi,θ0) s(wi,θ0)

′f(yi|xi,θ0) ∂yi + ∫ y H(wi,θ0) ·f(yi|xi; θ0) ∂yi

= Ey[s(wi,θ0) s(wi,θ0) ′|xi] + Ey[H(wi,θ0)|xi]

Note that the third line uses the result that

s(wi,θ) ≡ (∇θlog f(yi|xi,θ)) ′ ≡

( ∇θf(yi|xi,θ) f(yi|xi,θ)

)′ implies

∇θf(yi|xi,θ) = s(wi,θ)′ f(yi|xi,θ)

Rearranging terms generates the Conditional Information Matrix Equality (CIME)

Ey[s(wi,θ0) s(wi,θ0) ′|xi] = −Ey[H(wi,θ0)|xi]

which, using the Law of Iterated Expectations, implies the Unconditional Information Matrix Equality (UIME) is

Ex[Ey[s(wi,θ0) s(wi,θ0) ′|xi]] = −Ex[Ey[H(wi,θ0)|xi]]

E[s(wi,θ0) s(wi,θ0) ′] = −E[H(wi,θ0)]

Page 15

Microeconometrics 440.618. Handout #1 Content Owned by N. Goldstein

Second, if the entire conditional distribution yi|xi is both correctly specified and imposed during the estimation and Ey[g(wi,θ0)|xi] = 0, then E[g(wi,θ0) s(wi,θ0)′] = −E[G(wi,θ0)]. Using

Ey[g(wi,θ0)|xi] ≡ ∫ y g(wi,θ0) ·f(yi|xi; θ0) ∂yi

and differentiating both sides,

0 = ∇θ ∫ y g(wi,θ0) ·f(yi|xi; θ0) ∂yi

Assuming sufficient “smoothness” to reverse the order of gradient and integral,

0 = ∫ y ∇θ (g(wi,θ0) ·f(yi|xi; θ0)) ∂yi

= ∫ y (g(wi,θ0) ·∇θf(yi|xi; θ0) +∇θg(wi,θ0) ·f(yi|xi; θ0)) ∂yi

= ∫ y g(wi,θ0) s(wi,θ0)

′f(yi|xi,θ0) ∂yi + ∫ y G(wi,θ0) ·f(yi|xi; θ0) ∂yi

= Ey[g(wi,θ0) s(wi,θ0) ′|xi] + Ey[G(wi,θ0)|xi]

for which the third line uses the result ∇θf(yi|xi,θ) = s(wi,θ)′ f(yi|xi,θ). Rearranging terms generates

Ey[g(wi,θ0) s(wi,θ0) ′|xi] = −Ey[G(wi,θ0)|xi]

which, using the Law of Iterated Expectations, generates

Ex[Ey[g(wi,θ0) s(wi,θ0) ′|xi]] = −Ex[Ey[G(wi,θ0)|xi]]

E[g(wi,θ0) s(wi,θ0) ′] = −E[G(wi,θ0)]

Substituting,

V1 = E[H(wi,θ0)] −1 ·E[s(wi,θ0) s(wi,θ0)′] ·E[H(wi,θ0)]−1

= E[H(wi,θ0)] −1 ·−E[H(wi,θ0)] ·E[H(wi,θ0)]−1

= −E[H(wi,θ0)]−1

and

Page 16

Microeconometrics 440.618. Handout #1 Content Owned by N. Goldstein

V2 = Avar [g(wi,θ0)] + E[G(wi,θ0)] ·V1 ·E[G(wi,θ0)]′− E [g(wi,θ0) s(wi,θ0)

′] ·E[H(wi,θ0)]−1 ·E[G(wi,θ0)]′− E[G(wi,θ0)] ·E[H(wi,θ0)]−1 ·E[s(wi,θ0) g(wi,θ0)′]

= Avar [g(wi,θ0)] + E[G(wi,θ0)] ·−E[H(wi,θ0)]−1 ·E[G(wi,θ0)]′− −E[G(wi,θ0)] ·E[H(wi,θ0)]−1 ·E[G(wi,θ0)]′− E[G(wi,θ0)] ·E[H(wi,θ0)]−1 ·−E[G(wi,θ0)]′

= Avar [g(wi,θ0)] + E[G(wi,θ0)] ·V1 ·E[G(wi,θ0)]′

Substituting consistent parameter estimates for population parameters, sample averages for population moments, and n ·var[θ̂] for V̂1,

Âvar[g] = 1 n V̂ = 1

n Âvar[g(wi, θ̂)] + G · Âvar[θ̂] ·G

for which

Âvar[g(wi, θ̂)] =

 

1 n

n∑ i=1

Ey[g(wi,θ0) g(wi,θ0) ′|xi] if expectation has closed-form solution

1 n

n∑ i=1

g(wi, θ̂) g(wi, θ̂) ′ otherwise

G =

 

1 n

n∑ i=1

Ey[G(wi,θ0)|xi] if expectation has closed-form solution

1 n

n∑ i=1

G(wi, θ̂) otherwise

Examples are provided in Handout 12: Conditional Moment Tests for Common LDV Models.

Page 17