Urgent Econometrics test

profilewc15920665166
Lecture06a.pptx

Further Inference in the Multiple Regression Model

A null hypothesis with multiple conjectures, expressed with more than one equal sign, is called a joint hypothesis

Example: Should a group of explanatory variables should be included in a particular model?

Example: Does the quantity demanded of a product depend on the prices of substitute goods, or only on its own price?

Joint Hypothesis Testing

(6.1)

The joint null hypothesis in Eq. 6.1 contains three conjectures (three equal signs): β4 = 0, β5 = 0, and β6 = 0

A test of H0 is a joint test for whether all three conjectures hold simultaneously

(6.2)

Test whether or not advertising has an effect on sales – but advertising is in the model as two variables

Example

Advertising will have no effect on sales if β3 = 0 and β4 = 0

Advertising will have an effect if β3 ≠ 0 or β4 ≠ 0 or if both β3 and β4 are nonzero

The null hypotheses are:

Relative to the null hypothesis H0 : β3 = 0, β4 = 0 the model in Eq. 6.2 is called the unrestricted model

- The restrictions in the null hypothesis have not been imposed on the model

It contrasts with the restricted model, which is obtained by assuming the parameter restrictions in H0 are true

When H0 is true, β3 = 0 and β4 = 0, and ADVERT and ADVERT2 drop out of the model and we have

(6.3)

The F-test for the hypothesis H0 : β3 = 0, β4 = 0 is based on a comparison of the sums of squared errors (sums of squared least squares residuals) from the unrestricted model in Eq. 6.2 and the restricted model in Eq. 6.3

Shorthand notation for these two quantities is SSEU and SSER, respectively

The F-statistic determines what constitutes a large reduction or a small reduction in the sum of squared errors

(6.4)

where J is the number of restrictions, N is the number of observations and K is the number of coefficients in the unrestricted model

If the null hypothesis is true, then the statistic F has what is called an F-distribution with J numerator degrees of freedom and N - K denominator degrees of freedom

If the null hypothesis is not true, then the difference between SSER and SSEU becomes large

- The restrictions placed on the model by the null hypothesis significantly reduce the ability of the model to fit the data

The F-test for our sales problem is:

Specify the null and alternative hypotheses:

- The joint null hypothesis is H0 : β3 = 0, β4 = 0. The alternative hypothesis is H1 : β3 ≠ 0 or β4 ≠ 0 both are nonzero

Specify the test statistic and its distribution if the null hypothesis is true:

- Having two restrictions in H0 means J = 2

- Also, recall that N = 75:

The F-test for our sales problem is (Continued):

- Set the significance level and determine the rejection region

- Calculate the sample value of the test statistic and, if desired, the p-value

The corresponding p-value is p = P(F(2, 71) > 8.44) = 0.0005

The F-test for our sales problem is:

- State your conclusion

- Since F = 8.44 > Fc = 3.126, we reject the null hypothesis that both β3 = 0 and β4 = 0, and conclude that at least one of them is not zero

- Advertising does have a significant effect upon sales revenue

Consider again the general multiple regression model with (K - 1) explanatory variables and K unknown coefficients

(6.5)

To examine whether we have a viable explanatory model, we set up the following null and alternative hypotheses:

(6.6)

Since we are testing whether or not we have a viable explanatory model, the test for Eq. 6.6 is sometimes referred to as a test of the overall significance of the regression model.

- Given that the t-distribution can only be used to test a single null hypothesis, we use the F-test for testing the joint null hypothesis in Eq. 6.6

The unrestricted model is that given in Eq. 6.5

- The restricted model, assuming the null hypothesis is true, becomes:

(6.7)

The least squares estimator of β1 in this restricted model is:

The restricted sum of squared errors from the hypothesis Eq. 6.6 is:

Thus, to test the overall significance of a model, but not in general, the F-test statistic can be modified and written as:

(6.8)

For our problem, note:

We are testing:

If H0 is true:

- Using a 5% significance level, we find the critical value for the F-statistic with (3,71) degrees of freedom is Fc = 2.734.

- Thus, we reject H0 if F ≥ 2.734.

- The required sums of squares are SST = 3115.482 and SSE = 1532.084 which give an F-value of:

p-value = P(F ≥ 24.459) = 0.0000

Since 24.459 > 2.734, we reject H0 and conclude that the estimated relationship is a significant one

- Note that this conclusion is consistent with conclusions that would be reached using separate t-tests for the significance of each of the coefficients

We used the F-test to test whether β3 = 0 and β4 = 0 in:

Suppose we want to test if PRICE affects SALES

The F-value for the restricted model is:

- The 5% critical value is Fc = F(0.95, 1, 71) = 3.976

We reject H0 : β2 = 0

Using the t-test:

The t-value for testing H0: β2 = 0 against H1: β2 ≠ 0 is t = 7.640/1.045939 = 7.30444

Its square is t = (7.30444)2 = 53.355, identical to the F-value

The elements of an F-test

- The null hypothesis H0 consists of one or more equality restrictions on the model parameters βk

- The alternative hypothesis states that one or more of the equalities in the null hypothesis is not true

- The test statistic is the F-statistic in (6.4)

- If the null hypothesis is true, F has the F-distribution with J numerator degrees of freedom and N - K denominator degrees of freedom

- When testing a single equality null hypothesis, it is perfectly correct to use either the t- or F-test procedure: they are equivalent

The conjectures made in the null hypothesis were that particular coefficients are equal to zero

The F-test can also be used for much more general hypotheses

Any number of conjectures (≤ K) involving linear hypotheses with equal signs can be tested

Suppose we have:

In this case, we can no longer use the F-test

- The F-test cannot distinguish between the left and right tails as is needed for a one-tail test

We restrict ourselves to the t-distribution when considering alternative hypotheses that have inequality signs such as < or >

Most software packages have commands that will automatically compute t- and F-values and their corresponding p-values when provided with a null hypothesis

- These tests belong to a class of tests called Wald tests

Suppose we conjecture that:

- We formulate the joint null hypothesis:

Because there are J = 2 restrictions to test jointly, we use an F-test

- A t-test is not suitable

In many estimation problems we have information over and above the information contained in the sample observations

- This nonsample information may come from many places, such as economic principles or experience

- When it is available, it seems intuitive that we should find a way to use it

Consider the log-log functional form for a demand model for beer:

- This model is a convenient one because it precludes infeasible negative prices, quantities, and income, and because the coefficients β2 , β3, β4 , and β5 are elasticities

The Use of Nonsample Information

A relevant piece of nonsample information can be derived by noting that if all prices and income go up by the same proportion, we would expect there to be no change in quantity demanded

- For example, a doubling of all prices and income should not change the quantity of beer consumed

- This assumption is that economic agents do not suffer from ‘‘money illusion’’

Having all prices and income change by the same proportion is equivalent to multiplying each price and income by a constant, say λ:

(6.15)

To have no change in ln(Q) when all prices and income go up by the same proportion, it must be true that:

We call such a restriction nonsample information

To estimate a model, we can start with:

- Solve the restriction for one of the parameters, say β4:

- Substitution gives us:

To get least squares estimates that satisfy the parameter restriction, called restricted least squares estimates, we apply the least squares estimation procedure directly to the restricted model:

Let the restricted least squares estimates in Eq. 6.19 be denoted by b*1, b*2, b*3, and b*5

To obtain an estimate for β4, we use the restriction:

By using the restriction within the model, we have ensured that the estimates obey the constraint:

Properties of this restricted least squares estimation procedure:

- The restricted least squares estimator is biased, unless the constraints we impose are exactly true

- The restricted least squares estimator is that its variance is smaller than the variance of the least squares estimator, whether the constraints imposed are true or not

In any econometric investigation, choice of the model is one of the first steps

- What are the important considerations when choosing a model?

- What are the consequences of choosing the wrong model?

- Are there ways of assessing whether a model is adequate?

It is possible that a chosen model may have important variables omitted

- Our economic principles may have overlooked a variable, or lack of data may lead us to drop a variable even when it is prescribed by economic theory

Model Specification

Let us consider the model:

(6.20)

If for some reason we omit wife’s education:

(6.21)

Relative to Eq. 6.20, omitting WEDU leads us to overstate the effect of an extra year of education for the husband by about $2,000

- Omission of a relevant variable (defined as one whose coefficient is nonzero) leads to an estimator that is biased

- This bias is known as omitted-variable bias

Write a general model as:

Omitting x3 is equivalent to imposing the restriction β3 = 0

It can be viewed as an example of imposing an incorrect constraint on the parameters

The bias is:

Table 6.1 Correlation Matrix for Variables Used in Family Income Example

Note that:

β3 > 0 because husband’s education has a positive effect on family income.

because husband’s and wife’s levels of education are positively correlated.

Thus, the bias is positive

Now consider the model:

Notice that the coefficient estimates for HEDU and WEDU have not changed a great deal

- This outcome occurs because KL6 is not highly correlated with the education variables

Is it a good strategy to include as many variables as possible in your model.

Doing so will not only complicate your model unnecessarily, but may also inflate the variances of your estimates because of the presence of irrelevant variables.

Example:

The inclusion of irrelevant variables has reduced the precision of the estimated coefficients for other variables in the equation

- Choose variables and a functional form on the basis of your theoretical and general understanding of the relationship

- If an estimated equation has coefficients with unexpected signs, or unrealistic magnitudes, they could be caused by a misspecification such as the omission of an important variable

- One method for assessing whether a variable or a group of variables should be included in an equation is to perform significance tests

- The adequacy of a model can be tested using a general specification test known as RESET

There are three main model selection criteria:

R2

AIC

SC (BIC)

Choosing a model

A common feature of the criteria we describe is that they are suitable only for comparing models with the same dependent variable, not models with different dependent variables like y and ln(y)

The problem is that R2 can be made large by adding more and more variables, even if the variables added have no justification

- Algebraically, it is a fact that as variables are added the sum of squared errors SSE goes down, and thus R2 goes up

- If the model contains N - 1 variables, then R2 =1

An alternative measure of goodness of fit called the adjusted-R2, denoted as

The Akaike information criterion (AIC) is given by:

Schwarz criterion (SC), also known as the Bayesian information criterion (BIC) is given by:

Table 6.2 Goodness-of-Fit and Information Criteria for Family Income Example

A model could be misspecified if:

- we have omitted important variables

- included irrelevant ones

- chosen a wrong functional form

- have a model that violates the assumptions of the multiple regression model

RESET (REgression Specification Error Test) is designed to detect omitted variables and incorrect functional form

Suppose we have the model:

Let the predicted values of y be:

RESET Test

Now consider the following two artificial models:

In Eq. 6.29 a test for misspecification is a test of H0:γ1 = 0 against the alternative H1:γ1 ≠ 0

In Eq. 6.30, testing H0:γ1 = γ2 = 0 against H1: γ1 ≠ 0 and/or γ2 ≠ 0 is a test for misspecification

Applying RESET to our problem (Eq. 6.24), we get:

- In both cases the null hypothesis of no misspecification is rejected at a 5% significance level

When data are the result of an uncontrolled experiment, many of the economic variables may move together in systematic ways

Such variables are said to be collinear, and the problem is labeled collinearity

Consider the model:

- The variance of the least squares estimator for β2 is:

Poor Data Quality, Collinearity, and Insignificance

Exact or extreme collinearity exists when x2 and x3 are perfectly correlated, in which case r23 = 1 and var(b2) goes to infinity

- Similarly, if x2 exhibits no variation equals zero and var(b2) again goes to infinity

- In this case x2 is collinear with the constant term

In general, whenever there are one or more exact linear relationships among the explanatory variables, then the condition of exact collinearity exists

- In this case the least squares estimator is not defined

- We cannot obtain estimates of βk’s using the least squares principle

Issues with Collinearity

The effects of this imprecise information are:

- When estimator standard errors are large, it is likely that the usual t-tests will lead to the conclusion that parameter estimates are not significantly different from zero

- Estimators may be very sensitive to the addition or deletion of a few observations, or to the deletion of an apparently insignificant variable

- Accurate forecasts may still be possible if the nature of the collinear relationship remains the same within the out-of-sample observations

A regression of MPG on CYL yields:

Now add ENG and WGT:

One simple way to detect collinear relationships is to use sample correlation coefficients between pairs of explanatory variables

- These sample correlations are descriptive measures of linear association

However, in some cases in which collinear relationships involve more than two of the explanatory variables, the collinearity may not be detected by examining pairwise correlations

Try an auxiliary model:

- If R2 from this artificial model is high, above 0.80, say, the implication is that a large portion of the variation in x2 is explained by variation in the other explanatory variables

The collinearity problem is that the data do not contain enough ‘‘information’’ about the individual effects of explanatory variables to permit us to estimate all the parameters of the statistical model precisely

- Consequently, one solution is to obtain more information and include it in the analysis.

A second way of adding new information is to introduce nonsample information in the form of restrictions on the parameters

Consider the model:

- The prediction problem is to predict the value of the dependent variable y0, which is given by:

- The best linear unbiased predictor is:

Prediction

The variance of the forecast error, is:

For our example, suppose PRICE0 = 6, ADVERT0 = 1.9, and ADVERT20 = 3.61:

- We forecast sales will be $76,974

Table 6.3 Covariance Matrix for Andy’s Burger Barn Model

The estimated variance of the forecast error is:

The standard error of the forecast error is:

The 95% prediction interval is:

- We predict, with 95% confidence, that the settings for price and advertising expenditure will yield SALES between $67,533 and $86,415

The point forecast and the point estimate are both the same:

But:

A 95% confidence interval for is:

0456

:

β0,β0,β0

H

===

2

1234

ββββ

SALESPRICEADVERTADVERTe

=++++

034

134

:

β0,β0

:

β0 or β0 or both are nonzero

H

H

==

¹¹

12

ββ

SALESPRICEe

=++

(

)

(

)

RU

U

SSESSEJ

F

SSENK

-

=

-

(

)

(

)

2

754

RU

U

SSESSE

F

SSE

-

=

-

(

)

(

)

(

)

(

)

1896.3911532.0842

8.44

1532.084754

RU

U

SSESSEJ

F

SSENK

--

===

--

12233

ββββ

KK

yxxxe

=+++++

L

023

1

:

β0,β0,,β0

: of the

β is nonzero for 2,3,

K

k

H

HAtleastonekK

===

=

K

K

1

ii

ye

=b+

*

1

1

N

i

i

byNy

=

==

å

(

)

(

)

2

2

*

1

11

NN

Rii

ii

SSEybyySST

==

=-=-=

åå

(

)

(

)

(

)

1

SSTSSEK

F

SSENK

--

=

-

0234

1234

:

β0,β0,β0

: of

β or β or β is nonzero

H

HAtleastone

===

(

)

(

)

(

)

(

)

3,71

41

~

754

SSTSSE

FF

SSE

--

=

-

(

)

(

)

(

)

(

)

(

)

13115.482-1532.0843

24.459

1532.084754

SSTSSEK

F

SSENK

--

===

--

2

134

βββ

SALESADVERTADVERTe

=+++

01

11

:

β0

:

β0

H

H

=

¹

(

)

(

)

(

)

(

)

2683.4111532.0841

53.355

1532.084754

RU

U

SSESSEJ

F

SSENK

--

===

--

·

2

109.727.64012.1512.768

(se) (6.80) (1.046) (3.556)

(0.941)

SALESPRICEADVERTADVERT

=-+-

034134

H:

β3.8β1 H: β+3.8β>1

(

)

2

1234

2

1234

ββββ

β6β1.9β1.9β

80

ESALESPRICEADVERTADVERT

=+++

=+++

=

0341234

:

β3.8β1 β6β1.9β3.61β80

Hand

+=+++=

(

)

(

)

(

)

(

)

(

)

12345

ln

ββlnβlnβlnβln

QPBPLPRI

=++++

(

)

(

)

(

)

(

)

(

)

(

)

(

)

(

)

(

)

(

)

(

)

12345

12345

2345

ln

ββlnβlnβlnβln

ββlnβlnβlnβln

ββββln

QPBPLPRI

PBPLPRI

llll

l

=++++

=++++

++++

2345

ββββ0

+++=

4235

ββββ

=---

(

)

(

)

(

)

(

)

(

)

12345

ln

ββlnβlnβlnβln

QPBPLPRIe

=+++++

(

)

(

)

(

)

(

)

(

)

(

)

(

)

(

)

(

)

(

)

(

)

(

)

1232355

123

5

1235

ln

ββlnβlnβββlnβln

ββlnlnβlnln

βlnln

ββlnβlnβln

QPBPLPRIe

PBPRPLPR

IPRe

PBPLI

e

PRPRPR

=+++---++

éùéù

=+-+-

ëûëû

éù

+-+

ëû

æöæöæö

=++++

ç÷ç÷ç÷

èøèøèø

(

)

·

(

)

(

)

(

)

(

)

ln4.7981.2994ln0.1868ln0.9458ln

se 0.166

0.284 0.427

PBPLI

Q

PRPRPR

æöæöæö

=--++

ç÷ç÷ç÷

èøèøèø

(

)

****

4235

1.29940.18680.94580.1668

bbbb

=---=----=

****

2344

bbbb0

+++=

·

(

)

(

)

(

)

(

)

(

)

(

)

(

)

(

)

553431324523

se 11230 803 1066

-value 0.622 0.000 0.000

FAMINCHEDUWEDU

p

=-++

·

(

)

(

)

(

)

(

)

(

)

(

)

261915155

se 8541 658

-value 0.0020.000

FAMINCHEDU

p

=-+

12233

βββ

yxxe

=+++

(

)

(

)

(

)

·

(

)

·

23

**

2223

2

cov,

bias

ββ

var

xx

bEb

x

=-=

·

(

)

(

)

(

)

(

)

(

)

(

)

(

)

(

)

(

)

(

)

775532124777143116

11163 797 1061

5004

-value 0.488 0.000 0.000

0.004

FAMINCHEDUWEDUKL

se

p

=-++-

·

(

)

(

)

(

)

(

)

(

)

(

)

(

)

(

)

(

)

(

)

(

)

(

)

(

)

(

)

56

7759334058691420068891067

111951250 2278

5044 2242 1982

-value 0.5000.008 0.010

0.005 0.692 0.591

FAMINCHEDUWEDUKLXX

se

p

=-++-+-

(

)

(

)

2

1

1

SSENK

R

SSTN

-

=-

-

2

ln

SSEK

AIC

NN

æö

=+

ç÷

èø

(

)

ln

ln

KN

SSE

SC

NN

æö

=+

ç÷

èø

12233

ˆ

b

ybbxx

=++

2

122331

23

1223312

ˆ

βββ (6.29)

ˆˆ

βββ (6.30)

yxxye

yxxyye

g

gg

=++++

=+++++

01

012

:0 5.984 - 0.015

:0 3.123 - 0.045

HFpvalue

HFpvalue

g

gg

===

====

(

)

(

)

2

2

22

2322

1

var

1()

n

i

b

rxx

s

=

=

--

å

(

)

2

22

xx

-

å

·

(

)

(

)

(

)

(

)

(

)

(

)

42.93.558

se 0.83 0.146

-value0.0000.000

MPGCYL

p

=-

·

(

)

(

)

(

)

(

)

(

)

(

)

(

)

(

)

(

)

(

)

44.40.2680.01270.00571

se 1.5 0.413 0.0083

0.00071

-value0.0000.517 0.125 0

.000

MPGCYLENGWGT

p

=---

21133

KK

xaxaxaxerror

=++++

L

010220330

βββ

yxxe

=+++

01022033

ˆ

b

ybxbx

=++

(

)

(

)

(

)

(

)

(

)

(

)

(

)

(

)

(

)

(

)

(

)

120230301202303

01202303

22

01022033

02120313020323

varvar

βββb

varb

varvarvarvar

2cov,2cov,2cov,

fxxebbxx

ebbxx

ebxbxb

xbbxbbxxbb

éù

=+++-++

ëû

=---

=+++

+++

·

2

0000

109.7197.64012.15122.768

109.7197.640612.15121.92.7683.61

76.974

SALESPRICEADVERTADVERT

=-+-

=-´+´-´

=

(

)

00

ˆ

fyy

=-

(

)

·

(

)

·

(

)

·

(

)

·

(

)

·

(

)

·

(

)

·

(

)

·

(

)

·

(

)

·

(

)

·

2222

1022033044

021203130414

020323020424030434

222

ˆ

var

σvarvarvarvar

2cov,2cov,2cov,

2cov,2cov,2cov,

21.5786546.2270261.0939881.912.64633.610

.884774

266.426

fbxbxbxb

xbbxbbxbb

xxbbxxbbxxbb

=++++

+++

+++

=++´+´+´

+´´-

(

)

(

)

(

)

(

)

11321.911.6009623.612.939026

261.90.300407263.610.085619

21.93.613.288746

22.4208

+´´-+´´

+´´´+´´´-

+´´´-

=

(

)

se22.42084.7351

f

==

(

)

(

)

76.9741.99394.7351,76.9741.99394.735167.

533,86.415

-´+´=

·

(

)

·

00

76.974

SALESESALES

==

(

)

·

(

)

(

)

·

2

0

ˆ

sevar

σ22.420821.57860.9177

ESALESf

=-=-=

(

)

(

)

76.9741.99390.9177,76.9741.99390.917775.

144,78.804

-´+´=

(

)

·

0

ESALES