Urgent Econometrics test

profilewc15920665166
Lecture05.pptx

The Multiple Regression Model

In most economic models there are more than one explanatory variables

- When we turn an economic model with more than one explanatory variable into its corresponding econometric model, we refer to it as a multiple regression model

- Most of the results developed for the simple regression model can be extended to multiple regression model

For the economic model

β2 is the change in monthly sales SALES ($1000) when the price index PRICE is increased by one unit ($1), and advertising expenditure ADVERT is held constant

Similarly, β3 is the change in monthly sales SALES ($1000) when the advertising expenditure is increased by one unit ($1000), and the price index PRICE is held constant

Taking expectations we have

In order to allow for a difference between observable sales revenue and the expected value of sales revenue, we add a random error term, e = SALES - E(SALES)

FIGURE 5.1 The multiple regression plane

Table 5.1 Observations on Monthly Sales, Price, and Advertising in Big Andy’s Burger Barn

In a general multiple regression model, a dependent variable y is related to a number of explanatory variables x2, x3, …, xK through a linear equation that can be written as:

(5.3)

A single parameter, call it βk, measures the effect of a change in the variable xk upon the expected value of y, all other variables held constant

The parameter β1 is the intercept term.

- We can think of it as being attached to a variable x1 that is always equal to 1

- That is, x1 = 1

The equation for sales revenue can be viewed as a special case of Eq. 5.3 where K = 3, y = SALES, x1 = 1, x2 = PRICE and x3 = ADVERT

We make assumptions similar to those we made before:

- E(e) = 0

- var(e) = σ2

- Errors with this property are said to be homoskedastic

- cov(ei,ej) = 0

e ~ N(0, σ2)

The statistical properties of y follow from those of e:

- E(y) = β1 + β2x2 + β3x3

- var(y) = var(e) = σ2

- cov(yi, yj) = cov(ei, ej) = 0

- y ~ N[(β1 + β2x2 + β3x3), σ2]

- This is equivalent to assuming that e ~ N(0, σ2)

MR1.

MR2.

MR3.

MR4.

MR5. The values of each xtk are not random and are not exact linear functions of the other explanatory variables

MR6.

Assumptions of the Multiple Regression Model

We will discuss estimation in the context of the following model

(5.4)

- This model is simpler than the full model, yet all the results we present carry over to the general case with only minor modifications

- Mathematically we minimize the sum of squares function S(β1, β2, β3), which is a function of the unknown parameters, given the data:

(5.5)

Estimating the Parameters of the Multiple Regression Model

Formulas for b1, b2, and b3, obtained by minimizing Eq. 5.5, are estimation procedures, which are called the least squares estimators of the unknown parameters

- In general, since their values are not known until the data are observed and the estimates calculated, the least squares estimators are random variables

Estimates along with their standard errors and the equation’s R2 are typically reported in equation format as:

Table 5.2 Least Squares Estimates for Sales Equation for Big Andy’s Burger Barn

The negative coefficient on PRICE suggests that demand is price elastic; we estimate that, with advertising held constant, an increase in price of $1 will lead to a fall in monthly revenue of $7,908

The coefficient on advertising is positive; we estimate that with price held constant, an increase in advertising expenditure of $1,000 will lead to an increase in sales revenue of $1,863

The estimated intercept implies that if both price and advertising expenditure were zero the sales revenue would be $118,914. This does not make any economic sense.

Interpretation

The predicted value of sales revenue for PRICE = 5.5 and ADVERT =1.2 is $77,656.

The negative sign attached to price implies that reducing the price will increase sales revenue.

If taken literally, why should we not keep reducing the price to zero?

Obviously that would not keep increasing total revenue

This makes the following important point:

- Estimated regression models describe the relationship between the economic variables for values similar to those found in the sample data

- Extrapolating the results to extreme values is usually not a good idea

- Predicting the value of the dependent variable for values of the explanatory variables far from the sample values may provide absurd results

Interpreting regression results

The squared errors are unobservable, so we need an estimator for σ2 based on the squares of the least squares residuals:

An estimator for σ2 that uses the information from estimated residuals and has good statistical properties is:

where K is the number of β parameters being estimated in the multiple regression model.

THE GAUSS-MARKOV THEOREM

Sampling Properties of the Least Squares Estimators

For the multiple regression model, if assumptions MR1–MR5 hold, then the least squares estimators are the best linear unbiased estimators (BLUE) of the parameters.

If the errors are not normally distributed, then the least squares estimators are approximately normally distributed in large samples

What constitutes ‘‘large’’ is not definitive

It depends on a number of factors specific to each application

Frequently, N – K = 50 will be large enough

It can be shown that:

where

It can be noted that:

- Larger error variances 2 lead to larger variances of the least squares estimators

- Larger sample sizes N imply smaller variances of the least squares estimators

- More variation in an explanatory variable around its mean, leads to a smaller variance of the least squares estimator

- A larger correlation between x2 and x3 leads to a larger variance of b2

The covariance matrix can be written as

Using the hamburger data:

Hence, we have

Table 5.3 Covariance Matrix for Coefficient Estimates

Consider the general form of a multiple regression model:

- If we add assumption MR6, that the random errors ei have normal probability distributions, then the dependent variable yi is normally distributed:

- Since the least squares estimators are linear functions of dependent variables, it follows that the least squares estimators are also normally distributed:

We can now form the standard normal variable Z:

(5.11)

Replacing the variance of bk with its estimate:

(5.12)

- Notice that the number of degrees of freedom for t-statistics is N - K

A linear combination of the coefficients can be formed as:

So we have

(5.13)

When there are three independent variables

where

What happens if the errors are not normally distributed?

- Then the least squares estimator will not be normally distributed and Eq. 5.11, Eq. 5.12, and Eq. 5.13 will not hold exactly

- They will, however, be approximately true in large samples

- Thus, having errors that are not normally distributed does not stop us from using Eq. 5.12 and Eq. 5.13, but it does mean we have to be cautious if the sample size is not large

For the hamburger example, we need:

Using tc = 1.993, we can rewrite (5.15) as:

So, we have

Interval Estimation

Using our data, we have b2 = -7.908 and se(b2) = 1.096, so that:

- This interval estimate suggests that decreasing price by $1 will lead to an increase in revenue somewhere between $5,723 and $10,093.

- In terms of a price change whose magnitude is more realistic, a 10-cent price reduction will lead to a revenue increase between $572 and $1,009

Similarly for advertising, we get:

- We estimate that an increase in advertising expenditure of $1,000 leads to an increase in sales revenue of between $501 and $3,225

This interval is a relatively wide one; it implies that extra advertising expenditure could be unprofitable (the revenue increase is less than $1,000) or could lead to a revenue increase more than three times the cost of the advertising

We can write the general expression for a 100(1-α)% confidence interval as:

Suppose Big Andy wants to increase advertising expenditure by $800 and drop the price by 40 cents. Then the change in expected sales is:

Point estimate:

A 90% interval would be:

The standard error is:

The 90% interval is then:

- We estimate, with 90% confidence, that the expected increase in sales will lie between $3,471 and $5,835

Hypothesis Testing

A null hypothesis H0

An alternative hypothesis H1

A test statistic

A rejection region

A conclusion

We need to ask whether the data provide any evidence to suggest that y is related to each of the explanatory variables

- If a given explanatory variable, say xk, has no bearing on y, then βk = 0

- Testing this null hypothesis is sometimes called a test of significance for the explanatory variable xk

Null hypothesis:

Alternative hypothesis:

Test statistic:

t values for a test with level of significance α:

We now are in a position to state the following questions as testable hypotheses and ask whether the hypotheses are compatible with the data

- Is demand price-elastic or price-inelastic?

- Would additional sales revenue from additional advertising expenditure cover the costs of the advertising?

For the demand elasticity, we would like to know if:

- β2 ≥ 0: a decrease in price leads to a decrease in sales revenue (demand is price-inelastic or has an elasticity of unity), or

- β2 < 0: a decrease in price leads to a increase in sales revenue (demand is price-elastic)

The null and alternative hypotheses are:

The test statistic, if the null hypothesis is true, is:

At a 5% significance level, we reject H0 if t ≤ -1.666 or if the p-value ≤ 0.05

The test statistic is:

and the p-value is:

Since -7.215 < -1.666, we reject H0: β2 ≥ 0 and conclude that H0: β2 < 0 (demand is elastic)

Another hypothesis of interest is whether an increase in advertising expenditure will bring an increase in sales revenue that is sufficient to cover the increased cost of advertising

- Such an increase will be achieved if β3 > 1

- The null and alternative hypotheses are:

- The test statistic, if the null hypothesis is true, is:

- At a 5% significance level, we reject H0 if t ≥ 1.666 or if the p-value ≤ 0.05

- The test statistic is:

and the p-value is:

- Since 1.263 < 1.666, we do not reject H0

The marketing adviser claims that dropping the price by 20 cents will be more effective for increasing sales revenue than increasing advertising expenditure by $500

- In other words, she claims that -0.2β2 > 0.5β3, or -0.2β2 - 0.5β3> 0

- We want to test a hypothesis about the linear combination -0.2β2 – 0.5β3

- The null and alternative hypotheses are:

- The test statistic, if the null hypothesis is true, is:

At a 5% significance level, we reject H0 if t ≥ 1.666 or if the p-value ≤ 0.05

- We need to get the standard error

- The test statistic is:

and the p-value is:

- Since 1.622 < 1.666, we do not reject H0

A better model to capture diminishing returns in advertising expenditure might be:

The change in expected sales to a change in advertising is:

Capturing non-linearity

FIGURE 5.4 A model where sales exhibits diminishing returns to advertising expenditure

The least squares estimates are:

And

Upon substitution, we find that when advertising is at its minimum value in the sample of $500 (ADVERT = 0.5), the marginal effect of advertising on sales is 9.383

- When advertising is at a level of $2,000 (ADVERT = 2), the marginal effect is 1.079

- Allowing for diminishing returns to advertising expenditure has improved our model both statistically and in terms of meeting expectations about how sales will respond to changes in advertising

The marginal benefit from advertising is the marginal revenue from more advertising

- The required marginal revenue is given by the marginal effect of more advertising β3 + 2β4ADVERT

- The marginal cost of $1 of advertising is $1 plus the cost of preparing the additional products sold due to effective advertising

- Ignoring the latter costs, the marginal cost of $1 of advertising expenditure is $1

Advertising should be increased to the point where

with ADVERT0 denoting the optimal level of advertising

Using the least squares estimates, a point estimate of ADVERT0 is:

implying that the optimal monthly advertising expenditure is $2,014

Suppose λ = (1 - β3)/2β4 and

Then, the approximate variance expression is:

Using the above equation to find an approximate expression for a variance is called the delta method

The required derivatives are:

Thus, for the estimated variance of the optimal level of advertising, we have:

and

An approximate 95% interval estimate for ADVERT0 is:

We estimate with 95% confidence that the optimal level of advertising lies between $1,757 and $2,271

Let us say that we are interested in studying the effect of income and age on an individual’s expenditure on pizza

We could start with the following linear model:

For a given level of income, the expected expenditure on pizza changes by the amount β2 with an additional year of age

For individuals of a given age, an increase in income of $1,000 increases expected expenditures on pizza by β3

Interaction Variables

Table 5.4 Pizza Expenditure Data

The estimated model is:

- The signs of the estimated parameters are as we anticipated

- Both AGE and INCOME have significant coefficients, based on their t-statistics

It is unreasonable to expect that, regardless of the age of the individual, an increase in income by $1,000 should lead to an increase in pizza expenditure by $1.83?

- It would seem more reasonable to assume that as a person grows older, his/her marginal propensity to spend on pizza declines

- That is, as a person ages, less of each extra dollar is expected to be spent on pizza

- This is a case in which the effect of income depends on the age of the individual.

- That is, the effect of one variable is modified by another

- One way of accounting for such interactions is to include an interaction variable that is the product of the two variables involved

The new model can be written as:

Implications of this revised model are:

Estimated model:

The estimated marginal effect of age upon pizza expenditure for two individuals—one with $25,000 income and one with $90,000 income is:

We expect that an individual with $25,000 income will reduce pizza expenditures by $6.06 per year, whereas the individual with $90,000 income will reduce pizza expenditures by $14.07 per year

Consider a wage equation where ln(WAGE) depends on years of education (EDUC) and years of experience (EXPER):

- If we believe the effect of an extra year of experience on wages will depend on the level of education, then we can add an interaction variable

- The effect of another year of experience, holding education constant, is roughly:

- The approximate percentage change in wage given a one-year increase in experience is 100(β3+β4EDUC)%

An estimated model is:

If there is an interaction and a quadratic term on the right-hand side, as in

then we find that a one-year increase in experience leads to an approximate percentage wage change of:

In the multiple regression model the R2 is relevant and the same formulas are valid, but now we talk of the proportion of variation in the dependent variable explained by all the explanatory variables included in the linear model

The coefficient of determination is:

(5.31)

Measuring Goodness-of-fit

The predicted value of y is:

and

For the hamburger example:

44.8% of the variation in sales revenue is explained by the variation in price and by the variation in the level of advertising expenditure

In our sample, 55.2% of the variation in revenue is left unexplained and is due to variation in the error term or to variation in other variables that implicitly form part of the error term

If the model does not contain an intercept parameter, then the measure R2 given in Eq. 5.31 is no longer appropriate

- The reason it is no longer appropriate is that without an intercept term in the model,

so SST ≠ SSR + SSE

Under these circumstances it does not make sense to talk of the proportion of total variation that is explained by the regression

- When your model does not contain a constant, it is better not to report R2

- Even if your computer displays one

123

SALESPRICEADVERT

=b+b+b

(

)

2

held constant

ADVERT

SALES

PRICE

SALES

PRICE

b

D

=

D

=

(

)

3

held constant

PRICE

SALES

ADVERT

SALES

ADVERT

b

D

=

D

=

123

()

βββ

ESALESPRICEADVERT

=++

(

)

123

βββ

SALESESALESePRICEADVERTe

=+=+++

12233

ββββ

KK

yxxxe

=+++++

K

(

)

(

)

other xs held constant

β

k

kk

EyEy

xx

==

12233

βββ

yxxe

=+++

122

,1,,

iiKiKi

yxxeiN

=b+b++b+=

LK

122

()()0

iiKiKi

EyxxEe

=b+b++bÛ=

L

2

var()var()

ii

ye

==s

cov(,)cov(,)0

ijij

yyee

==

22

122

~(),~(0,)

iiKiKi

yNxxeN

éù

b+b++bsÛs

ëû

L

12233

βββ

iiii

yxxe

=+++

(

)

(

)

(

)

(

)

2

123

1

2

12233

1

β,β,β

βββ

N

ii

i

N

iii

i

SyEy

yxx

=

=

=-

=---

å

å

·

2

118.917.9081863 0.448

() (6.35) (1.096)

(0.683)

SALESPRICEADVERTR

se

=-+=

118.91 - 7.908 1.863

118.914-7.90795.51.86261.2

77.656

SALESPRICEADVERT

=+

=´+´

=

(

)

12233

ˆˆ

iiiiii

eyyybbxbx

=-=-++

2

2

1

ˆ

ˆ

N

i

i

e

NK

s

=

=

-

å

2

2

22

2322

1

var()

(1)()

N

i

i

b

rxx

=

s

=

--

å

2233

23

22

2233

()()

()()

ii

ii

xxxx

r

xxxx

--

=

--

å

åå

(

)

(

)

(

)

(

)

(

)

(

)

(

)

(

)

(

)

(

)

11213

12312223

13233

varcov,cov,

cov,,cov,varcov,

cov,cov,var

bbbbb

bbbbbbbb

bbbbb

éù

êú

=

êú

êú

ëû

(

)

·

123

40.343 6.795 0.7484

cov,,6.7951.2010.0197

0.74840.01970.4668

bbb

--

éù

êú

=--

êú

êú

--

ëû

(

)

·

(

)

·

(

)

·

(

)

·

(

)

·

(

)

·

112

213

323

var40.343cov,6.795

var1.201cov,0.7484

var0.4668cov,0.0197

bbb

bbb

bbb

==-

==-

==-

12233

iiiKiKi

yxxxe

=b+b+b++b+

L

22

122

~(),~(0,)

iiKiKi

yNxxeN

éù

b+b++bsÛs

ëû

L

(

)

~,var

kkk

bNb

ébù

ëû

(

)

(

)

~0,1,for 1,2,,

var

kk

k

b

zNkK

b

-b

==¼

(

)

·

(

)

~

se()

var

kkkk

NK

k

k

bb

tt

b

b

-

-b-b

==

(

)

(

)

ˆ

β

~

ˆ

se()

kkkk

NK

kk

cbc

tt

cb

se

-

-

l-l

==

l

åå

å

1122

1

ββββ

K

KKkk

k

cccc

=

l=+++=

å

K

(

)

(

)

·

112233112233

sebbbvarbbb

cccccc

++=++

(

)

·

(

)

·

(

)

·

(

)

·

(

)

·

(

)

·

(

)

·

222

112233112233

121213132323

varbbbvarvarvar

2cov,2cov,2cov,

ccccbcbcb

ccbbccbbccbb

++=++

+++

(72)

().95

cc

Pttt

-<<=

22

2

1.9931.993.95

se()

b

P

b

æö

-b

-££=

ç÷

èø

[

]

22222

1.993se() 1.993se().95

Pbbbb

-´£b£+´=

(

)

(

)

7.90791.9931.096,7.90791.9931.09610.093,

5.723

--´-+´=--

(

)

(

)

1.86261.99350.6832,1.86261.99350.68320.5

01,3.225

-´+´=

(

)

(1/2,)(1/2,)

se(),se()

kNKkkNKk

btbbtb

-a--a-

-´+´

(

)

(

)

(

)

(

)

[

]

10

12030

12030

23

ββ0.4β0.8

βββ

0.4

β0.8β

ESALESESALES

PRICEADVERT

PRICEADVERT

l

=-

éù

=+-++

ëû

-++

=-+

(

)

23

ˆ

0.4b0.8b0.47.90790.81.8626

4.6532

l

=-+=-´-+´

=

(

)

(

)

(

)

(

)

(

)

(

(

)

(

)

)

2323

2323

ˆˆˆˆ

,

0.40.80.40.8,

0.40.80.40.8

cc

c

c

tsetse

bbtsebb

bbtsebb

llll

-´+´

=-+-´-+

-++´-+

(

)

(

)

(

)

(

)

·

(

)

(

)

·

(

)

·

(

)

2323

22

2323

se0.4b0.8bvar0.4b0.8b

0.4var0.8var20.40.8cov,

0.161.20120.640.46680.640.0197

0.7096

bbbb

-+=-+

=-+-´-´´

=´+´-´-

=

(

)

(

)

4.65321.6660.7096,4.65321.6660.70963.471

,5.835

-´+´=

0

:0

k

H

b=

1

:0

k

H

(

)

()

~

se

k

NK

k

b

tt

b

-

=

(1/2,)(/2,)

and

cNKcNK

tttt

-a-a-

=-=

02

12

:0 (demand is unit-elastic or inelastic

)

:< 0 (demand is elastic)

H

H

b

(

)

22

se()~

NK

tbbt

-

=

(

)

2

2

7.908

7.215

1.096

b

t

seb

-

===-

(

)

(

)

72

7.2150.000

Pt

<-=

03

13

:1

:> 1

H

H

b

(

)

3

3

1

~

se()

NK

b

tt

b

-

-

=

(

)

33

3

β1.86261

1.263

0.6832

b

t

seb

--

===

(

)

(

)

72

1.2630.105

Pt

>=

023

123

:0.2

β0.5β0 (marketer's claim is not

correct)

:0.2

β0.5β0 (marketer's claim is cor

rect)

H

H

--£

-->

(

)

23

72

23

0.20.5

~

se(0.20.5)

bb

tt

bb

--

=

--

(

)

·

(

)

(

)

·

(

)

(

)

·

(

)

(

)

(

)

·

(

)

2323

22

23

se(0.20.5)varse(0.20.5)

0.2var0.5var20.20.5cov2,3

0.041.20120.250.46680.20.0197

0.4010

bbbb

bbbb

--=--

=-+-+´-´-

=´+´+´-

=

(

)

23

23

0.20.5

1.581580.9319

1.622

se0.20.50.4010

bb

t

bb

--

-

===

--

(

)

(

)

72

1.6220.055

Pt

>=

2

1234

ββββ

SALESPRICEADVERTADVERTe

=++++

(

)

(

)

(

)

held constant

34

=

β2β

PRICE

ESALESESALES

ADVERTADVERT

ADVERT

=

+

·

(

)

(

)

(

)

(

)

(

)

2

109.727.64012.1512.768

se 6.80 1.046 3.556

0.941

SALESPRICEADVERTADVERT

=-+-

·

12.1515.536

SALES

ADVERT

ADVERT

=-

340

β2β1

ADVERT

+=

·

(

)

3

0

4

1

112.1512

2.014

222.76796

b

ADVERT

b

-

-

===

´-

(

)

(

)

(

)

(

)

2

2

3434

3434

ˆ

varvarvar2cov,

ββββ

bbbb

llll

l

æöæö

æöæö

¶¶¶¶

=++

ç÷ç÷

ç÷ç÷

¶¶¶¶

èøèø

èøèø

3

4

1

ˆ

2

b

b

l

-

=

3

2

3444

1

β

1

,

β2ββ2β

ll

-

¶¶

=-=-

¶¶

(

)

(

)

·

(

)

·

(

)

·

22

33

3434

22

4444

22

2

2

11

11

ˆ

varvarvar2cov,

2222

1112.151

12.6460.88477

22.76822.768

1112.151

23.2887

22.76822.768

0.016567

bb

bbbb

bbbb

l

æöæöæöæö

--

=-+-+--

ç÷ç÷ç÷ç÷

èøèøèøèø

-

æöæö

=´+´

ç÷ç÷

´´

èøèø

-

æöæö

ç÷ç÷

´´

èøèø

=

(

)

ˆ

se0.0165670.1287

l

==

(

)

(

)

(

)

(

)

(

)

(

)

(

)

0.975,710.975,71

ˆˆˆˆ

se,se

2.014-1.9940.1287,2.0141.9940.1287

1.757, 2.271

tt

llll

-+

=´+´

=

123

βββ

PIZZAAGEINCOMEe

=+++

(

)

2

β

EPIZZAAGE

¶¶=

(

)

3

β

EPIZZAINCOME

¶¶=

(

)

(

)

342.887.5761.832

(t) -3.27

3.95

PIZZAAGEINCOME

=-+

(

)

1234

ββββ

PIZZAAGEINCOMEAGEINCOMEe

=+++´+

(

)

24

ββ

EPIZZAAGEINCOME

¶¶=+

(

)

34

ββ

EPIZZAINCOMEAGE

¶¶=+

(

)

(

)

(

)

(

)

161.472.9776.9800.1232

(t) -0.89

2.47 1.85

PIZZAAGEINCOMEAGEINCOME

=-+-´

-

(

)

·

24

2.9770.1232

-6.06 for = 25

-14.07 for = 90

EPIZZA

bbINCOME

AGE

INCOME

INCOME

INCOME

=+

=--

ì

=

í

î

(

)

123

ln

βββ

WAGEEDUCEXPERe

=+++

(

)

(

)

1234

ln

ββββ

WAGEEDUCEXPEREDUCEXPERe

=+++´+

(

)

34

fixed

ln

ββ

EDUC

WAGE

EDUC

EXPER

D

=+

D

(

)

·

(

)

ln1.3920.094940.00633

0.0000364

WAGEEDUCEXPER

EDUCEXPER

=++

(

)

(

)

1234

2

5

ln

ββββ

β

WAGEEDUCEXPEREDUCEXPER

EXPERe

=+++´

++

(

)

345

%100

β+β2β%

WAGEEDUCEXPER

D@+

(

)

(

)

(

)

2

2

1

2

1

2

1

2

1

ˆ

1

ˆ

1

N

i

i

N

i

i

N

i

i

N

i

i

yy

SSR

R

SST

yy

SSE

SST

e

yy

=

=

=

=

-

==

-

=-

=-

-

å

å

å

å

12233

ˆ

bb

iiiKiK

yxbxbx

=++++

L

(

)

2

1

y

SSTNs

=-

(

)

2

2

1

2

1

ˆ

1718.943

110.448

3115.482

N

i

i

N

i

i

e

R

yy

=

=

=--=

-

å

å

(

)

(

)

22

2

111

ˆˆ

NNN

iii

iii

yyyye

===

-¹-+

ååå