Urgent Econometrics test

profilewc15920665166
Lecture02.pptx

Simple Linear Regression

Economists are interested in studying relationships between economic variables

For example, economic theory tells us that expenditure on goods depends on income

Expenditure on goods is the dependent variable and income is an independent (or ‘‘explanatory’’ variable) variable.

In order to study the relationship between expenditure and income, we need to construct an econometric model. This econometric model is also called a regression model.

An Economics Model

The simple regression function is written as

where β1 is the intercept and β2 is the slope

A simple regression model has only one independent variable

Figure 2.2 The economic model: a linear relationship between average per person food expenditure and income

Figure 2.3 The probability density function for y at two levels of income

Assumption 1: The mean value of y, for a given x, can be written as

Assumption 2: For each x, the variance of y is given by

Assumption 3: The sample values of y are uncorrelated with each other, implying that no linear association exists among them

Simple linear regression – Assumptions-I

Assumption 4: The variable x is not random, and must take at least two different values

Assumption 5: (optional) The values of y are normally distributed about their mean for each value of x

Simple linear regression – Assumptions-I

Assumption SR1: The value of y, for each value of x, is:

Assumption SR2: The expected value of the random error e is:

as

If SR2 holds good then

Simple linear regression – Assumptions-II

Figure 2.4 Probability density functions for e and y

Assumption SR3: The variance of the random error e is:

Assumption SR4: The covariance between any pair of random errors, ei and ej is:

If we assume (stronger assumption than the above one) that the random errors e are statistically independent, then this also means that dependent variables y are also statistically independent

Simple linear regression – Assumptions-II

Assumption SR5: The variable x is not random, and must take at least two different values

Assumption SR6: (optional) The values of e are normally distributed about their mean if the values of y are normally distributed, and vice versa

Simple linear regression – Assumptions-II

Figure 2.5 The relationship among y, e and the true regression line

Table 2.1 Food Expenditure and Income Data

Estimating the Regression Parameters

Figure 2.6 Data for food expenditure example

The fitted regression line is given by:

The least squares residual is given by:

Figure 2.7 The relationship among y, ê and the fitted regression line

The least squares line has the smaller sum of squared residuals:

Least squares estimates for the unknown parameters β1 and β2 are obtained my minimizing the sum of squares function:

Therefore, the fitted regression line is given by

Least Squares estimators & an example

Figure 2.8 The fitted regression line

If income increases by $100, expected weekly expenditure on food will increase by approximately $10.21

The intercept estimate b1 = 83.42 is an estimate of the weekly food expenditure on food for a household with no income

The elasticity of a variable y with respect to another variable x is:

The elasticity of mean expenditure with respect to income is:

Example:

Interpretation of regression coefficients and elasticity

If we want to predict weekly food expenditure for a household with a weekly income of $2000, this can be achieved by substituting x = 20 into our estimated equation to obtain:

Based on the above prediction, a household with a weekly income of $2000 will spend $287.61 per week on food

Prediction

Figure 2.9 EViews Regression Output

The estimator b2 can be rewritten as:

where

It can also be written as:

Properties of least squares estimators

As and we have

Unbiasedness does not mean that an estimate based on a particular sample is close to the true population parameter

However, it can be said that the least squares estimator is unbiased

Averages of estimates from many samples would approach the true parameter values

Unbiasedness property of least squares estimators

Table 2.2 Estimates from 10 Samples

If assumptions SR1-SR5 are satisfied then the variances and covariance of b1 and b2 can be written as:

(2.14)

(2.15)

(2.16)

Variances and covariances of b1 & b2

The larger the variance term σ2 , the larger the variances and covariance of the least squares estimators.

The larger the sum of squares, , the smaller the variances of the least squares estimators and the more precisely we can estimate the unknown parameters.

Large the sample size N, leads to smaller the variances and covariance of the least squares estimators.

The larger the term , the larger the variance of the least squares estimator b1.

Absolute magnitude of the covariance increases with the magnitude of the sample mean and the covariance, but it has an opposite sign of sample mean.

Variances and covariances of b1 & b2

Figure 2.11 The influence of variation in the explanatory variable x on precision of estimation (a) Low x variation, low precision (b) High x variation, high precision

If the assumptions SR1-SR5 of the linear regression model are true, then the estimators b1 and b2 have the smallest variance of all linear and unbiased estimators of b1 and b2. They are the Best Linear Unbiased Estimators (BLUE) of b1 and b2 . It must be noted that the Theorem does not imply that b1 and b2 are the best of all possible estimators.

This Theorem is valid even if the assumption (however, SR1-SR5 must hold good ) SR6 is not true.

The Gauss-Markov theorem holds good for the least squares estimators.

Gauss-Markov Theorem

If the error term has a normal distribution then the least squares estimators have the following normal distribution:

In large samples (N is sufficiently large) the least squares estimators have the above mentioned (normal) distributions even if the error term is not normally distributed. This follows from Central Limit Theorem (CLT).

The Probability Distributions of the Least Squares Estimators

The least squares residuals are obtained by replacing the unknown parameters by their least squares estimates:

The unbiased estimate is slightly different and is given by

for simple linear regression

The estimated variance for least square estimators are obtained by replacing the unknown error variance σ2 in Eq. 2.14 – Eq. 2.16 by

Estimating the Variance of the Error Term

Table 2.3 Least Squares Residuals

Estimated covariance matrix for the food expenditure data is:

C Income

C 1884.442 -85.90316

Income -85.90316 4.381752

The standard errors of b1 and b2 are measures of the sampling variability of the least squares estimates b1 and b2 in repeated samples.

The quadratic function y = β1 + β2x2 is a parabola

The elasticity, or the percentage change in y given a 1% change in x, is:

Non-linear models

Figure 2.13 A quadratic function

A non-linear model for house prices includes the squared value of SQFT, that is:

The slope can be written as:

If , then the slope increases as the area (SQFT) of house increases.

For 1080 houses sold in Baton Rouge, LA during mid-2005, the estimated quadratic equation is:

The estimated slope for the above equation can be written as:

The elasticity is:

Example

For houses of 2000, 4000 and 6000 square feet, the estimated elasticities are:

respectively

For a 2000-square-foot house, we estimate that a 1% increase in house size will increase price by 1.05%

1.05 using

1.63 using

1.82 using

The log-linear equation ln(y) = a + bx has a logarithmic term on the left-hand side of the equation and a linear variable on the right-hand side

The slope can be written as:

The elasticity, the percentage change in y given a 1% increase in x, at a point on this curve is:

The semi-elasticity, which tells us the percentage change in y given a 1-unit increase in x, can be written as:

Log-linear regression model

Logarithmic transformation can regularize data that is skewed with a long tail to the right

Figure 2.16 (a) Histogram of PRICE (b) Histogram of ln(PRICE)

For the Baton Rouge data, the fitted log-linear model is given by:

The predicted price is given by:

The slope of the log-linear model is:

Example

The estimated elasticity is:

For a house with 2000-square-feet, the estimated elasticity is 0.823:

A 1% increase in house size is estimated to increase selling price by 0.823%

Using the ‘‘semi-elasticity’’ defined above we can conclude that, if area increases by one-square-foot, the estimated price increases by 0.04%

In a simple regression model an indicator variable on the right-hand side allows us to estimate the differences between population means

Dummy variable regression

Figure 2.18 Distributions of house prices

12

(|)

μββ

y

Eyxx

==+

12

(|)

ββ

Eyxx

=+

2

var(|)

σ

yx

=

0

)

,

cov(

=

j

i

y

y

2

12

~(

ββ,σ)

yNx

+

12

ββ

yxe

=++

0

)

(

=

e

E

12

()

ββ

Eyx

=+

12

(|)(|)

ββ0

EexEyxx

=--=

2

var()

σvar()

ey

==

0

)

,

cov(

)

,

cov(

=

=

j

i

j

i

y

y

e

e

2

~(0,

σ)

eN

12

ˆ

ii

ybbx

=+

i

i

i

i

i

x

b

b

y

y

y

e

2

1

ˆ

ˆ

-

-

=

-

=

*

1

2

*

*

1

2

then

ˆ

and

ˆ

if

SSE

SSE

e

SSE

e

SSE

N

i

i

N

i

i

<

=

=

å

å

=

=

2

1212

1

(

β,β)(ββ)

N

ii

i

Syx

=

=--

å

å

å

-

-

-

=

2

2

)

(

)

)(

(

x

x

y

y

x

x

b

i

i

i

x

b

y

b

2

1

-

=

2096

.

10

7876

.

1828

2684

.

18671

)

(

)

)(

(

2

2

=

=

-

-

-

=

å

å

x

x

y

y

x

x

b

i

i

i

4160

.

83

)

6048

.

19

)(

2096

.

10

(

5735

.

283

2

1

=

-

=

-

=

x

b

y

b

i

i

x

y

21

.

10

42

.

83

ˆ

+

=

y

x

x

y

x

y

D

D

=

=

in

change

percentage

in

change

percentage

e

2

()()()

β

()()

EyEyEyxx

xxxEyEy

e

DD

===

DD

71

.

0

57

.

283

60

.

19

21

.

10

ˆ

2

=

´

=

=

y

x

b

e

61

.

287

)

20

(

21

.

10

42

.

83

21

.

10

42

.

83

ˆ

=

+

=

+

=

i

x

y

å

=

=

N

i

i

i

y

w

b

1

2

2

()

i

i

i

xx

w

xx

-

=

-

å

22

β

ii

bwe

=+

å

)

(

)

(

i

i

i

i

e

E

w

e

w

E

=

0

)

(

=

i

e

E

2221122

21122

2

2

2

()(

β)(β...)

(

β)()()...()

(

β)()

β()

β

iiNN

NN

ii

ii

EbEweEwewewe

EEweEweEwe

EEwe

wEe

=+=++++

=++++

=+

=+

=

å

å

å

(

)

2

2

1

2

var()

σ

i

i

x

b

Nxx

éù

êú

=

-

êú

ëû

å

å

(

)

2

2

2

σ

var()

i

b

xx

=

-

å

(

)

2

12

2

cov(,)

σ

i

x

bb

xx

éù

-

êú

=

-

êú

ëû

å

(

)

2

i

xx

-

å

å

2

i

x

[

]

2

2

2

2

)

(

)

var(

b

E

b

E

b

-

=

(

)

22

11

2

σ

~

β,

i

i

x

bN

Nxx

æö

ç÷

ç÷

-

èø

å

å

(

)

2

22

2

σ

~

β,

i

bN

xx

æö

ç÷

ç÷

-

èø

å

12

2

2

ˆˆ

ˆ

σ

iiiii

i

eyyybbx

e

N

=-=--

=

å

2

ˆ

2

2

2

-

=

-

=

å

å

N

e

K

N

e

i

i

s

2

=

K

.

ˆ

2

s

2

2

ˆ

304505.2

ˆ

σ8013.29

238

i

e

N

===

-

å

y

bx

y

x

slope

2

2

=

´

=

e

2

12

αα

PRICESQFTe

=++

(

)

2

ˆ

ˆ

2

α

dPRICE

SQFT

dSQFT

=

2

ˆ

α0

>

·

2

557765600154

PRICE..SQFT

=+

·

(

)

2

ˆ

ˆ

2

α

SQFT

slope

PRICE

SQFT

SQFT

PRICE

e

·

(

)

200154

slope.SQFT

=

7

$117,461.7

CE

I

ˆ

PR

=

9

$302,517.3

CE

I

ˆ

PR

=

2

$610,943.4

CE

I

ˆ

PR

=

by

dx

dy

=

bx

y

x

slope

=

´

=

e

(

)

100

η100

dyy

b

dx

==

·

(

)

ln10.83860.0004113

PRICESQFT

=+

·

·

(

)

(

)

explnexp10.83860.0004113

PRICEPRICESQFT

éù

==+

ëû

·

(

)

·

·

2

ˆ

γ0.0004113

dPRICE

PRICEPRICE

dSQFT

==

2

ˆ

ˆ

γ0.0004113

SQFTSQFT

e

==

î

í

ì

=

Oaks

Golden

in

is

house

0

Town

ity

in Univers

is

house

1

UTOWN

12

ββ

PRICEUTOWNe

=++

(

)

12

1

ββ if 1

β if 0

UTOWN

EPRICE

UTOWN

+=

ì

=

í

=

î