Urgent Econometrics test
Simple Linear Regression
Economists are interested in studying relationships between economic variables
For example, economic theory tells us that expenditure on goods depends on income
Expenditure on goods is the dependent variable and income is an independent (or ‘‘explanatory’’ variable) variable.
In order to study the relationship between expenditure and income, we need to construct an econometric model. This econometric model is also called a regression model.
An Economics Model
The simple regression function is written as
where β1 is the intercept and β2 is the slope
A simple regression model has only one independent variable
Figure 2.2 The economic model: a linear relationship between average per person food expenditure and income
Figure 2.3 The probability density function for y at two levels of income
Assumption 1: The mean value of y, for a given x, can be written as
Assumption 2: For each x, the variance of y is given by
Assumption 3: The sample values of y are uncorrelated with each other, implying that no linear association exists among them
Simple linear regression – Assumptions-I
Assumption 4: The variable x is not random, and must take at least two different values
Assumption 5: (optional) The values of y are normally distributed about their mean for each value of x
Simple linear regression – Assumptions-I
Assumption SR1: The value of y, for each value of x, is:
Assumption SR2: The expected value of the random error e is:
as
If SR2 holds good then
Simple linear regression – Assumptions-II
Figure 2.4 Probability density functions for e and y
Assumption SR3: The variance of the random error e is:
Assumption SR4: The covariance between any pair of random errors, ei and ej is:
If we assume (stronger assumption than the above one) that the random errors e are statistically independent, then this also means that dependent variables y are also statistically independent
Simple linear regression – Assumptions-II
Assumption SR5: The variable x is not random, and must take at least two different values
Assumption SR6: (optional) The values of e are normally distributed about their mean if the values of y are normally distributed, and vice versa
Simple linear regression – Assumptions-II
Figure 2.5 The relationship among y, e and the true regression line
Table 2.1 Food Expenditure and Income Data
Estimating the Regression Parameters
Figure 2.6 Data for food expenditure example
The fitted regression line is given by:
The least squares residual is given by:
Figure 2.7 The relationship among y, ê and the fitted regression line
The least squares line has the smaller sum of squared residuals:
Least squares estimates for the unknown parameters β1 and β2 are obtained my minimizing the sum of squares function:
Therefore, the fitted regression line is given by
Least Squares estimators & an example
Figure 2.8 The fitted regression line
If income increases by $100, expected weekly expenditure on food will increase by approximately $10.21
The intercept estimate b1 = 83.42 is an estimate of the weekly food expenditure on food for a household with no income
The elasticity of a variable y with respect to another variable x is:
The elasticity of mean expenditure with respect to income is:
Example:
Interpretation of regression coefficients and elasticity
If we want to predict weekly food expenditure for a household with a weekly income of $2000, this can be achieved by substituting x = 20 into our estimated equation to obtain:
Based on the above prediction, a household with a weekly income of $2000 will spend $287.61 per week on food
Prediction
Figure 2.9 EViews Regression Output
The estimator b2 can be rewritten as:
where
It can also be written as:
Properties of least squares estimators
As and we have
Unbiasedness does not mean that an estimate based on a particular sample is close to the true population parameter
However, it can be said that the least squares estimator is unbiased
Averages of estimates from many samples would approach the true parameter values
Unbiasedness property of least squares estimators
Table 2.2 Estimates from 10 Samples
If assumptions SR1-SR5 are satisfied then the variances and covariance of b1 and b2 can be written as:
(2.14)
(2.15)
(2.16)
Variances and covariances of b1 & b2
The larger the variance term σ2 , the larger the variances and covariance of the least squares estimators.
The larger the sum of squares, , the smaller the variances of the least squares estimators and the more precisely we can estimate the unknown parameters.
Large the sample size N, leads to smaller the variances and covariance of the least squares estimators.
The larger the term , the larger the variance of the least squares estimator b1.
Absolute magnitude of the covariance increases with the magnitude of the sample mean and the covariance, but it has an opposite sign of sample mean.
Variances and covariances of b1 & b2
Figure 2.11 The influence of variation in the explanatory variable x on precision of estimation (a) Low x variation, low precision (b) High x variation, high precision
If the assumptions SR1-SR5 of the linear regression model are true, then the estimators b1 and b2 have the smallest variance of all linear and unbiased estimators of b1 and b2. They are the Best Linear Unbiased Estimators (BLUE) of b1 and b2 . It must be noted that the Theorem does not imply that b1 and b2 are the best of all possible estimators.
This Theorem is valid even if the assumption (however, SR1-SR5 must hold good ) SR6 is not true.
The Gauss-Markov theorem holds good for the least squares estimators.
Gauss-Markov Theorem
If the error term has a normal distribution then the least squares estimators have the following normal distribution:
In large samples (N is sufficiently large) the least squares estimators have the above mentioned (normal) distributions even if the error term is not normally distributed. This follows from Central Limit Theorem (CLT).
The Probability Distributions of the Least Squares Estimators
The least squares residuals are obtained by replacing the unknown parameters by their least squares estimates:
The unbiased estimate is slightly different and is given by
for simple linear regression
The estimated variance for least square estimators are obtained by replacing the unknown error variance σ2 in Eq. 2.14 – Eq. 2.16 by
Estimating the Variance of the Error Term
Table 2.3 Least Squares Residuals
Estimated covariance matrix for the food expenditure data is:
C Income
C 1884.442 -85.90316
Income -85.90316 4.381752
The standard errors of b1 and b2 are measures of the sampling variability of the least squares estimates b1 and b2 in repeated samples.
The quadratic function y = β1 + β2x2 is a parabola
The elasticity, or the percentage change in y given a 1% change in x, is:
Non-linear models
Figure 2.13 A quadratic function
A non-linear model for house prices includes the squared value of SQFT, that is:
The slope can be written as:
If , then the slope increases as the area (SQFT) of house increases.
For 1080 houses sold in Baton Rouge, LA during mid-2005, the estimated quadratic equation is:
The estimated slope for the above equation can be written as:
The elasticity is:
Example
For houses of 2000, 4000 and 6000 square feet, the estimated elasticities are:
respectively
For a 2000-square-foot house, we estimate that a 1% increase in house size will increase price by 1.05%
1.05 using
1.63 using
1.82 using
The log-linear equation ln(y) = a + bx has a logarithmic term on the left-hand side of the equation and a linear variable on the right-hand side
The slope can be written as:
The elasticity, the percentage change in y given a 1% increase in x, at a point on this curve is:
The semi-elasticity, which tells us the percentage change in y given a 1-unit increase in x, can be written as:
Log-linear regression model
Logarithmic transformation can regularize data that is skewed with a long tail to the right
Figure 2.16 (a) Histogram of PRICE (b) Histogram of ln(PRICE)
For the Baton Rouge data, the fitted log-linear model is given by:
The predicted price is given by:
The slope of the log-linear model is:
Example
The estimated elasticity is:
For a house with 2000-square-feet, the estimated elasticity is 0.823:
A 1% increase in house size is estimated to increase selling price by 0.823%
Using the ‘‘semi-elasticity’’ defined above we can conclude that, if area increases by one-square-foot, the estimated price increases by 0.04%
In a simple regression model an indicator variable on the right-hand side allows us to estimate the differences between population means
Dummy variable regression
Figure 2.18 Distributions of house prices
12
(|)
μββ
y
Eyxx
==+
12
(|)
ββ
Eyxx
=+
2
var(|)
σ
yx
=
0
)
,
cov(
=
j
i
y
y
2
12
~(
ββ,σ)
yNx
+
12
ββ
yxe
=++
0
)
(
=
e
E
12
()
ββ
Eyx
=+
12
(|)(|)
ββ0
EexEyxx
=--=
2
var()
σvar()
ey
==
0
)
,
cov(
)
,
cov(
=
=
j
i
j
i
y
y
e
e
2
~(0,
σ)
eN
12
ˆ
ii
ybbx
=+
i
i
i
i
i
x
b
b
y
y
y
e
2
1
ˆ
ˆ
-
-
=
-
=
*
1
2
*
*
1
2
then
ˆ
and
ˆ
if
SSE
SSE
e
SSE
e
SSE
N
i
i
N
i
i
<
=
=
å
å
=
=
2
1212
1
(
β,β)(ββ)
N
ii
i
Syx
=
=--
å
å
å
-
-
-
=
2
2
)
(
)
)(
(
x
x
y
y
x
x
b
i
i
i
x
b
y
b
2
1
-
=
2096
.
10
7876
.
1828
2684
.
18671
)
(
)
)(
(
2
2
=
=
-
-
-
=
å
å
x
x
y
y
x
x
b
i
i
i
4160
.
83
)
6048
.
19
)(
2096
.
10
(
5735
.
283
2
1
=
-
=
-
=
x
b
y
b
i
i
x
y
21
.
10
42
.
83
ˆ
+
=
y
x
x
y
x
y
D
D
=
=
in
change
percentage
in
change
percentage
e
2
()()()
β
()()
EyEyEyxx
xxxEyEy
e
DD
===
DD
71
.
0
57
.
283
60
.
19
21
.
10
ˆ
2
=
´
=
=
y
x
b
e
61
.
287
)
20
(
21
.
10
42
.
83
21
.
10
42
.
83
ˆ
=
+
=
+
=
i
x
y
å
=
=
N
i
i
i
y
w
b
1
2
2
()
i
i
i
xx
w
xx
-
=
-
å
22
β
ii
bwe
=+
å
)
(
)
(
i
i
i
i
e
E
w
e
w
E
=
0
)
(
=
i
e
E
2221122
21122
2
2
2
()(
β)(β...)
(
β)()()...()
(
β)()
β()
β
iiNN
NN
ii
ii
EbEweEwewewe
EEweEweEwe
EEwe
wEe
=+=++++
=++++
=+
=+
=
å
å
å
(
)
2
2
1
2
var()
σ
i
i
x
b
Nxx
éù
êú
=
-
êú
ëû
å
å
(
)
2
2
2
σ
var()
i
b
xx
=
-
å
(
)
2
12
2
cov(,)
σ
i
x
bb
xx
éù
-
êú
=
-
êú
ëû
å
(
)
2
i
xx
-
å
å
2
i
x
[
]
2
2
2
2
)
(
)
var(
b
E
b
E
b
-
=
(
)
22
11
2
σ
~
β,
i
i
x
bN
Nxx
æö
ç÷
ç÷
-
èø
å
å
(
)
2
22
2
σ
~
β,
i
bN
xx
æö
ç÷
ç÷
-
èø
å
12
2
2
ˆˆ
ˆ
σ
iiiii
i
eyyybbx
e
N
=-=--
=
å
2
ˆ
2
2
2
-
=
-
=
å
å
N
e
K
N
e
i
i
s
2
=
K
.
ˆ
2
s
2
2
ˆ
304505.2
ˆ
σ8013.29
238
i
e
N
===
-
å
y
bx
y
x
slope
2
2
=
´
=
e
2
12
αα
PRICESQFTe
=++
(
)
2
ˆ
ˆ
2
α
dPRICE
SQFT
dSQFT
=
2
ˆ
α0
>
·
2
557765600154
PRICE..SQFT
=+
·
(
)
2
ˆ
ˆ
2
α
SQFT
slope
PRICE
SQFT
SQFT
PRICE
e
=´
=´
·
(
)
200154
slope.SQFT
=
7
$117,461.7
CE
I
ˆ
PR
=
9
$302,517.3
CE
I
ˆ
PR
=
2
$610,943.4
CE
I
ˆ
PR
=
by
dx
dy
=
bx
y
x
slope
=
´
=
e
(
)
100
η100
dyy
b
dx
==
·
(
)
ln10.83860.0004113
PRICESQFT
=+
·
·
(
)
(
)
explnexp10.83860.0004113
PRICEPRICESQFT
éù
==+
ëû
·
(
)
·
·
2
ˆ
γ0.0004113
dPRICE
PRICEPRICE
dSQFT
==
2
ˆ
ˆ
γ0.0004113
SQFTSQFT
e
==
î
í
ì
=
Oaks
Golden
in
is
house
0
Town
ity
in Univers
is
house
1
UTOWN
12
ββ
PRICEUTOWNe
=++
(
)
12
1
ββ if 1
β if 0
UTOWN
EPRICE
UTOWN
+=
ì
=
í
=
î