Econometrics homework

profileVincent666
Class5.pptx

Simple LinearRegression

Class 5 for Econometrics 1

Vincent Geloso

Recap

In this unit, we will expand on what we saw with correlations and try to understand regressions (bivariate – i.e. only two variables).

I know some of you are wondering what will we do with theme 3 (normal distribution, central limit theorem, t-stat, z-score). Hold your breath! What we will see in this unit will be combined with t-stats and elements from previous classes in the units on « hypothesis testing ».

After we are done with hypothesis testing, we will move to multiple regressions.

Concept of regression

With correlation, we simply explained the relation between variables.

Regressions are different as you must define which one is affecting the other.

A given variable’s movements (Y ) are explained by movements in another (X ).

If so, we can « predict » how further changing the other variable ( X ) will affect the given variable (Y )

Thus, if I say Y is influenced by X, I am saying Y=f(x)

Concept of regression

Thus, if I say Y is influenced by X, I am saying Y=f(x)

I am saying that Y is an dependent variable and X is an independent variable. The dependent is the explained variable and the independent is the explanatory variable.

This requires a priori decision

Textbook example: crime rates in Britain during the 19th century and wheat prices (Question: do you think crime rates affect wheat prices or the reverse?)

Textbook counter-example (to invite caution): Government dole causes high unemployment during British Great Depression or is it the high unemployment that causes government dole? (p. 94)

Linear Regression Model

The relationship between X and Y is described by a linear function

Changes in Y are assumed to be influenced by changes in X

Linear regression population equation model

Important assumptions that you must know but which will be discussed more in econometrics 2: a) the relationship is linear; b) errors are independent of the explanatory variable; c) errors have a constant variance and a mean of zero (they cancel out); d) the errors are note correlated with each other

Simple Linear Regression Model

(continued)

Random Error for this Xi value

Y

X

Observed Value of Y for xi

Predicted Value of Y for xi

xi

Slope = β1

Intercept = β0

εi

Remember the notation!

This is what we discuss today

Simple Linear Regression Equation

The simple linear regression equation provides an estimate of the population regression line

Estimate of the regression intercept

Estimate of the regression slope

Estimated (or predicted) y value for observation i

Value of x for observation i

The individual random error terms ei have a mean of zero

Simple Linear Regression Equation

The simple linear regression equation provides an estimate of the population regression line (notice something here: if this is an estimate of the population line, do you think we can later use t-stat to see how well this estimate (from a sample) speaks to population)

Estimate of the regression intercept

Estimate of the regression slope

Estimated (or predicted) y value for observation i

Value of x for observation i

The individual random error terms ei have a mean of zero

Least Squares Coefficient Estimators

b0 and b1 are obtained by finding the values of b0 and b1 that minimize the sum of the squared residuals (errors), SSE:

Differential calculus is used to obtain the coefficient estimators b0 and b1 that minimize SSE (but we wont do that here – we will do it in econometrics II)

This image illustrates well why we take the « Squares » of distances to the predicted and this is what we want to minimize!

Least Squares Coefficient Estimators

The slope coefficient estimator is

And the constant or y-intercept is

The regression line always goes through the mean x, y

(continued)

Simple Linear Regression Example

A real estate agent wishes to examine the relationship between the selling price of a home and its size (measured in square feet)

A random sample of 10 houses is selected

Dependent variable (Y) = house price in $1000s

Independent variable (X) = square feet

Sample Data for House Price Model

House Price in $1000s (Y) Square Feet (X)
245 1400
312 1600
279 1700
308 1875
199 1100
219 1550
405 2350
324 2450
319 1425
255 1700

Graphical Presentation

House price model: scatter plot

Regression Using Excel

Ch. 11-17

Excel will be used to generate the coefficients and measures of goodness of fit for regression

Data / Data Analysis / Regression

Regression Using Excel

Data / Data Analysis / Regression

(continued)

Provide desired input:

Excel Output

We will deal with these later (theme 6 on hypothesis testing)

Excel Output

Regression Statistics
Multiple R 0.76211
R Square 0.58082
Adjusted R Square 0.52842
Standard Error 41.33032
Observations 10
ANOVA   df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000      
  Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

The regression equation is:

(continued)

Graphical Presentation

House price model: scatter plot and regression line

Slope

= 0.10977

Intercept

= 98.248

Interpretation of the Intercept, b0

b0 is the estimated average value of Y when the value of X is zero (if X = 0 is in the range of observed X values)

Here, no houses had 0 square feet, so b0 = 98.24833 just indicates that, for houses within the range of sizes observed, $98,248.33 is the portion of the house price not explained by square feet

Interpretation of the Slope Coefficient, b1

b1 measures the estimated change in the average value of Y as a result of a one-unit change in X

Here, b1 = .10977 tells us that the average value of a house increases by .10977($1000) = $109.77, on average, for each additional one square foot of size

Explanatory Power of a Linear Regression Equation

Total variation is made up of two parts:

Total Sum of Squares

Regression Sum of Squares

Error (residual) Sum of Squares

where:

= Average value of the dependent variable

yi = Observed values of the dependent variable

i = Predicted value of y for the given xi value

Analysis of Variance

SST = total sum of squares

Measures the variation of the yi values around their mean, y

SSR = regression sum of squares

Explained variation attributable to the linear relationship between x and y

SSE = error sum of squares

Variation attributable to factors other than the linear relationship between x and y

Analysis of Variance

(continued)

xi

y

X

yi

SST = (yi - y)2

SSE = (yi - yi )2

SSR = (yi - y)2

_

_

_

y

Y

y

_

y

Explained variation

Unexplained variation

Coefficient of Determination, R2

The coefficient of determination is the portion of the total variation in the dependent variable that is explained by variation in the independent variable

The coefficient of determination is also called R-squared and is denoted as R2

note:

Examples of Approximate r2 Values

r2 = 1

Y

X

Y

X

r2 = 1

r2 = 1

Perfect linear relationship between X and Y:

100% of the variation in Y is explained by variation in X

Examples of Approximate r2 Values

Y

X

Y

X

0 < r2 < 1

Weaker linear relationships between X and Y:

Some but not all of the variation in Y is explained by variation in X

Examples of Approximate r2 Values

r2 = 0

No linear relationship between X and Y:

The value of Y does not depend on X. (None of the variation in Y is explained by variation in X)

Y

X

r2 = 0

Excel Output

Regression Statistics
Multiple R 0.76211
R Square 0.58082
Adjusted R Square 0.52842
Standard Error 41.33032
Observations 10
ANOVA   df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000      
  Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

58.08% of the variation in house prices is explained by variation in square feet

Correlation and R2

The coefficient of determination, R2, for a simple regression is equal to the simple correlation squared

Correlation and R2

Take the “Relief” dataset in OWL and do the regression in two ways.

1st: do all the steps to calculate the estimator (b) of the relation between unemployment (y) and relief (x) – that means get your variance, your covariance, your mean etc. Also, calculate the R-squared by using all the tools one by one. That means you calculate the standard deviations, the correlation coefficient etc.

2nd: Once you are done with the first step, just run the regression as excel would (if you want to use Stata, just copy paste the data and have it run the following line : reg unemp relief). Compare the results you got in the first step to those the software churns out of you.

Note: I know this sounds tedious, but it is incredibly useful for you to do so because you will understand the ingredients (and tools) of a regression.

Regression of a time series (fitting a time-trend)

i

i

1

0

i

ε

X

β

β

Y

+

+

=

i

1

0

i

x

b

b

y

+

=

ˆ

)

)

ˆ

(

i

1

0

i

i

i

i

x

b

(b

-

y

y

-

y

e

+

=

=

2

i

1

0

i

2

i

i

n

1

i

2

i

)]

x

b

(b

[y

min

)

y

(y

min

e

min

SSE

min

+

-

=

-

=

=

å

å

å

=

ˆ

x

y

2

x

n

1

i

2

i

n

1

i

i

i

1

s

s

r

s

y)

Cov(x,

)

x

(x

)

y

)(y

x

(x

b

=

=

-

-

-

=

å

å

=

=

x

b

y

b

1

0

-

=

0

.2

.4

.6

Livres (monetary) in rents per arpent

170180190200210

Length of growing season (days)

Rents per arpent (livres)Fitted values

Seigneurial rents (private taxes) per arpent and land quality in Lower Canada, 1831

0

50

100

150

200

250

300

350

400

450

050010001500200025003000

Square Feet

House Price ($1000s)

Chart2

1400
1600
1700
1875
1100
1550
2350
2450
1425
1700
House Price
Square Feet
House Price ($1000s)
245
312
279
308
199
219
405
324
319
255

Sheet4

SUMMARY OUTPUT
Regression Statistics
Multiple R 0.76211
R Square 0.58082
Adjusted R Square 0.52842
Standard Error 41.33032
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.08476 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
RESIDUAL OUTPUT
Observation Predicted House Price Residuals
1 251.9231625835 -6.9231625835
2 273.8767101495 38.1232898505
3 284.8534839325 -5.8534839325
4 304.0628380528 3.9371619472
5 218.9928412345 -19.9928412345
6 268.388323258 -49.388323258
7 356.2025135221 48.7974864779
8 367.1792873051 -43.1792873051
9 254.6673560293 64.3326439707
10 284.8534839325 -29.8534839325

Sheet4

1400 1400
1600 1600
1700 1700
1875 1875
1100 1100
1550 1550
2350 2350
2450 2450
1425 1425
1700 1700
House Price
Predicted House Price
Square Feet
House Price
Square Feet Line Fit Plot
245
0
312
0
279
0
308
0
199
0
219
0
405
0
324
0
319
0
255
0

Sheet1

House Price Square Feet
245 1400
312 1600
279 1700
308 1875
199 1100
219 1550
405 2350
324 2450
319 1425
255 1700

Sheet1

0
0
0
0
0
0
0
0
0
0
House Price
Square Feet
House Price
0
0
0
0
0
0
0
0
0
0

Sheet2

Sheet3

feet)

(square

0.10977

98.24833

price

house

+

=

0

50

100

150

200

250

300

350

400

450

050010001500200025003000

Square Feet

House Price ($1000s)

Chart2

1400
1600
1700
1875
1100
1550
2350
2450
1425
1700
House Price
Square Feet
House Price ($1000s)
245
312
279
308
199
219
405
324
319
255

Sheet4

SUMMARY OUTPUT
Regression Statistics
Multiple R 0.76211
R Square 0.58082
Adjusted R Square 0.52842
Standard Error 41.33032
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.08476 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
RESIDUAL OUTPUT
Observation Predicted House Price Residuals
1 251.9231625835 -6.9231625835
2 273.8767101495 38.1232898505
3 284.8534839325 -5.8534839325
4 304.0628380528 3.9371619472
5 218.9928412345 -19.9928412345
6 268.388323258 -49.388323258
7 356.2025135221 48.7974864779
8 367.1792873051 -43.1792873051
9 254.6673560293 64.3326439707
10 284.8534839325 -29.8534839325

Sheet4

1400 1400
1600 1600
1700 1700
1875 1875
1100 1100
1550 1550
2350 2350
2450 2450
1425 1425
1700 1700
House Price
Predicted House Price
Square Feet
House Price
Square Feet Line Fit Plot
245
0
312
0
279
0
308
0
199
0
219
0
405
0
324
0
319
0
255
0

Sheet1

House Price Square Feet
245 1400
312 1600
279 1700
308 1875
199 1100
219 1550
405 2350
324 2450
319 1425
255 1700

Sheet1

0
0
0
0
0
0
0
0
0
0
House Price
Square Feet
House Price
0
0
0
0
0
0
0
0
0
0

Sheet2

Sheet3

feet)

(square

0.10977

98.24833

price

house

+

=

SSE

SSR

SST

+

=

å

-

=

2

i

)

y

(y

SST

å

-

=

2

i

i

)

y

(y

SSE

ˆ

å

-

=

2

i

)

y

y

(

SSR

ˆ

y

ˆ

y

0.58082

32600.5000

18934.9348

SST

SSR

R

2

=

=

=

2

2

r

R

=

.5

1

1.5

2

2.5

170017501800

var2

Real WageFitted values

Fitting

a Time Trend in real wages in Canada, 1688 to 1790

.4

.6

.8

1

Dollars per day

185018601870188018901900

var1

var3Fitted values

Real Wages in Canada, 1850 to 1900