Econometrics homework

Class5.pptx

Home >Business & Finance homework help >Economics homework help >Econometrics homework

Simple LinearRegression

Class 5 for Econometrics 1

Vincent Geloso

Recap

In this unit, we will expand on what we saw with correlations and try to understand regressions (bivariate – i.e. only two variables).

I know some of you are wondering what will we do with theme 3 (normal distribution, central limit theorem, t-stat, z-score). Hold your breath! What we will see in this unit will be combined with t-stats and elements from previous classes in the units on « hypothesis testing ».

After we are done with hypothesis testing, we will move to multiple regressions.

Concept of regression

With correlation, we simply explained the relation between variables.

Regressions are different as you must define which one is affecting the other.

A given variable’s movements (Y ) are explained by movements in another (X ).

If so, we can « predict » how further changing the other variable ( X ) will affect the given variable (Y )

Thus, if I say Y is influenced by X, I am saying Y=f(x)

Concept of regression

Thus, if I say Y is influenced by X, I am saying Y=f(x)

I am saying that Y is an dependent variable and X is an independent variable. The dependent is the explained variable and the independent is the explanatory variable.

This requires a priori decision

Textbook example: crime rates in Britain during the 19th century and wheat prices (Question: do you think crime rates affect wheat prices or the reverse?)

Textbook counter-example (to invite caution): Government dole causes high unemployment during British Great Depression or is it the high unemployment that causes government dole? (p. 94)

Linear Regression Model

The relationship between X and Y is described by a linear function

Changes in Y are assumed to be influenced by changes in X

Linear regression population equation model

Important assumptions that you must know but which will be discussed more in econometrics 2: a) the relationship is linear; b) errors are independent of the explanatory variable; c) errors have a constant variance and a mean of zero (they cancel out); d) the errors are note correlated with each other

Simple Linear Regression Model

(continued)

Random Error for this Xi value

Observed Value of Y for xi

Predicted Value of Y for xi

Slope = β1

Intercept = β0

εi

Remember the notation!

This is what we discuss today

Simple Linear Regression Equation

The simple linear regression equation provides an estimate of the population regression line

Estimate of the regression intercept

Estimate of the regression slope

Estimated (or predicted) y value for observation i

Value of x for observation i

The individual random error terms ei have a mean of zero

Simple Linear Regression Equation

The simple linear regression equation provides an estimate of the population regression line (notice something here: if this is an estimate of the population line, do you think we can later use t-stat to see how well this estimate (from a sample) speaks to population)

Estimate of the regression intercept

Estimate of the regression slope

Estimated (or predicted) y value for observation i

Value of x for observation i

The individual random error terms ei have a mean of zero

Least Squares Coefficient Estimators

b0 and b1 are obtained by finding the values of b0 and b1 that minimize the sum of the squared residuals (errors), SSE:

Differential calculus is used to obtain the coefficient estimators b0 and b1 that minimize SSE (but we wont do that here – we will do it in econometrics II)

This image illustrates well why we take the « Squares » of distances to the predicted and this is what we want to minimize!

Least Squares Coefficient Estimators

The slope coefficient estimator is

And the constant or y-intercept is

The regression line always goes through the mean x, y

(continued)

Simple Linear Regression Example

A real estate agent wishes to examine the relationship between the selling price of a home and its size (measured in square feet)

A random sample of 10 houses is selected

Dependent variable (Y) = house price in $1000s

Independent variable (X) = square feet

Sample Data for House Price Model

House Price in $1000s (Y)	Square Feet (X)
245	1400
312	1600
279	1700
308	1875
199	1100
219	1550
405	2350
324	2450
319	1425
255	1700

Graphical Presentation

House price model: scatter plot

Regression Using Excel

Ch. 11-17

Excel will be used to generate the coefficients and measures of goodness of fit for regression

Data / Data Analysis / Regression

Regression Using Excel

Data / Data Analysis / Regression

(continued)

Provide desired input:

Excel Output

We will deal with these later (theme 6 on hypothesis testing)

Excel Output

Regression Statistics
Multiple R	0.76211
R Square	0.58082
Adjusted R Square	0.52842
Standard Error	41.33032
Observations	10

ANOVA	df	SS	MS	F	Significance F
Regression	1	18934.9348	18934.9348	11.0848	0.01039
Residual	8	13665.5652	1708.1957
Total	9	32600.5000

	Coefficients	Standard Error	t Stat	P-value	Lower 95%	Upper 95%
Intercept	98.24833	58.03348	1.69296	0.12892	-35.57720	232.07386
Square Feet	0.10977	0.03297	3.32938	0.01039	0.03374	0.18580

The regression equation is:

(continued)

Graphical Presentation

House price model: scatter plot and regression line

Slope

= 0.10977

Intercept

= 98.248

Interpretation of the Intercept, b0

b0 is the estimated average value of Y when the value of X is zero (if X = 0 is in the range of observed X values)

Here, no houses had 0 square feet, so b0 = 98.24833 just indicates that, for houses within the range of sizes observed, $98,248.33 is the portion of the house price not explained by square feet

Interpretation of the Slope Coefficient, b1

b1 measures the estimated change in the average value of Y as a result of a one-unit change in X

Here, b1 = .10977 tells us that the average value of a house increases by .10977($1000) = $109.77, on average, for each additional one square foot of size

Explanatory Power of a Linear Regression Equation

Total variation is made up of two parts:

Total Sum of Squares

Regression Sum of Squares

Error (residual) Sum of Squares

where:

= Average value of the dependent variable

yi = Observed values of the dependent variable

i = Predicted value of y for the given xi value

Analysis of Variance

SST = total sum of squares

Measures the variation of the yi values around their mean, y

SSR = regression sum of squares

Explained variation attributable to the linear relationship between x and y

SSE = error sum of squares

Variation attributable to factors other than the linear relationship between x and y

Analysis of Variance

(continued)

SST = (yi - y)2

SSE = (yi - yi )2



SSR = (yi - y)2



Explained variation

Unexplained variation

Coefficient of Determination, R2

The coefficient of determination is the portion of the total variation in the dependent variable that is explained by variation in the independent variable

The coefficient of determination is also called R-squared and is denoted as R2

note:

Examples of Approximate r2 Values

r2 = 1

Perfect linear relationship between X and Y:

100% of the variation in Y is explained by variation in X

Examples of Approximate r2 Values

0 < r2 < 1

Weaker linear relationships between X and Y:

Some but not all of the variation in Y is explained by variation in X

Examples of Approximate r2 Values

r2 = 0

No linear relationship between X and Y:

The value of Y does not depend on X. (None of the variation in Y is explained by variation in X)

r2 = 0

Excel Output

Regression Statistics
Multiple R	0.76211
R Square	0.58082
Adjusted R Square	0.52842
Standard Error	41.33032
Observations	10

ANOVA	df	SS	MS	F	Significance F
Regression	1	18934.9348	18934.9348	11.0848	0.01039
Residual	8	13665.5652	1708.1957
Total	9	32600.5000

	Coefficients	Standard Error	t Stat	P-value	Lower 95%	Upper 95%
Intercept	98.24833	58.03348	1.69296	0.12892	-35.57720	232.07386
Square Feet	0.10977	0.03297	3.32938	0.01039	0.03374	0.18580

58.08% of the variation in house prices is explained by variation in square feet

Correlation and R2

The coefficient of determination, R2, for a simple regression is equal to the simple correlation squared

Correlation and R2

Take the “Relief” dataset in OWL and do the regression in two ways.

1st: do all the steps to calculate the estimator (b) of the relation between unemployment (y) and relief (x) – that means get your variance, your covariance, your mean etc. Also, calculate the R-squared by using all the tools one by one. That means you calculate the standard deviations, the correlation coefficient etc.

2nd: Once you are done with the first step, just run the regression as excel would (if you want to use Stata, just copy paste the data and have it run the following line : reg unemp relief). Compare the results you got in the first step to those the software churns out of you.

Note: I know this sounds tedious, but it is incredibly useful for you to do so because you will understand the ingredients (and tools) of a regression.

Regression of a time series (fitting a time-trend)

)

(

)]

min

)

min

SSE

min

Cov(x,

)

)(y

Livres (monetary) in rents per arpent

170180190200210

Length of growing season (days)

Rents per arpent (livres)Fitted values

Seigneurial rents (private taxes) per arpent and land quality in Lower Canada, 1831

100

150

200

250

300

350

400

450

050010001500200025003000

Square Feet

House Price ($1000s)

Chart2

1400
1600
1700
1875
1100
1550
2350
2450
1425
1700

House Price

Square Feet

House Price ($1000s)

245

312

279

308

199

219

405

324

319

255

Sheet4

SUMMARY OUTPUT
Regression Statistics
Multiple R	0.76211
R Square	0.58082
Adjusted R Square	0.52842
Standard Error	41.33032
Observations	10
ANOVA
	df	SS	MS	F	Significance F
Regression	1	18934.9348	18934.9348	11.08476	0.01039
Residual	8	13665.5652	1708.1957
Total	9	32600.5000
	Coefficients	Standard Error	t Stat	P-value	Lower 95%	Upper 95%
Intercept	98.24833	58.03348	1.69296	0.12892	-35.57720	232.07386
Square Feet	0.10977	0.03297	3.32938	0.01039	0.03374	0.18580
RESIDUAL OUTPUT
Observation	Predicted House Price	Residuals
1	251.9231625835	-6.9231625835
2	273.8767101495	38.1232898505
3	284.8534839325	-5.8534839325
4	304.0628380528	3.9371619472
5	218.9928412345	-19.9928412345
6	268.388323258	-49.388323258
7	356.2025135221	48.7974864779
8	367.1792873051	-43.1792873051
9	254.6673560293	64.3326439707
10	284.8534839325	-29.8534839325

Sheet4

1400	1400
1600	1600
1700	1700
1875	1875
1100	1100
1550	1550
2350	2350
2450	2450
1425	1425
1700	1700

House Price

Predicted House Price

Square Feet

House Price

Square Feet Line Fit Plot

245

312

279

308

199

219

405

324

319

255

Sheet1

House Price	Square Feet
245	1400
312	1600
279	1700
308	1875
199	1100
219	1550
405	2350
324	2450
319	1425
255	1700

Sheet1

0
0
0
0
0
0
0
0
0
0

House Price

Square Feet

House Price

Sheet2

Sheet3

feet)

(square

0.10977

98.24833

price

house

100

150

200

250

300

350

400

450

050010001500200025003000

Square Feet

House Price ($1000s)

Chart2

1400
1600
1700
1875
1100
1550
2350
2450
1425
1700

House Price

Square Feet

House Price ($1000s)

245

312

279

308

199

219

405

324

319

255

Sheet4

SUMMARY OUTPUT
Regression Statistics
Multiple R	0.76211
R Square	0.58082
Adjusted R Square	0.52842
Standard Error	41.33032
Observations	10
ANOVA
	df	SS	MS	F	Significance F
Regression	1	18934.9348	18934.9348	11.08476	0.01039
Residual	8	13665.5652	1708.1957
Total	9	32600.5000
	Coefficients	Standard Error	t Stat	P-value	Lower 95%	Upper 95%
Intercept	98.24833	58.03348	1.69296	0.12892	-35.57720	232.07386
Square Feet	0.10977	0.03297	3.32938	0.01039	0.03374	0.18580
RESIDUAL OUTPUT
Observation	Predicted House Price	Residuals
1	251.9231625835	-6.9231625835
2	273.8767101495	38.1232898505
3	284.8534839325	-5.8534839325
4	304.0628380528	3.9371619472
5	218.9928412345	-19.9928412345
6	268.388323258	-49.388323258
7	356.2025135221	48.7974864779
8	367.1792873051	-43.1792873051
9	254.6673560293	64.3326439707
10	284.8534839325	-29.8534839325

Sheet4

1400	1400
1600	1600
1700	1700
1875	1875
1100	1100
1550	1550
2350	2350
2450	2450
1425	1425
1700	1700

House Price

Predicted House Price

Square Feet

House Price

Square Feet Line Fit Plot

245

312

279

308

199

219

405

324

319

255

Sheet1

House Price	Square Feet
245	1400
312	1600
279	1700
308	1875
199	1100
219	1550
405	2350
324	2450
319	1425
255	1700

Sheet1

0
0
0
0
0
0
0
0
0
0

House Price

Square Feet

House Price

Sheet2

Sheet3

feet)

(square

0.10977

98.24833

price

house

SSE

SSR

SST

)

SST

)

SSE

)

(

SSR

0.58082

32600.5000

18934.9348

SST

SSR

1.5

2.5

170017501800

var2

Real WageFitted values

Fitting

a Time Trend in real wages in Canada, 1688 to 1790

Dollars per day

185018601870188018901900

var1

var3Fitted values

Real Wages in Canada, 1850 to 1900