python
Binomial variable – free throws
Refer back to HW#1 functions all_pdfs and cdf. We can graph the pdf function of this variable:
Bernoulli= [0,1]
p_Bernoulli= [0.6,0.4]
Bernoulli_dict={'outcome': Bernoulli, 'prob':p_Bernoulli}
Bernoulli_df=pd.DataFrame(data=Bernoulli_dict,index=('miss', 'make’))
def choose(x,n):
return math.factorial(n)/(math.factorial(x)*math.factorial(n-x))
def pdf(x,n,df):
return choose(x,n)*(df.loc['make']['prob']**x)*(df.loc['miss']['prob']**(n-x))
def all_pdfs(n,df):
PDF=[]
for i in range(n+1):
PDF.append(pdf(i,n,df))
PDF=[round(num,6) for num in PDF]
return PDF
P=all_pdfs(20,Bernoulli_df)
import matplotlib.pyplot as plt
plt.plot(P)
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Binomial variable – free throws
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Binomial variable – free throws
On the graph – the x axis is free throws made (out of 20) and the y axis gives the probability for each outcome. Note that it peaks – the probability is highest – at 8 free throws made.
This is the expected value, as expected value of binomial = np.
We can also plot the cdf – which will give us the cumulative probabilities. So at 8 free throws made, for example, the cdf will tell us the probability the player makes from 0 to 8 free throws.
def cdf(p):
C=[]
C.append(p[0])
for i in range(len(p)-1):
C.append(p[i+1]+C[i])
C=[round(num,4) for num in C]
return C
CDF=cdf(P)
plt.plot(CDF)
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Testing
Let’s assume that a player has shot three-pointers at a 40% percentage for a number of years.
This player however spent the off-season training with Steph Curry, arguably the best shooter of all time.
Through the first 20 games of the season, this player has shot 200 three pointers and make 96 shots.
We want to test : did training with Steph make the player a better shooter?
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Testing
Player has made 96 out of 200 -- but shooting is random variable – we can’t just say – player has gotten better. So – we test:
TEST: H0 : Our “null”: Player has not gotten better:
p=three-point percentage=40%.
H1 : Our alternative: Player HAS gotten better!
p=three-point percentage>40%.
Note – this is a one-sided test.
NOTE: We are viewing a sample as 200 Bernoulli variables!
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
First – Two-sided Confidence Interval
We will create two. A 95% and 99% confidence interval.
The 95% interval we create is such that there is 95% chance the “true” probability is contained within our interval.
The 99% interval has a 99% chance of containing the “true” probability.
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Confidence Interval
Our two-sided confidence intervals are:
za/2 * standard error OR
za/2 * sample standard deviation/
What is sample standard deviation here?
We are testing a Bernoulli variable here. Standard Dev=
The standard deviation of the variable is
Our standard error then is 0.50/
0.48 1.96 * 95%
0.48 0.0692 95%
0.48 2.58 * 99%
0.48 0.0911 99%
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Confidence Interval
Our one-sided confidence intervals are:
za * standard error OR
-za * sample standard deviation/
What is sample standard deviation here?
We are testing a Bernoulli variable here. Standard Dev=
The standard deviation of the variable is
Our standard error then is 0.50/
0.48 -1.645 * 95%
0.48 - 0.058 95%
0.48- 2.33 * 99%
0.48 - 0.082 99%
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Intervals
We can reject our null hypothesis at the 95% level but not the 99% level using the one-sided confidence interval.
Using python:
Our t-statistic:
SE=(0.48*0.52/200)**.5
SE
Tstat=(0.48-0.40)/SE
Tstat
Our p-value:
from scipy import stats
stats.binom_test(96,200,p=0.4,alternative="greater")
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Testing – Bernoulli vs Binomial
Our test was that of a bernoulli variable. We had 200 observations of it.
We can also view it as that of a binomial variable – you will redo this test on the HW viewing it as such.
_______________________________________________________________________
Let’s say we had 100 players attend Steph’s new “Shooting Academy.” Before the academy the players shot 40% from three-point range.
After 500 shots, the 100 players have made:
MADE=np.random.binomial(500,0.45,size=100)
plt.hist(MADE)
Does the Academy make you a better shooter?
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Testing – Continuous Variable
We want to examine whether the mention of a stock in certain social media circles precedes a period of outperformance.
Over the past six months, 15 equities have been mentioned. We wish to test whether there is outperformance is greater than 0.
H0: There is no outperformance.
The outperformance of the 15 stocks for the month following mention is:
outperformance=np.random.normal(loc=0.01,scale=0.05,size=(15,1))
outperformance
import matplotlib.pyplot as plt
import seaborn as sns
sns.displot(outperformance)
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Confidence Interval (one-sided)
What is your sample standard deviation?
What is your standard error?
What is your confidence interval if :
t14,a=0.05=2.15
Check with:
test_out=stats.ttest_1samp(outperformance,0, axis=0)
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Comparison: t vs. normal distribution
import numpy as np
from scipy.stats import t
import matplotlib.pyplot as plt
%matplotlib inline
rv = t(df=10, loc=0, scale=1)
x = np.linspace(rv.ppf(0.01), rv.ppf(0.99), 100)
y = rv.pdf(x)
rv2=t(df=100000,loc=0,scale=1)
#Use t here with high degree of freedom as proxy for normal distribution
xx = np.linspace(rv2.ppf(0.01), rv2.ppf(0.99), 100)
yy = rv2.pdf(x)
#plt.xlim(-5,5)
plt.plot(x,y)
plt.plot(xx,yy)
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
14
The Simple Regression Model
Chapter 2
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
14
15
Definition of the simple regression model
“Explains variable y in terms of variable x”
The Simple Regression Model (1 of 39)
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
16
The simple linear regression model is rarely applicable in practice but its discussion is useful for pedagogical reasons.
Interpretation of the simple linear regression model
Explains how y varies with changes in x
The Simple Regression Model (2 of 39)
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
16
17
Example: A simple wage equation
Example: Soybean yield and fertilizer
The Simple Regression Model (3 of 39)
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
17
18
Example: wage equation
When is there a causal interpretation?
Conditional mean independence assumption
The Simple Regression Model (4 of 39)
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
19
This means that the average value of the dependent variable can be expressed as a linear function of the explanatory variable.
Population regression function (PFR)
The conditional mean independence assumption implies that
The Simple Regression Model (5 of 39)
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
20
The Simple Regression Model (6 of 39)
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
21
Deriving the ordinary least squares estimates
In order to estimate the regression model one needs data
A random sample of n observations
The Simple Regression Model (7 of 39)
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
22
OLS estimators
Minimize the sum of the squared regression residuals
Deriving the ordinary least squares (OLS) estimators
Defining regression residuals
The Simple Regression Model (8 of 39)
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
23
OLS fits as good as possible a regression line through the data points
The Simple Regression Model (9 of 39)
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
23
CEO Salary regression
Access data
pip install wooldridge
import wooldridge as woo
Import numpy as np
ceosal1=woo.dataWoo(ceosal1)
x=ceosal1[‘roe’]
y=ceosal1[‘salary’]
cov_xy=np.cov(x,y)[1,0]
var_x=np.cov(x,y)[0,0]
b1=cov_xy/var_x
bo=np.mean(y)-b1*np.mean(x)
24
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
25
Causal interpretation?
Fitted regression
Example of a simple regression
CEO salary and return on equity
The Simple Regression Model (10 of 39)
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
CEO Salary regression
import statsmodels.formula.api as smf
reg=smf.ols(formula=‘salary ~ roe’,data=ceosal1)
results=reg.fit()
b=results.params
plt.plot('roe','salary', data=ceosal1,color='grey',marker='o',linestyle='')
plt.plot(ceosal1['roe'], results.fittedvalues, color='black', linestyle='-')
plt.ylabel('salary')
plt.xlabel('roe')
plt.savefig('Example-2-3-3.pdf')
26
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
27
The Simple Regression Model (11 of 39)
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
28
Causal interpretation?
Fitted regression
Example of a simple regression
Wage and education
The Simple Regression Model (12 of 39)
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Wage and Education
wage1=woo.dataWoo('wage1')
reg=smf.ols(formula='wage~educ',data=wage1)
results=reg.fit()
b=results.params
b
29
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Voting and Campaign Expenditures
vote1=woo.dataWoo('vote1')
reg=smf.ols(formula='voteA~shareA',data=vote1)
results=reg.fit()
b=results.params
b
30
plt.plot('shareA','voteA',data=vote1,color='grey',marker='o',linestyle='')
plt.plot(vote1['shareA'],results.fittedvalues, color='black',linestyle='-')
plt.ylabel('voteA')
plt.xlabel('shareA')
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
31
Causal interpretation?
Fitted regression
Example of a simple regression
Voting outcomes and campaign expenditures (two parties)
The Simple Regression Model (13 of 39)
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
31
32
Algebraic properties of OLS regression
Properties of OLS on any sample of data
Fitted values and residuals
The Simple Regression Model (14 of 39)
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
33
This table presents fitted values and residuals for 15 CEOs.
For example, the 12th CEO’s predicted salary is $526,023 higher than their actual salary.
By contrast the 5th CEO’s predicted salary is $149,493 lower than their actual salary.
| obsno | roe | salary | salaryhat | uhat |
| 1 | 14.1 | 1095 | 1224.058 | -129.058 |
| 2 | 10.9 | 1001 | 1164.854 | -163.854 |
| 3 | 23.5 | 1122 | 1397.960 | -275.969 |
| 4 | 5.9 | 578 | 1072.348 | -494.348 |
| 5 | 13.8 | 1368 | 1218.508 | 149.493 |
| 6 | 20.0 | 1145 | 1333.215 | -188.215 |
| 7 | 16.4 | 1078 | 1266.611 | 188.611 |
| 8 | 16.3 | 1094 | 1264.761 | -170.761 |
| 9 | 10.5 | 1237 | 1157.454 | 79.546 |
| 10 | 26.3 | 833 | 1449.773 | -616.773 |
| 11 | 25.9 | 567 | 1442.372 | -875.372 |
| 12 | 26.8 | 933 | 1459.023 | -526.023 |
| 13 | 14.8 | 1339 | 1237.009 | 101.991 |
| 14 | 22.3 | 937 | 1375.768 | -438.768 |
| 15 | 56.3 | 2011 | 2004.808 | 6.192 |
The Simple Regression Model (15 of 39)
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
33
Fitted Values
reg=smf.ols(formula='salary ~ roe', data=ceosal1)
results=reg.fit()
b=results.params
salary_hat=b[0]+b[1]*ceosal1['roe']
u_hat=ceosal1['salary']-y_hat
34
salary_hat2=results.fittedvalues
u_hat2=results.resid
import pandas as pd
table=pd.DataFrame({'roe':ceosal1['roe'],
'salary':ceosal1['salary'],
'salary_hat':salary_hat2,
'u_hat': u_hat2})
table
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Confirm Properties
np.mean(u_hat2)
covmat=np.cov(u_hat2,ceosal1['roe’])
np.mean(salary_hat2)-np.mean(ceosal1['salary'])
35
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
36
Goodness of fit
How well does an explanatory variable explain the dependent variable?
Measures of variation:
The Simple Regression Model (16 of 39)
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
37
Goodness-of-fit measure (R-squared)
Decomposition of total variation
The Simple Regression Model (17 of 39)
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
38
Caution: A high R-squared does not necessarily mean that the regression has a causal interpretation!
Voting outcomes and campaign expenditures
CEO Salary and return on equity
The Simple Regression Model (18 of 39)
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Goodness of Fit
SST=np.var(ceosal1['salary'],ddof=1)
SSE=np.var(salary_hat2 ,ddof=1)
SSR=np.var(u_hat2,ddof=1)
R2=SSE/SST
R2
39
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
40
This changes the interpretation of the regression coefficient:
Incorporating nonlinearities: Semi-logarithmic form
Regression of log wages on years of education
The Simple Regression Model (19 of 39)
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
41
Fitted regression
The Simple Regression Model (20 of 39)
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
42
This changes the interpretation of the regression coefficient:
Incorporating nonlinearities: Log-logarithmic form
CEO salary and firm sales
The Simple Regression Model (21 of 39)
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
43
The log-log form postulates a constant elasticity model, whereas the semi-log form assumes a semi-elasticity model.
CEO salary and firm sales: fitted regression
The Simple Regression Model (22 of 39)
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
43
Nonlinear models
44
##Semi-log
reg=smf.ols(formula='np.log(salary) ~sales',data=ceosal1)
results=reg.fit()
b=results.params
b
#log-log
reg=smf.ols(formula='np.log(salary) ~ np.log(sales)',data=ceosal1)
results=reg.fit()
b=results.params
b
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
45
The question is what the estimators will estimate on average and how large will their variability be in repeated samples
Expected values and variances of the OLS estimators
The estimated regression coefficients are random variables because they are calculated from a random sample
The Simple Regression Model (23 of 39)
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
46
Assumption SLR.2 (Random sampling)
Standard assumptions for the linear regression model
Assumption SLR.1 (Linear in parameters)
The Simple Regression Model (24 of 39)
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
47
Discussion of random sampling: Wage and education
The population consists, for example, of all workers of country A
In the population, there is a linear relationship between wages (or log wages) and years of education.
Draw completely randomly a worker from the population
The wage and the years of education of the worker drawn are random because one does not know beforehand which worker is drawn.
Throw that worker back into the population and repeat the random draw n times.
The wages and years of education of the sampled workers are used to estimate the linear relationship between wages and education.
The Simple Regression Model (25 of 39)
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
48
The Simple Regression Model (26 of 39)
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
49
Assumption SLR.4 (Zero conditional mean)
Assumptions for the linear regression model (cont.)
Assumption SLR.3 (Sample variation in the explanatory variable)
The Simple Regression Model (27 of 39)
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
50
Interpretation of unbiasedness
The estimated coefficients may be smaller or larger, depending on the sample that is the result of a random draw.
However, on average, they will be equal to the values that characterize the true relationship between y and x in the population.
“On average” means if sampling was repeated, i.e. if drawing the random sample and doing the estimation was repeated many times.
In a given sample, estimates may differ considerably from true values.
Theorem 2.1 (Unbiasedness of OLS)
The Simple Regression Model (28 of 39)
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
51
Assumption SLR.5 (Homoskedasticity)
Variances of the OLS estimators
Depending on the sample, the estimates will be nearer or farther away from the true population values.
How far can we expect our estimates to be away from the true population values on average (= sampling variability)?
Sampling variability is measured by the estimator‘s variances
The Simple Regression Model (29 of 39)
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
52
Graphical illustration of homoskedasticity
The Simple Regression Model (30 of 39)
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
53
An example for heteroskedasticity: Wage and education
The Simple Regression Model (31 of 39)
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
54
Conclusion:
The sampling variability of the estimated regression coefficients will be the higher, the larger the variability of the unobserved factors, and the lower, the higher the variation in the explanatory variable.
Theorem 2.2 (Variances of the OLS estimators)
Under assumptions SLR.1 – SLR.5:
The Simple Regression Model (32 of 39)
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
55
Estimating the error variance
The Simple Regression Model (33 of 39)
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
56
The estimated standard deviations of the regression coefficients are called “standard errors.” They measure how precisely the regression coefficients are estimated.
Calculation of standard errors for regression coefficients
Theorem 2.3 (Unbiasedness of the error variance)
The Simple Regression Model (34 of 39)
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
57
Note that the statistical properties of OLS are no different when x is binary
This regression allows the mean value of y to differ depending on the state of x
Regression on a binary explanatory variable
Suppose that x is either equal to 0 or 1
The Simple Regression Model (35 of 39)
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
58
Note that we will never actually observe this since we either observe yi(1) or yi(0) for a given i, but never both.
Let the average treatment effect be defined as:
Counterfactual outcomes, causality and policy analysis
In policy analysis, define a treatment effect as:
The Simple Regression Model (36 of 39)
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
59
Therefore, regressing y on x will give us an estimate of the (constant) treatment effect.
As long as we have random assignment, OLS will yield an unbiased estimator for the treatment effect τ.
This can be written as:
Counterfactual outcomes, causality and policy analysis (contd.)
Let xi be a binary policy variable.
The Simple Regression Model (37 of 39)
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
60
Random assignment
Subjects are randomly assigned into treatment and control groups such that there are no systematic differences between the two groups other than the treatment.
In practice, randomized control trials (RCTs) are expensive to implement and may raise ethical issues.
Though RCTs are often not feasible in economics, it is useful to think about the kind of experiment you would run if random assignment was a possibility. This helps in identifying the potential impediments to random assignment (that we could conceivable control for in a multivariate regression).
The Simple Regression Model (38 of 39)
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
61
Those who participated in the training program have earnings $1,790 higher than those who did not participate.
This represents a 39.3% increase over the $4,550 average earnings from those who did not participate.
Example: The effects of a job training program on earnings
Real earnings are regressed on a binary variable indicating participation in a job training program.
The Simple Regression Model (39 of 39)
Introductory Econometrics: A Modern Approach (7e)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.