python

nemo123

Feb_4_Class.pptx

Home >Business & Finance homework help >Accounting homework help >python

Binomial variable – free throws

Refer back to HW#1 functions all_pdfs and cdf. We can graph the pdf function of this variable:

Bernoulli= [0,1]

p_Bernoulli= [0.6,0.4]

Bernoulli_dict={'outcome': Bernoulli, 'prob':p_Bernoulli}

Bernoulli_df=pd.DataFrame(data=Bernoulli_dict,index=('miss', 'make’))

def choose(x,n):

return math.factorial(n)/(math.factorial(x)*math.factorial(n-x))

def pdf(x,n,df):

return choose(x,n)*(df.loc['make']['prob']**x)*(df.loc['miss']['prob']**(n-x))

def all_pdfs(n,df):

PDF=[]

for i in range(n+1):

PDF.append(pdf(i,n,df))

PDF=[round(num,6) for num in PDF]

return PDF

P=all_pdfs(20,Bernoulli_df)

import matplotlib.pyplot as plt

plt.plot(P)

Introductory Econometrics: A Modern Approach (7e)

© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.

Binomial variable – free throws

Introductory Econometrics: A Modern Approach (7e)

Binomial variable – free throws

On the graph – the x axis is free throws made (out of 20) and the y axis gives the probability for each outcome. Note that it peaks – the probability is highest – at 8 free throws made.

This is the expected value, as expected value of binomial = np.

We can also plot the cdf – which will give us the cumulative probabilities. So at 8 free throws made, for example, the cdf will tell us the probability the player makes from 0 to 8 free throws.

def cdf(p):

C=[]

C.append(p[0])

for i in range(len(p)-1):

C.append(p[i+1]+C[i])

C=[round(num,4) for num in C]

return C

CDF=cdf(P)

plt.plot(CDF)

Introductory Econometrics: A Modern Approach (7e)

Testing

Let’s assume that a player has shot three-pointers at a 40% percentage for a number of years.

This player however spent the off-season training with Steph Curry, arguably the best shooter of all time.

Through the first 20 games of the season, this player has shot 200 three pointers and make 96 shots.

We want to test : did training with Steph make the player a better shooter?

Introductory Econometrics: A Modern Approach (7e)

Testing

Player has made 96 out of 200 -- but shooting is random variable – we can’t just say – player has gotten better. So – we test:

TEST: H0 : Our “null”: Player has not gotten better:

p=three-point percentage=40%.

H1 : Our alternative: Player HAS gotten better!

p=three-point percentage>40%.

Note – this is a one-sided test.

NOTE: We are viewing a sample as 200 Bernoulli variables!

Introductory Econometrics: A Modern Approach (7e)

First – Two-sided Confidence Interval

We will create two. A 95% and 99% confidence interval.

The 95% interval we create is such that there is 95% chance the “true” probability is contained within our interval.

The 99% interval has a 99% chance of containing the “true” probability.

Introductory Econometrics: A Modern Approach (7e)

Confidence Interval

Our two-sided confidence intervals are:

 za/2 * standard error OR

 za/2 * sample standard deviation/

What is sample standard deviation here?

We are testing a Bernoulli variable here. Standard Dev=

The standard deviation of the variable is

Our standard error then is 0.50/

0.48  1.96 * 95%

0.48  0.0692 95%

0.48 2.58 * 99%

0.48  0.0911 99%

Introductory Econometrics: A Modern Approach (7e)

Confidence Interval

Our one-sided confidence intervals are:

za * standard error OR

-za * sample standard deviation/

What is sample standard deviation here?

We are testing a Bernoulli variable here. Standard Dev=

The standard deviation of the variable is

Our standard error then is 0.50/

0.48 -1.645 * 95%

0.48 - 0.058 95%

0.48- 2.33 * 99%

0.48 - 0.082 99%

Introductory Econometrics: A Modern Approach (7e)

Intervals

We can reject our null hypothesis at the 95% level but not the 99% level using the one-sided confidence interval.

Using python:

Our t-statistic:

SE=(0.48*0.52/200)**.5

Tstat=(0.48-0.40)/SE

Tstat

Our p-value:

from scipy import stats

stats.binom_test(96,200,p=0.4,alternative="greater")

Introductory Econometrics: A Modern Approach (7e)

Testing – Bernoulli vs Binomial

Our test was that of a bernoulli variable. We had 200 observations of it.

We can also view it as that of a binomial variable – you will redo this test on the HW viewing it as such.

_______________________________________________________________________

Let’s say we had 100 players attend Steph’s new “Shooting Academy.” Before the academy the players shot 40% from three-point range.

After 500 shots, the 100 players have made:

MADE=np.random.binomial(500,0.45,size=100)

plt.hist(MADE)

Does the Academy make you a better shooter?

Introductory Econometrics: A Modern Approach (7e)

Testing – Continuous Variable

We want to examine whether the mention of a stock in certain social media circles precedes a period of outperformance.

Over the past six months, 15 equities have been mentioned. We wish to test whether there is outperformance is greater than 0.

H0: There is no outperformance.

The outperformance of the 15 stocks for the month following mention is:

outperformance=np.random.normal(loc=0.01,scale=0.05,size=(15,1))

outperformance

import matplotlib.pyplot as plt

import seaborn as sns

sns.displot(outperformance)

Introductory Econometrics: A Modern Approach (7e)

Confidence Interval (one-sided)

What is your sample standard deviation?

What is your standard error?

What is your confidence interval if :

t14,a=0.05=2.15

Check with:

test_out=stats.ttest_1samp(outperformance,0, axis=0)

Introductory Econometrics: A Modern Approach (7e)

Comparison: t vs. normal distribution

import numpy as np

from scipy.stats import t

import matplotlib.pyplot as plt

%matplotlib inline

rv = t(df=10, loc=0, scale=1)

x = np.linspace(rv.ppf(0.01), rv.ppf(0.99), 100)

y = rv.pdf(x)

rv2=t(df=100000,loc=0,scale=1)

#Use t here with high degree of freedom as proxy for normal distribution

xx = np.linspace(rv2.ppf(0.01), rv2.ppf(0.99), 100)

yy = rv2.pdf(x)

#plt.xlim(-5,5)

plt.plot(x,y)

plt.plot(xx,yy)

Introductory Econometrics: A Modern Approach (7e)

The Simple Regression Model

Chapter 2

Introductory Econometrics: A Modern Approach (7e)

Definition of the simple regression model

“Explains variable y in terms of variable x”

The Simple Regression Model (1 of 39)

Introductory Econometrics: A Modern Approach (7e)

The simple linear regression model is rarely applicable in practice but its discussion is useful for pedagogical reasons.

Interpretation of the simple linear regression model

Explains how y varies with changes in x

The Simple Regression Model (2 of 39)

Introductory Econometrics: A Modern Approach (7e)

Example: A simple wage equation

Example: Soybean yield and fertilizer

The Simple Regression Model (3 of 39)

Introductory Econometrics: A Modern Approach (7e)

Example: wage equation

When is there a causal interpretation?

Conditional mean independence assumption

The Simple Regression Model (4 of 39)

Introductory Econometrics: A Modern Approach (7e)

This means that the average value of the dependent variable can be expressed as a linear function of the explanatory variable.

Population regression function (PFR)

The conditional mean independence assumption implies that

The Simple Regression Model (5 of 39)

Introductory Econometrics: A Modern Approach (7e)

The Simple Regression Model (6 of 39)

Introductory Econometrics: A Modern Approach (7e)

Deriving the ordinary least squares estimates

In order to estimate the regression model one needs data

A random sample of n observations

The Simple Regression Model (7 of 39)

Introductory Econometrics: A Modern Approach (7e)

OLS estimators

Minimize the sum of the squared regression residuals

Deriving the ordinary least squares (OLS) estimators

Defining regression residuals

The Simple Regression Model (8 of 39)

Introductory Econometrics: A Modern Approach (7e)

OLS fits as good as possible a regression line through the data points

The Simple Regression Model (9 of 39)

Introductory Econometrics: A Modern Approach (7e)

CEO Salary regression

Access data

pip install wooldridge

import wooldridge as woo

Import numpy as np

ceosal1=woo.dataWoo(ceosal1)

x=ceosal1[‘roe’]

y=ceosal1[‘salary’]

cov_xy=np.cov(x,y)[1,0]

var_x=np.cov(x,y)[0,0]

b1=cov_xy/var_x

bo=np.mean(y)-b1*np.mean(x)

Introductory Econometrics: A Modern Approach (7e)

Causal interpretation?

Fitted regression

Example of a simple regression

CEO salary and return on equity

The Simple Regression Model (10 of 39)

Introductory Econometrics: A Modern Approach (7e)

CEO Salary regression

import statsmodels.formula.api as smf

reg=smf.ols(formula=‘salary ~ roe’,data=ceosal1)

results=reg.fit()

b=results.params

plt.plot('roe','salary', data=ceosal1,color='grey',marker='o',linestyle='')

plt.plot(ceosal1['roe'], results.fittedvalues, color='black', linestyle='-')

plt.ylabel('salary')

plt.xlabel('roe')

plt.savefig('Example-2-3-3.pdf')

Introductory Econometrics: A Modern Approach (7e)

The Simple Regression Model (11 of 39)

Introductory Econometrics: A Modern Approach (7e)

Causal interpretation?

Fitted regression

Example of a simple regression

Wage and education

The Simple Regression Model (12 of 39)

Introductory Econometrics: A Modern Approach (7e)

Wage and Education

wage1=woo.dataWoo('wage1')

reg=smf.ols(formula='wage~educ',data=wage1)

results=reg.fit()

b=results.params

Introductory Econometrics: A Modern Approach (7e)

Voting and Campaign Expenditures

vote1=woo.dataWoo('vote1')

reg=smf.ols(formula='voteA~shareA',data=vote1)

results=reg.fit()

b=results.params

plt.plot('shareA','voteA',data=vote1,color='grey',marker='o',linestyle='')

plt.plot(vote1['shareA'],results.fittedvalues, color='black',linestyle='-')

plt.ylabel('voteA')

plt.xlabel('shareA')

Introductory Econometrics: A Modern Approach (7e)

Causal interpretation?

Fitted regression

Example of a simple regression

Voting outcomes and campaign expenditures (two parties)

The Simple Regression Model (13 of 39)

Introductory Econometrics: A Modern Approach (7e)

Algebraic properties of OLS regression

Properties of OLS on any sample of data

Fitted values and residuals

The Simple Regression Model (14 of 39)

Introductory Econometrics: A Modern Approach (7e)

This table presents fitted values and residuals for 15 CEOs.

For example, the 12th CEO’s predicted salary is $526,023 higher than their actual salary.

By contrast the 5th CEO’s predicted salary is $149,493 lower than their actual salary.

obsno	roe	salary	salaryhat	uhat
1	14.1	1095	1224.058	-129.058
2	10.9	1001	1164.854	-163.854
3	23.5	1122	1397.960	-275.969
4	5.9	578	1072.348	-494.348
5	13.8	1368	1218.508	149.493
6	20.0	1145	1333.215	-188.215
7	16.4	1078	1266.611	188.611
8	16.3	1094	1264.761	-170.761
9	10.5	1237	1157.454	79.546
10	26.3	833	1449.773	-616.773
11	25.9	567	1442.372	-875.372
12	26.8	933	1459.023	-526.023
13	14.8	1339	1237.009	101.991
14	22.3	937	1375.768	-438.768
15	56.3	2011	2004.808	6.192

The Simple Regression Model (15 of 39)

Introductory Econometrics: A Modern Approach (7e)

Fitted Values

reg=smf.ols(formula='salary ~ roe', data=ceosal1)

results=reg.fit()

b=results.params

salary_hat=b[0]+b[1]*ceosal1['roe']

u_hat=ceosal1['salary']-y_hat

salary_hat2=results.fittedvalues

u_hat2=results.resid

import pandas as pd

table=pd.DataFrame({'roe':ceosal1['roe'],

'salary':ceosal1['salary'],

'salary_hat':salary_hat2,

'u_hat': u_hat2})

table

Introductory Econometrics: A Modern Approach (7e)

Confirm Properties

np.mean(u_hat2)

covmat=np.cov(u_hat2,ceosal1['roe’])

np.mean(salary_hat2)-np.mean(ceosal1['salary'])

Introductory Econometrics: A Modern Approach (7e)

Goodness of fit

How well does an explanatory variable explain the dependent variable?

Measures of variation:

The Simple Regression Model (16 of 39)

Introductory Econometrics: A Modern Approach (7e)

Goodness-of-fit measure (R-squared)

Decomposition of total variation

The Simple Regression Model (17 of 39)

Introductory Econometrics: A Modern Approach (7e)

Caution: A high R-squared does not necessarily mean that the regression has a causal interpretation!

Voting outcomes and campaign expenditures

CEO Salary and return on equity

The Simple Regression Model (18 of 39)

Introductory Econometrics: A Modern Approach (7e)

Goodness of Fit

SST=np.var(ceosal1['salary'],ddof=1)

SSE=np.var(salary_hat2 ,ddof=1)

SSR=np.var(u_hat2,ddof=1)

R2=SSE/SST

Introductory Econometrics: A Modern Approach (7e)

This changes the interpretation of the regression coefficient:

Incorporating nonlinearities: Semi-logarithmic form

Regression of log wages on years of education

The Simple Regression Model (19 of 39)

Introductory Econometrics: A Modern Approach (7e)

Fitted regression

The Simple Regression Model (20 of 39)

Introductory Econometrics: A Modern Approach (7e)

This changes the interpretation of the regression coefficient:

Incorporating nonlinearities: Log-logarithmic form

CEO salary and firm sales

The Simple Regression Model (21 of 39)

Introductory Econometrics: A Modern Approach (7e)

The log-log form postulates a constant elasticity model, whereas the semi-log form assumes a semi-elasticity model.

CEO salary and firm sales: fitted regression

The Simple Regression Model (22 of 39)

Introductory Econometrics: A Modern Approach (7e)

Nonlinear models

##Semi-log

reg=smf.ols(formula='np.log(salary) ~sales',data=ceosal1)

results=reg.fit()

b=results.params

#log-log

reg=smf.ols(formula='np.log(salary) ~ np.log(sales)',data=ceosal1)

results=reg.fit()

b=results.params

Introductory Econometrics: A Modern Approach (7e)

The question is what the estimators will estimate on average and how large will their variability be in repeated samples

Expected values and variances of the OLS estimators

The estimated regression coefficients are random variables because they are calculated from a random sample

The Simple Regression Model (23 of 39)

Introductory Econometrics: A Modern Approach (7e)

Assumption SLR.2 (Random sampling)

Standard assumptions for the linear regression model

Assumption SLR.1 (Linear in parameters)

The Simple Regression Model (24 of 39)

Introductory Econometrics: A Modern Approach (7e)

Discussion of random sampling: Wage and education

The population consists, for example, of all workers of country A

In the population, there is a linear relationship between wages (or log wages) and years of education.

Draw completely randomly a worker from the population

The wage and the years of education of the worker drawn are random because one does not know beforehand which worker is drawn.

Throw that worker back into the population and repeat the random draw n times.

The wages and years of education of the sampled workers are used to estimate the linear relationship between wages and education.

The Simple Regression Model (25 of 39)

Introductory Econometrics: A Modern Approach (7e)

The Simple Regression Model (26 of 39)

Introductory Econometrics: A Modern Approach (7e)

Assumption SLR.4 (Zero conditional mean)

Assumptions for the linear regression model (cont.)

Assumption SLR.3 (Sample variation in the explanatory variable)

The Simple Regression Model (27 of 39)

Introductory Econometrics: A Modern Approach (7e)

Interpretation of unbiasedness

The estimated coefficients may be smaller or larger, depending on the sample that is the result of a random draw.

However, on average, they will be equal to the values that characterize the true relationship between y and x in the population.

“On average” means if sampling was repeated, i.e. if drawing the random sample and doing the estimation was repeated many times.

In a given sample, estimates may differ considerably from true values.

Theorem 2.1 (Unbiasedness of OLS)

The Simple Regression Model (28 of 39)

Introductory Econometrics: A Modern Approach (7e)

Assumption SLR.5 (Homoskedasticity)

Variances of the OLS estimators

Depending on the sample, the estimates will be nearer or farther away from the true population values.

How far can we expect our estimates to be away from the true population values on average (= sampling variability)?

Sampling variability is measured by the estimator‘s variances

The Simple Regression Model (29 of 39)

Introductory Econometrics: A Modern Approach (7e)

Graphical illustration of homoskedasticity

The Simple Regression Model (30 of 39)

Introductory Econometrics: A Modern Approach (7e)

An example for heteroskedasticity: Wage and education

The Simple Regression Model (31 of 39)

Introductory Econometrics: A Modern Approach (7e)

Conclusion:

The sampling variability of the estimated regression coefficients will be the higher, the larger the variability of the unobserved factors, and the lower, the higher the variation in the explanatory variable.

Theorem 2.2 (Variances of the OLS estimators)

Under assumptions SLR.1 – SLR.5:

The Simple Regression Model (32 of 39)

Introductory Econometrics: A Modern Approach (7e)

Estimating the error variance

The Simple Regression Model (33 of 39)

Introductory Econometrics: A Modern Approach (7e)

The estimated standard deviations of the regression coefficients are called “standard errors.” They measure how precisely the regression coefficients are estimated.

Calculation of standard errors for regression coefficients

Theorem 2.3 (Unbiasedness of the error variance)

The Simple Regression Model (34 of 39)

Introductory Econometrics: A Modern Approach (7e)

Note that the statistical properties of OLS are no different when x is binary

This regression allows the mean value of y to differ depending on the state of x

Regression on a binary explanatory variable

Suppose that x is either equal to 0 or 1

The Simple Regression Model (35 of 39)

Introductory Econometrics: A Modern Approach (7e)

Note that we will never actually observe this since we either observe yi(1) or yi(0) for a given i, but never both.

Let the average treatment effect be defined as:

Counterfactual outcomes, causality and policy analysis

In policy analysis, define a treatment effect as:

The Simple Regression Model (36 of 39)

Introductory Econometrics: A Modern Approach (7e)

Therefore, regressing y on x will give us an estimate of the (constant) treatment effect.

As long as we have random assignment, OLS will yield an unbiased estimator for the treatment effect τ.

This can be written as:

Counterfactual outcomes, causality and policy analysis (contd.)

Let xi be a binary policy variable.

The Simple Regression Model (37 of 39)

Introductory Econometrics: A Modern Approach (7e)

Random assignment

Subjects are randomly assigned into treatment and control groups such that there are no systematic differences between the two groups other than the treatment.

In practice, randomized control trials (RCTs) are expensive to implement and may raise ethical issues.

Though RCTs are often not feasible in economics, it is useful to think about the kind of experiment you would run if random assignment was a possibility. This helps in identifying the potential impediments to random assignment (that we could conceivable control for in a multivariate regression).

The Simple Regression Model (38 of 39)

Introductory Econometrics: A Modern Approach (7e)

Those who participated in the training program have earnings $1,790 higher than those who did not participate.

This represents a 39.3% increase over the $4,550 average earnings from those who did not participate.

Example: The effects of a job training program on earnings

Real earnings are regressed on a binary variable indicating participation in a job training program.

The Simple Regression Model (39 of 39)

Introductory Econometrics: A Modern Approach (7e)