elem econ homework

shazuanzhe
Simpleregression.pptx

Simple regression

The simple linear regression model

The simple regression model (also called the bivariate regression model) is a (very unrealistic) model that purports that a dependent variable is a function of only one variable (and a constant term).

This is a little different than what we had before since there are no control variables.

While we are working with this model we will assume that it is “true”. That is, this exactly describes the relationship between Y and X. This is of course an unrealistic assumption for most real-world examples, but it provides a nice starting point for us to take a closer look at regression.

The error term

I am going to switch up notation here to emphasize that we are assuming that this is a “true” model. Here does not refer to the residual, but to the error term (or disturbance).

The error term is a random variable that captures all unobserved components that have an effect on our dependent variable.

We can without much fuss make a simplifying assumption:

E[e]=0

Moreover, we make another very important assumption, mean independence:

E[e|X]=E[e]

Technically we could have combined the two assumptions into one: E[e|X]=0, and that assumption would buy us E[e]=0 with the law of iterated expectations.

Mean independence… ceteris paribus

It’s worth it to unpack the mean-independence assumption and understand what it means…

If E[e|X]=E[e] then that means that the expected value of u does not change if we condition on X.

In other words all other factors that also affect our dependent variable are on average the same regardless of the value of X.

Let’s think back to our example of health insurance and health, where our treatment variable is binary. The mean independence assumption tells us that the average value of all other factors than health insurance should be the same between treatment and control groups.

This was exactly what we needed for a simple difference in group averages to buy us the average causal effect. This is no coincidence, as it turns out if we regress just health on health insurance we end up with a simple difference of group averages (more on that later).

Regression and CEFs

With our assumptions in hand we are ready to look at the regression equation as a CEF

This tells us that conditional on X, the expected value of Y is a linear function of X (and only X).

Assuming this model is the truth, we call the conditional expectation function above the population regression function (PRF).

Note that the PRF tells us how the average value of Y changes with X, not how Y itself changes with X.

In other words, if X increases by one unit, then on average Y increases by units. This relationship need not hold for every member of the relevant population, as long as the deviations from the PRF value are not systematic. In other words deviations of observed Y values from the PRF values are due solely to “noise”.

Solving the PRF

We assumed that E[e]=0. With this assumption combined with the mean-independence assumption, we also have E[eX]=0. These two equations actually determine the parameters α and β (don’t worry about the reason why).

We can solve for the parameters by solving the above system of equations.

From the first equation we have

Plugging this into the second equation we have:

Rearranging:

With some more algebraic manipulation we get

You don’t need to memorize the derivation, but for the curious in the last steps we use the fact that E[X-E[X]]=0

β and correlation

You may notice that the formula for β looks very similar to the formula for a correlation coefficient (except for the denominator). We can think of β as being a measure of how X and Y are linearly correlated. Some clear implications arise:

If X and Y are positively correlated, then β is positive.

If X and Y are negatively correlated, then β is negative.

If X and Y have no correlation (zero covariance), then β=0.

Estimation: thinking graphically

When working with simple regression, it is intuitive to think about estimating the parameters graphically.

We start with a scatter plot of data. A scatter plot places one variable in the y-axis (in our case the dependent variable) and another on the x-axis (treatment variable). Each point represents a single observation in the data with the y-value representing the value of the y variable for that observation and the x-value representing the value of the x variable.

In this plot, I am using data from a pre-loaded dataset in R with Motor Trend car information.

Y-axis: average miles per gallon of a car

X-axis: weight of the car (in 1000’s of lbs)

Estimation: thinking graphically

Given the assumption that the PRF is linear, we wish to estimate the parameters by drawing a straight line through the data points that comes as close as possible to each point

Note here that inclusion of the constant term, α, is extremely important. Otherwise we would be forced to draw a line that comes close to the points and displays the correct relationship that goes through the origin.

Estimation: thinking graphically

We choose where to draw our line by seeking to minimize the sum of squared residuals, so as to get the line as close as possible to all points.

The residual in the graphical context is the vertical distance between the actual observed value of mpg for an observation and the fitted value from the regression line.

Why are we minimizing a sum of squared residuals?

Some residuals are negative, so we want to treat the residuals as positive.

Squaring the residuals places a hefty penalty on large deviations from the regression line

residual

Thinking about scatterplots

It is also useful in general to examine scatterplots in simple regressions to determine if it is reasonable to assume that the PRF is correctly specified and linear.

In our example, there is a clear negative relationship that looks like it may plausibly be linear.

However sometimes the scatter plot may look more like a cluster%#@! as in the comic to the right. In these cases we should be skeptical that we are estimating a “true” model.

Further, there could be a clear relationship that is non-linear. This would tell us that OLS will get us a linear approximation of the true PRF, but it will nevertheless be a biased estimate.

Estimation: algebraically

We won’t work through mathematical derivations for the formulas for the estimates of the simple regression parameters. Instead I will casually note that whenever we have a good idea as to what the population parameter may be we can often replace expected values in the parameter with sample averages to get the corresponding estimate.

Note that this buys us that

, biased or unbiased?

We can show that given our assumptions on the error term, e, our estimates are unbiased (recall what it means for an estimate to be unbiased).

Plugging in :

Rearranging and reducing:

Using the law of iterated expectations: :

And recalling that functions of X can be pulled out of the CEF: