econ sse 2
2nd Statistical computing assignment
1) The data in 401K are a subset of data analyzed by Papke (1995) to study the relationship between participation in a 401(k)-pension plan and the generosity of the plan. The variable is the percentage of eligible workers with an active account; this is the variable we would like to explain. The measure of generosity is the plan match rate, . This variable gives the average amount the firm contributes to each worker’s plan for each $1 contribution by the worker. For example, if , then a $1 contribution by the worker will be matched by a 50-cent contribution by the firm.
a. Find the average participation rate and the average match rate in the sample of plans
b. Now estimate the simple regression equation
And report the results along with the sample size and .
c. Interpret the intercept in your equation. Interpret the coefficient on .
d. Find the predicted when . Is this a reasonable prediction? Explain what is happening here.
e. How much of the variation in is explained by the variation in ? Is this a lot in your opinion?
2) The data et in CEOSAL2 contains information on chief executive officers for U.S. corporations. The variable is annual compensation, in thousands of dollars, and is prior number of years as company CEO.
a. Find the average salary and average tenure in the sample.
b. How many CEOs are in their very first year as CEO? What is the longest tenure as a CEO?
c. Estimate the simple regression model
And report your results in the usual form. What is the (approximate) predicted percentage increase in salary given one more year as a CEO?
3) Use the data in COUNTYMURDERS to answer the following parts. Use only the data for 1996.
a. How many counties had zero murders in 1996? How many counties had at least one execution? What is the largest number of executions?
b. Estimate the equation
By OLS and report the results in the usual way, including sample size and .
c. Interpret the slope coefficient in part (b). Does the estimated equation suggest a deterrent effect of capital punishment?
d. What is the smallest number of murders that can be predicted by the equation? What is the residual for a county with zero executions and zero murders?
e. Explain why a simple regression analysis is not well suited for determining whether capital punishment has a deterrent effect on murders.
4) The data set in CATHOLIC includes test score information on over 7000 students in the U.S. who were in eighth grade in 1988. The variables and are scores on twelfth grade standardized math and reading tests, respectively.
a. How many students are in the sample? Find the means and standard deviations of and .
b. Run the simple regression of on to obtain the OLS intercept and slope estimates, Report the results in the form
Where you fill in the values for and and also replace the question marks.
c. Does the intercept reported in (b) have a meaningful economic interpretation? Explain.
d. Are you surprised by the you found? What about ?
e. Suppose that you present your findings to a super intendent of a school district, and the superintendent says, “Your findings show that to improve math scores we just need to improve reading scores, so we should hire more reading tutors.” How would you respond to this comment?
5) A problem of interest to health officials is to determine the effects of smoking during pregnancy on infant health. One measure of infant health is birth weight; a birth weight that is too low can put an infant at risk for contracting various illnesses. Since factors other than cigarette smoking that affect birth weight are likely to be correlated with smoking, we should take those factors into account. An equation that recognizes this is
a) What is the most likely sign for ?
b) Do you think are likely to be correlated? Explain why this correlation might be positive or negative.
c) Now estimate the equation with and without , using the data in BWGHT. Report the results in equation form, including the sample size and R-squared. Discuss your results, focusing on whether adding substancially changes the estimated effect of .
6) Use the data in HPRICE1 to estimate the model
where is the house price measured in thousands of dollars.
a) Write out the results in equation form.
b) What is the estimated increase in price for a house with one more bedroom, ceteris paribus?
c) What is the estimated increase in price for a house with an additional bedroom that is 140 square feet in size? Compare this to your answer in (b)
d) What percentage of the variation in price is explained by the variation of the two independent variables?
e) The first house in the sample has and . Find the predicted selling price for this house from the OLS regression line.
f) The actual selling price of the first house in the sample was $300,000. Find the residual for this house. Does it suggest that the buyer underpaid or overpaid for the house?