statistic

profilegw96
Homework3.pdf

Rev. February 12, 2019

Homework #3

STAT 654, Spring 2019

Due: Monday, February 18, 2019

Note that there are several parts in this assignment that you do not have to turn in. These parts are clearly marked.

Total Points: 100

1. (40 Points) For this problem, use the data in Problem 2.15 of the text. Using SAS for the totality of this exercise (except for the plots, for which you can use another application), fit the model as specified in Problem 2.15(d) of the text. Note that you will need to create a new variable x2 for this part. Attach the SAS code and output in an appendix when you turn in your homework, but you should include your answers to the questions on a separate sheet with the rest of the homework.

(a) How would you interpret the model coefficients?

(b) Produce the plot of the actual and predicted values. The x-axis should be the number of surgical cases and the y-axis should be worker-hours per month.

(c) Produce the residual plot where the residuals are plotted against the fitted values

(d) Produce 90% confidence intervals for all model coefficients (use the ALPHA = option of the proc REG statement and the CLB option of the MODEL statement)

(e) Produce 95% confidence interval for the mean response (i.e., predicted value) of the first observation (use the LCLM and UCLM options of the OUTPUT statement)

(f) Another measure of goodness-of-fit that is used in evaluating model fits is the Mean Absolute Percent Error (MAPE). It is defined as:

MAPE = 1

n

n∑ i=1

∣∣∣∣eiyi ∣∣∣∣ × 100%,

where ei is the ith residual and yi is the value of the dependent variable for the ith observation. Calculate the MAPE.

(g) Comment on the quality of the model

(h) This is a general question and not specific to this problem. Comment on the appropriateness of using MAPE as a measure of goodness-of-fit. Under which circumstances using MAPE may be problematic?

(i) Do not turn in. Produce the Q-Q plot.

(j) Do not turn in. Fit the model as specified in Problem 2.15(b). Note that the dependent variable is the log of y. After you fit the model, convert the predicted values to the original units and using the latter, calculate MAPE and R2. How does this model compare with the model in Problem 2.15(d).

1

Rev. February 12, 2019

2. (20 Points) The researcher wants to fit a model where a certain relationship between the intercept and slope is assumed. Specifically, she considers the following model with the usual OLS assumptions: ϵi are uncorrelated random variables with mean zero and constant variance σ2.

yi = β + 2βxi + ϵi i = 1, 2, 3, . . . , n.

(a) Derive the least squares estimator for β (denote it by β̂). I.e., set up the function S(β) that you need to minimize and obtain the OLS estimator. Make sure to support your claims and steps.

(b) Compute E(β̂) and V (β̂).

(c) The researcher seeks to fit the model in Part (a) to the following data:

y x 3 0 10 1 14 2

Calculate β̂.

3. (20 Points) Let H = X(X′X)−1X′ ∈ Rn×n be the hat matrix. (a) Show that the diagonal elements of H satisfy the following conditions:

0 ≤ hii ≤ 1 i = 1, 2, . . . n.

(Hint: Write the expression for hii and remember that H is a symmetric matrix.)

(b) Do not turn in. For square matrices, the trace is defined as the sum of the diagonal elements. If A = (a)ij is a n × n matrix, then

tr(A) = n∑

i=1

aii.

Consider the standard linear regression model with n observations and p predictor variables and let H = X(X′X)−1X′ ∈ Rn×n be the hat matrix. Show that

tr(H) = p.

(Hint: Use the following result. For any two matrices A and B for which both AB and BA are defined, tr(AB) = tr(BA). Note that A and B do not have to be square matrices. For example, if A is 2 × 5 and B is 5 × 2, then AB is 2 × 2 and BA is 5 × 5 and the above identity holds.)

(c) What is the average value of the diagonal elements of H? Express it in terms of n and p. You can use the conclusion of Part b.

2

Rev. February 12, 2019

4. (10 Points) Let H = X(X′X)−1X′ ∈ Rn×n be the hat matrix and let I ∈ Rn×n be the identity matrix. Show that I − H is idempotent. I.e., show that it is symmetric and (I − H)2 = (I − H). You can use the result that H is idempotent as given.

5. (10 Points) A linear regression model is build to describe the weekly relationship be- tween the dependent measure y (pounds of ice cream sold) and two independent vari- ables: F (average weekly temperature in degrees Fahrenheit) and P (total weekly precipitation in mm). The estimated equation is as follows:

ŷ = 12000 + 100F − 250P.

How would the equation change if instead the following measures are used in the model: k (kilograms of ice cream sold), C (average weekly temperature in degrees Celsius) and I (total weekly precipitation in cm). You are asked to determine the three coefficients in the following linear regression:

k̂ = β0 + β1C + β2I.

6. Do not turn in. Consider the simple linear regression model:

yi = β0 + β1xi + ϵi i = 1, 2, . . . , n;

Where ϵi are uncorrelated random variables with zero-mean and constant variance σ 2

(again, this is the usual least squares setup). Let SST = Syy denote the total sum of squares. Show that

E(SST) = (n − 1)σ2 + β21Sxx.

7. Do not turn in. Classify the following matrices as positive/negative (semi)definite or indefinite. Fully support your answer. (a)

A =

  2 −1 0−1 2 −1

0 −1 2

 

(b)

B =

( 2 −4 −1 −3

) 8. Do not turn in. For this problem you are asked to calculate the least squares estimates

for a linear regression using the matrix formulas. You do not need to derive the formulas; you need to apply them to this problem.

The data includes five observations on the dependent variable y and the single predictor variable x. We seek to fit a model y = β0 + β1x + ϵ.

3

Rev. February 12, 2019

y x 9 6 14 8 5 2 6 4 8.5 5

(1)

(a) Define the design matrix X and the vector corresponding to the dependent variable y. If you need additional notation, introduce it.

(b) Using the formula for the least squares estimates, Calculate the vector of the model coefficients

β̂ =

( β̂0 β̂1

)

Make sure to apply the column reduction technique when calculating inverse(s) of matrices and show the steps.

(c) Directly check that the hat matrix is idempotent, i.e. H2 = H and H′ = H.

(d) Calculate the residuals using

e = (I − H)y

(e) Calculate the predicted values using

ŷ = Hy

(f) Calculate the residual sum of squares SSE and σ̂ using

SSE = e′e

σ̂ = SSE/(n − 2)

(g) Show that the variance of the ith residual is (1−hii)σ2, where hii is the ith diagonal element of the H matrix.

4