At the end of this document, I have included some SAS code for multiple linear regression. Your task for this data analysis assignment is to come up with a good model for housing prices using the data “KutnerRealEstate.txt” provided on Canvas. Note that there is more than one good model. It is not so important that your model is the same one that I obtained as it is that you justify your choice of the model. A good model has the following characteristics:
1. All assumptions are met, at least approximately.
2. All explanatory variables in the model are statistically significant.
3. VIFs for the model are “small”, indicating that no two predictor variables give redundant information.
4. There are as few predictors in the model as possible.
Now, here’s the hard part. Run the following code for these data and any other code you need to obtain a model meeting the above conditions. The deliverable is
1. A formal write up (no colloquial language, third person pronouns only), in either .doc, .docx, or .pdf. that is no longer than three pages including graphs. The write up should describe the final model, justification for that particular model, an interpretation of the model, and how you determined your final model. You may use information from SAS output, but you may not include tables or code from SAS. You may include a MAXIMUM of three graphs (the graph matrix that SAS prints out when Proc Reg is run counts as NINE graphs). Choose wisely. This must be a paper, not a list of answers to the points I asked you to include.
2. A text file containing your code. Note that .sas files will not be accepted. It must be a .txt file. The code within the text file must be properly commented. Give only the code that you used to determine the final model. In other words, do not include code that does not run or code that is not pertinent to the final result.
3. You are not limited to linear regression. If you decide that a robust regression or a weighted regression leads to the best model, you may use one, provided that you can interpret the results appropriately. Don’t get fancy just for the sake getting fancy.
4. You will do better on this project if you write up your results at least two days before submission. Let the paper sit for a day, and then look at it again with fresh eyes to make sure that the language is precise and accurate. Even better, have a classmate or the TA look it over for any glaring errors. I will not look at your write up before you submit it.
5. For an extra five points, explain the purpose and output for the “Proc rank” and “proc gplot” procedures at the end of the code. Why is this code necessary? This explanation does not count in the page count for the write up.
data houses;
infile ‘c:\Pathname\KutnerRealEstate.txt' firstobs=1;
input id price sqft bedrms baths ac garag pool year qual style lotsz hway;
logprice = log(price);
proc means mean stddev min max n;
var logprice sqft bedrms baths ac garag pool year qual style lotsz hway;
* *;
* next few lines are used to see effect of categorical variable "ac" *:
* *;
proc sort;
by ac;
proc means mean stddev min max n;
var logprice price;
by ac;
* *;
* plots to explore relationships *;
* *;
proc gplot;
plot price*sqft logprice*sqft logprice*bedrms;
proc reg;
model logprice = sqft bedrms baths ac garag pool year qual style lotsz hway / p r clm cli influence;
output out = resids p = yhat r = resid student = studres cookd = cooks h = lev rstudent = extstud ;
proc rank normal=blom data = resids out = norm;
var resid;
ranks nrm;
proc gplot data=norm;
plot studres*nrm studres*yhat;
run;