JUMP software

profilesvalli19
HW2BostonHousing.docx

HW 2

Applied Business Statistics II

Due: November 9, 2020

This assignment is about one of the most popular datasets in statistics / analytics called the Boston Housing data. The dataset contains information collected by the U.S Census Bureau related to housing in the Boston area. It’s a small dataset with only 506 cases. The main purpose is to apply simple linear regression technique using MEDV as the dependent variable:

Here is the list of variables in the data:

Variables list:

1. CRIM - per capita crime rate by town

2. ZN - proportion of residential land zoned for lots over 25,000 sq.ft.

3. INDUS - proportion of non-retail business acres per town.

4. CHAS - Charles River dummy variable (1 if tract bounds river; 0 otherwise)

5. NOX - nitric oxides concentration (parts per 10 million)

6. RM - average number of rooms per dwelling

7. AGE - proportion of owner-occupied units built prior to 1940

8. DIS - weighted distances to five Boston employment centres

9. RAD - index of accessibility to radial highways

10. TAX - full-value property-tax rate per $10,000

11. PTRATIO - pupil-teacher ratio by town

12. B - 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town

13. LSTAT - % lower status of the population

14. MEDV - Median value of owner-occupied homes in $1000's

The instructions for this assignment are the same as last time. Remember to copy and paste the most important pieces of software output and describe in detail. You are required to write analysis report answering the following questions.

Use MEDV as the dependent variable and run FIVE separate simple linear regression models by choosing the most appropriate independent variables of your liking. Compare the models by describing the salient features within each model such as R^2, MSE, parameter estimate (beta), significance of the parameter, residuals etc. Also create plots to test the accuracy of the regression model. Lastly, make sure that the MEDV variable is normally distributed to be used as the dependent variable, and if not, then transform it to normal distribution.