PYTHON PROGAMMING

profileHussain2018
2ForHW02_email.docx

For HW02

For HW02, you need to reduce the train data to just include Single Family homes.  Here is the code to do that

train_sf = train[(train.bldgtype == '1Fam')]  Most of you caught that and made the change for HW01.  I purposely left that in HW01 to see if you would make that reduction in the data.

For HW02, make sure when you correct the train data that you also correct the test data.  If you correct NaN in the train data, include that variable in your model and fail to correct the test NaN the result will be a missing value for your prediction.  The assignment is to make predictions for all of the test data.  My score code will find this error and it will penalize your score.  The bigger issue for you is, in the future if you do this in your work and others find the error you will lose credibility and have a very hard time ever regaining credibility.

Linear regression is well known to make predictions that are out of range.  You have to look at predictions and make sure that never happens.  Several of you included predictions that were less than 0.  That should never happen.  You have to inspect your work before others do.

I have posted a recording plus ppt slides to help you with some of the cryptic reading in the text for hw02.  See it at 

Course: Recordings    http://nwuniversity.adobeconnect.com/p3p51ctu7giw/외부 사이트로 연결합니다.

2] If you are attempting to use the code for SelectKBest and are getting an error message it may be due to a sneaky nan.  To find this, double click on the dataframe you are using in the upper right box labeled variable explorer.  This will open the dataframe for viewing.  Then click at the top of each column to sort the data.  Any sneaky nan's will show up at the bottom of the file.

Also, the SelectKBest code will list the data in the columns that are the best without naming the variables.  To identify the variables it shows look at the dataframe in the variable explorer again and you will see which variables are identified.

3]

The assignment is to:

"Compute actual and estimated mean price per square foot for each neighborhood.   

Group the neighborhoods by actual price per square foot. Create between 3 and 6 groups. Code a family of indicator variables for the neighborhoods to include in your multiple regression model."  

If you have trouble coding this in Python, remember you can save the file as csv and do the editing in Excel.