python modeling

profilenieyanan
Projecthelp-Part2.docx

K513 Final Project Guidance – Part 2

Suggestions on how to tackle the final project part 2

Build and evaluate your model

You can choose to build the regression model or the classification model or both.

It is likely that you found most of the variables in the cleaned data don’t have strong correlation with your target variable. It does not mean that you should drop them all. The power of Machine Learning is its capability to squeeze out the predicting effect from multiple features even if the effect is small and combine the effect in the final model. Of course, you should make the model simple and only include features that contributes to the model. This means that you might need to build multiple models with different combinations of features and pick the one with the best performance. Luckily, it is quick and easy once you have the data all prepared. Keep all the features that might be useful in your dataframe but only include the ones you want to use in any particular model in the set of predictors (X). Keep track of the models, associated predictor sets, and model performance based on appropriate evaluation metrics. When building the models, try the following:

1. Try different models. We learned a number of models in both regression and classification. Try different ones to get a feel of how they perform.

2. Try different hyper parameter values to get the best model you can, which means to reduce overfitting and underfitting to the greatest extent.

3. If your have a model that fit just right with decent performance, you probably don’t need to explore further. But if you want to improve the model performance, you can go back and review your features and tinker your predictors. We don’t have anymore data to increase the data size so your can rule out that possibility.

4. When partitioning the data, use the default split of training and test (75 : 25) and set random_state = 0 so our models are under similar condition.

Build Regression Model

You can set Unit_Sold as your target variable for regression.

Build Classification Model

You can convert Unit_Sold into a binary variable or a multi-class categorical variable using the cut() function of Pandas. As long as you can justify your split points, any categorical variable generated would be fine.

What should be included in your Slides?

Overview of the Models (1 slide)

What type(s) of ML model you choose to build (regression vs. classification)? Why do you make this decision (the choice should support your business objective.)

Overview of model building and evaluation (A few slide)

· At the very high level, talk about the predictors chosen, hyper parameter selection.

· If multiple models have been tried to find the best one, briefly talk about how you get the best one.

· If different models are tried (for example, linear regression, ridge, KNN, etc.), introduce all of them.

· Evaluate model performance using one or more metrics. Explain the rationale behind the choice of metrics. Compare the models with the choose metrics. Provide your comments about model fit (overfit, underfit, just right?)

· If other steps have been taken to boost the model performance, talk about them. For example, polynomial features are included, etc.

Insights drawn from the EDA and Model

What hidden patterns or relationships have been revealed? If you are to advice sellers to list products on Wish, what would you tell them? Make sure only the important findings, insights and recommendations are in the main body of the PPT. Other slides could be in the appendix of the PPT.

Future directions

Obviously, you are limited by the data provided. If you have all the resources needed, what can you do to make a better model? Get more data? Get more variables? What additional variables you want to have to improve model performance, etc.

2 | Page