regression
Guidelines for Regression Project
This project is designed to help you gain experience and build skills in the diagnosis
and prediction of employee turnover. The dataset we will use (see the Excel file
named “Regression Project”) contains a wide variety of workforce data (employee
demographics, and attitudes) on approximately 1000 employees. The primary
dependent variables are "Attrition" and “Probability of Turnover.”
Your CEO wants to better understand the factors driving employee turnover, and she
has asked you to take the lead in conducting the analyses.
You should begin with data cleaning and range checks. Expect that there are
problems here, as with most any large dataset! Please address any problems that
you find and document any changes that you have made in your memo to me.
Then, move on to the basics (e.g., are departing employees older, younger, have
higher education levels, lower job satisfaction, etc.)? You should also seek to
determine whether or not there are any differences in attrition across departments
and if so, why.
Once you have outlined the basics, develop a multivariate regression model to
determine which factors appear to be the most important predictors of Probability of
Turnover and/or Turnover (be sure to use a logistic regression model if you focus on
the latter). Note that you have a lot of discretion in how you approach this problem,
so I am intentionally not providing step-by-step details on what you should do with
these projects. I want you to show me how you would approach the problem.
Please summarize your findings in a (maximum) five-page, double spaced memo to
the CEO. Any tables, figures, etc., can be placed in an Appendix to your memo.
Appendix
Variable Descriptions