exam and project
DSBA/MBAD 6211 Assignment 1
Due: 11:59pm @ 2/18/2021
In the fall of 2019, the administration of a large private university requested that the Office of Enrollment Management and the Office of Institutional Research work together to identify prospective students who would most likely enroll as new freshmen in the Fall 2020 semester. Historically, inquiries numbered about 90,000+ students, and the university enrolled from 2400 to 2800 new freshmen each Fall semester. It was decided that inquiries for Fall 2019 would be used to build the model to help shape the Fall 2020 freshman class. The data set INQ2019 was built over a period of a several months in consultation with Enrollment Management. Please carefully explore all variables and build a predictive model for better enrollment management. Please apply regression and decision tree models to analyze the data.
· Variable and model naming requirements:
· Please include your name initials to the data frame names as well as model names in your R coding.
· Please instance, in my coding, I would name the data frames as dfKZ, dfKZ.train , and dfKZ.valid. I would also name the models as regressionKZ, treeKZ , etc.
Please submit a Word document including:
1. A table showing the overall structure of the dataset, including variable names, data types, and whether the variables will be used in your analyses. Also, please answer questions c, d, e.
a. The nominal variables ACADEMIC_INTEREST_1, ACADEMIC_INTEREST_2, and IRSCHOOL were rejected because they were replaced by the interval variables INT1RAT, INT2RAT, and HSCRAT, respectively. For example, academic interest codes 1 and 2 were replaced by the percentage of inquirers over the past five years who indicated those interest codes and then enrolled. The variable IRSCHOOL is the high school code of the student, and it was replaced by the percentage of inquirers from that high school over the last five years who enrolled.
b. CONTACT_CODE1 and CONTACT_DATE1 are also rejected due to their irrelevance suggested by Enrollment Management.
c. Should your model reject any other variables for your analyses? If so, please explain reasons for each additionally rejected variable.
d. Which variable is your target variable?
e. Do you need to change data types or measurement levels of your existing variables (e.g., binary, numeric, factor)? Why?
2. Explain whether variable imputation and transformation are needed for the regression model. If so, please explain which variables have been imputed, transformed and how.
3. Please provide the following results for each model:
a. Model result summary for the regression model (e.g., coefficients, significance levels)
b. Tree plot for the decision tree model
4. Which model will you choose? Why? Please provide support for your answer.
5. Based on the selected model, please explain and summarize your major findings to the director of the Office of Enrollment Management.
6. Copy and paste your R codes at the end of the documents.
|
Name |
Description |
|
ACADEMIC_INTEREST_1 |
Primary academic interest code |
|
ACADEMIC_INTEREST_2 |
Secondary academic interest code |
|
CAMPUS_VISIT |
Campus visit code |
|
CONTACT_CODE1 |
First contact code |
|
CONTACT_DATE1 |
First contact date |
|
ETHNICITY |
Ethnicity |
|
ENROLL |
1=Enrolled F2014, 0=Not enrolled F2014 |
|
IRSCHOOL |
High school code |
|
INSTATE |
1=In state, 0=Out of state |
|
LEVEL_YEAR |
Student academic level |
|
REFERRAL_CNTCTS |
Referral contact count |
|
SELF_INIT_CNTCTS |
Self-initiated contact count |
|
SOLICITED_CNTCTS |
Solicited contact count |
|
TERRITORY |
Recruitment area |
|
TOTAL_CONTACTS |
Total contact count |
|
TRAVEL_INIT_CNTCTS |
Travel initiated contact count |
|
AVG_INCOME |
Commercial HH income estimate |
|
DISTANCE |
Distance from university |
|
HSCRAT |
5-year high school enrollment rate |
|
INIT_SPAN |
Time from first contact to enrollment date |
|
INT1RAT |
5-year primary interest code rate |
|
INT2RAT |
5-year secondary interest code rate |
|
INTEREST |
Number of indicated extracurricular interests |
|
MAILQ |
Mail qualifying score (1=very interested) |
|
PREMIERE |
1=Attended campus recruitment event, 0=Did not |
|
SATSCORE |
SAT (original) score |
|
SEX |
Sex |
|
STUCELL |
1=Have a cell phone, 0=Do not |
|
TELECQ |
Telecounciling qualifying score (1=very interested) |
1