exam and project

tejvallabh
Assignment1.docx

DSBA/MBAD 6211 Assignment 1

Due: 11:59pm @ 2/18/2021

In the fall of 2019, the administration of a large private university requested that the Office of Enrollment Management and the Office of Institutional Research work together to identify prospective students who would most likely enroll as new freshmen in the Fall 2020 semester. Historically, inquiries numbered about 90,000+ students, and the university enrolled from 2400 to 2800 new freshmen each Fall semester. It was decided that inquiries for Fall 2019 would be used to build the model to help shape the Fall 2020 freshman class. The data set INQ2019 was built over a period of a several months in consultation with Enrollment Management. Please carefully explore all variables and build a predictive model for better enrollment management. Please apply regression and decision tree models to analyze the data.

· Variable and model naming requirements:

· Please include your name initials to the data frame names as well as model names in your R coding.

· Please instance, in my coding, I would name the data frames as dfKZ, dfKZ.train , and dfKZ.valid. I would also name the models as regressionKZ, treeKZ , etc.

Please submit a Word document including:

1. A table showing the overall structure of the dataset, including variable names, data types, and whether the variables will be used in your analyses. Also, please answer questions c, d, e.

a. The nominal variables ACADEMIC_INTEREST_1, ACADEMIC_INTEREST_2, and IRSCHOOL were rejected because they were replaced by the interval variables INT1RAT, INT2RAT, and HSCRAT, respectively. For example, academic interest codes 1 and 2 were replaced by the percentage of inquirers over the past five years who indicated those interest codes and then enrolled. The variable IRSCHOOL is the high school code of the student, and it was replaced by the percentage of inquirers from that high school over the last five years who enrolled.

b. CONTACT_CODE1 and CONTACT_DATE1 are also rejected due to their irrelevance suggested by Enrollment Management.

c. Should your model reject any other variables for your analyses? If so, please explain reasons for each additionally rejected variable.

d. Which variable is your target variable?

e. Do you need to change data types or measurement levels of your existing variables (e.g., binary, numeric, factor)? Why?

2. Explain whether variable imputation and transformation are needed for the regression model. If so, please explain which variables have been imputed, transformed and how.

3. Please provide the following results for each model:

a. Model result summary for the regression model (e.g., coefficients, significance levels)

b. Tree plot for the decision tree model

4. Which model will you choose? Why? Please provide support for your answer.

5. Based on the selected model, please explain and summarize your major findings to the director of the Office of Enrollment Management.

6. Copy and paste your R codes at the end of the documents.

Name

Description

ACADEMIC_INTEREST_1

Primary academic interest code

ACADEMIC_INTEREST_2

Secondary academic interest code

CAMPUS_VISIT

Campus visit code

CONTACT_CODE1

First contact code

CONTACT_DATE1

First contact date

ETHNICITY

Ethnicity

ENROLL

1=Enrolled F2014, 0=Not enrolled F2014

IRSCHOOL

High school code

INSTATE

1=In state, 0=Out of state

LEVEL_YEAR

Student academic level

REFERRAL_CNTCTS

Referral contact count

SELF_INIT_CNTCTS

Self-initiated contact count

SOLICITED_CNTCTS

Solicited contact count

TERRITORY

Recruitment area

TOTAL_CONTACTS

Total contact count

TRAVEL_INIT_CNTCTS

Travel initiated contact count

AVG_INCOME

Commercial HH income estimate

DISTANCE

Distance from university

HSCRAT

5-year high school enrollment rate

INIT_SPAN

Time from first contact to enrollment date

INT1RAT

5-year primary interest code rate

INT2RAT

5-year secondary interest code rate

INTEREST

Number of indicated extracurricular interests

MAILQ

Mail qualifying score (1=very interested)

PREMIERE

1=Attended campus recruitment event, 0=Did not

SATSCORE

SAT (original) score

SEX

Sex

STUCELL

1=Have a cell phone, 0=Do not

TELECQ

Telecounciling qualifying score (1=very interested)

1