Feedback1

profileTtnsK8Kb
DataMiningPropsalDraft1.docx

Identifying Students Likely to Fail Their Math SOL

Predicting The Likelihood of a Student Failing Their Math SOL

Student learning is assessed at the end of the year on the Student Learning Objectives (SOL) test. A high-stakes test for both the student and the teacher. Students are required to pass SOLs for high school graduation and scores are used to evaluate teacher performance. The problem is identifying the students that are likely to fail the SOL early enough to be able to give them support throughout the year. The solution is a model that uses data on the students to determine what groups of students are the most likely to fail the SOL so we can begin supporting those students on day one and get them involved in programs that show an increased likelihood of passing the SOL.

This problem can be solved with predictive modeling because there are behaviors (attributes) that can be categorized to determine if a student is likely to fail or pass the SOL. Being able to predict if a student will fail the SOL before the start of school will allow teachers, parents, and administrators to put in place support for those specific students and help change their predicted outcomes. For example, if the model shows that a student in a low socioeconomic class that does not participate in after-school activities is likely to fail, we can work with parents and students to get them into after-school programs. We cannot change the economic challenges the student faces, but with the correct model, we can identify attributes that we can change to make a student more successful.

The data mining will be supervised and the best model for the data is a logistic regression with a target variable of did the student fail the math SOL. I picked a logistic regression because it will be valuable to know the probability of a student failing. As a team, we will have to determine the threshold for the probability a student has of failing and when we intervene. Using the logistic regression model will also allow us to see what classes a correlation to passing the SOL. The importance of this project to our school is enormous. If we can use this model to predict students in jeopardy of failing and help change their fate, then we are doing work that is in the best interest of the child.

Much of the data that we need to build this model we already have. Attribute that we will want to take a closer look at are:

1. Did the student fail the math SOL? Yes or No? (Target Variable)

2. Does the student attend school (did they miss more than 18 days the previous year)? Yes or No?

3. Does the student participate in afterschool activities? Yes or No?

4. Does the student come from a low socioeconomic class? Yes or No?

5. Did attend summer school the previous year? Yes or No?

6. Did the student have a study hall or tutoring class? Yes or No?

This data is all readily available to us. The school keeps demographic data on students, the attendance office has data on student absences and the school has sign-in forms for students who participate in afterschool activities. This proposal is suggesting collecting this data and looking at it to find correlations we have previously missed.