Logistic Regression in r

profileguanqing
FCA1_2.pptx

Stat 6306 Introduction to Data Science

Week 1.2

Due: 10am Friday

Philosophy

On the next slide you will see activities and the estimated / expected time that the student should spend on that activity.

It is important to note that the goal of the activities is to become familiar with the methods, ideas and implementation involved in that activity so that we can efficiently iron out all the details in live session.

Analogy: You are building the pieces of puzzle in the For Live Session Activity and we are putting them together to see the big picture in class.

It is not expected that the student have all the correct answers. The expectation is that each student spend the allotted time (indicated next to the activity) on each activity so that we can discuss the details in class.

If you max out the indicated time without finishing the activity and you don’t have more time to finish, simply write up what you have learned by that time and record any questions you might have and we will address those in class!

We want to develop the questions before class so that we can use the live session time to effectively answer them!

First Thing’s First

In this first week we will make sure everyone is polished up and on the same page. We will review base R, Rstudio, RMarkdown, Linear Regression and Logistic Regression.

Please:

Check out the videos on canvas … since this week is review, you can look at the just the ones you think you might need a refresher on.

Complete Part 1

Part 1 (2-3 hours)

Find the data (titanic_train.csv) on Canvas.

Use Logistic Regression to classify those who survived and died based on Gender, Age and class.

Use your age and gender and predict your survival based on each of the ticket classes.

Create a confusion matrix and calculate the accuracy, misclassification rate, sensitivity and specificity. Be prepared to explain these statistics. (It is ok if you have questions here … it is expected and is a good thing if we can identify and record them… we will answer them in class… just do your best in the time allotted.)

Use your model to classify the 418 randomly selected passengers in the test set (titanic_test.csv) on Canvas.

Make a PowerPoint to present in Class and to turn in to Canvas. (At least 1 slide per question above (more is fine … just don’t want to crowd slides.)

Note: there are videos and slide decks on Canvas to use as resources and reviews of Logistic regression and how to implement it in R. The videos describe logistic regression and mostly use SAS but the slides show the implementation in R.

That’s It! See you in Class!