Data Science Assignment

movistar95
assignment3.pdf

DSC 341 Foundations of Data Science Assignment 3

Instructor: Ilyas Ustun, PhD

You need to solve the questions by manual calculation. Show your calculations to get credit. You will not get credit by just providing the result.

You can show the calculations by: - Using Word Equations (Insert -> Equation) and show the formula followed by the numeric values

properly placed and the final solution. Example: What’s the length of hypotenuse c, given sides a=3 and b=4 in a perpendicular triangle?

𝑎𝑎2 + 𝑏𝑏2 = 𝑐𝑐2 a = 3, b = 4

32 + 42 = 25 = 𝑐𝑐2 => 𝑐𝑐 = √25 = 5

- Or do it on paper, take the picture, and paste it on Word, in their corresponding questions. - At the end, you should only submit a single Word file.

Submission Instructions

1. Answer the problems and write your answers in a Word document. Submit a single Word file. 2. For full credit per problem, make sure that you provide enough detail to each one of your

answers. 3. Submit your file online at the website at http://d2l.depaul.edu and check your submission 4. Keep a copy of all your submissions 5. If you have questions about the homework, email me BEFORE the deadline 6. Write your name and ID on the top of your assignment

weather temperature jogging 1 sunny warm yes 2 sunny warm yes 3 rainy cold no 4 rainy warm yes 5 rainy cold no 6 rainy cold no 7 sunny hot no 8 rainy warm no 9 rainy hot no 10 sunny cold no 11 sunny cold no 12 sunny cold yes 13 sunny warm yes 14 sunny hot no 15 sunny hot yes 16 rainy hot yes 17 rainy hot no 18 rainy cold yes 19 sunny cold yes 20 sunny cold no 21 sunny hot yes 22 sunny warm no 23 sunny warm no 24 sunny warm yes 25 sunny hot yes

Problem 1 [74]:

Use the data table to answer the Problem. “jogging” is the outcome variable. “weather” and “temperature” are the explanatory variables.

Answer the following question using the table given above.

1. [10] How does decision tree algorithm work? Define in your own words. What is the goal at each split? How can you define impurity?

2. [4] What is the formula for gini index calculation? 3. [4] What is the formula for the entropy calculation? 4. [4] What is the gini index at the initial state? (Expected answer: 0.4992) 5. [10] What is the information gain splitting on temperature as your first split using gini index?

Split into each group separately (split into warm, cold, and hot) (Expected answer: 0.0292) 6. [10] What is the information gain splitting on weather as your first split using gini index? Split

into each group separately.

7. [4] Based on questions 5 and 6, which variable would be chosen for the first split using gini index, weather or temperature?

8. [4] What is the entropy at the initial state? 9. [10] What is the information gain splitting on temperature as your first split using entropy

measure? Split into each group separately (split into warm, cold, and hot) 10. [10] What is the information gain splitting on weather as your first split using entropy measure?

Split into each group separately. (Expected answer: 0.035486) 11. [4] Based on questions 9 and 10, which variable would be chosen for the first split using entropy

measure, weather or temperature?

Problem 2 [26]

A decision tree was trained on the data given in the table and the following was obtained. Answer the following questions based on the diagram.

1. [16] Based on the decision tree diagram given, after first split based on temperature three nodes are obtained: Warm, Cold, and Hot. Make a second split on the node “Cold” using the weather variable.

a. [5] What is the information gain using the gini index as the impurity measure? b. [5] What is the information gain using entropy as the impurity measure? c. [6] Is there any point in doing this split? Write your ideas.

(Here Parent is the node “Cold”, and children will be the nodes obtained by splitting the Parent node based on Weather type.)

2. [10] Based on the decision tree diagram given, which class would the following observations belong to? You can point out the paths being followed by each data point.

a. temperature= ‘warm’, weather=’sunny’ b. temperature= ‘hot’, weather=’rainy’ c. temperature= ‘warm’, weather=’rainy’ d. temperature= ‘cold’, weather=’sunny’ e. temperature= ‘hot’, weather=’sunny’