ps2.docx

Problem 1

Consider a problem from your current or a past job, a hobby, or an interest that would make for a good application of a classification using supervised segmentation. Think about the relevant concepts

· Supervised Versus Unsupervised Methods

· Data Mining and Its Results

· The Data Mining Process

· Predictive modelling

· Supervised Segmentation

· Visualizing Segmentations

referring to the concepts introduced as appropriate.

Please do not choose a hypothetical example like something from the textbook, it should be something with which you have personal experience. This is also a good way to start thinking ahead to your data science proposal (though you are not committing to anything here).

Once you have something in mind, answer the following questions:

A) Describe why this would be an appropriate example of a classification problem that can be solved with supervised segmentation methods. What is the target variable that you need to predict?

B) What is the use you want to support with this solution?

C) What are at least 3 attributes that would help you predict your target variable. For each one, briefly explain why it would be useful and how you could obtain the data.

Problem 2

Suppose that you work for a large American hotel chain that wants to use data analytics to improve their business decision making. They want to better understand customer cancelation patterns to improve their booking and profitability. You have to the following historical data on cancelations:

Target variable:

· is_canceled: whether the reservation was canceled

Attributes:

· hotel_type: whether the hotel is a “resort” or “city” hotel

· summer: whether the was made for the summer season or not

· children: whether children are listed on the reservation

· previous_cancelations: if person who made reservation has canceled before

A) Explain how you can use the concepts of entropy and information gain to determine which attributes in this data set are the most informative attributes and how this information can be used to build a classification decision tree.

B) In about 8-10 sentences, report the key findings of your analysis (see below link) and who seems to be the most likely to cancel a reservation. This should include:

· A discussion of at least one probability taken from a leaf node on the decision tree and its decision-making importance.

· How the hotel chain can use this decision tree to improve their business decision making.

· The limits of the results of this analysis when considering the use of the model.

Link to the analysis”

hotel_bookings - upRHmFJKEj1WH0nOPSYTMD7Sxsd | BigML.com

You can see the details of each node on the decision tree if you hover the mouse over or select the node itself.