PS3
ECON-575-Data Analytics-Problem Set 2 – Peter Dorner
Problem 1
I was working in an insurance firm as an intern where every member of staff worked in
an open office allowing one to see the operations and events that happened to the office. The
secretary was requested to provide a list of all clients who had arrears of more than $200. Also,
the region manager of the organization needed the list of clients to be classified on basis of
gender and location of the client. The secretary instructed the IT intern to provide the list in an
hours’ time. The IT intern searched for the clients on the company’s data files but could not
locate the list within the required time. The secretary was angry at the IT intern and blamed the
intern for incompetency. The intern was summoned by the manager, but he explained what the
problem was. The intern could not locate the list as clients’ data had been uploaded in the system
without following any format.
This is an appropriate example of classification problem. This is because the data
available in the organization at the time had been uploaded in the system without following the
required format or procedure such as on basis of gender, geographical location of the client or
even on basis of the payments they had done. Supervised segmentation method can assist
through classifying the data based on the predetermined classifications or groups of data.
Target variable:
Arrears over $200: whether arrears are over $200
Male or female: determining the gender of the client
From New York: whether the client was within the expected location or outside the New
York location or region.
ECON-575-Data Analytics-Problem Set 2 – Peter Dorner
The use needed in supporting the solution is to ensure there is easier retrieval of data
from the mass data. When there are well defined target variables, it becomes easier to retrieve
data that shares common features.
There are three variables that can help in setting the target variables. The first variable is
the nature of distribution where data with similar nature of distribution such as uniform
distribution is classified under a single target variable. The second attribute is the purpose of the
data or the purpose in which the data retrieved aims at attaining. The third attribute used in
setting the target variable is the level of independence of the variable in terms of whether it is
independent or dependent.
Problem 2
One probability that can be taken from the decision tree is the dependency of the past
occurrences or data in predicting future occurrences. A good example is the target variable from
the list provided is: ‘previous _cancelations: if the person who made reservation has canceled
before.’
The hotel can use the decision tree in determining the current status of the hotel and ways
in which the hotel can be improved. For instance, by determining the pattern in guest
cancellations, it is possible to determine if the past data on guest who cancel their visits has
anything to do with future cancellation. This would enable the management of the hotel in
determining the areas that needs improvement and strategies that can be applied in preventing
future guest cancellations.
ECON-575-Data Analytics-Problem Set 2 – Peter Dorner
The limits of the results of the analysis are ensuring there is a known definition of the
target variables. Presence of multiple target variables would result into much complexity in
defining the dataset needed.