Project

Sample_Project_1.docx

Home >Information Systems homework help >Project

CIS 575 Project Report

December 7, 2019

Client Satisfaction Prediction Models

Contents Executive Summary 3 Business Problem 3 Business Objective 3 Process followed for Selecting and Gathering Data 3 Preliminary Data Exploraton and Findings 4 Further Exploration of the Data 5 CaseOrigin 5 Severity 5 CustomerPriority 6 DayOfWeek 6 TimeOfDay 6 Product 6 Rep_Product 6 Rep_TimeToResolve 7 Description of Data Preparation 7 Microsoft Excel 7 Enterprise Miner 8 Rep_TimeToResolve 8 Rep_Product 8 Description of Data Modeling and Assessments 8 Decision Tree 9 Configuration of the Decision Tree Node 9 Decision Tree Results 9 Regression Model 10 Configuration 10 Regression Results 11 Neural Network 12 Configuration 12 Neural Network results 13 AutoNeural Node results 13 Model Comparisons and Model Selection 14 Model Selection 15 Conclusions and Recommendations 15 Lessons Learned 16 Next Steps 16 Appendix A – Figures and Tables 17 Appendix B – Replacement Values for Rep_Product 24

Executive Summary: Support organizations are cost centers. They do not bring typically bring money into a business, but rather are a cost associated with supporting the customer through their lifecycle journey. While revenues are not a typically expected outcome within a support organization, there are ways that support organizations can decrease client effort, reduce costs, and improve client retention which all help to improve the profits of a business. Improving client satisfaction is one such method. Neil Patel states that focusing on client satisfaction provides many benefits, but his main point is “companies that prioritize customer satisfaction grow and increase revenue. Those that do not, don’t.”[endnoteRef:1] By undergoing efforts to better understand our current client satisfaction historical data, we can make strides to improving client satisfaction through analyzing which aspects of the support experience lead to a satisfied customer. [1: https://neilpatel.com/blog/benefits-and-importance-of-customer-satisfaction/]

Using our own historical case and survey data, we look to discover which aspects of the case are the biggest drivers of either a positive or a negative support experience. If done correctly, we can then look to deliver a more substantiated strategy aimed at improving the long-term client satisfaction metrics, but also we can use this as a way to look or red flags amongst active cases to head off potential negative client experiences before they occur.

Business Problem: I work for a technical support organization. We handle incoming client phone calls and emails, working with our customers to help resolve issues and concerns they have with our software. Our number one client facing metric is client satisfaction (rated on a scale of 1-5). We also utilize the Net Promoter Score (NPS – scale of 1-10) to measure client sentiment. Clients that are highly satisfied with a support organization and have high NPS with a business are more likely to not only continue as clients, but more likely to expand business and invest more money with a given organization.

Business Objective: What I propose is to define a model that can help to predict a client’s satisfaction with a given case. By examining this information real time, we can look to cases that are flagged as potential for negative client satisfaction and increase our attention on those cases. By doing this, we can not only improve the overall satisfaction of our clients, but we can also look to prevent case escalations, and ensure continued business with our clients.

Process followed for Selecting and Gathering Data

Our technical support organization has 10 years of historical net promoter score data. The net promoter program asks, “How likely are you to recommend [Company] to your friends or colleagues?” The typical scoring for this is based upon a 1-10 scale, with a score of 9 or 10 being considered a promoter, a score of 7 or 8 being scored a passive, and a score of 1-6 to be considered a detractor.

A customer is provided a survey given a variety of rules, but the basic tenets of these rules are: 1) Have you received a survey in the past 90 days? 2) Did the client have a case closed? If the answer to question 1 is no, and the answer to question 2 is yes, then the client is emailed a survey upon closing of their support case.

The customer does not have to answer the survey of course, but a business can encourage customers to answer these surveys by having an effective closed loop process and by showing tangible evidence of responding to these surveys and actioning client concerns or feedback. Our business has a history of reaching out to every client (the closed loop process) to thank them for their feedback and ask for any additional clarity regarding their responses. We then work to bucket all responses into a feedback loop which we can then take to the appropriate business units and form a plan on if/how we can effectively change or encourage the behaviors that are reported upon.

Because all surveys are prompted from a case, the survey data is tied to our support case data. This allows us to look at aspects of the case at a wholistic level to determine what may have impacted the either positive or negative score. While the surveys are not launched/created from within the same system, the data from those surveys is tied back to our case system (salesforce.com) so we can effectively gather all data for this project from our internal salesforce.com instances.

Using salesforce.com, the data was gathered by running a report to filter out only cases that have attached survey responses. The included columns include the variables that are believed to have the biggest impact on satisfaction. The data was then exported from the system as a Microsoft Excel .xlsx file.

Preliminary Data Exploraton and Findings

The dataset gathered is survey data from 2010 to present, totaling 10,434 total records. The table includes a total of 14 columns:

· AccountID (Nominal) - Unique identifier of a given account that the case/survey are related to.

· Of the 10,000+ records, the Mode account ID of 0018000000PZDZ8 is responsible for 3.72% of the surveys gathered in this dataset.

· CaseNumber (Nominal)

· Unique identifier of the case that was opened that the survey was created from.

· There is no true mode as a case can only have 1 NPS survey spawned from it.

· CaseOrigin (Nominal) – initial channel the case is opened through

· Possible values: Email, Phone, Portal

· Mode: Email – 53.55% of cases

· CustomerPriority (Nominal) – The level of importance to the client business

· Possible values include: Green, Amber, Red

· Mode: Green – 95.28% of cases

· DayOfWeek (Nominal) – Day of the week the case was opened.

· Possible values: Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday

· Mode: Tuesday – 21.62% of cases

· Product (Nominal) – Product the case was opened for

· Possible values: over 100 individual products within the system

· Mode: MapInfo Pro 32-BIT – 25.61% of cases

· Reasons (Nominal) – Reason why the case was opened

· Possible Values: Support, Defect, Enhancement, Licensing

· Mode: Support – 93.92% of cases

· ResponseNumber (Nominal) - Unique Survey ID

· No Mode, only 1 Survey ID per record

· Satisfied (Binary)

· Target Variable –> 1 = Satisfied, 0 = Dissatisfied

· Mean = .73 -> Indicates that 73% of the cases resulted in a survey of Satisfied

· 1 = Satisfied defined as NPS score of 8, 9, or 10

· 0 = Dissatisfied defined as NPS score of 0 - 7

· Severity (Nominal) – Defined in the service level agreement (SLA), indicates the level of criticality and impact that the issue the case addresses has on the client

· Possible values: Critical, High, Medium, Low

· Mode: Medium – 64.78% of cases

· NPS (Interval) – Net Promoter Score: How likely is client to recommend us to a friend/colleague?

· Scale of 0 – 10

· Mean – 8.04; Percent Missing – 2.19%

· OpenedDate (Date) – Date/Time that the case was created/opened.

· Time (Time) – Time field individually separated from OpenedDate Column. Shows hour of day case was opened

· TimeOfDay (Nominal)

· Possible Values: Overnight, Morning, Midday, Evening

· Mode: Midday – 35.14% of the cases

· TimeToResolve (Interval) – The amount of time a case was open before being resolved

· Mean – 10.42 days

Further Exploration of the Data

CaseOrigin

Understanding the case origin can be important. Support processes state we work to communicate with the client on their preferred method of communication, determined by the method in which they opened the case. For example, If they contacted us by phone, we would work with the client via phone conversations. Figure 1 shows the breakdown of case data by case origin. 53.55% of cases came in via email, 39.43% came in via phone, and 6.91% came in via the online case management system (Portal). By including this information in the analysis, the goal is to determine whether the way the client opens the case has any impact on there level of satisfaction.

Severity

The Severity field is used to indicate the impact of the case to the client’s business. Based upon the Service Level Agreement’s with clients, a case is defined as a Critical, High, Medium, or Low. The assumption here is that a client with a critical or high severity issue is potentially more likely to be dissatisfied as a customer, given our software is creating a business impacting even for them. Figure 2 is a bar chart that shows the distribution of cases across these 4 severities. Examining the charts, it is evident that a large portion of cases are Medium or Low severity. This aligns well to what is expected. Low severity cases are less likely to be raised because they do not present as much of a business impact. Client’s do not want to utilize resources and effort to investigate issues that do not overtly impact the business. Medium issues are the most plentiful as it takes a larger amount of impact for a case to raise above Medium into High or Critical. When cases reach this level, they are given more priority within the business as determined by the SLA’s.

CustomerPriority

The customer priority field is intended to be used by the support organization as a pairing to the Severity column. While Severity is a technically defined field, Customer Priority is a customer sentiment field. Meaning, if a case with a Low severity, meaning it has little to no impact to the client’s business outcomes, but the client expresses a high level of frustration and discontent, the support team should classify a case as Amber or Red priority. Figure 3 shows that over 95% of cases are set with the default value of Green. While this may not be a perfect representation of the customer’s sentiment, it does mean that if a value is set to something other than Green, then the client frustration was truly magnified in those instances.

DayOfWeek

The DayOfWeek variable shows that of the 7 days in the week, the week days hold a majority of the data as expected. Examining Figure 4, Tuesday (2,256 cases) is the day when most cases are opened, Wednesday, Thursday and Monday all saw more than 2000 of the cases opened on those days. Friday had just below 1500 cases opened from this data set. Saturday and Sunday each had a small set of cases opened, with Sunday having more due to the data being in Eastern time and Australia coming online for their Monday on Sunday evening in the Eastern time zone.

TimeOfDay

This field is intended to indicate whether a case was opened in the Overnight hours in eastern time, or whether it was opened during the Morning, Midday, or Evening. By examining this data, we are looking to see if there is any correlation between the satisfaction of a client and the time of day in which they initially contact support. Initially, this data analysis was to be done on the time, but to simplify and better understand the results, it was decided to proceed to bucket time values into these bins for a more understandable result. Figure 5 shows the breakdown of cases based upon the time of the day they were opened. We quickly see that the breakdown of data does tend to favor Midday, but overall the case data is well dispersed through all 4 levels.

Product

The product field represents the software product that the client is calling in with questions about. MapInfo Pro is by far our most sold product as it is the least expensive offering and has an extensive legacy client base that continues to use and upgrade the software. There are cases from 167 different products within this dataset. In looking at the data, we quickly realize that in its current form – the data is too varied to provide much value. The data needs to be bucketed into a more concise group to provide more value to the analysis we’re looking to perform. More details on how the Rep_Product variable was created is available in the Description of Data Preparation section of this report.

Rep_Product

The Rep_Product field is a grouping of like variables from the product field. This field contains 14 variables (down from 167) and will provide a much better basis for examination into whether the product in question is more or less likely to result in a satisfied or dissatisfied customer. Figure 6 shows that the top 5 products in terms of events in this data set are MapInfo, Coding, Spectrum, EngageOne, and Confirm. Internal employees would agree that this data aligns well to the overall distribution of our case volumes in Support.

Rep_TimeToResolve

It is assumed that TimeToResolve is one of the largest drivers of a satisfied or dissatisfied customer. The longer it takes for an issue to be solved, the longer the business impact that issue is creating exists. Clients lost more time and money the longer an active case is being investigated, and the impact of that case tends to build up over time. By utilizing the Rep_TimeToResolve field, we can better focus the analysis into a more reasonable spread of the overall day. Figure 7 shows a histogram of the Rep_TimeToResolve field. The mean of this new field is 7.96 but the data is heavily skewed with a skewness of 3.692 and a Kurtosis of 14.633.

Description of Data Preparation

Preparation of the data was done in both Microsoft Excel and in SAS Enterprise Miner.

Microsoft Excel

The first thing that was done in Microsoft Excel was the removal of ancillary columns that were pulled in the initial report from Salesforce.com. Looking at all of the fields that were pulled, it was determined that they were not all needed, so, 6 columns of data were removed to help focus the analysis within the Enterprise Miner system.

The OpenedDate field is the basis of two other columns of data within this report. The DayOfWeek field was created by taking the date field and using the =TEXT([OpenedDate], “dddd”). This allowed excel to look at the value in the OpenedDate column adjacent to the DayOfWeek column and populate DayOfWeek based on that date. This will allow us to analyze whether the day of the week has any impact on the likelihood of a satisfied or dissatisfied customer.

The other field which is impacted by the OpenedDate field is the Time field. By using the HOUR function in excel, we were able to extract the Hour the case was opened from the OpenedDate field so we could then run analysis to determine whether time of day had any real impact on the level of satisfaction of a client in their support experience. To further this analysis, another variable was created called “TimeOfDay” in which the hour dictates which portion of the day the case was opened in. The variable is defined as follows:

· 0-6 and 21-24 = Overnight

· 7-10 = Morning

· 11-15 = Midday

· 16-20 = Evening

The other item that was created within the Excel spreadsheet prior to importing into SAS Enterprise Miner is the Satisfied field. This field is based upon the NPS data field. As noted above, any NPS score of 8, 9, or 10 indicates a Satisfied); any NPS score of 0 - 7 considered a Dissatisfied client. By bucketing this data into a binary field – we can use this field as a Target for our modelling and analysis. Initially, the thought was to use a Nominal field with three levels, but upon investigation into that path, the results were not yielding favorable models. To rectify that, the change to a binary target variable was made.

Enterprise Miner

Rep_TimeToResolve

While most of the data preparation did take place in Microsoft Excel prior to import into Enterprise Miner, one alteration did need to be made in order to accurately run the models and analysis. In the initial dataset, there are 19 records with a TimeToResolve value as a negative number. TimeToResolve is a metric that takes the “Resolution Date” of a case and subtracts the “Opened Date” of that case. This number cannot be negative because that would indicate the case was resolved (a resolution that answers the client’s question or solves the client’s issue has been delivered) before the case was ever opened. These values are incorrect and misleading.

To rectify this, a Data Replacement node was utilized within Enterprise Miner. The Replacement node utilized the following SAS code to take all of the negative values for TimeToResolve and replacement them with a “missing” or “null” value:

* ;

* Variable: TimeToResolve ;

* ;

Label REP_TimeToResolve='Replacement: TimeToResolve';

Length REP_TimeToResolve 8;

REP_TimeToResolve =TimeToResolve ;

if TimeToResolve eq . then REP_TimeToResolve = . ;

else

if TimeToResolve <0 then REP_TimeToResolve = . ;

The results of this node show that 19 replacements were made to the dataset.

Additionally, there are some outliers in the upper limits of this dataset as well. Some of the TImeToResolve values exceed 100 days, into the 1,000-day ranges. For purposes of this analysis, any case with a TimeToResolve of 100 or greater is grouped together. Any case opened that long will likely cause frustration for a client, so, seeing data in the 500 plus day range does not provide much additional value.

Rep_Product

Rep_Product was created using the same Replacement node as above. The goal of this variable is to condense the 167 product classes into a more functional subset of grouped products from which the analysis can be run. Utilizing the Replacement node’s “Replacement Editor” utility, each new value for a product was manually entered into the replacement value column. The node was then run showing that 10,427 changes were made to the Product variable, meaning each Product was not bucketed into a new variable successfully. Appendix B contains a list of all the individual product variables along with the new variable each product was placed into.

Description of Data Modeling and Assessments

For this analysis, three different data models were created with SAS Enterprise Miner. The purpose of using three distinct models is to understand which modelling architecture provides the most accurate and representative model. The models selected for this analysis are Decision Tree, Regression, and Neural Networks.

All of the models will utilize the same set of variables. This list is below:

· Case Origin – Input

· Customer Priority – Input

· DayofWeek – Input

· REP_Product – Input

· REP_TimeToResolve – Input

· Reasons – Input

· Satisfied – Target

· Severity – Input

· TimeOfDay – Input

Decision Tree

A decision tree is a tool that allows users to model a set of given data in a “tree like” form, which the various decision points creating new branches and the decisions off those points being referred to as leaves. According to Wikipedia, this model type is often used to “help identify a strategy most likely to reach a goal.”[endnoteRef:2] [2: https://en.wikipedia.org/wiki/Decision_tree]

Configuration of the Decision Tree Node

This node’s path from the data source is as follows:

ProjData -> Replacement -> Data Partition -> Decision Tree

Setting the node up this way ensures that the model utilizes the fully prepared dataset, and runs the model using the equally segmented Train data and Validate data. By configuring the node this way, we are following standard best practices. The node was configured to create an optimal tree automatically by setting the subtree Method to Assessment and the Assessment Measure to Average Square Error. The follow settings were configured under splitting rule:

These are the default settings of the Decision Tree node within SAS Enterprise Manager.

Decision Tree Results

Running the Decision Tree node resulted in a maximal tree being created that contained 15 leaves. The ASE for this model is .188632 for the Train data and .197024 for the Validation data. The optimal path shows as leaf 8. The rule for this node is: WHERE Replacement: TimeToResolve < 2.15 AND Replacement: Product MAPINFO, CODING, MAILSTREAM, ... Or Missing AND TimeOfDay OVERNIGHT, MORNING, EVENING Or Missing.

Essentially, this rule states that the most likely scenario that leads to a satisfied customer is one where the case is resolved in less than 2.15 days, the product is MapInfo, Coding, MailStream, or unknown, and the case is opened in the morning, evening, overnight, or unknown (meaning not midday). Understanding this, it goes to assume that the most likely unsatisfied customer would be the opposite of that rule.

One byproduct of the decision tree model is the “Variable Importance” statistics that are derived from it. Based on this information, of the variables input into the model, the order of importance is:

1. REP_TimeToResolve – Importance = 1.0000 (Train) / 1.0000 (Validation)

2. REP_Product – Importance = .6859 (Train) / .4315 (Validation)

3. TimeOfDay – Importance = .4218 (Train) / .2844 (Validation)

4. CaseOrigin – Importance = .3023 (Train) / .4833 (Validation)

5. Reasons – Importance = .2263 (Train) / .1072 (Validation)

The other input variables were not recognized as providing any importance to the building of the decision tree. What this data indicates is that the TimeToResolve is the #1 most important factor in creating a satisfied client. The longer a case remains open – the more likely that case is to result in a dissatisfied client. The initial splitting rule sets the split at 2.15 days – which means that cases closed within that time frame are more likely to satisfy a client and cases opened longer will lead to unhappier clients. Cases open more than 64.45 days are even more likely to generate a dissatisfied client. A full view of the Decision Tree model can be found in Appendix A as Figure 8.

Regression Model

A Regression model is another form of statistical modeling which is offers a different approach to prediction modeling as comparted to decision trees. Page 2 of the Logistic Regression PDF provided in CIS 575 indicates that regression is a parametric model which assumes a specific association between the input variables and the target. In SAS Enterprise Miner, the model is generated using a prediction formula that creates a series of models with increasing complexity, then using fit statistics calculated from the validation data, a best model is selected.

Configuration

Given the target variable is binary, a logistic regression model is selected within Enterprise Miner. After running a few different models, the final model was configured to utilize the Stepwise selection model, with selection options of:

· Entry Significance Level = 1.0

· Stay Significance Level = .5

· Maximum Number of Steps = 30

These settings were selected based upon previous instruction which stated to generate a larger model from which to gather best fit statistics, setting 1 for entry significance enables any input into the model, and a value of .5 stay significance can allow the stepwise selection process to continue longer and not terminate prematurely. By putting a maximum number of steps = 30, we set an upper bound to ensure that the model does not run an overly long amount of time. To allow for Enterprise Miner to optimize the complexity of the model, the Selection Criterion was changed from default to Validation Error.

Regression Results

Utilizing the optimized model, viewing the iteration plot we see that the model at iteration 7 is the best fit model for this set of data. The ASE for this model is .188706 for the Train data and .194423 for the Validation data.

As the plot progresses through the iterations, the lowest point for Average Square Error for both the Train data and Validation data is at step 7. There’s a slight increase after that step which means the model becomes less accurate from that point forward.

In further analyzing the model, the output provides an analysis of effects as seen below:

Type 3 Analysis of Effects

Effect	DF	Sum of Squares	F Value	PR > F
CaseOrigin	6	2.2371	1.96	0.0671
CustomerPriority	2	0.5518	1.45	0.2338
REP_Product	13	8.5307	3.46	<.0001
REP_TimeToResolve	1	6.7412	35.52	<.0001
Reasons	3	0.4487	0.79	0.5003
Severity	3	1.4268	2.51	0.0572
TimeOfDay	3	1.1742	2.06	0.103

Examining this information further, we see evidence that the REP_TimeToResolve variable is by far the most impactful input into this model. With an F value of 35.52, this indicates that the results of this variable upon the model are significant. The next most significant value is REP_Product, which would indicate that certain products are more likely to go into escalation than others. Perhaps this should impact the strategy of the support department moving forward.

Neural Network

Neural networks are one of the more popular prediction models utilized in current data mining platforms. Similar to regression models, Neural Networks utilize mathematical algorithms to determine the impact of variables upon a given target. The most typical type of neural network is actually an extension of a standard regression model. Neural networks allow users to model virtually any association between given inputs and a target.

Configuration

Multiple attempts were created to discover which settings would create the best Neural Network model for the given inputs. 3 different neural models were created:

1) A standalone neural network model

2) A neural network model with a regression model preceding it to define inputs

3) An auto neural network

For Items 1 and 2, the configurations were identical. Both models utilized Average Error as a Model Selection Criterion. Under the “Network” settings, a value of 6 was used for “Number of Hidden Units.” Under the Optimization settings, Maximum Iterations was set to 100 and the “Preliminary Training” option was disabled.

For the Auto Neural node – a few settings were adjusted from the defaults:

· Train Action > Search enabled

· Number of Hidden Units > 1

· Tolerance > Low

· Direct > No

· Normal > No

Neural Network results

Both the standalone neural network and the neural network connected to the Regression node were executed with the above settings and returned near identical results. Modifying the inputs via the regression node had negligible impact on the performance of the neural model. In examining the ASE for this model, the value for Train is .187575 and for Validation is .193953. The iteration plot below shows that the optimal iteration occurs at iteration 29.

Examining the final weights from this network (shown in Figure 9), it becomes evident that the strongest weights belong to the BIAS in H14, the REP_Product = Legacy is the highest variable with a weight of -.655 meaning that Legacy products have a positive effect towards a satisfied customer. The REP_TimeToResolve has a weight of .913 which means that the larger that value the more likely a negative event will occur at the target – meaning a dissatisfied customer. This is consistent with the other models we’ve run to this point.

AutoNeural Node results

In examining the ASE for this model, the value for Train is .189396 and for Validation is .194331. Examining the iteration plot below, it is evident that training step 3 provides the optimal iteration for this model, as the ASE for validation and train seem to stay constant from that point forward without any further improvement or degradation in performance.

Examining the final weights, as show in Figure 10, the AutoNeural network had different results than the Neural Network node. While REP_TimeToResolve continued to show a negative impact on reaching a binary target of 1, the REP_ProductConfirm appears to have the largest weight in this dataset of 1.750. The variable with the largest weight towards a positive satisfied level is REP_ProductMailStream with a weight of -1.464, as opposed to the Neural Network’s identification of REP_ProductLegacy. The AutoNeural and Neural Networks have shown to provide vastly different models.

Model Comparisons and Model Selection

Within SAS Enterprise Miner, a Model Comparison node was configured with the following changes to the default settings:

· Selection Statistic: Average Squared Error

· HP Selection Statistic: Average Squared Error

· Selection Table: Validation

· Model Inputs:

· Decision Tree

· Regression

· Neural Network based on Regression

· (Neural Network Standalone had identical results – no need to include both)

· AutoNeural

Upon running the Model Comparison tool, SAS Enterprise Miner looks at all of the Average Squared Error (ASE) statistics for the various model’s Validation data runs and selects the best model based upon the lowest ASE.

Model Selection

The model selected by Enterprise Miner was the Neural Network node. In examining the Fit Statistics in the model comparison output, the following table clearly shows that the Neural Network had the best performance regarding model performance.

Selected Model	Model Type	Validation ASE	Train ASE
Y	Neural Network	0.19395	0.18758
	AutoNeural	0.19433	0.1894
	Regression	0.19442	0.18871
	Decision Tree	0.19702	0.18863

Further investigation into the models shows that the models are perform at nearly consistent level and that while the neural network does outperform the other models based upon the ASE, all of these models would provide useful detail in moving towards a prediction of satisfied vs dissatisfied with a set of clients.

Viewing the Score Rankings Overlay: Satisfied chart below, its apparent that NN Regression (the green line) provides the best mean prediction at initial depths, and the 2nd best mean predictions at a depth of 100. This indicates that the model performs very well at both initial and full depth.

Conclusions and Recommendations

This analysis provides a lot of insight into Net Promoter Score survey data gathered over the past 10 years of Support cases from our clients. The goal of this analysis was to create a model that would allow for us to better understand the variables that drive a satisfied client, and thereby understand what variables would drive a dissatisfied client. Client dissatisfaction can cost the business undue time and resources in working to resolve account concerns and keep the client as a customer or can result in a lost client altogether. Dissatisfied clients increase costs for support organizations as well when the concerns are not addressed appropriately.

This analysis is also a precursor to a potential utility that would allow a support organization to identify customer’s that are likely to go into “escalation.” Escalated customers require the attention from business executives, cause damage to the business/client relationship, and increase the workload drastically for all persons involved. Increased workloads mean increased costs and declining profits. Identifying customers at risk of escalation and addressing those concerns prior to a full-blown escalation can lead to substantial cost savings and a much better client satisfaction experience.

Lessons Learned

From this analysis, it was proven that certain variables have much more impact upon a client’s satisfaction level than others. Examining the results of the various models, it’s clear that things like Day of Week and Time of Day that a case is opened have little to no longstanding impact upon the likelihood of a satisfied client.

Conversely, and not surprisingly, the variables that have the most impact upon a client’s satisfaction level appear to be TimeToResolve and Product. The analysis proves that the longer a case is open, the more likely that case is to go into a dissatisfied state. While things like severity and priority certainly impact the length of time that may be, overall, it’s shown that a client having to wait for resolution is not ideal. A support strategy aimed at focusing on resolution times is sure to have a benefit upon overall client satisfaction levels.

Additionally, this analysis showed that some products do not impact satisfaction levels greatly, but others do. These products are more polarizing to our client base. While a product like MapInfo, a desktop software product with a mature user base and strong community, results in higher satisfaction, a product like EngageOne, a much more complex enterprise level software with virtually no community support, results in lower satisfaction. By examining which products result in lower satisfaction levels, the business can invest resources into investigating and improving pitfalls of that product and work to improve satisfaction at the product itself, not just at the support service surrounding it.

Next Steps

The business needs to build this model out further – expanding the base of data being used to train and validate the model. Once the model is strengthened, a weekly execution of this model with all currently active/open cases should be performed. That execution should look to identify cases that are creeping towards scenarios that would lead to a dissatisfied client – and appropriate actions can then be identified. Does the support engineer need to prioritize a case? Has a case fallen off the radar and need to be re-engaged? Do other departments need to be included to progress an issue towards resolution? Are certain support engineers in need of additional support to prevent issues from resulting in dissatisfied clients? With the right model, support management can be much better equipped to make small adjustments in a given workday or work week to help improve overall client experiences.

Appendix A – Figures and Tables

Figure 1 – Case Origin

Figure 2 – Severity

Figure 3 – Customer Priority

Figure 4 - DayOfWeek

Figure 5 – Time of Day

Figure 6 – Rep_Product

Figure 7 – Rep_TimeToResolve

Figure 8 - Decision Tree Model

Figure 9 - Neural Network Weights

Figure 10 - AutoNeural Weights

Figure 11 - Final Diagram

Appendix B – Replacement Values for Rep_Product

Original Value	Replacement Value
CODE-1 Plus	Coding
Finalist	Coding
VeriMove	Coding
Canadian Code-1 Plus	Coding
Address Broker	Coding
SortStream Canada	Coding
Address Validation	Coding
CODE-1 Plus International	Coding
Confirm AM	Confirm
Confirm OnDemand	Confirm
Confirm CF	Confirm
Street Pro	Data
Communications Data Suite	Data
Postal Data	Data
Communities Boundaries Data	Data
Parcels / Cadastre	Data
Centrus GDT Data	Data
Exchange Info	Data
Post Point Pro Data Product	Data
Address Fabric Data	Data
Census Boundaries Data	Data
Centrus Points	Data
POSTNET Barcoding Option	Data
World Boundaries Premium	Data
EngageOne Enrichment	EngageOne
EngageOne Designer (6+)	EngageOne
DOC1 Series 5	EngageOne
EngageOne Server	EngageOne
DOC1 Designer	EngageOne
EngageOne Vault	EngageOne
DFWorks	EngageOne
EDGE	EngageOne
DOC1 Generate	EngageOne
DOC1 Series 3/4	EngageOne
Spectrum Spatial	EngageOne
EngageOne Content Author	EngageOne
EngageOne Deliver	EngageOne
EngageOne Inform	EngageOne
Mail 360 Manager	EngageOne
DOC1 Document Composition Service	EngageOne
e2 - Hosted	EngageOne
OpenEDMS	EngageOne
EngageOne Generate (6+)	EngageOne
EngageOne Video	EngageOne
DOC1 Interactive	EngageOne
e2 Present	EngageOne
EngageOne Liason B2B	EngageOne
Mail.Dat Viewer	EngageOne
MapMarker - USA	GeoCoding
Centrus Desktop	GeoCoding
GeoStan	GeoCoding
MapMarker	GeoCoding
MapMarker - CAN	GeoCoding
Geographic Coding Plus	GeoCoding
GeoCoding	GeoCoding
MapMarker - AUS	GeoCoding
Spectrum - Enterprise Routing Module	GeoCoding
GeoTAX	GeoTax
Centrus GeoTAX Matrix	GeoTax
OnDemand	Hosted
PBS Hosted	Hosted
Software & Data Marketplace	Hosted
zTestProduct	Legacy
PlanWeb	Legacy
	Legacy
Merge/Purge Plus	Legacy
GeoStore	Legacy
OnRoute	Legacy
Bar-coded Bag	Legacy
GPInfo	Legacy
Consumer Merge/Purge	Legacy
Compass	Legacy
Labels Printing Plus	Legacy
PlanAccess+B37	Legacy
SendPro SaaS	Legacy
ConnectMaster	Legacy
Digital	Legacy
GeoInsight	Legacy
GeoReveal Studio	Legacy
Geographic Determination Library	Legacy
Message1	Legacy
PlanAccess	Legacy
StreamSure	Legacy
iProof	Legacy
MailStream Plus	MailStream
Dispatcher4	MailStream
List Conversion Plus	MailStream
Generalized Selection Plus	MailStream
List Conversion	MailStream
EZ Case Plus	MailStream
MapInfo Pro 32-bit	MapInfo
MapInfo Discover	MapInfo
MapInfo Pro 64-bit	MapInfo
AnySite	MapInfo
MapInfo Discover 3D	MapInfo
MapBasic	MapInfo
Target Pro	MapInfo
Discover PA	MapInfo
Vertical Mapper	MapInfo
ModelVision	MapInfo
MapInfo Raster	MapInfo
Encom Engage	MapInfo
DriveTime	MapInfo
Encom Engage3D Pro	MapInfo
Discover Mobile	MapInfo
MapInfo ProViewer	MapInfo
Spectrum - Location Intelligence Module	MapInfo
Mail 360 Manager Client	MapInfo
MapInfo Manager	MapInfo
Centrus Enhanced Data	MapInfo
AnySite Online	MapInfo
CI Appliance	MapInfo
ArcGIS Plug-In	MapInfo
MapInfo Discovery	MapInfo
Quickmag	MapInfo
Paramics	Portrait
Portrait Dialogue	Portrait
Spectrum Miner	Portrait
Portrait Foundation	Portrait
Portrait HQ	Portrait
Portrait Interaction Optimizer	Portrait
P/I Output Manager	Production Intelligence
P/I Enterprise Manager	Production Intelligence
P/I Output Enhancement	Production Intelligence
JESConnect	Production Intelligence
OfficeMail	Production Intelligence
AddressNow	Production Intelligence
Centrus Coastal Boundaries Data	Production Intelligence
IntelliJet Print Process Manager	Production Intelligence
Sagent Data Flow	Sagent
AutoMate BPA Server	Sagent
Spectrum Technology Platform	Spectrum
Spectrum - Universal Addressing Module	Spectrum
Exponare	Spectrum
Spectrum - Enterprise Geocoding Module	Spectrum
Spectrum - OnDemand	Spectrum
MapXtreme .NET	Spectrum
Stratus/Analyst	Spectrum
Spectrum Spatial Analyst	Spectrum
Envinsa	Spectrum
Spectrum - Client API	Spectrum
Spectrum - Enterprise Tax Module	Spectrum
Envinsa EOLS	Spectrum
Stratus Gallery	Spectrum
Spectrum - Connector	Spectrum
Location Intelligence APIs	Spectrum
MapInfo Stratus (SaaS)	Spectrum
Spectrum - Address Now Module	Spectrum
MapX	Spectrum
SpatialWare	Spectrum
Crime Profiler	Spectrum
MapMarker - Other	Spectrum
MapXtreme Windows	Spectrum
RouteFinder	Spectrum
Spatial+	Spectrum
MapXtreme Java	Spectrum
Spectrum - Advanced Matching Module	Spectrum
Spectrum - Enterprise Data Integration Module	Spectrum
Spectrum - Universal Name Module	Spectrum
Location Based Services	Spectrum
MapXtreme MapX	Spectrum
PB Community	Spectrum
Parcel Property Attributes Data	Spectrum
Portrait EDGE 2020	Spectrum
Schools Data	Spectrum
Spatial Server	Spectrum
Spectrum - Data Normalization Module	Spectrum
e2 Account Management	Spectrum
e2 Present & Pay	Spectrum
Unknown	Spectrum