Project

profileprNat0rals
Sample_Project_1.docx

CIS 575 Project Report

December 7, 2019

Client Satisfaction Prediction Models

Contents Executive Summary 3 Business Problem 3 Business Objective 3 Process followed for Selecting and Gathering Data 3 Preliminary Data Exploraton and Findings 4 Further Exploration of the Data 5 CaseOrigin 5 Severity 5 CustomerPriority 6 DayOfWeek 6 TimeOfDay 6 Product 6 Rep_Product 6 Rep_TimeToResolve 7 Description of Data Preparation 7 Microsoft Excel 7 Enterprise Miner 8 Rep_TimeToResolve 8 Rep_Product 8 Description of Data Modeling and Assessments 8 Decision Tree 9 Configuration of the Decision Tree Node 9 Decision Tree Results 9 Regression Model 10 Configuration 10 Regression Results 11 Neural Network 12 Configuration 12 Neural Network results 13 AutoNeural Node results 13 Model Comparisons and Model Selection 14 Model Selection 15 Conclusions and Recommendations 15 Lessons Learned 16 Next Steps 16 Appendix A – Figures and Tables 17 Appendix B – Replacement Values for Rep_Product 24

Executive Summary: Support organizations are cost centers. They do not bring typically bring money into a business, but rather are a cost associated with supporting the customer through their lifecycle journey. While revenues are not a typically expected outcome within a support organization, there are ways that support organizations can decrease client effort, reduce costs, and improve client retention which all help to improve the profits of a business. Improving client satisfaction is one such method. Neil Patel states that focusing on client satisfaction provides many benefits, but his main point is “companies that prioritize customer satisfaction grow and increase revenue. Those that do not, don’t.”[endnoteRef:1] By undergoing efforts to better understand our current client satisfaction historical data, we can make strides to improving client satisfaction through analyzing which aspects of the support experience lead to a satisfied customer. [1: https://neilpatel.com/blog/benefits-and-importance-of-customer-satisfaction/]

Using our own historical case and survey data, we look to discover which aspects of the case are the biggest drivers of either a positive or a negative support experience. If done correctly, we can then look to deliver a more substantiated strategy aimed at improving the long-term client satisfaction metrics, but also we can use this as a way to look or red flags amongst active cases to head off potential negative client experiences before they occur.

Business Problem: I work for a technical support organization. We handle incoming client phone calls and emails, working with our customers to help resolve issues and concerns they have with our software. Our number one client facing metric is client satisfaction (rated on a scale of 1-5). We also utilize the Net Promoter Score (NPS – scale of 1-10) to measure client sentiment. Clients that are highly satisfied with a support organization and have high NPS with a business are more likely to not only continue as clients, but more likely to expand business and invest more money with a given organization.

Business Objective: What I propose is to define a model that can help to predict a client’s satisfaction with a given case. By examining this information real time, we can look to cases that are flagged as potential for negative client satisfaction and increase our attention on those cases. By doing this, we can not only improve the overall satisfaction of our clients, but we can also look to prevent case escalations, and ensure continued business with our clients.

Process followed for Selecting and Gathering Data

Our technical support organization has 10 years of historical net promoter score data. The net promoter program asks, “How likely are you to recommend [Company] to your friends or colleagues?” The typical scoring for this is based upon a 1-10 scale, with a score of 9 or 10 being considered a promoter, a score of 7 or 8 being scored a passive, and a score of 1-6 to be considered a detractor.

A customer is provided a survey given a variety of rules, but the basic tenets of these rules are: 1) Have you received a survey in the past 90 days? 2) Did the client have a case closed? If the answer to question 1 is no, and the answer to question 2 is yes, then the client is emailed a survey upon closing of their support case.

The customer does not have to answer the survey of course, but a business can encourage customers to answer these surveys by having an effective closed loop process and by showing tangible evidence of responding to these surveys and actioning client concerns or feedback. Our business has a history of reaching out to every client (the closed loop process) to thank them for their feedback and ask for any additional clarity regarding their responses. We then work to bucket all responses into a feedback loop which we can then take to the appropriate business units and form a plan on if/how we can effectively change or encourage the behaviors that are reported upon.

Because all surveys are prompted from a case, the survey data is tied to our support case data. This allows us to look at aspects of the case at a wholistic level to determine what may have impacted the either positive or negative score. While the surveys are not launched/created from within the same system, the data from those surveys is tied back to our case system (salesforce.com) so we can effectively gather all data for this project from our internal salesforce.com instances.

Using salesforce.com, the data was gathered by running a report to filter out only cases that have attached survey responses. The included columns include the variables that are believed to have the biggest impact on satisfaction. The data was then exported from the system as a Microsoft Excel .xlsx file.

Preliminary Data Exploraton and Findings

The dataset gathered is survey data from 2010 to present, totaling 10,434 total records. The table includes a total of 14 columns:

· AccountID (Nominal) - Unique identifier of a given account that the case/survey are related to.

· Of the 10,000+ records, the Mode account ID of 0018000000PZDZ8 is responsible for 3.72% of the surveys gathered in this dataset.

· CaseNumber (Nominal)

· Unique identifier of the case that was opened that the survey was created from.

· There is no true mode as a case can only have 1 NPS survey spawned from it.

· CaseOrigin (Nominal) – initial channel the case is opened through

· Possible values: Email, Phone, Portal

· Mode: Email – 53.55% of cases

· CustomerPriority (Nominal) – The level of importance to the client business

· Possible values include: Green, Amber, Red

· Mode: Green – 95.28% of cases

· DayOfWeek (Nominal) – Day of the week the case was opened.

· Possible values: Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday

· Mode: Tuesday – 21.62% of cases

· Product (Nominal) – Product the case was opened for

· Possible values: over 100 individual products within the system

· Mode: MapInfo Pro 32-BIT – 25.61% of cases

· Reasons (Nominal) – Reason why the case was opened

· Possible Values: Support, Defect, Enhancement, Licensing

· Mode: Support – 93.92% of cases

· ResponseNumber (Nominal) - Unique Survey ID

· No Mode, only 1 Survey ID per record

· Satisfied (Binary)

· Target Variable –> 1 = Satisfied, 0 = Dissatisfied

· Mean = .73 -> Indicates that 73% of the cases resulted in a survey of Satisfied

· 1 = Satisfied defined as NPS score of 8, 9, or 10

· 0 = Dissatisfied defined as NPS score of 0 - 7

· Severity (Nominal) – Defined in the service level agreement (SLA), indicates the level of criticality and impact that the issue the case addresses has on the client

· Possible values: Critical, High, Medium, Low

· Mode: Medium – 64.78% of cases

· NPS (Interval) – Net Promoter Score: How likely is client to recommend us to a friend/colleague?

· Scale of 0 – 10

· Mean – 8.04; Percent Missing – 2.19%

· OpenedDate (Date) – Date/Time that the case was created/opened.

· Time (Time) – Time field individually separated from OpenedDate Column. Shows hour of day case was opened

· TimeOfDay (Nominal)

· Possible Values: Overnight, Morning, Midday, Evening

· Mode: Midday – 35.14% of the cases

· TimeToResolve (Interval) – The amount of time a case was open before being resolved

· Mean – 10.42 days

Further Exploration of the Data

CaseOrigin

Understanding the case origin can be important. Support processes state we work to communicate with the client on their preferred method of communication, determined by the method in which they opened the case. For example, If they contacted us by phone, we would work with the client via phone conversations. Figure 1 shows the breakdown of case data by case origin. 53.55% of cases came in via email, 39.43% came in via phone, and 6.91% came in via the online case management system (Portal). By including this information in the analysis, the goal is to determine whether the way the client opens the case has any impact on there level of satisfaction.

Severity

The Severity field is used to indicate the impact of the case to the client’s business. Based upon the Service Level Agreement’s with clients, a case is defined as a Critical, High, Medium, or Low. The assumption here is that a client with a critical or high severity issue is potentially more likely to be dissatisfied as a customer, given our software is creating a business impacting even for them. Figure 2 is a bar chart that shows the distribution of cases across these 4 severities. Examining the charts, it is evident that a large portion of cases are Medium or Low severity. This aligns well to what is expected. Low severity cases are less likely to be raised because they do not present as much of a business impact. Client’s do not want to utilize resources and effort to investigate issues that do not overtly impact the business. Medium issues are the most plentiful as it takes a larger amount of impact for a case to raise above Medium into High or Critical. When cases reach this level, they are given more priority within the business as determined by the SLA’s.

CustomerPriority

The customer priority field is intended to be used by the support organization as a pairing to the Severity column. While Severity is a technically defined field, Customer Priority is a customer sentiment field. Meaning, if a case with a Low severity, meaning it has little to no impact to the client’s business outcomes, but the client expresses a high level of frustration and discontent, the support team should classify a case as Amber or Red priority. Figure 3 shows that over 95% of cases are set with the default value of Green. While this may not be a perfect representation of the customer’s sentiment, it does mean that if a value is set to something other than Green, then the client frustration was truly magnified in those instances.

DayOfWeek

The DayOfWeek variable shows that of the 7 days in the week, the week days hold a majority of the data as expected. Examining Figure 4, Tuesday (2,256 cases) is the day when most cases are opened, Wednesday, Thursday and Monday all saw more than 2000 of the cases opened on those days. Friday had just below 1500 cases opened from this data set. Saturday and Sunday each had a small set of cases opened, with Sunday having more due to the data being in Eastern time and Australia coming online for their Monday on Sunday evening in the Eastern time zone.

TimeOfDay

This field is intended to indicate whether a case was opened in the Overnight hours in eastern time, or whether it was opened during the Morning, Midday, or Evening. By examining this data, we are looking to see if there is any correlation between the satisfaction of a client and the time of day in which they initially contact support. Initially, this data analysis was to be done on the time, but to simplify and better understand the results, it was decided to proceed to bucket time values into these bins for a more understandable result. Figure 5 shows the breakdown of cases based upon the time of the day they were opened. We quickly see that the breakdown of data does tend to favor Midday, but overall the case data is well dispersed through all 4 levels.

Product

The product field represents the software product that the client is calling in with questions about. MapInfo Pro is by far our most sold product as it is the least expensive offering and has an extensive legacy client base that continues to use and upgrade the software. There are cases from 167 different products within this dataset. In looking at the data, we quickly realize that in its current form – the data is too varied to provide much value. The data needs to be bucketed into a more concise group to provide more value to the analysis we’re looking to perform. More details on how the Rep_Product variable was created is available in the Description of Data Preparation section of this report.

Rep_Product

The Rep_Product field is a grouping of like variables from the product field. This field contains 14 variables (down from 167) and will provide a much better basis for examination into whether the product in question is more or less likely to result in a satisfied or dissatisfied customer. Figure 6 shows that the top 5 products in terms of events in this data set are MapInfo, Coding, Spectrum, EngageOne, and Confirm. Internal employees would agree that this data aligns well to the overall distribution of our case volumes in Support.

Rep_TimeToResolve

It is assumed that TimeToResolve is one of the largest drivers of a satisfied or dissatisfied customer. The longer it takes for an issue to be solved, the longer the business impact that issue is creating exists. Clients lost more time and money the longer an active case is being investigated, and the impact of that case tends to build up over time. By utilizing the Rep_TimeToResolve field, we can better focus the analysis into a more reasonable spread of the overall day. Figure 7 shows a histogram of the Rep_TimeToResolve field. The mean of this new field is 7.96 but the data is heavily skewed with a skewness of 3.692 and a Kurtosis of 14.633.

Description of Data Preparation

Preparation of the data was done in both Microsoft Excel and in SAS Enterprise Miner.

Microsoft Excel

The first thing that was done in Microsoft Excel was the removal of ancillary columns that were pulled in the initial report from Salesforce.com. Looking at all of the fields that were pulled, it was determined that they were not all needed, so, 6 columns of data were removed to help focus the analysis within the Enterprise Miner system.

The OpenedDate field is the basis of two other columns of data within this report. The DayOfWeek field was created by taking the date field and using the =TEXT([OpenedDate], “dddd”). This allowed excel to look at the value in the OpenedDate column adjacent to the DayOfWeek column and populate DayOfWeek based on that date. This will allow us to analyze whether the day of the week has any impact on the likelihood of a satisfied or dissatisfied customer.

The other field which is impacted by the OpenedDate field is the Time field. By using the HOUR function in excel, we were able to extract the Hour the case was opened from the OpenedDate field so we could then run analysis to determine whether time of day had any real impact on the level of satisfaction of a client in their support experience. To further this analysis, another variable was created called “TimeOfDay” in which the hour dictates which portion of the day the case was opened in. The variable is defined as follows:

· 0-6 and 21-24 = Overnight

· 7-10 = Morning

· 11-15 = Midday

· 16-20 = Evening

The other item that was created within the Excel spreadsheet prior to importing into SAS Enterprise Miner is the Satisfied field. This field is based upon the NPS data field. As noted above, any NPS score of 8, 9, or 10 indicates a Satisfied); any NPS score of 0 - 7 considered a Dissatisfied client. By bucketing this data into a binary field – we can use this field as a Target for our modelling and analysis. Initially, the thought was to use a Nominal field with three levels, but upon investigation into that path, the results were not yielding favorable models. To rectify that, the change to a binary target variable was made.

Enterprise Miner

Rep_TimeToResolve

While most of the data preparation did take place in Microsoft Excel prior to import into Enterprise Miner, one alteration did need to be made in order to accurately run the models and analysis. In the initial dataset, there are 19 records with a TimeToResolve value as a negative number. TimeToResolve is a metric that takes the “Resolution Date” of a case and subtracts the “Opened Date” of that case. This number cannot be negative because that would indicate the case was resolved (a resolution that answers the client’s question or solves the client’s issue has been delivered) before the case was ever opened. These values are incorrect and misleading.

To rectify this, a Data Replacement node was utilized within Enterprise Miner. The Replacement node utilized the following SAS code to take all of the negative values for TimeToResolve and replacement them with a “missing” or “null” value:

* ;

* Variable: TimeToResolve ;

* ;

Label REP_TimeToResolve='Replacement: TimeToResolve';

Length REP_TimeToResolve 8;

REP_TimeToResolve =TimeToResolve ;

if TimeToResolve eq . then REP_TimeToResolve = . ;

else

if TimeToResolve <0 then REP_TimeToResolve = . ;

The results of this node show that 19 replacements were made to the dataset.

Additionally, there are some outliers in the upper limits of this dataset as well. Some of the TImeToResolve values exceed 100 days, into the 1,000-day ranges. For purposes of this analysis, any case with a TimeToResolve of 100 or greater is grouped together. Any case opened that long will likely cause frustration for a client, so, seeing data in the 500 plus day range does not provide much additional value.

Rep_Product

Rep_Product was created using the same Replacement node as above. The goal of this variable is to condense the 167 product classes into a more functional subset of grouped products from which the analysis can be run. Utilizing the Replacement node’s “Replacement Editor” utility, each new value for a product was manually entered into the replacement value column. The node was then run showing that 10,427 changes were made to the Product variable, meaning each Product was not bucketed into a new variable successfully. Appendix B contains a list of all the individual product variables along with the new variable each product was placed into.

Description of Data Modeling and Assessments

For this analysis, three different data models were created with SAS Enterprise Miner. The purpose of using three distinct models is to understand which modelling architecture provides the most accurate and representative model. The models selected for this analysis are Decision Tree, Regression, and Neural Networks.

All of the models will utilize the same set of variables. This list is below:

· Case Origin – Input

· Customer Priority – Input

· DayofWeek – Input

· REP_Product – Input

· REP_TimeToResolve – Input

· Reasons – Input

· Satisfied – Target

· Severity – Input

· TimeOfDay – Input

Decision Tree

A decision tree is a tool that allows users to model a set of given data in a “tree like” form, which the various decision points creating new branches and the decisions off those points being referred to as leaves. According to Wikipedia, this model type is often used to “help identify a strategy most likely to reach a goal.”[endnoteRef:2] [2: https://en.wikipedia.org/wiki/Decision_tree]

Configuration of the Decision Tree Node

This node’s path from the data source is as follows:

ProjData -> Replacement -> Data Partition -> Decision Tree

Setting the node up this way ensures that the model utilizes the fully prepared dataset, and runs the model using the equally segmented Train data and Validate data. By configuring the node this way, we are following standard best practices. The node was configured to create an optimal tree automatically by setting the subtree Method to Assessment and the Assessment Measure to Average Square Error. The follow settings were configured under splitting rule:

These are the default settings of the Decision Tree node within SAS Enterprise Manager.

Decision Tree Results

Running the Decision Tree node resulted in a maximal tree being created that contained 15 leaves. The ASE for this model is .188632 for the Train data and .197024 for the Validation data. The optimal path shows as leaf 8. The rule for this node is: WHERE Replacement: TimeToResolve < 2.15 AND Replacement: Product MAPINFO, CODING, MAILSTREAM, ... Or Missing AND TimeOfDay OVERNIGHT, MORNING, EVENING Or Missing.

Essentially, this rule states that the most likely scenario that leads to a satisfied customer is one where the case is resolved in less than 2.15 days, the product is MapInfo, Coding, MailStream, or unknown, and the case is opened in the morning, evening, overnight, or unknown (meaning not midday). Understanding this, it goes to assume that the most likely unsatisfied customer would be the opposite of that rule.

One byproduct of the decision tree model is the “Variable Importance” statistics that are derived from it. Based on this information, of the variables input into the model, the order of importance is:

1. REP_TimeToResolve – Importance = 1.0000 (Train) / 1.0000 (Validation)

2. REP_Product – Importance = .6859 (Train) / .4315 (Validation)

3. TimeOfDay – Importance = .4218 (Train) / .2844 (Validation)

4. CaseOrigin – Importance = .3023 (Train) / .4833 (Validation)

5. Reasons – Importance = .2263 (Train) / .1072 (Validation)

The other input variables were not recognized as providing any importance to the building of the decision tree. What this data indicates is that the TimeToResolve is the #1 most important factor in creating a satisfied client. The longer a case remains open – the more likely that case is to result in a dissatisfied client. The initial splitting rule sets the split at 2.15 days – which means that cases closed within that time frame are more likely to satisfy a client and cases opened longer will lead to unhappier clients. Cases open more than 64.45 days are even more likely to generate a dissatisfied client. A full view of the Decision Tree model can be found in Appendix A as Figure 8.

Regression Model

A Regression model is another form of statistical modeling which is offers a different approach to prediction modeling as comparted to decision trees. Page 2 of the Logistic Regression PDF provided in CIS 575 indicates that regression is a parametric model which assumes a specific association between the input variables and the target. In SAS Enterprise Miner, the model is generated using a prediction formula that creates a series of models with increasing complexity, then using fit statistics calculated from the validation data, a best model is selected.

Configuration

Given the target variable is binary, a logistic regression model is selected within Enterprise Miner. After running a few different models, the final model was configured to utilize the Stepwise selection model, with selection options of:

· Entry Significance Level = 1.0

· Stay Significance Level = .5

· Maximum Number of Steps = 30

These settings were selected based upon previous instruction which stated to generate a larger model from which to gather best fit statistics, setting 1 for entry significance enables any input into the model, and a value of .5 stay significance can allow the stepwise selection process to continue longer and not terminate prematurely. By putting a maximum number of steps = 30, we set an upper bound to ensure that the model does not run an overly long amount of time. To allow for Enterprise Miner to optimize the complexity of the model, the Selection Criterion was changed from default to Validation Error.

Regression Results

Utilizing the optimized model, viewing the iteration plot we see that the model at iteration 7 is the best fit model for this set of data. The ASE for this model is .188706 for the Train data and .194423 for the Validation data.

As the plot progresses through the iterations, the lowest point for Average Square Error for both the Train data and Validation data is at step 7. There’s a slight increase after that step which means the model becomes less accurate from that point forward.

In further analyzing the model, the output provides an analysis of effects as seen below:

Type 3 Analysis of Effects

Effect

DF

Sum of Squares

F Value

PR > F

CaseOrigin

6

2.2371

1.96

0.0671

CustomerPriority

2

0.5518

1.45

0.2338

REP_Product

13

8.5307

3.46

<.0001

REP_TimeToResolve

1

6.7412

35.52

<.0001

Reasons

3

0.4487

0.79

0.5003

Severity

3

1.4268

2.51

0.0572

TimeOfDay

3

1.1742

2.06

0.103

Examining this information further, we see evidence that the REP_TimeToResolve variable is by far the most impactful input into this model. With an F value of 35.52, this indicates that the results of this variable upon the model are significant. The next most significant value is REP_Product, which would indicate that certain products are more likely to go into escalation than others. Perhaps this should impact the strategy of the support department moving forward.

Neural Network

Neural networks are one of the more popular prediction models utilized in current data mining platforms. Similar to regression models, Neural Networks utilize mathematical algorithms to determine the impact of variables upon a given target. The most typical type of neural network is actually an extension of a standard regression model. Neural networks allow users to model virtually any association between given inputs and a target.

Configuration

Multiple attempts were created to discover which settings would create the best Neural Network model for the given inputs. 3 different neural models were created:

1) A standalone neural network model

2) A neural network model with a regression model preceding it to define inputs

3) An auto neural network

For Items 1 and 2, the configurations were identical. Both models utilized Average Error as a Model Selection Criterion. Under the “Network” settings, a value of 6 was used for “Number of Hidden Units.” Under the Optimization settings, Maximum Iterations was set to 100 and the “Preliminary Training” option was disabled.

For the Auto Neural node – a few settings were adjusted from the defaults:

· Train Action > Search enabled

· Number of Hidden Units > 1

· Tolerance > Low

· Direct > No

· Normal > No

Neural Network results

Both the standalone neural network and the neural network connected to the Regression node were executed with the above settings and returned near identical results. Modifying the inputs via the regression node had negligible impact on the performance of the neural model. In examining the ASE for this model, the value for Train is .187575 and for Validation is .193953. The iteration plot below shows that the optimal iteration occurs at iteration 29.

Examining the final weights from this network (shown in Figure 9), it becomes evident that the strongest weights belong to the BIAS in H14, the REP_Product = Legacy is the highest variable with a weight of -.655 meaning that Legacy products have a positive effect towards a satisfied customer. The REP_TimeToResolve has a weight of .913 which means that the larger that value the more likely a negative event will occur at the target – meaning a dissatisfied customer. This is consistent with the other models we’ve run to this point.

AutoNeural Node results

In examining the ASE for this model, the value for Train is .189396 and for Validation is .194331. Examining the iteration plot below, it is evident that training step 3 provides the optimal iteration for this model, as the ASE for validation and train seem to stay constant from that point forward without any further improvement or degradation in performance.

Examining the final weights, as show in Figure 10, the AutoNeural network had different results than the Neural Network node. While REP_TimeToResolve continued to show a negative impact on reaching a binary target of 1, the REP_ProductConfirm appears to have the largest weight in this dataset of 1.750. The variable with the largest weight towards a positive satisfied level is REP_ProductMailStream with a weight of -1.464, as opposed to the Neural Network’s identification of REP_ProductLegacy. The AutoNeural and Neural Networks have shown to provide vastly different models.

Model Comparisons and Model Selection

Within SAS Enterprise Miner, a Model Comparison node was configured with the following changes to the default settings:

· Selection Statistic: Average Squared Error

· HP Selection Statistic: Average Squared Error

· Selection Table: Validation

· Model Inputs:

· Decision Tree

· Regression

· Neural Network based on Regression

· (Neural Network Standalone had identical results – no need to include both)

· AutoNeural

Upon running the Model Comparison tool, SAS Enterprise Miner looks at all of the Average Squared Error (ASE) statistics for the various model’s Validation data runs and selects the best model based upon the lowest ASE.

Model Selection

The model selected by Enterprise Miner was the Neural Network node. In examining the Fit Statistics in the model comparison output, the following table clearly shows that the Neural Network had the best performance regarding model performance.

Selected Model

Model Type

Validation ASE

Train ASE

Y

Neural Network

0.19395

0.18758

AutoNeural

0.19433

0.1894

Regression

0.19442

0.18871

Decision Tree

0.19702

0.18863

Further investigation into the models shows that the models are perform at nearly consistent level and that while the neural network does outperform the other models based upon the ASE, all of these models would provide useful detail in moving towards a prediction of satisfied vs dissatisfied with a set of clients.

Viewing the Score Rankings Overlay: Satisfied chart below, its apparent that NN Regression (the green line) provides the best mean prediction at initial depths, and the 2nd best mean predictions at a depth of 100. This indicates that the model performs very well at both initial and full depth.

Conclusions and Recommendations

This analysis provides a lot of insight into Net Promoter Score survey data gathered over the past 10 years of Support cases from our clients. The goal of this analysis was to create a model that would allow for us to better understand the variables that drive a satisfied client, and thereby understand what variables would drive a dissatisfied client. Client dissatisfaction can cost the business undue time and resources in working to resolve account concerns and keep the client as a customer or can result in a lost client altogether. Dissatisfied clients increase costs for support organizations as well when the concerns are not addressed appropriately.

This analysis is also a precursor to a potential utility that would allow a support organization to identify customer’s that are likely to go into “escalation.” Escalated customers require the attention from business executives, cause damage to the business/client relationship, and increase the workload drastically for all persons involved. Increased workloads mean increased costs and declining profits. Identifying customers at risk of escalation and addressing those concerns prior to a full-blown escalation can lead to substantial cost savings and a much better client satisfaction experience.

Lessons Learned

From this analysis, it was proven that certain variables have much more impact upon a client’s satisfaction level than others. Examining the results of the various models, it’s clear that things like Day of Week and Time of Day that a case is opened have little to no longstanding impact upon the likelihood of a satisfied client.

Conversely, and not surprisingly, the variables that have the most impact upon a client’s satisfaction level appear to be TimeToResolve and Product. The analysis proves that the longer a case is open, the more likely that case is to go into a dissatisfied state. While things like severity and priority certainly impact the length of time that may be, overall, it’s shown that a client having to wait for resolution is not ideal. A support strategy aimed at focusing on resolution times is sure to have a benefit upon overall client satisfaction levels.

Additionally, this analysis showed that some products do not impact satisfaction levels greatly, but others do. These products are more polarizing to our client base. While a product like MapInfo, a desktop software product with a mature user base and strong community, results in higher satisfaction, a product like EngageOne, a much more complex enterprise level software with virtually no community support, results in lower satisfaction. By examining which products result in lower satisfaction levels, the business can invest resources into investigating and improving pitfalls of that product and work to improve satisfaction at the product itself, not just at the support service surrounding it.

Next Steps

The business needs to build this model out further – expanding the base of data being used to train and validate the model. Once the model is strengthened, a weekly execution of this model with all currently active/open cases should be performed. That execution should look to identify cases that are creeping towards scenarios that would lead to a dissatisfied client – and appropriate actions can then be identified. Does the support engineer need to prioritize a case? Has a case fallen off the radar and need to be re-engaged? Do other departments need to be included to progress an issue towards resolution? Are certain support engineers in need of additional support to prevent issues from resulting in dissatisfied clients? With the right model, support management can be much better equipped to make small adjustments in a given workday or work week to help improve overall client experiences.

Appendix A – Figures and Tables

Figure 1 – Case Origin

Figure 2 – Severity

Figure 3 – Customer Priority

Figure 4 - DayOfWeek

Figure 5 – Time of Day

Figure 6 – Rep_Product

Figure 7 – Rep_TimeToResolve

Figure 8 - Decision Tree Model

Figure 9 - Neural Network Weights

Figure 10 - AutoNeural Weights

Figure 11 - Final Diagram

Appendix B – Replacement Values for Rep_Product

Original Value

Replacement Value

CODE-1 Plus

Coding

Finalist

Coding

VeriMove

Coding

Canadian Code-1 Plus

Coding

Address Broker

Coding

SortStream Canada

Coding

Address Validation

Coding

CODE-1 Plus International

Coding

Confirm AM

Confirm

Confirm OnDemand

Confirm

Confirm CF

Confirm

Street Pro

Data

Communications Data Suite

Data

Postal Data

Data

Communities Boundaries Data

Data

Parcels / Cadastre

Data

Centrus GDT Data

Data

Exchange Info

Data

Post Point Pro Data Product

Data

Address Fabric Data

Data

Census Boundaries Data

Data

Centrus Points

Data

POSTNET Barcoding Option

Data

World Boundaries Premium

Data

EngageOne Enrichment

EngageOne

EngageOne Designer (6+)

EngageOne

DOC1 Series 5

EngageOne

EngageOne Server

EngageOne

DOC1 Designer

EngageOne

EngageOne Vault

EngageOne

DFWorks

EngageOne

EDGE

EngageOne

DOC1 Generate

EngageOne

DOC1 Series 3/4

EngageOne

Spectrum Spatial

EngageOne

EngageOne Content Author

EngageOne

EngageOne Deliver

EngageOne

EngageOne Inform

EngageOne

Mail 360 Manager

EngageOne

DOC1 Document Composition Service

EngageOne

e2 - Hosted

EngageOne

OpenEDMS

EngageOne

EngageOne Generate (6+)

EngageOne

EngageOne Video

EngageOne

DOC1 Interactive

EngageOne

e2 Present

EngageOne

EngageOne Liason B2B

EngageOne

Mail.Dat Viewer

EngageOne

MapMarker - USA

GeoCoding

Centrus Desktop

GeoCoding

GeoStan

GeoCoding

MapMarker

GeoCoding

MapMarker - CAN

GeoCoding

Geographic Coding Plus

GeoCoding

GeoCoding

GeoCoding

MapMarker - AUS

GeoCoding

Spectrum - Enterprise Routing Module

GeoCoding

GeoTAX

GeoTax

Centrus GeoTAX Matrix

GeoTax

OnDemand

Hosted

PBS Hosted

Hosted

Software & Data Marketplace

Hosted

zTestProduct

Legacy

PlanWeb

Legacy

Legacy

Merge/Purge Plus

Legacy

GeoStore

Legacy

OnRoute

Legacy

Bar-coded Bag

Legacy

GPInfo

Legacy

Consumer Merge/Purge

Legacy

Compass

Legacy

Labels Printing Plus

Legacy

PlanAccess+B37

Legacy

SendPro SaaS

Legacy

ConnectMaster

Legacy

Digital

Legacy

GeoInsight

Legacy

GeoReveal Studio

Legacy

Geographic Determination Library

Legacy

Message1

Legacy

PlanAccess

Legacy

StreamSure

Legacy

iProof

Legacy

MailStream Plus

MailStream

Dispatcher4

MailStream

List Conversion Plus

MailStream

Generalized Selection Plus

MailStream

List Conversion

MailStream

EZ Case Plus

MailStream

MapInfo Pro 32-bit

MapInfo

MapInfo Discover

MapInfo

MapInfo Pro 64-bit

MapInfo

AnySite

MapInfo

MapInfo Discover 3D

MapInfo

MapBasic

MapInfo

Target Pro

MapInfo

Discover PA

MapInfo

Vertical Mapper

MapInfo

ModelVision

MapInfo

MapInfo Raster

MapInfo

Encom Engage

MapInfo

DriveTime

MapInfo

Encom Engage3D Pro

MapInfo

Discover Mobile

MapInfo

MapInfo ProViewer

MapInfo

Spectrum - Location Intelligence Module

MapInfo

Mail 360 Manager Client

MapInfo

MapInfo Manager

MapInfo

Centrus Enhanced Data

MapInfo

AnySite Online

MapInfo

CI Appliance

MapInfo

ArcGIS Plug-In

MapInfo

MapInfo Discovery

MapInfo

Quickmag

MapInfo

Paramics

Portrait

Portrait Dialogue

Portrait

Spectrum Miner

Portrait

Portrait Foundation

Portrait

Portrait HQ

Portrait

Portrait Interaction Optimizer

Portrait

P/I Output Manager

Production Intelligence

P/I Enterprise Manager

Production Intelligence

P/I Output Enhancement

Production Intelligence

JESConnect

Production Intelligence

OfficeMail

Production Intelligence

AddressNow

Production Intelligence

Centrus Coastal Boundaries Data

Production Intelligence

IntelliJet Print Process Manager

Production Intelligence

Sagent Data Flow

Sagent

AutoMate BPA Server

Sagent

Spectrum Technology Platform

Spectrum

Spectrum - Universal Addressing Module

Spectrum

Exponare

Spectrum

Spectrum - Enterprise Geocoding Module

Spectrum

Spectrum - OnDemand

Spectrum

MapXtreme .NET

Spectrum

Stratus/Analyst

Spectrum

Spectrum Spatial Analyst

Spectrum

Envinsa

Spectrum

Spectrum - Client API

Spectrum

Spectrum - Enterprise Tax Module

Spectrum

Envinsa EOLS

Spectrum

Stratus Gallery

Spectrum

Spectrum - Connector

Spectrum

Location Intelligence APIs

Spectrum

MapInfo Stratus (SaaS)

Spectrum

Spectrum - Address Now Module

Spectrum

MapX

Spectrum

SpatialWare

Spectrum

Crime Profiler

Spectrum

MapMarker - Other

Spectrum

MapXtreme Windows

Spectrum

RouteFinder

Spectrum

Spatial+

Spectrum

MapXtreme Java

Spectrum

Spectrum - Advanced Matching Module

Spectrum

Spectrum - Enterprise Data Integration Module

Spectrum

Spectrum - Universal Name Module

Spectrum

Location Based Services

Spectrum

MapXtreme MapX

Spectrum

PB Community

Spectrum

Parcel Property Attributes Data

Spectrum

Portrait EDGE 2020

Spectrum

Schools Data

Spectrum

Spatial Server

Spectrum

Spectrum - Data Normalization Module

Spectrum

e2 Account Management

Spectrum

e2 Present & Pay

Spectrum

Unknown

Spectrum