Prepare a report on "Titanic: Machine Learning from Disaster" using any data mining software (i.e., SAS Enterprise Guide, SAS Studio, SAS Enterprise Miner, R)
CIS 575: Final Project
December 2019
Executive Summary
Project Summary
This project analyzes sales data of a private company over a 12-month period. This organization sells talent acquisition and employment branding recruitment products to employers of all sizes. The primary objective of this project is to determine if there is a correlation between the number of contacts called and the revenue produced out of a sales account – of if there are other factors significantly contributing to revenue production.
This company is seeking to better understand which of two different approaches generates more sales:
1) whether its 45-member sales force should focus on calling every contact within an account or
2) call through accounts to identify stakeholders, and then create a plan around influencing and nurturing those previously identified stakeholders or decision-makers
With a new (expensive) power dialing tool currently under consideration, the results of this project may contribute to the sales force being an efficient and effective as possible when related to sales outreach methodology.
Background
Over the past 10 years, there has been an emphasis on the sales team to call every contact in every account. Some presume that a different approach is called for; one that stresses calling through contacts within each account, that incorporates an ear to identifying key decision-makers within each organization, and then build a follow-up or nurture list designed to influence the target group of contacts. This project is the first of its kind.
Currently, a power dialing tool costs $250,000 for a 6-month program that allows for 45 salespeople to utilize the power dial system for up to six hours each week. The opportunity to increase revenue with this dialing tool is based on the understanding that more calls will equal more revenue; however, the team needs to understand if an economy of time is more important than the economics of scale when it comes to calling through the database of contacts.
Business Objectives
As we prepare to negotiate our power dialer contract term, we are looking to ascertain if more revenue is generated from those accounts with the highest percentage of contacts by salespeople. Furthermore, the team would like to know if there’s a correlation between the number of contacts within our system for a given account and sales revenue.
Process
The following departments were involved in this project: sales, marketing, IT, client services, and the COO and CEO.
· Sales EVP & managers: Company sales leadership was involved in providing context as to what this project would encompass. Although there are many pressing questions contact outreach by salespeople, we narrowed the scope down to better understand the relationship between number of contacts contacted and the amount of revenue generated with the current strategy. Approximately 8 hours of periodic discussions/communications were held over the past two months. A side discussion
developed around importing new contacts into the system - thus requiring new investment and resources - therefore, the reasoning for determining relevancy between number of contacts and revenue generated. Key challenges of these discussions involved the lack of willingness or capabilities a large percentage of our salespeople may exhibit when learning how to identify key individuals within an account and tagging them accordingly within our CRM. Opinions expressed were that the time needed to invest within the sales team to develop the skill set would be better spent on the phone calling into accounts as has always been done. As a group, it was determined that we will circle back together once the project had been completed. At that time, sales leadership would present potential next steps to the COO that is the overwatch of the sales EVP.
· Marketing: Because marketing is looking to apply an account-based marketing methodology, better understanding the relationship between marketing to an account in its entirety and/or marketing to designated stakeholders/decision-makers within an account, their interest is keen on this project. Approximately four hours over the past two months were invested in gathering input and questions surrounding this project’s objectives. Problems were encountered when it came to understand how marketing would best implement marketing designed to influence decision-makers within accounts. It was agreed to discuss that later as findings were released and discussed.
· Client Services: Because client service personnel interact with clients directly, it is important that they understand the relationship of influencers and key decision-makers within the accounts being served. Without this basic understanding, history has shown that all clients are then served in the same fashion. Prior performance data has made it evident that treating high-value or long-term value clients different from the one-off type of customers has produced exponential results in revenue generation and long-term commitment. Over the course of the last 6 weeks, approximately 2 hours of discussions have been invested in weighing the merits of being made aware of key contacts within accounts.
· CTO: The chief technology officer oversees the developers and CRM techs. This role was involved in the initial comparison query as his team assembles the data within the CRM involved: SalesForce. It is this group that was instrumental in gathering and assembling the data that could be pulled together into one cohesive spreadsheet, to be later imported into SAS. Approximately 6 hours was invested over the past 6 weeks to create cohesiveness between the number of contacts, percentage contacted, and revenue realized during the same period. Challenges arose when other competing priorities stalled the data accumulation aspect of this project, which in turn caused a delay in launching the plan.
· COO: The COO provides the overwatch for the executive sales team, marketing leadership and client services department, he is a key figure that influences efforts regarding training and awareness across the enterprise. Incidentally, for the past 10 years, he has been the key influencer on calling every contact in any account for business. He would be a principal individual in getting buy-in across the departments and energizing that in the year ahead. Over the past 6 weeks, we have invested approximately 2 hours touching on whether to call every contact or train the sales team to better understand how to identify influencers and principal stakeholders quickly and efficiently.
· CEO: As the highest-ranking executive within the organization, his main responsibilities include making general corporate judgments (and more directly related to this project) is the primary overseer of our operations and resource usage across the entity. The direct dial power tool mentioned earlier is the single biggest investment he’s ever made in the company in the 20 years holding the role. The results of
this project may provide a fuller 360° view of current and future tools that may be considered, as well as a high level input of the direction sales, marketing, client service training may need to elevate as we lean into low unemployment figures that cause our company to be even more aggressive in meeting revenue goals and strategic objectives. Approximately 30 minutes were invested in gathering input from the CEO regarding what this team should be mindful of when considering the results and the variables to be investigated. Primary challenges inherent with discussions of this nature centered around usage of resources being wisely used and low confidence level in the ability of the average salesperson to make the most of every moment they speak to a contact – that might also include eliciting information related to decision-makers within the organizations we serve. Additionally, strained IT resources may present a challenge should a greater investment be required to fulfill needs sought by sales should they look to map organizations for influencers and decision-makers.
Selecting and Gathering Data
This company utilizes Salesforce CRM to manage, report and describe sales transactions over the past 5 years. Our subscription costs more than $1 million per annual subscription. Being that business analytics is becoming an ever-important part of how our e-commerce-based platform operates, the reporting capabilities that Salesforce provides are instrumental in helping us understand our business and augment our understanding of predictable outcomes related to our client communications, candidate aggregations, customers, prospects and sales activity. Although this enterprise is in the very early stages of utilizing business analytics beyond a descriptive use and into a predictive stage, this project aims to augment management’s understanding of how best to utilize the tools and personnel to maximize revenue.
Although our data is being produced and stored, the company’s ability to warehouse the data is clumsy and to-date there stands a deficit in organizing data and functionally surfacing insight. Prior to selecting the data, opportunities were seen from an operational perspective in that a better understanding of this dataset was to complement marketing, sales, client services, and the management team. Surprisingly, this operational data had not been collected with data analysis in mind; essentially operating within a silo-based data system. Because of this, it retards our ability to interpret business analytics and proved especially challenging. Understanding that enhancing our ability to predict essential revenue producing components than it was to utilize our old school ways that centered around inferences. Additionally, the models utilized within this project were designed with a long-term value in mind with an innate functionality that allows it to continue to perform validly over the course of the years as new data is assessed. Within our organization, sales personnel are driven to produce a certain number of phone calls each day; however, over time we’ve seen that some organizations prefer that our organization becomes a part of their recruitment chemistry which demands more of our time. Thus, why the selecting and gathering of this data was essential to complementing the future growth of our organization. This change in the market potentially means likely change in our accounts and contact targeting – as well as what makes it clear to management what combination of factors provide us with the most efficient use of our personnel and tools in order to maximize profitability of the company. Over time, this data may provide predictions of what may occur should the company decide to do nothing or implement a revised version of the current strategy to deflect detriment over the near term.
In this project, a report was created that measured 6155 sales records from 92,357 sales contacts and $15,081,790.17 in sales over 12-month period. The initial data was comprised of account ID, account owner, number of employees per account, account organization type, annual revenue, number of contacts, percentage of contacts contacted by sales personnel over the last 12 months, days since last sales activity, and sales amount over the last 12 months.
Two separate organizations make up this company; which in this case, we will identify as one segment of the entity sells products and the other sells placement services. In this project, only the product sales side of the organization was measured due to higher confidence level of validity and accuracy of the data collected. In this case, we are looking to apply an analytical approach to what many within our company deem as a self-inflicted business problem calling metrics that are reliable and offer a degree of benefits tangible to the departments and enterprise involved. Due in part to the significant investment required by the direct dialer platform, management is evermore seeking cost benefit and saving ideas that would complement other ongoing analytical applications parallel to ongoing decision-making of product revisions and development. Much of this effort can result in winning the competition for revenue against determined competitors within the niche.
Discussion of Preliminary Data Exploration and Findings
Upon downloading of data, it was found that much of the data was either incomplete, duplicitous, or errantly calculating totals. Time-consuming next steps involved reducing the numbers of variables to consider down to those found to be the most accurate: Account ID, # of contacts in an account, % of contacts contacted last 12 months, and sales amount generated last 12 months. Challenges were presented in that 42% of the cases had duplicate cases; these cases were reduced within Excel prior to uploading to Enterprise Miner. Preliminary exploration revealed several error entries within hundreds of fields that were found to be systematic input external to sales and correlated with CRM technician ghost accounts that caused irregularities and skewed the data.
The final dataset includes 3596 cases from affirmation variables. Considerable effort and support were required for the final dataset to be accepted into SAS Enterprise Miner. spaces, commas, and remnant error input prevented the spreadsheet from accepting the csv-turned-SAS format.
Keeping an eye to how best to understand the insights this data may provide on growing new revenue out of legacy accounts parlays well with our drive to leverage customer loyalty, cross-selling, staffing optimization, churn reduction and customer acquisition efforts.
Description of Data Preparation – Enterprise miner
(Repairs, Replacements, Reductions, Partitions, Derivations, Transformations
Results and Conclusions and Variable Clustering)
The goals being to screen for data values found to be unusual, analyzing the dispersion and shape of continuous variables, as well as to identify the values found within the central tendency of the sample being used. Regarding objectives utilizing statistical inference, a keen eye was held to the estimation and prediction of unknown parameter values from the population utilizing the sample contained within this dataset.
Getting error free data was found to be a time-consuming challenge. Preliminary data preparation required spacing and comma repairs, missing data was replaced with “0”s and case reductions or executed to remove systematic case errors that significantly biased the calculations; the majority of these cases were test inputs found to be and removed once testing had completed. In other cases, errant field data was found to include placeholder information that had yet to be assigned correctly. In all cases, they were excluded from the final dataset.
Measurements of scale were identified for each variable: target, nominal and ordinal. The response variable identified as the revenue purchased over a 12-month term; a key focus of this project as the outcome variable and considered to be the dependent variable. Predictor variables, the measures associated with the response variable, are expected be used to predict the value of the response variable. Also considered as an independent variable inherent within this analysis of data, determined to be revenue produced over a 12-month period. Predictor variables identified such as account owner, number of contacts within our system or percentage of contacts reached – otherwise known as the outcome variables. This sample is a subset of the entire population of clients and contacts housed within our CRM; the subset being just those contacts with sales activity over the past 12 months.
Variable exploration confirms within sample properties that 3595 cases are included within the dataset held within 5 columns. 555 of these accounts generated 13,649.515.00 in revenue.
· In analyzing the number of accounts assigned to each account owner, revealed that 1274 (35.44%) of contacts are assigned to non-sales personnel identified as “Sam Jones” and “TZipp”. This indicates that the system identifies 2321 (64.56%) accounts are owned by sales team members.
· Percent contacted last year ranges (5 bins)
· 41.84% of accounts had 80% to 100% of their contacts contacted over the past year
· 17.19% of accounts had 60% to 80% of their contacts contacted over the past year
· 18.14% of accounts had 40% to 60% of their contacts contacted over the past year
· 11.57% of accounts had 20% to 40% of their contacts contacted over the past year
· 11.27% of accounts had 0% to 20% of their contacts contacted over the past year
· Measures of Central Tendency:
· Mode: SAMJONES, 20.25%
· Mean:
· NumContacts: 23.41% per account
· PCtContactedLstYR: 65.53% of account contacts contacted
· SalesAmtLstYR: $3,796.80 of sales generated of each account
· Median
· Number of contacts per account: 6
· Percent of contacts contacted per account: 66.67%
· Standard Deviation:
· Number of contacts in an account: 59.96
· Percent of contacts contacted per account: 31.96
· Min/Max
· Minimum NumContacts per Account = 1
· Maximum NumContacts per Account = 890
Creating training and validation data incorporated as a critical step in prediction as we prepare for selecting competing models in next phases of analysis. In order to utilize models that can predict the target value from the set of input values accurately, we are assigning a training set of data. The objective being to avoid generalizing predictions from just the training data in correlation to the independent sample that may result in less accuracy.
At this point a Data Partition tool was added to the diagram workspace. Training and validation dataset allocations were assigned 60% training and 40% validation in order to devote more data to training results with the goal of establishing a predictive model that is more stable. The partition summary of metadata reveals 3595 observations divided up into 2355 training observations, as well as 1240 observations assigned to validation data.
Preparing for interactive tree construction. Used a Metadata node to analyze an error message citing that maximum target levels of 512 exceeded; therefore, being prevented from completing decision tree node run. Updated Project Macro Variable Max Levels to 1000. Transform variables changed bin from 4 to 5. Updated measurements.
Description of Data Modeling/Analyses and Assessments – Enterprise Miner
Decision Tree
The Decision Tree results indicate that the likelihood of sales is to come from the number of contacts that are greater than 2.5 and less than 10.5 contacts within the account.
|
Node |
Average |
Standard Deviation |
Count |
|
Node 8 Train |
669.04 |
4587.70 |
1397 |
|
Node 8 Validation |
978.36 |
6189.67 |
927 |
|
Node 13 Train |
1012.17 |
5815.79 |
791 |
|
Node 13 Validation |
1275.23 |
6260.50 |
496 |
Decision Tree: Subtree Assessment:
Because these are estimate predictions, model fits of the plenary data will be assessed by average squared error. In looking at the validation data, most of the improvement in the fit happens at the second split, it appears that the 3-leaf tree generates a lower average squared error than its Maximal Decision Tree at 1.682 validation as compared to the Maximal Tree at 1.731. However, this Decision Tree’s training data’s ASE is 2.322, slightly higher than the Maximal Decision Tree training analysis.
Highest correlation to produce the highest probabilities of SalesAmtLstYR is identified in this model is NumContacts less than 27.5.
|
|
Average |
Standard Deviation |
Count |
% |
|
Training |
1250.21 |
6102.63 |
1778 |
60.19 |
|
Validation |
1546.69 |
7888.67 |
1176 |
39.81 |
Initial and Additional Splitting Rules applied:
Because Decision tree models combine partitioning that is recursive of the training data, a splitting rule was implemented to isolate the concentrations of cases with identical target values. Regarding the chart below, the relative value or -Log(p) or logworth, the training data using the previously revealed input that NumContacts has the highest logworth, then followed by PctContactedLstYR – a value significantly far behind.
The Splitting Rule identified the Target Variable as SalesAmtLstYR and indicates the partitioning of the two branches that are to be created. The first branch contains cases with a NumContacts of less than 212.5, and the second branch contains cases with a NumContacts of greater than or equal to 212.5.
The results of the Splitting Rule after being partitioned into two subsets indicates that the first subset that corresponds to cases with a NumContacts of less than 212.5 has a higher than average concentration of SalesAmtLstYR. This partition of the data represents a predictive model that assigns involved cases in the left branch a predicted NumContacts value equal to 60.19 in Training and 39.81 in Validation.
|
|
Average |
Standard Deviation |
Count |
% |
|
Training |
2828.62 |
10,894.00 |
2126 |
60.19 |
|
Validation |
2815.62 |
10,530.40 |
1406 |
39.81 |
Additional Splitting Rule Applied:
In order to better understand the data, a second splitting rule was applied to NumContacts with the logworth of 51.1518 that further validated the training and validation data in that partitioned additional subsets that indicated the first subset corresponds to cases with NumContacts of greater than 27.5 contacts as a higher than average concentration SalesAmtLstYR. Results were comparative to the previous Decision Tree rules.
Assessing the Decision Tree – Changing Frozen Tree Property to YES:
In order to perform an assessment of this Maximal Tree, the Use Frozen Tree property was changed from a No to a Yes to prevent the maximal tree from changes made by other property settings when the diagram flow is run. A cumulative lift chart, tree map, and table of fit statistics has been now produced.
Subtree Assessment Plot:
The diagnostic tool provides results of the 7-leaf tree that reveals validation sample performance improving to a 1.633 from the previous 1.731 ASE.17 as the complexity increases. The training validation sample likewise declined to a 2.215 from a previous 1.957 ASE.
it appears that the Leaf 7 (out of 9) generates the optimal performance validation as compared to the Maximal Tree at 1.731. However, this Decision Tree’s training data’s ASE is 2.322, slightly higher than the Maximal Decision Tree training analysis.
Maximal Tree
The Maximal Tree model identifies node ID 20 as the preferred node for assigning predictions in these cases. By using identical samples of the data to evaluate the usefulness of the input variables and to further assess model performance may lead to over fit models. Because of this, an assessment plot based on validation data will be utilized to better understand a solution. Segments ahead will address the validation data.
The results show that for accounts with greater than or equal to 2.5 contacts in an account with an account owner of either BFJE, AoTool, AKearby, TZIPP, SAMJONES, JNICH, JBERT, GNELS, THARR, TVANHOOS, JLEBL, JCORN,JALLEN, DGOLD, LMILL, CNEWS, BCOME, JBEIT, WLUCUS, MROLL, SACKLEY, RJONE, RARNDT.
|
TREE VIEW STATISTICS – Node 20 |
Average |
Standard Deviation |
Count |
% |
|
Training |
847.65 |
5073.50 |
701 |
58.81 |
|
Validation |
1280.00 |
6301.06 |
491 |
41.19 |
Subtree Assessment Plot:
In looking at the validation data, most of the improvement in the fit happens at the second split, it appears that the maximal, 9-leaf tree generates a lower average squared error than any of its precursors at 1.731 validation. The training data average squared error equates to 1.957.
In this section a predictive model was created that assigns one of nine predicted target values to each case. Subsequent tasks will determine how well this model compares to the validation data.
Note: Unquantified variability in the fit statistics is expected to be due in part to complexity of this model.
Regression Node
Under Model Information section, Training Data Set indicates key information related to the number of model parameters at 75, due in part to the single input generating scores of model parameters. The number of observations is at 2157.
|
|
Average Squared Error |
Model Parameters |
|
Training |
2.306 |
75 |
|
Validation |
1.638 |
- |
Output:
Under Type 3 Analysis of Effects, which tests statistical significance of the added inputs to the model. Being that the Pr > F value is at <.0001, this indicates a highly significant output is AcctOwner (<.0001), as is to the greatest degree, the NumContacts (<.0001) and to a much lesser degree PctContactedLstYR (<.0005). In this case, Linear Regression within properties in this analysis w utilized. as
Estimate predictions being the focus, the model fit can be assessed by the average squared errors of 2.306 / Training versus 1.638 / Validation rates that indicate discrepancy between the values of these two key statistics. This may indicate a possible over-fit of the model under consideration.
Regression: Iteration plot with Stepwise approach
In order to tune the regression model to provide optimized performance on the validation data, an iteration plot was utilized. The objective being to calculate a fit statistic for each step used in the input selection process and then choosing the step that indicates optimal fit statistic value.
Output analysis:
This intercept only regression model began in Step 0 as the value of the intercept perimeter is selected. Diagnostics within the model predicted the target valuations in each case under analysis.
Stepwise employed a sequential selection approach into the Regression node. Analysis indicates within output diagnostics that NumContacts (<.0001) in Step 1, AcctOwner (<.0001) in Step 2 and PctContactedLastYR (.005) with NumContacts were selected in order.
Ultimately, the intercept selection indicated AcctOwner, NumContacts, and PctContactedLstYR as the selected model with optimal complexity.
Post Stepwise, average squared error is 2.306 Training and Validation is 1.638 - therefore, there was no noticeable difference applying this simpler method of model performance.
Neural Network
In order to assess more accurate predictions via weight estimates a Neural Network node was applied. This application is to analyze the number of values that potentially can be estimated – as well as a closer look at the relationship between the weights. Unfortunately, the complexity of the input and the target associations made it somewhat difficult to decipher whether the associations were correctly modeled. A prediction estimate was used by applying the logistic function.
|
Neural Network |
ASE |
# of Est. Weights |
|
Training |
3.164 |
229 |
|
Validation |
1.901 |
|
In this model, Maximum Iterations were set to 100. The average squared error is higher to the values observed from the Decision Tree and Regression models exercised in previous sections. Because the number of iterations was not enough, at the default of 50, for the network training process to converge, the maximum number of iterations was changed to 100 within the maximum iteration property. The Results did not change any of the fit statistics.
At 229, the number of estimated weights is a large model. Despite the large number of weights, the model shows little sign of overfitting when analyzing the weight estimates below.
Fit Statistics
Iteration Plot
The iteration plot shows the average squared error versus optimization in iteration. Observed is a significant divergence in training and validation of average squared error near iteration 30 and a realignment at iteration 60. Further research indicates that the large number of weights within this neural network model is likely to be the cause.
Regression Node to Neural node:
Stopped Training was implemented to help ensure that the neural network does not overfit since the number of network weights is so large. Change number of hidden units from 3 to 6. Number of estimated weights increased to 457; average squared error went unchanged to 3.164 in training to 1.901 validation. The iteration plot indicates an optimal validation average squared error occurring in iteration 3 with an average squared error of 1.626.
AutoNeural Tool
The AutoNeural tool was applied as a means of exploring alternative network architectures and hidden unit counts. Train Action was changed to Search to enable the AutoNeural node to chronologically increase the intricacy of the network. The number of Hidden Units was changed to 1 in order to add one hidden unit per iteration. The Tolerance field was changed to Low to avert preliminary training from taking place, The field Direct was changed to No in order to deactivate direct connections between the inputs and the target, and the Normal field was changed to No to deactivate the normal distribution activation role. The Sine function was selected to No as well. These settings are intended to situate each iteration to only adding one hidden unit to the neural network, so that only the hyperbolic tangent function is actuated.
AutoNeural Results - Fit Statistic Results:
|
Neural Network |
ASE |
# of Est. Weights |
|
Training |
2.519 |
229 |
|
Validation |
1.422 |
|
Since the number of weights indicates that the selected model has the one hidden unit. Referencing the average squared error rates of 2.519 for training and 1.4224 validation, the validation rates are the lowest of all the models activated thus far. However, the number of estimated weights increased exponentially from a previous high of 229 to 913.
The Neural and AutoNeural iteration plot vary significantly. The iteration plot indicates the best iteration happens in step 12 with an average squared error of validation at 1.422 and training coming in at 2.519.
Explanation of Final Model Comparisons and Model Selection
Model Comparison tool was connected to the Decision Tree, Regression, Maximal Tree and Neural models in order to analyze their performance. Being that the Model Comparison tool allows the user to compare measurements of the performing models as well as analyze information within the nodes themselves.
Fit Statistics
Being that the Decision Tree ASE was captured at the lowest average squared error at 1.633 for validation, this indicates the selected model using average squared error when there is an interval target such as used in this project. This is validated within the output section within the statistics. The performance of each model as gauged by fit statistics is quite similar even though the calculations are oversized due to an error, I was unable to identify a solution to resolve.
Using the Decision Tree model, the highest correlation related to producing the highest probabilities of SalesAmtLstYR is identified in this model as NumContacts less than 27.5.
Score Rankings Overlay
Describing the predicted mean chart, this plot brings the cases in the validation and training data on the order of the predicted target values. These deciles correspond to the depth access of the graph. In this case the mean percent indicates the model is separating the primary and secondary cases effectively. The Decision Tree has the highest mean predicted at 35,129.52 at a depth of 5. The maximal tree is a close 2nd at 31,881.31.
|
Individual Correlations from Each Model as Experienced thru the Project |
ASE |
# of Est. Weights |
Leaves |
||
|
|
Train |
Validate |
Train |
Validate |
|
|
Decision tree |
2.322 |
1.682 |
|
|
3 |
|
Maximal Tree |
1.957 |
1.731 |
|
|
9 |
|
Decision Tree - Frozen Tree to Yes |
2.215 |
1.633 |
|
|
7 |
|
Regression - none |
2.306 |
1.638 |
75 |
|
|
|
Regression - Stepwise |
2.306 |
1.638 |
|
|
|
|
Neural Network |
3.164 |
1.901 |
229 |
|
|
|
Regression - Neural Network |
3.164 |
1.901 |
|
|
|
|
Neural Network - AutoNeural |
2.519 |
1.422 |
913 |
|
|
Conclusions and Recommendations
My initial hypothesis, coupled with my domain expertise, led me to believe those accounts with the right types of decision-makers and/or influencers, was where our company was likely to generate the most revenue. As I presumed, basing a strategy on calling the highest percentage of contacts within an account, proved to be accurate if the models are understood to be valid. Percentage of contacts within an account called was typically the lowest rated value in each model. The final selected model, the Decision Tree, indicated that the number of contacts was of the utmost importance by far, much more so than the account owner as indicated by the Regression model (intercept: AcctOwner, NumContacts, and PctContactedLstYR). It comes as no surprise that having more contacts within an account may increase the likelihood of having the right mix of influencers and decision-makers; however, it was surprising that the winning model applied a likelihood of increased sales would come from accounts greater than or equal to 2.5 contacts, and further analysis in subtree assessments indicated greater than or equal to 26.5 contacts per account. My experience in the domain indicates that using this data to the highest probability of generating revenue would come from accounts with a core group of contacts that are part of decision-making and influence driving forces within an organization. Unfortunately, I would have liked to add more variables to the models, especially surrounding the overall revenue of the organizations and numbers of employees. Upon the preliminary analysis of the data set, much of this information was missing, which would have certainly biased the results.
The stated business objectives were met, as my counsel to the management team will establish weight behind the consideration of utilizing those account owners likely to generate the most revenue with the company, and at least in part, orient them towards crucial accounts with the focal point of identifying decision-makers and designates that can champion what our company has to offer. Those account owners less likely to produce higher than average predicted means in sales, can utilize the power dialer and generate transactional revenue by sheer volume of activity.
1) whether its 45-member sales force should focus on calling every contact within an account or
2) call through accounts to identify stakeholders, and then create a plan around influencing and nurturing those previously identified stakeholders or decision-makers
In the future, a more effective analysis would include a timeframe that measured those account owners utilizing the power dialer in comparison to those seasoned account owners not utilizing the power dialers but instead focusing on identifying buying teams within high-quality organizations. This will require skill sets to be taught as many of our best salespeople. Many are only accustomed to generating high volumes of revenue through transactional sales. There are few account owners that have the willingness and capabilities to map an organizations stakeholders and decision-makers and further build relationships with the influencers that percolate interest within those organizations.
Next steps. Assemble ideas that appeal to the c-suite within our organization and leverage the predictive reasoning realized with this project and build the case for the executive sales leadership. This case would include developing skill sets for those account owners with the propensity and desire to target key accounts that lack a diligent mapping approach. The power dialer contract could be cut in half by truly account owners with less strategic skills. On the other hand, providing missions and assigning accounts that are apparent matches for the services and products we offer, seems likely to produce the highest uptick in sales. Being that the highest predicted means is $35,000 plus, this provides an additional incentive for executive management to more firmly consider sending their best salespeople to build rapport with those leaders that matter the most within the companies we target versus wasting their talent on power dialing not designed to build a relationship but rather to engender a transaction purchase. So be a big step for our company, considering the high paced tempo. It’s been a long time in coming to share skill sets of those account owners that have better-than-average results finding the right types of people to talk to within an organization - regardless of the business unit or hurdles placed in the way. Embedding our top salespeople within accounts within industries that pair well with what we have to offer, is the greatest chance to secure maximum revenue in the year ahead.
Addendum
1. Initial Exploration:
2. Accounts Owned by Percentage:
3. Percent of Contacts Contacted Last Year:
4. Data Partitioning:
5. Metadata Node to assess number of levels error
6. Updating Project Macro Variable Max Levels to 525 (later to 1,000)
2