discussion
Chapter 6:
Techniques for Predictive Modeling
Business Intelligence and Analytics: Systems for Decision Support
(10th Edition)
Business Intelligence and Analytics: Systems for Decision Support
(10th Edition)
Copyright © 2014 Pearson Education, Inc.
6-‹#›
1
Learning Objectives
Understand the concept and definitions of artificial neural networks (ANN)
Learn the different types of ANN architectures
Know how learning happens in ANN
Become familiar with ANN applications
Understand the sensitivity analysis in ANN
Understand the concept and structure of support vector machines (SVM)
(Continued…)
Copyright © 2014 Pearson Education, Inc.
6-‹#›
Learning Objectives
Learn the advantages and disadvantages of SVM compared to ANN
Understand the concept and formulation of k-nearest neighbor algorithm (kNN)
Learn the process of applying kNN
Learn the advantages and disadvantages of kNN compared to ANN and SVM
Copyright © 2014 Pearson Education, Inc.
6-‹#›
Opening Vignette…
Predictive Modeling Helps Better Understand and Manage Complex Medical Procedures
Situation
Problem
Solution
Results
Answer & discuss the case questions.
Copyright © 2014 Pearson Education, Inc.
6-‹#›
4
Questions for the Opening Vignette
Why is it important to study medical procedures? What is the value in predicting outcomes?
What factors do you think are the most important in better understanding and managing healthcare?
What would be the impact of predictive modeling on healthcare and medicine? Can predictive modeling replace medical or managerial personnel?
What were the outcomes of the study? Who can use these results? How can they be implemented?
Search the Internet to locate two additional cases in managing complex medical procedures.
Copyright © 2014 Pearson Education, Inc.
6-‹#›
Opening Vignette – A Process Map for Training and Testing Four Predictive Models
Copyright © 2014 Pearson Education, Inc.
6-‹#›
Opening Vignette The Comparison of Four Models
Copyright © 2014 Pearson Education, Inc.
6-‹#›
Neural Network Concepts
Neural networks (NN): a brain metaphor for information processing
Neural computing
Artificial neural network (ANN)
Many uses for ANN for
pattern recognition, forecasting, prediction, and classification
Many application areas
finance, marketing, manufacturing, operations, information systems, and so on
Copyright © 2014 Pearson Education, Inc.
6-‹#›
8
Biological Neural Networks
Two interconnected brain cells (neurons)
Copyright © 2014 Pearson Education, Inc.
6-‹#›
9
Processing Information in ANN
A single neuron (processing element – PE) with inputs and outputs
Copyright © 2014 Pearson Education, Inc.
6-‹#›
10
Biology Analogy
Copyright © 2014 Pearson Education, Inc.
6-‹#›
11
Application Case 6.1
Neural Networks Are Helping to Save Lives in the Mining Industry
Questions for
Discussion
How did neural networks help save lives in the mining industry?
What were the challenges, the proposed solution, and the obtained results?
Copyright © 2014 Pearson Education, Inc.
6-‹#›
Elements of ANN
Processing element (PE)
Network architecture
Hidden layers
Parallel processing
Network information processing
Inputs
Outputs
Connection weights
Summation function
Copyright © 2014 Pearson Education, Inc.
6-‹#›
13
Elements of ANN
Neural Network with One Hidden Layer
Copyright © 2014 Pearson Education, Inc.
6-‹#›
14
Elements of ANN
Summation Function for a Single Neuron (a), and
Several Neurons (b)
Copyright © 2014 Pearson Education, Inc.
6-‹#›
15
Elements of ANN
Transformation (Transfer) Function
Linear function
Sigmoid (logical activation) function [0 1]
Tangent Hyperbolic function [-1 1]
Threshold value?
Copyright © 2014 Pearson Education, Inc.
6-‹#›
16
Neural Network Architectures
Architecture of a neural network is driven by the task it is intended to address
Classification, regression, clustering, general optimization, association, ….
Most popular architecture: Feedforward, multi-layered perceptron with backpropagation learning algorithm
Used for both classification and regression type problems
Others – Recurrent, self-organizing feature maps, Hopfield networks, …
Copyright © 2014 Pearson Education, Inc.
6-‹#›
17
Neural Network Architectures Feed-Forward Neural Networks
Feed-forward MLP with 1 Hidden Layer
Copyright © 2014 Pearson Education, Inc.
6-‹#›
18
Neural Network Architectures Recurrent Neural Networks
Copyright © 2014 Pearson Education, Inc.
6-‹#›
19
Other Popular ANN Paradigms Self-Organizing Maps (SOM)
First introduced by the Finnish Professor Teuvo Kohonen
Applies to clustering type problems
Copyright © 2014 Pearson Education, Inc.
6-‹#›
20
Other Popular ANN Paradigms Hopfield Networks
First introduced by John Hopfield
Highly interconnected neurons
Applies to solving complex computational problems (e.g., optimization problems)
Copyright © 2014 Pearson Education, Inc.
6-‹#›
21
Application Case 6.2
Predictive Modeling is Powering the Power Generators
Questions for Discussion
What are the key environmental concerns in the electric power industry?
What are the main application areas for predictive modeling in the electric power industry?
How was predictive modeling used to address a variety of problems in the electric power industry?
Copyright © 2014 Pearson Education, Inc.
6-‹#›
Development Process of an ANN
Copyright © 2014 Pearson Education, Inc.
6-‹#›
23
An MLP ANN Structure for the Box-Office Prediction Problem
Copyright © 2014 Pearson Education, Inc.
6-‹#›
24
Testing a Trained ANN Model
Data is split into three parts
Training (~60%)
Validation (~20%)
Testing (~20%)
k-fold cross validation
Less bias
Time consuming
Copyright © 2014 Pearson Education, Inc.
6-‹#›
25
AN Learning Process A Supervised Learning Process
Three-step process:
1. Compute temporary outputs.
2. Compare outputs with desired targets.
3. Adjust the weights and repeat the process.
Copyright © 2014 Pearson Education, Inc.
6-‹#›
26
Backpropagation Learning
Backpropagation of Error for a Single Neuron
Copyright © 2014 Pearson Education, Inc.
6-‹#›
27
Backpropagation Learning
The learning algorithm procedure
Initialize weights with random values and set other network parameters
Read in the inputs and the desired outputs
Compute the actual output (by working forward through the layers)
Compute the error (difference between the actual and desired output)
Change the weights by working backward through the hidden layers
Repeat steps 2-5 until weights stabilize
Copyright © 2014 Pearson Education, Inc.
6-‹#›
28
Illuminating The Black Box Sensitivity Analysis on ANN
A common criticism for ANN: The lack of transparency/explainability
The black-box syndrome!
Answer: sensitivity analysis
Conducted on a trained ANN
The inputs are perturbed while the relative change on the output is measured/recorded
Results illustrate the relative importance of input variables
Copyright © 2014 Pearson Education, Inc.
6-‹#›
29
Sensitivity Analysis on ANN Models
For a good example, see Application Case 6.3
Sensitivity analysis reveals the most important injury severity factors in traffic accidents
Copyright © 2014 Pearson Education, Inc.
6-‹#›
30
Application Case 6.3
Sensitivity Analysis Reveals Injury Severity Factors in Traffic Accidents
Questions for Discussion
How does sensitivity analysis shed light on the black box (i.e., neural networks)?
Why would someone choose to use a blackbox tool like neural networks over theoretically sound, mostly transparent statistical tools like logistic regression?
In this case, how did NNs and sensitivity analysis help identify injury-severity factors in traffic accidents?
Copyright © 2014 Pearson Education, Inc.
6-‹#›
Support Vector Machines (SVM)
SVM are among the most popular machine-learning techniques.
SVM belong to the family of generalized linear models… (capable of representing non-linear relationships in a linear fashion).
SVM achieve a classification or regression decision based on the value of the linear combination of input features.
Because of their architectural similarities, SVM are also closely associated with ANN.
Copyright © 2014 Pearson Education, Inc.
6-‹#›
32
Support Vector Machines (SVM)
Goal of SVM: to generate mathematical functions that map input variables to desired outputs for classification or regression type prediction problems.
First, SVM uses nonlinear kernel functions to transform non-linear relationships among the variables into linearly separable feature spaces.
Then, the maximum-margin hyperplanes are constructed to optimally separate different classes from each other based on the training dataset.
SVM has solid mathematical foundation!
Copyright © 2014 Pearson Education, Inc.
6-‹#›
33
Support Vector Machines (SVM)
A hyperplane is a geometric concept used to describe the separation surface between different classes of things.
In SVM, two parallel hyperplanes are constructed on each side of the separation space with the aim of maximizing the distance between them.
A kernel function in SVM uses the kernel trick (a method for using a linear classifier algorithm to solve a nonlinear problem)
The most commonly used kernel function is the radial basis function (RBF).
Copyright © 2014 Pearson Education, Inc.
6-‹#›
34
Support Vector Machines (SVM)
Many linear classifiers (hyperplanes) may separate the data
Copyright © 2014 Pearson Education, Inc.
6-‹#›
35
Application Case 6.4
Managing Student Retention with Predictive Modeling
Questions for Discussion
Why is attrition one of the most important issues in higher education?
How can predictive analytics (ANN, SVM, and so forth) be used to better manage student retention?
What are the main challenges and potential solutions to the use of analytics in retention management?
Copyright © 2014 Pearson Education, Inc.
6-‹#›
Application Case 6.4
Managing Student Retention with Predictive Modeling
Copyright © 2014 Pearson Education, Inc.
6-‹#›
How Does an SVM Work?
Following a machine-learning process, an SVM learns from the historic cases.
The Process of Building SVM
1. Preprocess the data
Scrub and transform the data.
2. Develop the model.
Select the kernel type (RBF is often a natural choice).
Determine the kernel parameters for the selected kernel type.
If the results are satisfactory, finalize the model; otherwise change the kernel type and/or kernel parameters to achieve the desired accuracy level.
3. Extract and deploy the model.
Copyright © 2014 Pearson Education, Inc.
6-‹#›
38
The Process of Building an SVM
Copyright © 2014 Pearson Education, Inc.
6-‹#›
39
SVM Applications
SVMs are the most widely used kernel-learning algorithms for wide range of classification and regression problems
SVMs represent the state-of-the-art by virtue of their excellent generalization performance, superior prediction power, ease of use, and rigorous theoretical foundation
Most comparative studies show its superiority in both regression and classification type prediction problems.
SVM versus ANN?
Copyright © 2014 Pearson Education, Inc.
6-‹#›
40
k-Nearest Neighbor Method (k-NN)
ANNs and SVMs time-demanding, computationally intensive iterative derivations
k-NN is a simplistic and logical prediction method, that produces very competitive results
k-NN is a prediction method for classification as well as regression types (similar to ANN & SVM)
k-NN is a type of instance-based learning (or lazy learning) – most of the work takes place at the time of prediction (not at modeling)
k : the number of neighbors used
Copyright © 2014 Pearson Education, Inc.
6-‹#›
41
k-Nearest Neighbor Method (k-NN)
The answer depends on the value of k
Copyright © 2014 Pearson Education, Inc.
6-‹#›
The Process of k-NN Method
Copyright © 2014 Pearson Education, Inc.
6-‹#›
Similarity Measure: The Distance Metric
Numeric versus nominal values?
k-NN Model Parameter
Copyright © 2014 Pearson Education, Inc.
6-‹#›
Number of Neighbors (the value of k)
The best value depends on the data
Larger values reduce the effect of noise but also make boundaries between classes less distinct
An “optimal” value can be found heuristically
Cross Validation is often used to determine the best value for k and the distance measure
k-NN Model Parameter
Copyright © 2014 Pearson Education, Inc.
6-‹#›
Application Case 6.5
Efficient Image Recognition and Categorization with kNN
Questions for Discussion
Why is image recognition/classification a worthy but difficult problem?
How can k-NN be effectively used for image recognition/classification applications?
Copyright © 2014 Pearson Education, Inc.
6-‹#›
End of the Chapter
Questions, comments
Copyright © 2014 Pearson Education, Inc.
6-‹#›
47
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher. Printed in the United States of America.
Copyright © 2014 Pearson Education, Inc.
6-‹#›
48
Soma
Axon
Axon
Synapse
Synapse
Dendrites
Dendrites
Soma
w
1
w
2
w
n
x
1
x
2
x
n
.
.
.
Y
Y
1
Y
n
Y
2
InputsWeightsOutputs
.
.
.
Neuron (or PE)
n
i
ii
WXS
1
)(Sf
Summation
Transfer
Function
(PE)
(PE)
(PE)
(PE)
(PE)
(PE)
(PE)
Transfer
Function
( f )
Weighted
Sum
(S)
x
1
x
2
x
3
Y
1
Input
Layer
Hidden
Layer
Output
Layer
x
1
x
2
2211
WXWXY
(PE)
(PE)
Y
(PE)
(PE)
w
1
w
1
w
11
w
21
w
12
w
22
w
23
x
1
x
2
Y
1
Y
2
Y
3
2121111
WXWXY
2221212
WXWXY
2323
WXY
(a) Single neuron(b) Multiple neurons
PE: Processing Element (or neuron)
X
1
=3
Processing
element (PE)
X
2
=1
X
3
=2
W
1
=
0
.
2
W
2
=0.4
W
3
=
0
.
1
Y=1.2
Summation function:
Transfer function:
Y = 3(0.2) + 1(0.4) + 2(0.1) = 1.2
Y
T
= 1/(1 + e
-1.2
) = 0.77
Y
T
=0.77
INPUT
LAYER
HIDDEN
LAYER
OUTPUT
LAYER
.
.
.
.
.
.
Voted “yes” or
“no” to legalizing
gaming
Predicted
vs. Actual
=
Socio-demographic
Religious
Financial
Other
Input 1
Input 2
Input 3
...
I n p u t
O
u
t
p
u
t
1
2
3
4
5
6
7
...
1
2
3
4
5
6
7
8
9
...
MPAA Rating (5)
(G, PG, PG13, R, NR)
Competition (3)
(High, Medium, Low)
Star Value (3)
(High, Medium, Low)
Genre (10)
(Sci-Fi, Action, ... )
Technical Effects (3)
(High, Medium, Low)
Sequel (2)
(Yes, No)
Number of Screens
(Positive Integer)
Class 1 -FLOP
(BO < 1 M)
Class 2
(1M < BO < 10M)
Class 3
(10M < BO < 20M)
Class 4
(20M < BO < 40M)
Class 5
(40M < BO < 65M)
Class 6
(65M < BO < 100M)
Class 7
(100M < BO < 150M)
Class 8
(150M < BO < 200M)
Class 9 -BLOCKBUSTER
(BO > 200M)
INPUT
LAYER
(27 PEs)
HIDDEN
LAYER I
(18 PEs)
HIDDEN
LAYER II
(16 PEs)
OUTPUT
LAYER
(9 PEs)
Compute
output
Is desired
output
achieved?
Stop
learning
Adjust
weights
Yes
No
ANN
Model
w
1
w
2
w
n
x
1
x
2
x
n
.
.
.
Y
i
Neuron (or PE)
n
i
ii
WXS
1
)(Sf
Summation
Transfer
Function
)(SfY
a(Z
i
–Y
i
)
error
D
1
Systematically
Perturbed
Inputs
Observed
Change in
Outputs
Trained ANN
“the black-box”
X
1
X
2
M
a
x
i
m
u
m
-
m
a
r
g
i
n
h
y
p
e
r
p
l
a
n
e
X
1
X
2
L
1
L
2
L
3
M
a
r
g
i
n
Pre-Process the Data
üScrub the data
“Identify and handle missing,
incorrect, and noisy”
üTransform the data
“Numerisize, normalize and
standardize the data”
Develop the Model
üSelect the kernel type
“Choose from RBF, Sigmoid
or Polynomial kernel types”
üDetermine the kernel values
“Use v-fold cross validation or
employ ‘grid-search’”
Deploy the Model
üExtract the model coefficients
üCode the trained model into
the decision support system
üMonitor and maintain the
model
Training
data
Pre-processed data
Validated SVM model
Prediction
Model
Experimentation
“Training/Testing”
X
Y
X
i
Y
i
k= 3
k= 5
Historic Data
New Data
Parameter Setting
üDistance measure
üValue of “k”
Training Set
Validation Set
Predicting
Classify (or Forecast)
new cases using k
number of most
similar cases