discussion

profilesri169025
sharda_dss10_ppt_06.pptx

Chapter 6:

Techniques for Predictive Modeling

Business Intelligence and Analytics: Systems for Decision Support

(10th Edition)

Business Intelligence and Analytics: Systems for Decision Support

(10th Edition)

Copyright © 2014 Pearson Education, Inc.

6-‹#›

1

Learning Objectives

Understand the concept and definitions of artificial neural networks (ANN)

Learn the different types of ANN architectures

Know how learning happens in ANN

Become familiar with ANN applications

Understand the sensitivity analysis in ANN

Understand the concept and structure of support vector machines (SVM)

(Continued…)

Copyright © 2014 Pearson Education, Inc.

6-‹#›

Learning Objectives

Learn the advantages and disadvantages of SVM compared to ANN

Understand the concept and formulation of k-nearest neighbor algorithm (kNN)

Learn the process of applying kNN

Learn the advantages and disadvantages of kNN compared to ANN and SVM

Copyright © 2014 Pearson Education, Inc.

6-‹#›

Opening Vignette…

Predictive Modeling Helps Better Understand and Manage Complex Medical Procedures

Situation

Problem

Solution

Results

Answer & discuss the case questions.

Copyright © 2014 Pearson Education, Inc.

6-‹#›

4

Questions for the Opening Vignette

Why is it important to study medical procedures? What is the value in predicting outcomes?

What factors do you think are the most important in better understanding and managing healthcare?

What would be the impact of predictive modeling on healthcare and medicine? Can predictive modeling replace medical or managerial personnel?

What were the outcomes of the study? Who can use these results? How can they be implemented?

Search the Internet to locate two additional cases in managing complex medical procedures.

Copyright © 2014 Pearson Education, Inc.

6-‹#›

Opening Vignette – A Process Map for Training and Testing Four Predictive Models

Copyright © 2014 Pearson Education, Inc.

6-‹#›

Opening Vignette The Comparison of Four Models

Copyright © 2014 Pearson Education, Inc.

6-‹#›

Neural Network Concepts

Neural networks (NN): a brain metaphor for information processing

Neural computing

Artificial neural network (ANN)

Many uses for ANN for

pattern recognition, forecasting, prediction, and classification

Many application areas

finance, marketing, manufacturing, operations, information systems, and so on

Copyright © 2014 Pearson Education, Inc.

6-‹#›

8

Biological Neural Networks

Two interconnected brain cells (neurons)

Copyright © 2014 Pearson Education, Inc.

6-‹#›

9

Processing Information in ANN

A single neuron (processing element – PE) with inputs and outputs

Copyright © 2014 Pearson Education, Inc.

6-‹#›

10

Biology Analogy

Copyright © 2014 Pearson Education, Inc.

6-‹#›

11

Application Case 6.1

Neural Networks Are Helping to Save Lives in the Mining Industry

Questions for

Discussion

How did neural networks help save lives in the mining industry?

What were the challenges, the proposed solution, and the obtained results?

Copyright © 2014 Pearson Education, Inc.

6-‹#›

Elements of ANN

Processing element (PE)

Network architecture

Hidden layers

Parallel processing

Network information processing

Inputs

Outputs

Connection weights

Summation function

Copyright © 2014 Pearson Education, Inc.

6-‹#›

13

Elements of ANN

Neural Network with One Hidden Layer

Copyright © 2014 Pearson Education, Inc.

6-‹#›

14

Elements of ANN

Summation Function for a Single Neuron (a), and

Several Neurons (b)

Copyright © 2014 Pearson Education, Inc.

6-‹#›

15

Elements of ANN

Transformation (Transfer) Function

Linear function

Sigmoid (logical activation) function [0 1]

Tangent Hyperbolic function [-1 1]

Threshold value?

Copyright © 2014 Pearson Education, Inc.

6-‹#›

16

Neural Network Architectures

Architecture of a neural network is driven by the task it is intended to address

Classification, regression, clustering, general optimization, association, ….

Most popular architecture: Feedforward, multi-layered perceptron with backpropagation learning algorithm

Used for both classification and regression type problems

Others – Recurrent, self-organizing feature maps, Hopfield networks, …

Copyright © 2014 Pearson Education, Inc.

6-‹#›

17

Neural Network Architectures Feed-Forward Neural Networks

Feed-forward MLP with 1 Hidden Layer

Copyright © 2014 Pearson Education, Inc.

6-‹#›

18

Neural Network Architectures Recurrent Neural Networks

Copyright © 2014 Pearson Education, Inc.

6-‹#›

19

Other Popular ANN Paradigms Self-Organizing Maps (SOM)

First introduced by the Finnish Professor Teuvo Kohonen

Applies to clustering type problems

Copyright © 2014 Pearson Education, Inc.

6-‹#›

20

Other Popular ANN Paradigms Hopfield Networks

First introduced by John Hopfield

Highly interconnected neurons

Applies to solving complex computational problems (e.g., optimization problems)

Copyright © 2014 Pearson Education, Inc.

6-‹#›

21

Application Case 6.2

Predictive Modeling is Powering the Power Generators

Questions for Discussion

What are the key environmental concerns in the electric power industry?

What are the main application areas for predictive modeling in the electric power industry?

How was predictive modeling used to address a variety of problems in the electric power industry?

Copyright © 2014 Pearson Education, Inc.

6-‹#›

Development Process of an ANN

Copyright © 2014 Pearson Education, Inc.

6-‹#›

23

An MLP ANN Structure for the Box-Office Prediction Problem

Copyright © 2014 Pearson Education, Inc.

6-‹#›

24

Testing a Trained ANN Model

Data is split into three parts

Training (~60%)

Validation (~20%)

Testing (~20%)

k-fold cross validation

Less bias

Time consuming

Copyright © 2014 Pearson Education, Inc.

6-‹#›

25

AN Learning Process A Supervised Learning Process

Three-step process:

1. Compute temporary outputs.

2. Compare outputs with desired targets.

3. Adjust the weights and repeat the process.

Copyright © 2014 Pearson Education, Inc.

6-‹#›

26

Backpropagation Learning

Backpropagation of Error for a Single Neuron

Copyright © 2014 Pearson Education, Inc.

6-‹#›

27

Backpropagation Learning

The learning algorithm procedure

Initialize weights with random values and set other network parameters

Read in the inputs and the desired outputs

Compute the actual output (by working forward through the layers)

Compute the error (difference between the actual and desired output)

Change the weights by working backward through the hidden layers

Repeat steps 2-5 until weights stabilize

Copyright © 2014 Pearson Education, Inc.

6-‹#›

28

Illuminating The Black Box Sensitivity Analysis on ANN

A common criticism for ANN: The lack of transparency/explainability

The black-box syndrome!

Answer: sensitivity analysis

Conducted on a trained ANN

The inputs are perturbed while the relative change on the output is measured/recorded

Results illustrate the relative importance of input variables

Copyright © 2014 Pearson Education, Inc.

6-‹#›

29

Sensitivity Analysis on ANN Models

For a good example, see Application Case 6.3

Sensitivity analysis reveals the most important injury severity factors in traffic accidents

Copyright © 2014 Pearson Education, Inc.

6-‹#›

30

Application Case 6.3

Sensitivity Analysis Reveals Injury Severity Factors in Traffic Accidents

Questions for Discussion

How does sensitivity analysis shed light on the black box (i.e., neural networks)?

Why would someone choose to use a blackbox tool like neural networks over theoretically sound, mostly transparent statistical tools like logistic regression?

In this case, how did NNs and sensitivity analysis help identify injury-severity factors in traffic accidents?

Copyright © 2014 Pearson Education, Inc.

6-‹#›

Support Vector Machines (SVM)

SVM are among the most popular machine-learning techniques.

SVM belong to the family of generalized linear models… (capable of representing non-linear relationships in a linear fashion).

SVM achieve a classification or regression decision based on the value of the linear combination of input features.

Because of their architectural similarities, SVM are also closely associated with ANN.

Copyright © 2014 Pearson Education, Inc.

6-‹#›

32

Support Vector Machines (SVM)

Goal of SVM: to generate mathematical functions that map input variables to desired outputs for classification or regression type prediction problems.

First, SVM uses nonlinear kernel functions to transform non-linear relationships among the variables into linearly separable feature spaces.

Then, the maximum-margin hyperplanes are constructed to optimally separate different classes from each other based on the training dataset.

SVM has solid mathematical foundation!

Copyright © 2014 Pearson Education, Inc.

6-‹#›

33

Support Vector Machines (SVM)

A hyperplane is a geometric concept used to describe the separation surface between different classes of things.

In SVM, two parallel hyperplanes are constructed on each side of the separation space with the aim of maximizing the distance between them.

A kernel function in SVM uses the kernel trick (a method for using a linear classifier algorithm to solve a nonlinear problem)

The most commonly used kernel function is the radial basis function (RBF).

Copyright © 2014 Pearson Education, Inc.

6-‹#›

34

Support Vector Machines (SVM)

Many linear classifiers (hyperplanes) may separate the data

Copyright © 2014 Pearson Education, Inc.

6-‹#›

35

Application Case 6.4

Managing Student Retention with Predictive Modeling

Questions for Discussion

Why is attrition one of the most important issues in higher education?

How can predictive analytics (ANN, SVM, and so forth) be used to better manage student retention?

What are the main challenges and potential solutions to the use of analytics in retention management?

Copyright © 2014 Pearson Education, Inc.

6-‹#›

Application Case 6.4

Managing Student Retention with Predictive Modeling

Copyright © 2014 Pearson Education, Inc.

6-‹#›

How Does an SVM Work?

Following a machine-learning process, an SVM learns from the historic cases.

The Process of Building SVM

1. Preprocess the data

Scrub and transform the data.

2. Develop the model.

Select the kernel type (RBF is often a natural choice).

Determine the kernel parameters for the selected kernel type.

If the results are satisfactory, finalize the model; otherwise change the kernel type and/or kernel parameters to achieve the desired accuracy level.

3. Extract and deploy the model.

Copyright © 2014 Pearson Education, Inc.

6-‹#›

38

The Process of Building an SVM

Copyright © 2014 Pearson Education, Inc.

6-‹#›

39

SVM Applications

SVMs are the most widely used kernel-learning algorithms for wide range of classification and regression problems

SVMs represent the state-of-the-art by virtue of their excellent generalization performance, superior prediction power, ease of use, and rigorous theoretical foundation

Most comparative studies show its superiority in both regression and classification type prediction problems.

SVM versus ANN?

Copyright © 2014 Pearson Education, Inc.

6-‹#›

40

k-Nearest Neighbor Method (k-NN)

ANNs and SVMs  time-demanding, computationally intensive iterative derivations

k-NN is a simplistic and logical prediction method, that produces very competitive results

k-NN is a prediction method for classification as well as regression types (similar to ANN & SVM)

k-NN is a type of instance-based learning (or lazy learning) – most of the work takes place at the time of prediction (not at modeling)

k : the number of neighbors used

Copyright © 2014 Pearson Education, Inc.

6-‹#›

41

k-Nearest Neighbor Method (k-NN)

The answer depends on the value of k

Copyright © 2014 Pearson Education, Inc.

6-‹#›

The Process of k-NN Method

Copyright © 2014 Pearson Education, Inc.

6-‹#›

Similarity Measure: The Distance Metric

Numeric versus nominal values?

k-NN Model Parameter

Copyright © 2014 Pearson Education, Inc.

6-‹#›

Number of Neighbors (the value of k)

The best value depends on the data

Larger values reduce the effect of noise but also make boundaries between classes less distinct

An “optimal” value can be found heuristically

Cross Validation is often used to determine the best value for k and the distance measure

k-NN Model Parameter

Copyright © 2014 Pearson Education, Inc.

6-‹#›

Application Case 6.5

Efficient Image Recognition and Categorization with kNN

Questions for Discussion

Why is image recognition/classification a worthy but difficult problem?

How can k-NN be effectively used for image recognition/classification applications?

Copyright © 2014 Pearson Education, Inc.

6-‹#›

End of the Chapter

Questions, comments

Copyright © 2014 Pearson Education, Inc.

6-‹#›

47

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher. Printed in the United States of America.

Copyright © 2014 Pearson Education, Inc.

6-‹#›

48

Soma

Axon

Axon

Synapse

Synapse

Dendrites

Dendrites

Soma

w

1

w

2

w

n

x

1

x

2

x

n

.

.

.

Y

Y

1

Y

n

Y

2

InputsWeightsOutputs

.

.

.

Neuron (or PE)

n

i

ii

WXS

1

)(Sf

Summation

Transfer

Function

(PE)

(PE)

(PE)

(PE)

(PE)

(PE)

(PE)

Transfer

Function

( f )

Weighted

Sum

(S)

x

1

x

2

x

3

Y

1

Input

Layer

Hidden

Layer

Output

Layer

x

1

x

2

2211

WXWXY 

(PE)

(PE)

Y

(PE)

(PE)

w

1

w

1

w

11

w

21

w

12

w

22

w

23

x

1

x

2

Y

1

Y

2

Y

3

2121111

WXWXY 

2221212

WXWXY 

2323

WXY

(a) Single neuron(b) Multiple neurons

PE: Processing Element (or neuron)

X

1

=3

Processing

element (PE)

X

2

=1

X

3

=2

W

1

=

0

.

2

W

2

=0.4

W

3

=

0

.

1

Y=1.2

Summation function:

Transfer function:

Y = 3(0.2) + 1(0.4) + 2(0.1) = 1.2

Y

T

= 1/(1 + e

-1.2

) = 0.77

Y

T

=0.77

INPUT

LAYER

HIDDEN

LAYER

OUTPUT

LAYER

.

.

.

.

.

.

Voted “yes” or

“no” to legalizing

gaming

Predicted

vs. Actual

=

Socio-demographic

Religious

Financial

Other

Input 1

Input 2

Input 3

...

I n p u t

O

u

t

p

u

t

1

2

3

4

5

6

7

...

1

2

3

4

5

6

7

8

9

...

MPAA Rating (5)

(G, PG, PG13, R, NR)

Competition (3)

(High, Medium, Low)

Star Value (3)

(High, Medium, Low)

Genre (10)

(Sci-Fi, Action, ... )

Technical Effects (3)

(High, Medium, Low)

Sequel (2)

(Yes, No)

Number of Screens

(Positive Integer)

Class 1 -FLOP

(BO < 1 M)

Class 2

(1M < BO < 10M)

Class 3

(10M < BO < 20M)

Class 4

(20M < BO < 40M)

Class 5

(40M < BO < 65M)

Class 6

(65M < BO < 100M)

Class 7

(100M < BO < 150M)

Class 8

(150M < BO < 200M)

Class 9 -BLOCKBUSTER

(BO > 200M)

INPUT

LAYER

(27 PEs)

HIDDEN

LAYER I

(18 PEs)

HIDDEN

LAYER II

(16 PEs)

OUTPUT

LAYER

(9 PEs)

Compute

output

Is desired

output

achieved?

Stop

learning

Adjust

weights

Yes

No

ANN

Model

w

1

w

2

w

n

x

1

x

2

x

n

.

.

.

Y

i

Neuron (or PE)

n

i

ii

WXS

1

)(Sf

Summation

Transfer

Function

)(SfY

a(Z

i

–Y

i

)

error

D

1

Systematically

Perturbed

Inputs

Observed

Change in

Outputs

Trained ANN

“the black-box”

X

1

X

2

M

a

x

i

m

u

m

-

m

a

r

g

i

n

h

y

p

e

r

p

l

a

n

e

X

1

X

2

L

1

L

2

L

3

M

a

r

g

i

n

Pre-Process the Data

üScrub the data

“Identify and handle missing,

incorrect, and noisy”

üTransform the data

“Numerisize, normalize and

standardize the data”

Develop the Model

üSelect the kernel type

“Choose from RBF, Sigmoid

or Polynomial kernel types”

üDetermine the kernel values

“Use v-fold cross validation or

employ ‘grid-search’”

Deploy the Model

üExtract the model coefficients

üCode the trained model into

the decision support system

üMonitor and maintain the

model

Training

data

Pre-processed data

Validated SVM model

Prediction

Model

Experimentation

“Training/Testing”

X

Y

X

i

Y

i

k= 3

k= 5

Historic Data

New Data

Parameter Setting

üDistance measure

üValue of “k”

Training Set

Validation Set

Predicting

Classify (or Forecast)

new cases using k

number of most

similar cases