Why IT Fumbles Analytics

profileStarasr
Chpt_05.pptx

Business Intelligence, Analytics, and Data Science: A Managerial Perspective

Fourth Edition

Chapter 5

Predictive Analytics II: Text, Web, and Social Media Analytics …

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

Learning Objectives (1 of 2)

5.1 Describe text mining and understand the need for text mining

5.2 Differentiate among text analytics, text mining, and data mining

5.3 Understand the different application areas for text mining

5.4 Know the process of carrying out a text mining project

5.5 Appreciate the different methods to introduce structure to text-based data

Slide 5-2

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

Slide 2 is a list of textbook LO numbers and statements.

2

Learning Objectives (2 of 2)

5.6 Describe sentiment analysis

5.7 Develop familiarity with popular applications of sentiment analysis

5.8 Learn the common methods for sentiment analysis

5.9 Become familiar with speech analytics as it relates to sentiment analysis

Slide 5-3

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

Slide 3 is a list of textbook LO numbers and statements.

3

OPENING VIGNETTE Machine versus Men on Jeopardy!: The Story of Watson (1 of 3)

Slide 5-4

IBM Watson going head-to-head with the best of the best in Jeopardy!

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

OPENING VIGNETTE Machine versus Men on Jeopardy!: The Story of Watson (2 of 3)

Slide 5-5

IBM Watson – How does it do it?

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

OPENING VIGNETTE Machine versus Men on Jeopardy!: The Story of Watson (3 of 3)

Discussion Questions for the Opening Vignette

What is Watson? What is special about it?

What technologies were used in building Watson (both hardware and software)?

What are the innovative characteristics of DeepQA architecture that made Watson superior?

Why did IBM spend all that time and money to build Watson? Where is the return on investment (ROI)?

Slide 5-6

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

Text Analytics and Text Mining

Text Analytics versus Text Mining

Text Analytics =

Information Retrieval +

Information Extraction +

Data Mining +

Web Mining

or simply

Text Analytics = Information Retrieval + Text Mining

Slide 5-7

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

Text Analytics and Text Mining

Slide 5-8

FIGURE 5.2 Text Analytics, Related Application Areas, and Enabling Disciplines

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

Text Mining Concepts

85-90 percent of all corporate data is in some kind of unstructured form (e.g., text)

Unstructured corporate data is doubling in size every 18 months

Tapping into these information sources is not an option, but a need to stay competitive

Answer: text mining

A semi-automated process of extracting knowledge from unstructured data sources

a.k.a. text data mining or knowledge discovery in textual databases

Slide 5-9

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

9

Data Mining versus Text Mining

Both seek for novel and useful patterns

Both are semi-automated processes

Difference is the nature of the data:

Structured versus unstructured data

Structured data: in databases

Unstructured data: Word documents, PDF files, text excerpts, XML files, and so on

To perform text mining – first, impose structure to the data, then mine the structured data

Slide 5-10

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

10

Text Mining Concepts

Benefits of text mining are obvious especially in text-rich data environments

e.g., law (court orders), academic research (research articles), finance (quarterly reports), medicine (discharge summaries), biology (molecular interactions), technology (patent files), marketing (customer comments), etc.

Electronic communization records (e.g., e-mail)

Spam filtering

E-mail prioritization and categorization

Automatic response generation

Slide 5-11

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

11

Text Mining Application Area

Information extraction

Topic tracking

Summarization

Categorization

Clustering

Concept linking

Question answering

Slide 5-12

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

12

Text Mining Terminology

Unstructured or semistructured data

Corpus (and corpora)

Terms

Concepts

Stemming

Stop words (and include words)

Synonyms (and polysemes)

Tokenizing

Slide 5-13

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

13

Text Mining Terminology

Term dictionary

Word frequency

Part-of-speech tagging

Morphology

Term-by-document matrix

Occurrence matrix

Singular value decomposition

Latent semantic indexing

Slide 5-14

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

14

Application Case 5.1 Insurance Group Strengthens Risk Management with Text Mining Solution

Questions for Discussion

How can text analytics and mining be used to keep up with changing business needs of insurance companies?

What were the challenges, the proposed solution, and the obtained results?

Can you think of other uses of text analytics and text mining for insurance companies?

Slide 5-15

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

Natural Language Processing (NLP)

Structuring a collection of text

Old approach: bag-of-words

New approach: natural language processing

NLP is …

a very important concept in text mining

a subfield of artificial intelligence and computational linguistics

the studies of "understanding" the natural human language

Syntax versus semantics-based text mining

Slide 5-16

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

16

Natural Language Processing (NLP)

What is “Understanding”?

Human understands, what about computers?

Natural language is vague, context driven

True understanding requires extensive knowledge of a topic

Can/will computers ever understand natural language the same/accurate way we do?

Slide 5-17

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

17

Natural Language Processing (NLP)

Challenges in NLP

Part-of-speech tagging

Text segmentation

Word sense disambiguation

Syntax ambiguity

Imperfect or irregular input

Speech acts

Dream of AI community

to have algorithms that are capable of automatically reading and obtaining knowledge from text

Slide 5-18

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

18

Natural Language Processing (NLP)

WordNet

A laboriously hand-coded database of English words, their definitions, sets of synonyms, and various semantic relations between synonym sets

A major resource for NLP

Need automation to be completed

Sentiment Analysis

A technique used to detect favorable and unfavorable opinions toward specific products and services

SentiWordNet

Slide 5-19

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

19

Application Case 5.2 AMC Networks Is Using Analytics to Capture New Viewers, Predict Ratings, and Add Value for Advertisers in a Multichannel World (1 of 2)

Slide 5-20

A Web-Based Dashboard Used by AMC Networks [Source: AMC Networks]

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

Application Case 5.2 AMC Networks Is Using Analytics to Capture New Viewers, Predict Ratings, and Add Value for Advertisers in a Multichannel World (2 of 2)

Questions for Discussion

What are the common challenges broadcasting companies are facing nowadays? How can analytics help to alleviate these challenges?

How did AMC leverage analytics to enhance their business performance?

What were the types of text analytics and text mini solutions developed by AMC networks? Can you think of other potential uses of text mining applications in the broadcasting industry?

Slide 5-21

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

NLP Task Categories

Question answering

Automatic summarization

Natural language generation & understanding

Machine translation

Foreign language reading & writing

Speech recognition

Text proofing, optical character recognition

Optical character recognition

Slide 5-22

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

22

Text Mining Applications

Marketing applications

Enables better CRM

Security applications

ECHELON, OASIS

Deception detection (…)

Medicine and biology

Literature-based gene identification (…)

Academic applications

Research stream analysis

Slide 5-23

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

23

Deception detection

A difficult problem

If detection is limited to only text, then the problem is even more difficult

The study

analyzed text-based testimonies of person of interests at military bases

used only text-based features (cues)

Application Case 5.3 Mining for Lies (1 of 4)

Slide 5-24

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

24

Application Case 5.3 Mining for Lies (2 of 4)

FIGURE 5.3 Text-Based Deception-Detection Process

Slide 5-25

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

25

Application Case 5.3 Mining for Lies (3 of 4)

Table 5.1 Categories and Examples of Linguistic Features Used in Deception Detection

Slide 5-26

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

26

371 usable statements are generated

31 features are used

Different feature selection methods used

10-fold cross validation is used

Results (overall % accuracy)

Logistic regression 67.28

Decision trees 71.60

Neural networks 73.46

Application Case 5.3 Mining for Lies (4 of 4)

Slide 5-27

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

27

Text Mining Applications (Gene/Protein Interaction Identification)

Slide 5-28

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

28

Application Case 5.4 Bringing the Customer into the Quality Equation: Lenovo Uses Analytics to Rethink Its Redesign

Questions for Discussion

How did Lenovo use text analytics and text mining to improve quality and design of their products and ultimately improve customer satisfaction?

What were the challenges, the proposed solution, and the obtained results?

Slide 5-29

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

29

Text Mining Process

A Context Diagram for Text Mining Process

Slide 5-30

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

30

Text Mining Process

FIGURE 5.6 The Three-Step/Task Text Mining Process

Slide 5-31

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

31

Text Mining Process

Step 1: Establish the corpus

Collect all relevant unstructured data (e.g., textual documents, XML files, e-mails, Web pages, short notes, voice recordings…)

Digitize, standardize the collection (e.g., all in ASCII text files)

Place the collection in a common place (e.g., in a flat file, or in a directory as separate files)

Slide 5-32

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

32

Text Mining Process

Step 2: Create the Term–by–Document Matrix

Slide 5-33

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

33

Text Mining Process

Step 2: Create the Term–by–Document Matrix (TDM) (Cont.)

Should all terms be included?

Stop words, include words

Synonyms, homonyms

Stemming

What is the best representation of the indices (values in cells)?

Row counts; binary frequencies; log frequencies;

Inverse document frequency

Slide 5-34

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

34

Text Mining Process

Step 2: Create the Term–by–Document Matrix (TDM) (Cont.)

TDM is a sparse matrix. How can we reduce the dimensionality of the TDM?

Manual - a domain expert goes through it

Eliminate terms with very few occurrences in very few documents (?)

Transform the matrix using singular value decomposition (SVD)

SVD is similar to principle component analysis

Slide 5-35

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

35

Text Mining Process

Step 3: Extract patterns/knowledge

Classification (text categorization)

Clustering (natural groupings of text)

Improve search recall

Improve search precision

Scatter/gather

Query-specific clustering

Association

Trend Analysis (…)

Slide 5-36

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

36

Application Case 5.5 Research Literature Survey with Text Mining (1 of 4)

Mining the published IS literature

MIS Quarterly (MISQ)

Journal of MIS (JMIS)

Information Systems Research (ISR)

Covers 12-year period (1994-2005)

901 papers are included in the study

Only the paper abstracts are used

9 clusters are generated for further analysis

Slide 5-37

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

37

Application Case 5.5 Research Literature Survey with Text Mining (2 of 4)

Slide 5-38

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

38

Application Case 5.5 Research Literature Survey with Text Mining (3 of 4)

Slide 5-39

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

39

Application Case 5.5 Research Literature Survey with Text Mining (4 of 4)

Slide 5-40

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

40

Sentiment Analysis

Sentiment  belief, view, opinion, and conviction

Sentiment analysis is trying to answer the question “What do people feel about a certain topic?”

By analyzing data related to opinions of many using a variety of automated tools

Used in variety of domains, but its applications in CRM are especially noteworthy (which related to customers/consumers’ opinions)

Slide # of total

Slide 5-41

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

Sentiment Analysis Applications

Voice of the customer (VOC)

Voice of the Market (VOM)

Voice of the Employee (VOE)

Brand Management

Financial Markets

Politics

Government Intelligence

… others

Slide 5-42

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

Sentiment Analysis Process

Slide 5-43

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

Sentiment Analysis Process

Step 1 – Sentiment Detection

Comes right after the retrieval and preparation of the text documents

It is also called detection of objectivity

Fact [= objectivity] versus Opinion [= subjectivity]

Step 2 – N-P Polarity Classification

Given an opinionated piece of text, the goal is to classify the opinion as falling under one of two opposing sentiment polarities

N [= negative] versus P [= positive]

Slide 5-44

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

Sentiment Analysis Process

Step 3 – Target Identification

The goal of this step is to accurately identify the target of the expressed sentiment (e.g., a person, a product, an event, etc.)

Level of difficulty  the application domain

Step 4 – Collection and Aggregation

Once the sentiments of all text data points in the document are identified and calculated, they are to be aggregated

Word  Statement  Paragraph  Document

Slide 5-45

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

P-N Polarity and S-O Polarity

Slide 5-46

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

Web Mining Overview

Web is the largest repository of data

Data is in HTML, XML, text format

Challenges (of processing Web data)

The Web is too big for effective data mining

The Web is too complex

The Web is too dynamic

The Web is not specific to a domain

The Web has everything

Opportunities and challenges are great!

Slide 5-47

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

47

Web Mining

Web mining (or Web data mining) is the process of discovering intrinsic relationships from Web data (textual, linkage, or usage)

Slide 5-48

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

48

Web Content/Structure Mining

Mining the textual content on the Web

Data collection via Web crawlers

Web pages include hyperlinks

Authoritative pages

Hubs

Hyperlink-induced topic search (HITS) alg.

Slide 5-49

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

49

Web Usage Mining

Extraction of information from data generated through Web page visits and transactions…

data stored in server access logs, referrer logs, agent logs, and client-side cookies

user characteristics and usage profiles

metadata, such as page attributes, content attributes, and usage data

Clickstream data

Clickstream analysis

Slide 5-50

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

50

Web Usage Mining

Web usage mining applications

Determine the lifetime value of clients

Design cross-marketing strategies across products.

Evaluate promotional campaigns

Target electronic ads and coupons at user groups based on user access patterns

Predict user behavior based on previously learned rules and users' profiles

Present dynamic information to users based on their interests and profiles

Slide 5-51

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

51

Search Engines

Google, Bing, Yahoo, …

For what reason do you use search engines?

Search engine is a software program that searches for documents (Internet sites or files) based on the keywords (individual words, multi-word terms, or a complete sentence) that users have provided that have to do with the subject of their inquiry

They are the workhorses of the Internet

Slide 5-52

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

Structure of a Typical Internet Search Engine

Slide 5-53

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

Anatomy of a Search Engine

Development Cycle

Web Crawler

Document Indexer

Response Cycle

Query Analyzer

Document Matcher/Ranker

Slide 5-54

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

It is the intentional activity of affecting the visibility of an e-commerce site or a Web site in a search engine’s natural (unpaid or organic) search results

Part of an Internet marketing strategy

Based on knowing how a Search Engine works

Content, HTML, keywords, external links, …

Indexing based on …

Webmaster submission of URL

Proactively and continuously crawling the Web

Search Engine Optimization

Slide 5-55

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

Top 15 Most Popular Search Engines (by eBizMBA, August 2016)

Slide 5-56

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

Web Usage Mining (Clickstream Analysis)

Slide 5-57

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

57

Web Analytics Metrics

Web site usability

How were the visitors using my Web site?

Traffic sources

Where did they come from?

Visitor profiles

What do my visitors look like?

Conversion statistics

What does it all mean for the business?

Slide 5-58

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

Web Analytics Metrics

Web Site Usability

Page views

Time on site

Downloads

Click map

Click paths

Traffic Source

Referral Web sites

Search engines

Direct

Offline campaigns

Online campaigns

Slide 5-59

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

Web Analytics Metrics

Visitor Profiles

Keywords

Content groupings

Geography

Time of day

Landing page profiles

Conversion Statistics

New visitors

Returning visitors

Leads

Sales/conversions

Abandonment/exit rate

Slide 5-60

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

A Sample Web Analytics Dashboard

Slide 5-61

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

Social Analytics Social Network Analysis

Social Network - social structure composed of individuals linking to each other

Analysis of social dynamics

Interdisciplinary field

Social psychology

Sociology

Statistics

Graph theory

Slide 5-62

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

Social Analytics Social Network Analysis

Social Networks help study relationships between individuals, groups, organizations, societies

Self organizing

Emergent

Complex

Typical social network types

Communication networks, community networks, criminal networks, innovation networks, …

Slide 5-63

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

Discussion Questions

How can social media analytics be used in the consumer products industry?

What do you think are the key challenges, potential solutions, and probable results in applying social media analytics in consumer products and services firms?

Application Case 5.8 Tito’s Vodka Establishes Brand Loyalty with an Authentic Social Strategy

Slide 5-64

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

64

Social Analytics Social Network Analysis Metrics

Connections

Homophily

Multiplexity

Mutuality/reciprocity

Network closure

Propinquity

Segmentation

Cliques and social circles

Clustering coefficient

Cohesion

Distribution

Bridge

Centrality

Density

Distance

Structural holes

Slide 5-65

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

Social Media Definitions and Concepts

Enabling technologies of social interactions among people

Relies on enabling technologies of Web 2.0

Takes on many different forms

Internet forums, Web logs, social blogs, microblogging, wikis, social networks, podcasts, pictures, video, and product reviews

Different types of social media

Based on media research and social process

Slide 5-66

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

Social versus Industrial Media

Web-based social media are different from traditional/industrial media, such as newspapers, television, and film

Differentiating characteristics

Quality

Reach

Frequency

Accessibility

Usability

Immediacy

Updatability

Slide 5-67

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

How Do People Use Social Media?

Different engagement levels

Slide 5-68

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

Social Media Analytics

It is the systematic and scientific ways to consume the vast amount of content created by Web-based social media outlets, tools, and techniques for the betterment of an organization’s competitiveness

Tools to measure social media impact:

Descriptive analytics

Social network analysis

Advanced analytics

Slide 5-69

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

Best Practices in Social Media Analytics

Think of measurement as a guidance system, not a rating system

Track the elusive sentiment

Continuously improve the accuracy of text analysis

Look at the ripple effect

Look beyond the brand

Identify your most powerful influencers

Look closely at the accuracy of your analytic tool

Incorporate social media intelligence into planning

Slide 5-70

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

End of Chapter 5

Questions / Comments

Slide 5-71

Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

Statements

Transcribed for

Processing

Text Processing

Software Identified

Cues in Statements

Statements Labeled as

Truthful or Deceptive

by Law Enforcement

Text Processing

Software Generated

Quantified Cues

Classification Models

Trained and Tested on

Quantified Cues

Cues Extracted &

Selected

Category Example Cues

Quantity Verb count, noun-phrase count, ...

Complexity Avg. no of clauses, sentence length, …

Uncertainty Modifiers, modal verbs, ...

Nonimmediacy Passive voice, objectification, ...

Expressivity Emotiveness

Diversity Lexical diversity, redundan cy, ...

Informality Typographical error ratio

Specificity Spatiotemporal , perceptual information …

Affect Positive affect, negative affect, etc.

G

e

n

e

/

P

r

o

t

e

i

n

596 12043 24224 28102042722 397276

D007962

D 016923

D 001773

D019254D044465D001769D002477D003643D016158

185851112923017275874279189521623563217825282523

NNINNNINVBZINJJJJNNNNNNCCNNINNN

NPPPNPNPPPNPNPPPNP

O

n

t

o

l

o

g

y

W

o

r

d

P

O

S

S

h

a

l

l

o

w

P

a

r

s

e

...expression of Bcl-2 is correlated with insufficient white blood cell death and activation of p53.

Establish the Corpus:

Collect and organize

the domain-specific

unstructured data

Create the Term-

Document Matrix:

Introduce structure

to the corpus

Extract Knowledge:

Discover novel

patterns from the

T-D matrix

The inputs to the process

include a variety of relevant

unstructured (and semi-

structured) data sources such as

text, XML, HTML, etc.

The output of Task 1 is a

collection of documents in

some digitized format for

computer processing

The output of Task 2 is a flat

file called term-document

matrix where the cells are

populated with the term

frequencies

The output of Task 3 is a

number of problem-specific

classification, association,

clustering models and

visualizations

Task 1Task 2Task 3

FeedbackFeedback

Knowledge

1

2

3

4

5

Data

Text

i

n

v

e

s

t

m

e

n

t

r

i

s

k

p

r

o

j

e

c

t

m

a

n

a

g

e

m

e

n

t

s

o

f

t

w

a

r

e

e

n

g

i

n

e

e

r

i

n

g

d

e

v

e

l

o

p

m

e

n

t

1

S

A

P

.

.

.

Document 1

Document 2

Document 3

Document 4

Document 5

Document 6

...

Documents

Terms

1

1

1

2

1

1

1

3

1

Journal

Year

Author(s)

Title

Vol/No

Pages

Keywords

Abstract

MISQ

2005

A. Malhotra,

S. Gosain and

O. A. El Sawy

Absorptive capacity

configurations in

supply chains:

Gearing for partner-

enabled market

knowledge creation

29/1

145-187

knowledge management

supply chain

absorptive capacity

interorganizational

information systems

configuration approaches

The need for continual value

innovation is driving supply

chains to evolve from a pure

transactional focus to

leveraging interorganizational

partner ships for sharing

ISR

1999

D. Robey and

M. C. Boudreau

Accounting for the

contradictory

organizational

consequences of

information

technology:

Theoretical directions

and methodological

implications

2-Oct

167-185

organizational

transformation

impacts of technology

organization theory

research methodology

intraorganizational power

electronic communication

mis implementation

culture

systems

Although much contemporary

thought considers advanced

information technologies as

either determinants or enablers

of radical organizational

change, empirical studies have

revealed inconsistent findings to

support the deterministic logic

implicit in such arguments. This

paper reviews the contradictory

JMIS

2001

R. Aron and

E. K. Clemons

Achieving the optimal

balance between

investment in quality

and investment in self-

promotion for

information products

18/2

65-88

information products

internet advertising

product positioning

signaling

signaling games

When producers of goods (or

services) are confronted by a

situation in which their offerings

no longer perfectly match

consumer preferences, they

must determine the extent to

which the advertised features of

Identify the target

for the sentiment

Calculate the N –P

Polarity of the

sentiment

Is there a

sentiment?

Record the Polarity,

Strength, and the

Target of the

sentiment.

Tabulate & aggregate

the sentiment

analysis results

Textual Data

Calculate the

O –S Polarity

YesNo

A statement

Yes

Lexicon

Lexicon

O –S

polarity

measure

N-P Polarity

Target

Step 1

Step 2

Step 3

Step 4

Positive (P)

(+)

Negative (N)

(-)

Objective (O)

Subjective (S)

P –N Polarity

S

-

O

P

o

l

a

r

i

t

y

Marketing AttributionCustomer Analytics

360 Customer ViewVoice of the Customer

Search Engine OptimizationSocial Network AnalysisSocial Media AnalyticsWeblog Analysis

Page RankInformation RetrievalGraph MiningSocial AnalyticsClickstream Analysis

Data

Mining

Text

Mining

Web Mining

Web Structure Mining

Source:the unified

resource locator (URL)

links contained in the

Web pages

Web Content Mining

Source:unstructured

textual content of the

Web pages (usually in

HTML format)

Web Usage Mining

Source:the detailed

description of a Web

site’s visits (sequence of

clicks by sessions)

Web AnalyticsSearch EnginesSentiment AnalysisSemantic Webs

Query Analyzer

Document

Matcher/Ranker

Web Crawler

Document

Indexer

Scheduler

Cashed / Indexed

Documents DB

User

World Wide Web

S

e

a

r

c

h

Q

u

e

r

y

P

r

o

c

e

s

s

e

d

Q

u

e

r

y

L

i

s

t

o

f

U

R

L

s

t

o

C

r

a

w

l

C

r

a

w

l

i

n

g

t

h

e

W

e

b

U

n

p

r

o

c

e

s

s

e

d

W

e

b

P

a

g

e

s

P

r

o

c

e

s

s

e

d

P

a

g

e

s

L

i

s

t

o

f

M

a

t

c

h

e

d

P

a

g

e

s

R

a

n

k

e

d

-

O

r

d

e

r

e

d

P

a

g

e

s

Responding CycleDevelopment Cycle

M

e

t

a

d

a

t

a

I

n

d

e

x

Weblogs

Website

Pre-Process Data

Collecting

Merging

Cleaning

Structuring

-Identify users

-Identify sessions

-Identify page views

-Identify visits

Extract Knowledge

Usage patterns

User profiles

Page profiles

Visit profiles

Customer value

How to better the data

How to improve the Web site

How to increase the customer value

User /

Customer

Creators

Critics

Joiners

Collectors

Spectators

Inactives

Time

L

e

v

e

l

o

f

S

o

c

i

a

l

M

e

d

i

a

E

n

g

a

g

e

m

e

n

t