Business Intelligence and Data Mining Assignment

profileKen33333
sharda_bi4e_ppt_034.pptx

Business Intelligence, Analytics, and Data Science: A Managerial Perspective

Fourth Edition

Chapter 3

Descriptive Analytics II: Business Intelligence and Data Warehousing

Copyright © 2018 Pearson Education Ltd.

Copyright © 2018 Pearson Education Ltd.

Learning Objectives (1 of 2)

3.1 Understand the basic definitions and concepts of data warehousing

3.2 Understand data warehousing architectures

3.3 Describe the processes used in developing and managing data warehouses

3.4 Explain data warehousing operations

3.5 Explain the role of data warehouses in decision support

Slide 3-2

Copyright © 2018 Pearson Education Ltd.

Slide 2 is a list of textbook LO numbers and statements.

2

Learning Objectives (2 of 2)

3.6 Explain data integration and the extraction, transformation, and load (ETL) processes

3.7 Understand the essence of business performance management (BPM)

3.8 Learn balanced scorecard and Six Sigma as performance measurement systems

Slide 3-3

Copyright © 2018 Pearson Education Ltd.

Slide 3 is a list of textbook LO numbers and statements.

3

OPENING VIGNETTE Targeting Tax Fraud with Business Intelligence and Data Warehousing

Why is it important for IRS and for U.S. state governments to use data warehousing and business intelligence (BI) tools in managing state revenues?

What were the challenges the state of Maryland was facing with regard to tax fraud?

What was the solution they adopted? Do you agree with their approach? Why?

What were the results that they obtained? Did the investment in BI and data warehousing pay off?

What other problems and challenges do you think federal and state governments are having that can benefit from BI and data warehousing?

Slide 3-4

Copyright © 2018 Pearson Education Ltd.

Business Intelligence and Data Warehousing

BI used to be everything related to use of data for managerial decision support

Now, it is a part of Business Analytics

BI = Descriptive Analytics

Slide 3-5

Copyright © 2018 Pearson Education Ltd.

What is a Data Warehouse?

A physical repository where relational data are specially organized to provide enterprise-wide, cleansed data in a standardized format

A relational database? (so what is the difference?)

“The data warehouse is a collection of integrated, subject-oriented databases designed to support DSS functions, where each unit of data is non-volatile and relevant to some moment in time”

Slide 3-6

Copyright © 2018 Pearson Education Ltd.

A Historical Perspective to Data Warehousing

Slide 3-7

Copyright © 2018 Pearson Education Ltd.

Characteristics of DWs

Subject oriented

Integrated

Time-variant (time series)

Nonvolatile

Summarized

Not normalized

Metadata

Web based, relational/multi-dimensional

Client/server, real-time/right-time/active...

Slide 3-8

Copyright © 2018 Pearson Education Ltd.

Data Mart

A departmental small-scale “DW” that stores only limited/relevant data

Dependent data mart

A subset that is created directly from a data warehouse

Independent data mart

A small data warehouse designed for a strategic business unit or a department

Slide 3-9

Copyright © 2018 Pearson Education Ltd.

Other DW Components

Operational data stores (ODS)

A type of database often used as an interim area for a data warehouse

Oper marts

An operational data mart

Enterprise data warehouse (EDW)

A data warehouse for the enterprise

Metadata – “data about data”

In DW metadata describe the contents of a data warehouse and its acquisition and use

Slide 3-10

Copyright © 2018 Pearson Education Ltd.

Application Case 3.1 A Better Data Plan: Well-Established TELCOs Leverage Data Warehousing and Analytics to Stay on Top in a Competitive Industry

Questions for Discussion

What are the main challenges for TELCOs?

How can data warehousing and data analytics help TELCOs in overcoming their challenges?

Why do you think TELCOs are well suited to take full advantage of data analytics?

Slide 3-11

Copyright © 2018 Pearson Education Ltd.

DW for Data-Driven Decision Making

An example of a DW supporting data-driven decision making in automotive industry

Slide 3-12

Copyright © 2018 Pearson Education Ltd.

A Generic DW Framework

Slide 3-13

Copyright © 2018 Pearson Education Ltd.

13

DW Architecture

Three-tier architecture

Data acquisition software (back-end)

The data warehouse that contains the data & software

Client (front-end) software that allows users to access and analyze data from the warehouse

Two-tier architecture

First two tiers in three-tier architecture are combined into one

… sometimes there is only one tier?

Slide 3-14

Copyright © 2018 Pearson Education Ltd.

14

DW Architectures

3-tier

architecture

2-tier

architecture

1-tier

Architecture

?

Slide 3-15

Copyright © 2018 Pearson Education Ltd.

15

Data Warehousing Architectures

Issues to consider when deciding which architecture to use:

Which database management system (DBMS) should be used?

Will parallel processing and/or partitioning be used?

Will data migration tools be used to load the data warehouse?

What tools will be used to support data retrieval and analysis?

Slide 3-16

Copyright © 2018 Pearson Education Ltd.

A Web-based DW Architecture

Slide 3-17

Copyright © 2018 Pearson Education Ltd.

17

Alternative DW Architectures

Slide 3-18

Copyright © 2018 Pearson Education Ltd.

18

Alternative DW Architectures

Each architecture has advantages and disadvantages!

Which architecture is the best?

Slide 3-19

Copyright © 2018 Pearson Education Ltd.

19

Ten Factors that Potentially Affect the Architecture Selection Decision

Information interdependence between organizational units

Upper management’s information needs

Urgency of need for a data warehouse

Nature of end-user tasks

Constraints on resources

Strategic view of the data warehouse prior to implementation

Compatibility with existing systems

Perceived ability of the in-house IT staff

Technical issues

Social/political factors

Slide 3-20

Copyright © 2018 Pearson Education Ltd.

20

Data Integration and the Extraction, Transformation, and Load Process

ETL = Extract Transform Load

Data integration

Integration that comprises three major processes: data access, data federation, and change capture.

Enterprise application integration (EAI)

A technology that provides a vehicle for pushing data from source systems into a data warehouse

Enterprise information integration (EII)

An evolving tool space that promises real-time data integration from a variety of sources, such as relational or multidimensional databases, Web services, etc.

Slide 3-21

Copyright © 2018 Pearson Education Ltd.

Data Integration and the Extraction, Transformation, and Load Process

Slide 3-22

Copyright © 2018 Pearson Education Ltd.

ETL (Extract, Transform, Load)

Issues affecting the purchase of an ETL tool

Data transformation tools are expensive

Data transformation tools may have a long learning curve

Important criteria in selecting an ETL tool

Ability to read from and write to an unlimited number of data sources/architectures

Automatic capturing and delivery of metadata

A history of conforming to open standards

An easy-to-use interface for the developer and the functional user

Slide 3-23

Copyright © 2018 Pearson Education Ltd.

23

Application Case 3.2 BP Lubricants Achieves BIGS Success

Questions for Discussion

What is BIGS?

What were the challenges, the proposed solution, and the obtained results with BIGS?

Slide 3-24

Copyright © 2018 Pearson Education Ltd.

Data Warehouse Development

Data warehouse development approaches

Inmon Model: EDW approach (top-down)

Kimball Model: Data mart approach (bottom-up)

Which model is best?

Table 3.3 provides a comparative analysis between EDW and Data Mart approach

Another alternative is the hosted data warehouses

Slide 3-25

Copyright © 2018 Pearson Education Ltd.

Comparing EDW and Data Mart

Slide 3-26

Copyright © 2018 Pearson Education Ltd.

Application Case 3.3 Use of Teradata Analytics for SAP Solutions Accelerates Big Data Delivery

Questions for Discussion

What were the challenges faced by the large Dutch retailer?

What was the proposed multivendor solution? What were the implementation challenges?

What were the lessons learned?

Slide 3-27

Copyright © 2018 Pearson Education Ltd.

Additional DW Considerations Hosted Data Warehouses

Benefits:

Requires minimal investment in infrastructure

Frees up capacity on in-house systems

Frees up cash flow

Makes powerful solutions affordable

Enables solutions that provide for growth

Offers better quality equipment and software

Provides faster connections

… more in the book

Slide 3-28

Copyright © 2018 Pearson Education Ltd.

Representation of Data in DW

Dimensional Modeling

A retrieval-based system that supports high-volume query access

Star schema

The most commonly used and the simplest style of dimensional modeling

Contain a fact table surrounded by and connected to several dimension tables

Snowflakes schema

An extension of star schema where the diagram resembles a snowflake in shape

Slide 3-29

Copyright © 2018 Pearson Education Ltd.

The ability to organize, present, and analyze data by several dimensions, such as sales by region, by product, by salesperson, and by time (four dimensions)

Multidimensional presentation

Dimensions: products, salespeople, market segments, business units, geographical locations, distribution channels, country, or industry

Measures: money, sales volume, head count, inventory profit, actual versus forecast

Time: daily, weekly, monthly, quarterly, or yearly

Multidimensionality

Slide 3-30

Copyright © 2018 Pearson Education Ltd.

30

Star Schema versus Snowflake Schema

Slide 3-31

Copyright © 2018 Pearson Education Ltd.

Analysis of Data in DW

OLTP vs. OLAP…

OLTP (Online Transaction Processing)

Capturing and storing data from ERP, CRM, POS, …

The main focus is on efficiency of routine tasks

OLAP (Online Analytical Processing)

Converting data into information for decision support

Data cubes, drill-down / rollup, slice & dice, …

Requesting ad hoc reports

Conducting statistical and other analyses

Developing multimedia-based applications

…more in the book

Slide 3-32

Copyright © 2018 Pearson Education Ltd.

OLAP vs. OLTP

Slide 3-33

Copyright © 2018 Pearson Education Ltd.

OLAP Operations

Slice - a subset of a multidimensional array

Dice - a slice on more than two dimensions

Drill Down/Up - navigating among levels of data ranging from the most summarized (up) to the most detailed (down)

Roll Up - computing all of the data relationships for one or more dimensions

Pivot - used to change the dimensional orientation of a report or an ad hoc query-page display

Slide 3-34

Copyright © 2018 Pearson Education Ltd.

OLAP

Slicing Operations on a Simple Tree-Dimensional

Data Cube

Slide 3-35

Copyright © 2018 Pearson Education Ltd.

Successful DW Implementation Things to Avoid

Starting with the wrong sponsorship chain

Setting expectations that you cannot meet

Engaging in politically naive behavior

Loading the data warehouse with information just because it is available

Believing that data warehousing database design is the same as transactional database design

… more in the book

Slide 3-36

Copyright © 2018 Pearson Education Ltd.

Massive DW and Scalability

Scalability

The main issues pertaining to scalability:

The amount of data in the warehouse

How quickly the warehouse is expected to grow

The number of concurrent users

The complexity of user queries

Good scalability means that queries and other data-access functions will grow linearly with the size of the warehouse

Slide 3-37

Copyright © 2018 Pearson Education Ltd.

37

Application Case 3.4 EDW Helps Connect State Agencies in Michigan

Questions for Discussion

Why would a state invest in a large and expensive IT infrastructure (such as an EDW)?

What is the size and complexity of the EDW used by state agencies in Michigan?

What were the challenges, the proposed solution, and the obtained results of the EDW?

Slide 3-38

Copyright © 2018 Pearson Education Ltd.

DW Administration and Security

Data warehouse administrator (DWA)

DWA should…

have the knowledge of high-performance software, hardware, and networking technologies

possess solid business knowledge and insight

be familiar with the decision-making processes so as to suitably design/maintain the data warehouse structure

possess excellent communications skills

Security and privacy is a pressing issue in DW

Safeguarding the most valuable assets

Government regulations (HIPAA, etc.)

Must be explicitly planned and executed

Slide 3-39

Copyright © 2018 Pearson Education Ltd.

The Future of DW

Sourcing…

Web, social media, and Big Data

Open source software

SaaS (software as a service)

Cloud computing

Data lakes

Infrastructure…

Columnar

Real-time DW

Data warehouse appliances

Data management practices/technologies

In-database & In-memory processing New DBMS

New DBMS, Advanced analytics, …

Slide 3-40

Copyright © 2018 Pearson Education Ltd.

Data Lakes

Unstructured data storage technology for Big Data

Data Lake versus Data Warehouse

Slide # of total

Slide 3-41

Copyright © 2018 Pearson Education Ltd.

Business Performance Management

Business Performance Management (BPM) is…

A real-time system that alerts managers to potential opportunities, impending problems, and threats, and then empowers them to react through models and collaboration

Also called corporate performance management (CPM by Gartner Group), enterprise performance management (EPM by Oracle), strategic enterprise management (SEM by SAP)

Slide 3-42

Copyright © 2018 Pearson Education Ltd.

42

Business Performance Management

BPM refers to the business processes, methodologies, metrics, and technologies used by enterprises to measure, monitor, and manage business performance.

BPM encompasses three key components

A set of integrated, closed-loop management and analytic processes, supported by technology …

Tools for businesses to define strategic goals and then measure/manage performance against them

Methods and tools for monitoring key performance indicators (KPIs), linked to organizational strategy

Slide 3-43

Copyright © 2018 Pearson Education Ltd.

43

A Closed-Loop Process to Optimize Business Performance

Process Steps

Strategize

Plan

Monitor/analyze

Act/adjust

Each with its own sub-process steps

Slide 3-44

Copyright © 2018 Pearson Education Ltd.

44

1 - Strategize: Where Do We Want to Go?

Strategic planning

Common tasks for the strategic planning process:

Conduct a current situation analysis

Determine the planning horizon

Conduct an environment scan

Identify critical success factors

Complete a gap analysis

Create a strategic vision

Develop a business strategy

Identify strategic objectives and goals

Slide 3-45

Copyright © 2018 Pearson Education Ltd.

45

2 - Plan: How Do We Get There?

Operational planning

Operational plan: plan that translates an organization’s strategic objectives and goals into a set of well-defined tactics and initiatives, resource requirements, and expected results for some future time period (usually a year).

Operational planning can be

Tactic-centric (operationally focused)

Budget-centric plan (financially focused)

Slide 3-46

Copyright © 2018 Pearson Education Ltd.

46

3 - Monitor/Analyze: How Are We Doing?

A comprehensive framework for monitoring performance should address two key issues:

What to monitor?

Critical success factors

Strategic goals and targets

How to monitor?

Slide 3-47

Copyright © 2018 Pearson Education Ltd.

47

Success (or mere survival) depends on new projects: creating new products, entering new markets, acquiring new customers (or businesses), or streamlining some process.

Many new projects and ventures fail!

What is the chance of failure?

60% of Hollywood movies fail

70% of large IT projects fail, …

4 - Act and Adjust: What Do We Need to Do Differently?

Slide 3-48

Copyright © 2018 Pearson Education Ltd.

48

Application Case 3.5 AARP Transforms Its BI Infrastructure and Achieves a 347% ROI in Three Years

Questions for Discussion

What were the challenges AARP was facing?

What was the approach for a potential solution?

What were the results obtained in the short term, and what were the future plans?

Slide 3-49

Copyright © 2018 Pearson Education Ltd.

Performance measurement system

A system that assists managers in tracking the implementations of business strategy by comparing actual results against strategic goals and objectives

Comprises systematic comparative methods that indicate progress (or lack thereof) against goals

Performance Measurement

Slide 3-50

Copyright © 2018 Pearson Education Ltd.

50

Key performance indicator (KPI)

A KPI represents a strategic objective and metrics that measure performance against a goal

Distinguishing features of KPIs

KPIs and Operational Metrics

Strategy

Targets

Ranges

Encodings

Time frames

Benchmarks

Slide 3-51

Copyright © 2018 Pearson Education Ltd.

51

Key performance indicator (KPI)

Outcome KPIs vs. Driver KPIs

(lagging indicators (leading indicators

e.g., revenues) e.g., sales leads)

Operational areas covered by driver KPIs

Customer performance

Service performance

Sales operations

Sales plan/forecast

Performance Measurement

Slide 3-52

Copyright © 2018 Pearson Education Ltd.

52

Balanced Scorecard (BSC)

A performance measurement and management methodology that helps translate an organization’s financial, customer, internal process, and learning and growth objectives and targets into a set of actionable initiatives

“The Balanced Scorecard: Measures That Drive Performance” (HBR, 1992)

Performance Measurement System

Slide 3-53

Copyright © 2018 Pearson Education Ltd.

53

Balanced Scorecard

The meaning of “balance” ?

Slide 3-54

Copyright © 2018 Pearson Education Ltd.

Copyright © 2018 Pearson Education Ltd.

54

Six Sigma

A performance management methodology aimed at reducing the number of defects in a business process to as close to zero defects per million opportunities (DPMO) as possible

Six Sigma as a Performance Measurement System

Slide 3-55

Copyright © 2018 Pearson Education Ltd.

55

The DMAIC performance model

A closed-loop business improvement model that encompasses the steps of defining, measuring, analyzing, improving, and controlling a process

Lean Six Sigma

Lean manufacturing / lean production

Lean production versus six sigma?

Six Sigma as a Performance Measurement System

Slide 3-56

Copyright © 2018 Pearson Education Ltd.

56

Comparison of BSC and Six Sigma

Slide 1- 57

Slide 3-57

Copyright © 2018 Pearson Education Ltd.

Effective Performance Measurement Should

Measures should focus on key factors.

Measures should be a mix of past, present, and future.

Measures should balance the needs of shareholders, employees, partners, suppliers, and other stakeholders.

Measures should start at the top and flow down to the bottom.

Measures need to have targets that are based on research and reality rather than arbitrary.

Slide # of total

Slide 3-58

Copyright © 2018 Pearson Education Ltd.

Application Case 3.6 Expedia.com’s Customer Satisfaction Scorecard

Questions for Discussion

Who are the customers for Expedia.com? Why is customer satisfaction a very important part of their business?

How did Expedia.com improve customer satisfaction with scorecards?

What were the challenges, the proposed solution, and the obtained results?

Slide 3-59

Copyright © 2018 Pearson Education Ltd.

Plenty of Resources for DW @ TUN

Teradata University Network (TUN)

Slide 3-60

TeradataUniversityNetwork.com

Copyright © 2018 Pearson Education Ltd.

End of Chapter 3

Questions / Comments

Slide 3-61

Copyright © 2018 Pearson Education Ltd.

What happened?

What is happening?

What will happen?

Why will it happen?

What should I do?

Why should I do it?

üBusiness reporting

üDashboards

üScorecards

üData warehousing

üData mining

üText mining

üWeb/media mining

üForecasting

üOptimization

üSimulation

üDecision modeling

üExpert systems

Well defined

business problems

and opportunities

Accurate projections

of future events and

outcomes

Best possible

business decisions

and actions

Q

u

e

s

t

i

o

n

s

E

n

a

b

l

e

r

s

O

u

t

c

o

m

e

s

DescriptivePredictivePrescriptive

Business Analytics

Business Intelligence

Advanced Analytics

1970s1980s1990s2000s2010s

üMainframe computers

üSimple data entry

üRoutine reporting

üPrimitive database structures

üTeradata incorporated

üMini/personal computers (PCs)

üBusiness applications for PCs

üDistributer DBMS

üRelational DBMS

üTeradata ships commercial DBs

üBusiness Data Warehousecoined

üCentralized data storage

üData warehousing was born

üInmon, Building the Data Warehouse

üKimball, The Data Warehouse Toolkit

üEDW architecture design

üExponentially growing data Web data

üConsolidation of DW/BI industry

üData warehouse appliances emerged

üBusiness intelligence popularized

üData mining and predictive modeling

üOpen source software

üSaaS, PaaS, Cloud Computing

üBig Data analytics

üSocial media analytics

üText and Web Analytics

üHadoop, MapReduce, NoSQL

üIn-memory, in-database

Data Warehouse

One management and analytics platform

for product configuration, warranty, and

diagnostic readout data

Reduced

Infrastructure

Expenses

2/3 cost reduction through

data mart consolidation

Produced Warranty

Expenses

Improved reimbursement

accuracy through improved

claim data quality

Improved Cost of

Quality

Faster identification,

prioritization, and resolution

of quality issues

Accurate

Environmental

Performance

Reporting

IT Architecture

Standardization

One strategic platform for

business intelligence and

compliance reporting

Data

Sources

ERP

Legacy

POS

Other

OLTP/Web

External

Data

Select

Transform

Extract

Integrate

Load

ETL

Process

Enterprise

Data warehouse

Metadata

Replication

A

P

I

/

M

i

d

d

l

e

w

a

r

e

Data/text

mining

Custom built

applications

OLAP,

Dashboard,

Web

Routine

Business

Reporting

Applications

(Visualization)

Data mart

(Operations)

Data mart

(Marketing)

Data mart

(Finance)

Data mart

(...)

Data

Marts

No data marts option

Tier 2:

Application server

Tier 1:

Client workstation

Tier 3:

Database server

Tier 1:

Client workstation

Tier 2:

Application & database server

Web

Server

Client

(Web browser)

Application

Server

Data

warehouse

Web pages

Internet/

Intranet/

Extranet

Source

Systems

Staging

Area

Independent data marts

(atomic/summarized data)

End user

access and

applications

ETL

(a) Independent Data Marts Architecture

Source

Systems

Staging

Area

End user

access and

applications

ETL

Dimensionalized data marts

linked by conformed dimentions

(atomic/summarized data)

(b) Data Mart Bus Architecture with Linked Dimensional Datamarts

Source

Systems

Staging

Area

End user

access and

applications

ETL

Normalized relational

warehouse (atomic data)

Dependent data marts

(summarized/some atomic data)

(c) Hub and Spoke Architecture (Corporate Information Factory)

Source

Systems

Staging

Area

Normalized relational

warehouse (atomic/some

summarized data)

End user

access and

applications

ETL

(d) Centralized Data Warehouse Architecture

End user

access and

applications

Logical/physical integration of

common data elements

Existing data warehouses

Data marts and legacy systmes

Data mapping / metadata

(e) Federated Architecture

Packaged

application

Legacy

system

Other internal

applications

Transient

data source

Data

warehouse

Data

marts

ExtractExtractExtractExtract

Product

T

i

m

e

G

e

o

g

r

a

p

h

y

Sales volumes of

a specific Product

on variable Time

and Region

Sales volumes of

a specific Region

on variable Time

and Products

Sales volumes of

a specific Time on

variable Region

and Products

Cells are filled

with numbers

representing

sales volumes

A 3-dimensional

OLAP cube with

slicing

operations

VISION &

STRATEGY

Internal

Business

Process

Perspective

Customer

Perspective

Financial

Perspective

Learning and

Growth

Perspective