Business Intelligence and Data Mining Assignment
Business Intelligence, Analytics, and Data Science: A Managerial Perspective
Fourth Edition
Chapter 3
Descriptive Analytics II: Business Intelligence and Data Warehousing
Copyright © 2018 Pearson Education Ltd.
Copyright © 2018 Pearson Education Ltd.
Learning Objectives (1 of 2)
3.1 Understand the basic definitions and concepts of data warehousing
3.2 Understand data warehousing architectures
3.3 Describe the processes used in developing and managing data warehouses
3.4 Explain data warehousing operations
3.5 Explain the role of data warehouses in decision support
Slide 3-2
Copyright © 2018 Pearson Education Ltd.
Slide 2 is a list of textbook LO numbers and statements.
2
Learning Objectives (2 of 2)
3.6 Explain data integration and the extraction, transformation, and load (ETL) processes
3.7 Understand the essence of business performance management (BPM)
3.8 Learn balanced scorecard and Six Sigma as performance measurement systems
Slide 3-3
Copyright © 2018 Pearson Education Ltd.
Slide 3 is a list of textbook LO numbers and statements.
3
OPENING VIGNETTE Targeting Tax Fraud with Business Intelligence and Data Warehousing
Why is it important for IRS and for U.S. state governments to use data warehousing and business intelligence (BI) tools in managing state revenues?
What were the challenges the state of Maryland was facing with regard to tax fraud?
What was the solution they adopted? Do you agree with their approach? Why?
What were the results that they obtained? Did the investment in BI and data warehousing pay off?
What other problems and challenges do you think federal and state governments are having that can benefit from BI and data warehousing?
Slide 3-4
Copyright © 2018 Pearson Education Ltd.
Business Intelligence and Data Warehousing
BI used to be everything related to use of data for managerial decision support
Now, it is a part of Business Analytics
BI = Descriptive Analytics
Slide 3-5
Copyright © 2018 Pearson Education Ltd.
What is a Data Warehouse?
A physical repository where relational data are specially organized to provide enterprise-wide, cleansed data in a standardized format
A relational database? (so what is the difference?)
“The data warehouse is a collection of integrated, subject-oriented databases designed to support DSS functions, where each unit of data is non-volatile and relevant to some moment in time”
Slide 3-6
Copyright © 2018 Pearson Education Ltd.
A Historical Perspective to Data Warehousing
Slide 3-7
Copyright © 2018 Pearson Education Ltd.
Characteristics of DWs
Subject oriented
Integrated
Time-variant (time series)
Nonvolatile
Summarized
Not normalized
Metadata
Web based, relational/multi-dimensional
Client/server, real-time/right-time/active...
Slide 3-8
Copyright © 2018 Pearson Education Ltd.
Data Mart
A departmental small-scale “DW” that stores only limited/relevant data
Dependent data mart
A subset that is created directly from a data warehouse
Independent data mart
A small data warehouse designed for a strategic business unit or a department
Slide 3-9
Copyright © 2018 Pearson Education Ltd.
Other DW Components
Operational data stores (ODS)
A type of database often used as an interim area for a data warehouse
Oper marts
An operational data mart
Enterprise data warehouse (EDW)
A data warehouse for the enterprise
Metadata – “data about data”
In DW metadata describe the contents of a data warehouse and its acquisition and use
Slide 3-10
Copyright © 2018 Pearson Education Ltd.
Application Case 3.1 A Better Data Plan: Well-Established TELCOs Leverage Data Warehousing and Analytics to Stay on Top in a Competitive Industry
Questions for Discussion
What are the main challenges for TELCOs?
How can data warehousing and data analytics help TELCOs in overcoming their challenges?
Why do you think TELCOs are well suited to take full advantage of data analytics?
Slide 3-11
Copyright © 2018 Pearson Education Ltd.
DW for Data-Driven Decision Making
An example of a DW supporting data-driven decision making in automotive industry
Slide 3-12
Copyright © 2018 Pearson Education Ltd.
A Generic DW Framework
Slide 3-13
Copyright © 2018 Pearson Education Ltd.
13
DW Architecture
Three-tier architecture
Data acquisition software (back-end)
The data warehouse that contains the data & software
Client (front-end) software that allows users to access and analyze data from the warehouse
Two-tier architecture
First two tiers in three-tier architecture are combined into one
… sometimes there is only one tier?
Slide 3-14
Copyright © 2018 Pearson Education Ltd.
14
DW Architectures
3-tier
architecture
2-tier
architecture
1-tier
Architecture
?
Slide 3-15
Copyright © 2018 Pearson Education Ltd.
15
Data Warehousing Architectures
Issues to consider when deciding which architecture to use:
Which database management system (DBMS) should be used?
Will parallel processing and/or partitioning be used?
Will data migration tools be used to load the data warehouse?
What tools will be used to support data retrieval and analysis?
Slide 3-16
Copyright © 2018 Pearson Education Ltd.
A Web-based DW Architecture
Slide 3-17
Copyright © 2018 Pearson Education Ltd.
17
Alternative DW Architectures
Slide 3-18
Copyright © 2018 Pearson Education Ltd.
18
Alternative DW Architectures
Each architecture has advantages and disadvantages!
Which architecture is the best?
Slide 3-19
Copyright © 2018 Pearson Education Ltd.
19
Ten Factors that Potentially Affect the Architecture Selection Decision
Information interdependence between organizational units
Upper management’s information needs
Urgency of need for a data warehouse
Nature of end-user tasks
Constraints on resources
Strategic view of the data warehouse prior to implementation
Compatibility with existing systems
Perceived ability of the in-house IT staff
Technical issues
Social/political factors
Slide 3-20
Copyright © 2018 Pearson Education Ltd.
20
Data Integration and the Extraction, Transformation, and Load Process
ETL = Extract Transform Load
Data integration
Integration that comprises three major processes: data access, data federation, and change capture.
Enterprise application integration (EAI)
A technology that provides a vehicle for pushing data from source systems into a data warehouse
Enterprise information integration (EII)
An evolving tool space that promises real-time data integration from a variety of sources, such as relational or multidimensional databases, Web services, etc.
Slide 3-21
Copyright © 2018 Pearson Education Ltd.
Data Integration and the Extraction, Transformation, and Load Process
Slide 3-22
Copyright © 2018 Pearson Education Ltd.
ETL (Extract, Transform, Load)
Issues affecting the purchase of an ETL tool
Data transformation tools are expensive
Data transformation tools may have a long learning curve
Important criteria in selecting an ETL tool
Ability to read from and write to an unlimited number of data sources/architectures
Automatic capturing and delivery of metadata
A history of conforming to open standards
An easy-to-use interface for the developer and the functional user
Slide 3-23
Copyright © 2018 Pearson Education Ltd.
23
Application Case 3.2 BP Lubricants Achieves BIGS Success
Questions for Discussion
What is BIGS?
What were the challenges, the proposed solution, and the obtained results with BIGS?
Slide 3-24
Copyright © 2018 Pearson Education Ltd.
Data Warehouse Development
Data warehouse development approaches
Inmon Model: EDW approach (top-down)
Kimball Model: Data mart approach (bottom-up)
Which model is best?
Table 3.3 provides a comparative analysis between EDW and Data Mart approach
Another alternative is the hosted data warehouses
Slide 3-25
Copyright © 2018 Pearson Education Ltd.
Comparing EDW and Data Mart
Slide 3-26
Copyright © 2018 Pearson Education Ltd.
Application Case 3.3 Use of Teradata Analytics for SAP Solutions Accelerates Big Data Delivery
Questions for Discussion
What were the challenges faced by the large Dutch retailer?
What was the proposed multivendor solution? What were the implementation challenges?
What were the lessons learned?
Slide 3-27
Copyright © 2018 Pearson Education Ltd.
Additional DW Considerations Hosted Data Warehouses
Benefits:
Requires minimal investment in infrastructure
Frees up capacity on in-house systems
Frees up cash flow
Makes powerful solutions affordable
Enables solutions that provide for growth
Offers better quality equipment and software
Provides faster connections
… more in the book
Slide 3-28
Copyright © 2018 Pearson Education Ltd.
Representation of Data in DW
Dimensional Modeling
A retrieval-based system that supports high-volume query access
Star schema
The most commonly used and the simplest style of dimensional modeling
Contain a fact table surrounded by and connected to several dimension tables
Snowflakes schema
An extension of star schema where the diagram resembles a snowflake in shape
Slide 3-29
Copyright © 2018 Pearson Education Ltd.
The ability to organize, present, and analyze data by several dimensions, such as sales by region, by product, by salesperson, and by time (four dimensions)
Multidimensional presentation
Dimensions: products, salespeople, market segments, business units, geographical locations, distribution channels, country, or industry
Measures: money, sales volume, head count, inventory profit, actual versus forecast
Time: daily, weekly, monthly, quarterly, or yearly
Multidimensionality
Slide 3-30
Copyright © 2018 Pearson Education Ltd.
30
Star Schema versus Snowflake Schema
Slide 3-31
Copyright © 2018 Pearson Education Ltd.
Analysis of Data in DW
OLTP vs. OLAP…
OLTP (Online Transaction Processing)
Capturing and storing data from ERP, CRM, POS, …
The main focus is on efficiency of routine tasks
OLAP (Online Analytical Processing)
Converting data into information for decision support
Data cubes, drill-down / rollup, slice & dice, …
Requesting ad hoc reports
Conducting statistical and other analyses
Developing multimedia-based applications
…more in the book
Slide 3-32
Copyright © 2018 Pearson Education Ltd.
OLAP vs. OLTP
Slide 3-33
Copyright © 2018 Pearson Education Ltd.
OLAP Operations
Slice - a subset of a multidimensional array
Dice - a slice on more than two dimensions
Drill Down/Up - navigating among levels of data ranging from the most summarized (up) to the most detailed (down)
Roll Up - computing all of the data relationships for one or more dimensions
Pivot - used to change the dimensional orientation of a report or an ad hoc query-page display
Slide 3-34
Copyright © 2018 Pearson Education Ltd.
OLAP
Slicing Operations on a Simple Tree-Dimensional
Data Cube
Slide 3-35
Copyright © 2018 Pearson Education Ltd.
Successful DW Implementation Things to Avoid
Starting with the wrong sponsorship chain
Setting expectations that you cannot meet
Engaging in politically naive behavior
Loading the data warehouse with information just because it is available
Believing that data warehousing database design is the same as transactional database design
… more in the book
Slide 3-36
Copyright © 2018 Pearson Education Ltd.
Massive DW and Scalability
Scalability
The main issues pertaining to scalability:
The amount of data in the warehouse
How quickly the warehouse is expected to grow
The number of concurrent users
The complexity of user queries
Good scalability means that queries and other data-access functions will grow linearly with the size of the warehouse
Slide 3-37
Copyright © 2018 Pearson Education Ltd.
37
Application Case 3.4 EDW Helps Connect State Agencies in Michigan
Questions for Discussion
Why would a state invest in a large and expensive IT infrastructure (such as an EDW)?
What is the size and complexity of the EDW used by state agencies in Michigan?
What were the challenges, the proposed solution, and the obtained results of the EDW?
Slide 3-38
Copyright © 2018 Pearson Education Ltd.
DW Administration and Security
Data warehouse administrator (DWA)
DWA should…
have the knowledge of high-performance software, hardware, and networking technologies
possess solid business knowledge and insight
be familiar with the decision-making processes so as to suitably design/maintain the data warehouse structure
possess excellent communications skills
Security and privacy is a pressing issue in DW
Safeguarding the most valuable assets
Government regulations (HIPAA, etc.)
Must be explicitly planned and executed
Slide 3-39
Copyright © 2018 Pearson Education Ltd.
The Future of DW
Sourcing…
Web, social media, and Big Data
Open source software
SaaS (software as a service)
Cloud computing
Data lakes
Infrastructure…
Columnar
Real-time DW
Data warehouse appliances
Data management practices/technologies
In-database & In-memory processing New DBMS
New DBMS, Advanced analytics, …
Slide 3-40
Copyright © 2018 Pearson Education Ltd.
Data Lakes
Unstructured data storage technology for Big Data
Data Lake versus Data Warehouse
Slide # of total
Slide 3-41
Copyright © 2018 Pearson Education Ltd.
Business Performance Management
Business Performance Management (BPM) is…
A real-time system that alerts managers to potential opportunities, impending problems, and threats, and then empowers them to react through models and collaboration
Also called corporate performance management (CPM by Gartner Group), enterprise performance management (EPM by Oracle), strategic enterprise management (SEM by SAP)
Slide 3-42
Copyright © 2018 Pearson Education Ltd.
42
Business Performance Management
BPM refers to the business processes, methodologies, metrics, and technologies used by enterprises to measure, monitor, and manage business performance.
BPM encompasses three key components
A set of integrated, closed-loop management and analytic processes, supported by technology …
Tools for businesses to define strategic goals and then measure/manage performance against them
Methods and tools for monitoring key performance indicators (KPIs), linked to organizational strategy
Slide 3-43
Copyright © 2018 Pearson Education Ltd.
43
A Closed-Loop Process to Optimize Business Performance
Process Steps
Strategize
Plan
Monitor/analyze
Act/adjust
Each with its own sub-process steps
Slide 3-44
Copyright © 2018 Pearson Education Ltd.
44
1 - Strategize: Where Do We Want to Go?
Strategic planning
Common tasks for the strategic planning process:
Conduct a current situation analysis
Determine the planning horizon
Conduct an environment scan
Identify critical success factors
Complete a gap analysis
Create a strategic vision
Develop a business strategy
Identify strategic objectives and goals
Slide 3-45
Copyright © 2018 Pearson Education Ltd.
45
2 - Plan: How Do We Get There?
Operational planning
Operational plan: plan that translates an organization’s strategic objectives and goals into a set of well-defined tactics and initiatives, resource requirements, and expected results for some future time period (usually a year).
Operational planning can be
Tactic-centric (operationally focused)
Budget-centric plan (financially focused)
Slide 3-46
Copyright © 2018 Pearson Education Ltd.
46
3 - Monitor/Analyze: How Are We Doing?
A comprehensive framework for monitoring performance should address two key issues:
What to monitor?
Critical success factors
Strategic goals and targets
How to monitor?
Slide 3-47
Copyright © 2018 Pearson Education Ltd.
47
Success (or mere survival) depends on new projects: creating new products, entering new markets, acquiring new customers (or businesses), or streamlining some process.
Many new projects and ventures fail!
What is the chance of failure?
60% of Hollywood movies fail
70% of large IT projects fail, …
4 - Act and Adjust: What Do We Need to Do Differently?
Slide 3-48
Copyright © 2018 Pearson Education Ltd.
48
Application Case 3.5 AARP Transforms Its BI Infrastructure and Achieves a 347% ROI in Three Years
Questions for Discussion
What were the challenges AARP was facing?
What was the approach for a potential solution?
What were the results obtained in the short term, and what were the future plans?
Slide 3-49
Copyright © 2018 Pearson Education Ltd.
Performance measurement system
A system that assists managers in tracking the implementations of business strategy by comparing actual results against strategic goals and objectives
Comprises systematic comparative methods that indicate progress (or lack thereof) against goals
Performance Measurement
Slide 3-50
Copyright © 2018 Pearson Education Ltd.
50
Key performance indicator (KPI)
A KPI represents a strategic objective and metrics that measure performance against a goal
Distinguishing features of KPIs
KPIs and Operational Metrics
Strategy
Targets
Ranges
Encodings
Time frames
Benchmarks
Slide 3-51
Copyright © 2018 Pearson Education Ltd.
51
Key performance indicator (KPI)
Outcome KPIs vs. Driver KPIs
(lagging indicators (leading indicators
e.g., revenues) e.g., sales leads)
Operational areas covered by driver KPIs
Customer performance
Service performance
Sales operations
Sales plan/forecast
Performance Measurement
Slide 3-52
Copyright © 2018 Pearson Education Ltd.
52
Balanced Scorecard (BSC)
A performance measurement and management methodology that helps translate an organization’s financial, customer, internal process, and learning and growth objectives and targets into a set of actionable initiatives
“The Balanced Scorecard: Measures That Drive Performance” (HBR, 1992)
Performance Measurement System
Slide 3-53
Copyright © 2018 Pearson Education Ltd.
53
Balanced Scorecard
The meaning of “balance” ?
Slide 3-54
Copyright © 2018 Pearson Education Ltd.
Copyright © 2018 Pearson Education Ltd.
54
Six Sigma
A performance management methodology aimed at reducing the number of defects in a business process to as close to zero defects per million opportunities (DPMO) as possible
Six Sigma as a Performance Measurement System
Slide 3-55
Copyright © 2018 Pearson Education Ltd.
55
The DMAIC performance model
A closed-loop business improvement model that encompasses the steps of defining, measuring, analyzing, improving, and controlling a process
Lean Six Sigma
Lean manufacturing / lean production
Lean production versus six sigma?
Six Sigma as a Performance Measurement System
Slide 3-56
Copyright © 2018 Pearson Education Ltd.
56
Comparison of BSC and Six Sigma
Slide 1- 57
Slide 3-57
Copyright © 2018 Pearson Education Ltd.
Effective Performance Measurement Should
Measures should focus on key factors.
Measures should be a mix of past, present, and future.
Measures should balance the needs of shareholders, employees, partners, suppliers, and other stakeholders.
Measures should start at the top and flow down to the bottom.
Measures need to have targets that are based on research and reality rather than arbitrary.
Slide # of total
Slide 3-58
Copyright © 2018 Pearson Education Ltd.
Application Case 3.6 Expedia.com’s Customer Satisfaction Scorecard
Questions for Discussion
Who are the customers for Expedia.com? Why is customer satisfaction a very important part of their business?
How did Expedia.com improve customer satisfaction with scorecards?
What were the challenges, the proposed solution, and the obtained results?
Slide 3-59
Copyright © 2018 Pearson Education Ltd.
Plenty of Resources for DW @ TUN
Teradata University Network (TUN)
Slide 3-60
TeradataUniversityNetwork.com
Copyright © 2018 Pearson Education Ltd.
End of Chapter 3
Questions / Comments
Slide 3-61
Copyright © 2018 Pearson Education Ltd.
What happened?
What is happening?
What will happen?
Why will it happen?
What should I do?
Why should I do it?
üBusiness reporting
üDashboards
üScorecards
üData warehousing
üData mining
üText mining
üWeb/media mining
üForecasting
üOptimization
üSimulation
üDecision modeling
üExpert systems
Well defined
business problems
and opportunities
Accurate projections
of future events and
outcomes
Best possible
business decisions
and actions
Q
u
e
s
t
i
o
n
s
E
n
a
b
l
e
r
s
O
u
t
c
o
m
e
s
DescriptivePredictivePrescriptive
Business Analytics
Business Intelligence
Advanced Analytics
1970s1980s1990s2000s2010s
üMainframe computers
üSimple data entry
üRoutine reporting
üPrimitive database structures
üTeradata incorporated
üMini/personal computers (PCs)
üBusiness applications for PCs
üDistributer DBMS
üRelational DBMS
üTeradata ships commercial DBs
üBusiness Data Warehousecoined
üCentralized data storage
üData warehousing was born
üInmon, Building the Data Warehouse
üKimball, The Data Warehouse Toolkit
üEDW architecture design
üExponentially growing data Web data
üConsolidation of DW/BI industry
üData warehouse appliances emerged
üBusiness intelligence popularized
üData mining and predictive modeling
üOpen source software
üSaaS, PaaS, Cloud Computing
üBig Data analytics
üSocial media analytics
üText and Web Analytics
üHadoop, MapReduce, NoSQL
üIn-memory, in-database
Data Warehouse
One management and analytics platform
for product configuration, warranty, and
diagnostic readout data
Reduced
Infrastructure
Expenses
2/3 cost reduction through
data mart consolidation
Produced Warranty
Expenses
Improved reimbursement
accuracy through improved
claim data quality
Improved Cost of
Quality
Faster identification,
prioritization, and resolution
of quality issues
Accurate
Environmental
Performance
Reporting
IT Architecture
Standardization
One strategic platform for
business intelligence and
compliance reporting
Data
Sources
ERP
Legacy
POS
Other
OLTP/Web
External
Data
Select
Transform
Extract
Integrate
Load
ETL
Process
Enterprise
Data warehouse
Metadata
Replication
A
P
I
/
M
i
d
d
l
e
w
a
r
e
Data/text
mining
Custom built
applications
OLAP,
Dashboard,
Web
Routine
Business
Reporting
Applications
(Visualization)
Data mart
(Operations)
Data mart
(Marketing)
Data mart
(Finance)
Data mart
(...)
Data
Marts
No data marts option
Tier 2:
Application server
Tier 1:
Client workstation
Tier 3:
Database server
Tier 1:
Client workstation
Tier 2:
Application & database server
Web
Server
Client
(Web browser)
Application
Server
Data
warehouse
Web pages
Internet/
Intranet/
Extranet
Source
Systems
Staging
Area
Independent data marts
(atomic/summarized data)
End user
access and
applications
ETL
(a) Independent Data Marts Architecture
Source
Systems
Staging
Area
End user
access and
applications
ETL
Dimensionalized data marts
linked by conformed dimentions
(atomic/summarized data)
(b) Data Mart Bus Architecture with Linked Dimensional Datamarts
Source
Systems
Staging
Area
End user
access and
applications
ETL
Normalized relational
warehouse (atomic data)
Dependent data marts
(summarized/some atomic data)
(c) Hub and Spoke Architecture (Corporate Information Factory)
Source
Systems
Staging
Area
Normalized relational
warehouse (atomic/some
summarized data)
End user
access and
applications
ETL
(d) Centralized Data Warehouse Architecture
End user
access and
applications
Logical/physical integration of
common data elements
Existing data warehouses
Data marts and legacy systmes
Data mapping / metadata
(e) Federated Architecture
Packaged
application
Legacy
system
Other internal
applications
Transient
data source
Data
warehouse
Data
marts
ExtractExtractExtractExtract
Product
T
i
m
e
G
e
o
g
r
a
p
h
y
Sales volumes of
a specific Product
on variable Time
and Region
Sales volumes of
a specific Region
on variable Time
and Products
Sales volumes of
a specific Time on
variable Region
and Products
Cells are filled
with numbers
representing
sales volumes
A 3-dimensional
OLAP cube with
slicing
operations
VISION &
STRATEGY
Internal
Business
Process
Perspective
Customer
Perspective
Financial
Perspective
Learning and
Growth
Perspective