Developing a Modern Data Architecture Overview White Paper
DEVELOPING A MODERN ENTERPRISE
DATA STRATEGY Edd Wilder-James, Scott Kurth
March 2017
22 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
TODAY’S SCHEDULE Introduction Why Have a Data Strategy?
Connecting Data with the Business
Understanding Data Gaps
The Data Platform Architecture
Break
Identifying Strategic Workloads
The Chief Data Officer
The Experimental Enterprise
INTRODUCTION
3 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
To view SVDS speakers and scheduling, or to receive a copy of our slides, go to:
www.svds.com/StrataCA2017
4 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
Silicon Valley Data Science is a boutique consulting firm focused on transforming your business through data science and engineering.
5 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
WE DO DATA RIGHT • We work in cross-functional teams made up of data
scientists, engineers, and solutions architects.
• We combine enterprise know-how with custom methods derived from Silicon Valley best practices.
• We use an Agile Software Development approach to make rapid progress against difficult problems that require flexibility.
• We focus on delivering business value as early as possible, then iterating toward the larger goal.
6 @SVDataScience6 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED.
OUR SERVICES
DATA STRATEGY
AGILE ENGINEERING
AGILE DATA SCIENCE
ARCHITECTURE
7 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
Supports investigative work and builds a solid layer for production.
Conducts experiments and responds to the changing environment.
Makes foundational infrastructure readily accessible.
THE EXPERIMENTAL ENTERPRISE
8 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
THE DATA VALUE CHAIN DRAW VALUE FROM YOUR STRATEGIC DATA ASSETS
DISCOVER INGEST PROCESS PERSIST INTEGRATE ANALYZE EXPOSE
9 @SVDataScience
WHAT’S ON YOUR MIND? What is preventing your organization from realizing its vision?
1010 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
TODAY’S SCHEDULE Introduction
Why Have a Data Strategy?
Connecting Data with the Business
Understanding Data Gaps
The Data Platform Architecture
Break
Identifying Strategic Workloads
The Chief Data Officer
The Experimental Enterprise
WHY HAVE A DATA STRATEGY?
11 @SVDataScience
DATA STRATEGY is not for the faint of heart*
* Creating an Enterprise Data Strategy by Wayne Eckerson http://www.enterprisemanagement360.com/white_paper/creating-an-enterprise-data-strategy/
12 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
The alternative is to treat data as a cost of business, to be minimized.
Data must serve the strategic imperatives of a business: the key strategic aspirations that define the future vision for an organization.
IS THERE AN ALTERNATIVE?
13 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
A modern data strategy is a roadmap to enable data- driven decision-making and applications that helps an enterprise achieve its strategic imperatives.
An effective data strategy helps an enterprise make technology choices, grounded in business priorities, to get the most value from their data.
IS THERE AN ALTERNATIVE?
14 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
CONNECTING TECHNOLOGY AND BUSINESS VALUE If you find that:
• you can’t articulate how the cost of your data systems relates to the benefits to your business, or
• you can’t articulate how your technology philosophy enables your business aspirations
then your organization would almost certainly benefit from data strategy.
15 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
Poll: • Is the technology
leadership in your organization prioritizes investments to meet the ambitions of the business?
• Can your organization clearly articulate the business impact of the data and technology investments it makes?
ARTICULATING THE BUSINESS IMPACT OF DATA & TECHNOLOGY
1616 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
TODAY’S SCHEDULE Introduction
Why Have a Data Strategy?
Connecting Data with the Business
Understanding Data Gaps
The Data Platform Architecture
Break
Identifying Strategic Workloads
The Chief Data Officer
The Experimental Enterprise
CONNECTING DATA WITH THE BUSINESS
17 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
CLEAN VALIDATE CONTROL PROTECT
CONVENTIONAL DATA STRATEGY “WHAT YOU DO TO DATA”
18 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
CONVENTIONAL WISDOM: 10 THINGS A DATA STRATEGY SHOULD INCLUDE* 1. What data should be collected?
2. How long should data be kept?
3. Where should the data be stored?
4. How will data privacy and security be managed?
5. From where can data be accessed?
6. What data can be displayed?
7. What level of detail should be retained?
8. Who is responsible for the data (governance)?
9. How is data integrated?
10. How will data be distributed (virtualization?)
* 10 Key Elements of your Data Strategy by Mike Schiff http://www.tdwi.org//Articles/2012/01/17/10-Elements-Data-Strategy.aspx?Page=1
19 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
MODERN DATA STRATEGY “WHAT YOU DO WITH DATA”
TARGET VIP CUSTOMERS ATTRACT NEW CUSTOMERS
AUTOMATE
20 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
A NEW ORTHODOXY? FOUR PRINCIPLES OF A SUCCESSFUL DATA STRATEGY*
1. How does data generate value?
2. What are our critical data assets?
3. What is our data ecosystem?
4. How do we govern data?
* The 4 Principles of a Successful Data Strategy by Paul Barth http://www.cioupdate.com/insights/article.php/3936706/The-4-Principles-of-a- Successful-Data-Strategy.htm
21 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
EDW Governance Security
NOT ALL DATA IS EQUAL
Conventional data strategy
22 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
EDW Governance Security
NOT ALL DATA IS EQUAL
Modern data strategy
23 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
WHAT IS A DATA STRATEGY?
Existing data & technology
Possible data & technology
Business strategic
ambitions
Constraints Priorities
Roadmap of investments
Tools to update and assess roadmap
Plan to update capabilities
24 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
Modern Role of Data: Represents the new role data and analytics play in the enterprise.
Outcomes, not Operations: A strategic notion of maturity should begin with value creation before addressing underlying operational processes.
Transforming Pragmatically: Changes are grounded in the holistic view of the future state of your enterprise.
A NEW NOTION OF MATURITY
25 @SVDataScience25 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED.
An organization’s ability to derive value from its data defines its maturity.
NEW STAGES, NEW DIMENSIONS
ASSETS
CULTURE
DECISIONS
OUTCOMES
Illustrative
26 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
Not just the technology! • People • Processes • Systems
DIMENSIONS OF DATA MATURITY
27 @SVDataScience27 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED.
CURIOUS WHERE YOU FALL?
ASSETS
CULTURE
DECISIONS
OUTCOMES
IllustrativeMaturity Mini-Assessment • 20Q survey (5-10 min)
• Identifies your stage and provides general recommendations
• Creates baseline for future performance and growth
dmm.svds.com
28 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
• Infrastructure is holding back growth
• Infrastructure is holding back development
• Analog to digital transformation
• Changing business models
• Unifying fragmented offerings
YOU NEED A DATA STRATEGY WHEN…
29 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
BEGIN WITH THE BUSINESS • First understand what drives your business
• Then make the leap from strategy to tactics
Technologists: This can’t be done without the business leaders in the room
Business Leaders: This can’t be done without the technologists in the room
3030 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
Understand the strategic imperatives of your organization:
• Annual report
• Investor updates
• Talk to leadership
STRATEGIC IMPERATIVES
31 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
Break down the strategic imperatives to make them tangible, achievable, and measurable. These become your business objectives.
Business objectives provide the guide for many other analyses in building your data strategy.
BUSINESS OBJECTIVES
32 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
REAL ESTATE MARKETPLACE: ZILLOW Business Objectives
• Build and maintain best algorithms for pricing • Use Hedonic pricing method to incorporate multiple attributes
and ‘nearest neighbors’ to create accurate Zestimate® • Deploy sophisticated and adaptive models, at scale (over 110
million homes) and at timely interval (3 times / week) • Use scalable infrastructure (cloud) for rapid analysis
• Build industry’s best real estate data sets • Increase completeness of data by include public data sets such
as construction listings, foreclosure listings, market context • Capture unique data with customer reviews and feedback from
real-estate firms • Manage scale of 110 million properties
and growing
Strategic Imperatives • Provide products and
services to help consumers with every stage of home ownership – buying, selling, renting, borrowing, and remodeling
• Generate more subscription and ad revenue
• Drive more unique users to marketplace
• Become leading real estate and home-related information marketplace on mobile and web
NOTE: Zillow is not an SVDS client.
33 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
HEALTH PROVIDER: KAISER PERMANENTE
Business Objectives • Increase data sharing with extended care teams
through secure electronic health record access
• Provide quicker, better diagnoses through evidence- based medicine techniques
• Provide mobile access to scheduling, pharmacy interactions, and other related services
• Improve member satisfaction by analyzing web and mobile user interactions, behavior, and feedback data
• Share access to knowledge, innovation, and population data with the public and other health care leaders
Strategic Imperatives • Provide seamless,
personalized care through an integrated team of care providers
• Enable members to manage their own care through easy-to-use channels
• Transform care and improve outcomes through investments in research and innovation
NOTE: Kaiser Permanente is not an SVDS client.
34 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
REAL ESTATE MARKETPLACE: ZILLOW
STRATEGIC IMPERATIVES
• Provide products and services to help consumers with every stage of home ownership – buying, selling, renting, borrowing, and remodeling
• Generate more subscription and ad revenue
• Drive more unique users to marketplace
• Become leading real estate and home-related information marketplace on mobile and web
NOTE: Zillow is not an SVDS client.
35 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
REAL ESTATE MARKETPLACE: ZILLOW
BUSINESS OBJECTIVES
1. Build and maintain best algorithms for pricing
• Use Hedonic pricing method to incorporate multiple attributes and ‘nearest neighbors’ to create accurate Zestimate®
• Deploy sophisticated and adaptive models, at scale (over 110 million homes) and at timely interval (3 times / week)
• Use scalable infrastructure (cloud) for rapid analysis
NOTE: Zillow is not an SVDS client.
36 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
REAL ESTATE MARKETPLACE: ZILLOW
BUSINESS OBJECTIVES
2. Build industry’s best real estate data sets
• Increase completeness of data by include public data sets such as construction listings, foreclosure listings, market context
• Capture unique data with customer reviews and feedback from real-estate firms
• Manage scale of 110 million properties and growing
NOTE: Zillow is not an SVDS client.
37 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
HEALTH PROVIDER: KAISER PERMANENTE
STRATEGIC IMPERATIVES
• Provide seamless, personalized care through an integrated team of care providers
• Enable members to manage their own care through easy-to-use channels
• Transform care and improve outcomes through investments in research and innovation
NOTE: Kaiser Permanente is not an SVDS client.
38 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
HEALTH PROVIDER: KAISER PERMANENTE
BUSINESS OBJECTIVES
• Increase data sharing with extended care teams through secure electronic health record access
• Provide quicker, better diagnoses through evidence- based medicine techniques
• Provide mobile access to scheduling, pharmacy interactions, and other related services
NOTE: Kaiser Permanente is not an SVDS client.
39 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
HEALTH PROVIDER: KAISER PERMANENTE
BUSINESS OBJECTIVES
• Improve member satisfaction by analyzing web and mobile user interactions, behavior, and feedback data
• Share access to knowledge, innovation, and population data with the public and other health care leaders
NOTE: Kaiser Permanente is not an SVDS client.
4040 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
TODAY’S SCHEDULE Introduction
Why Have a Data Strategy?
Connecting Data with the Business
Understanding Data Gaps The Data Platform Architecture
Break
Identifying Strategic Workloads
The Chief Data Officer
The Experimental Enterprise
UNDERSTANDING DATA GAPS
41 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
None of these questions make sense unless you ask:
For what?
Commonly-asked questions:
• Do I have gaps in my data?
• How good is my data?
• Is my data clean enough?
NO ONE'S DATA IS PERFECT
42 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
FOR WHAT? • Do I have gaps in my data?
• How good is my data?
• Is my data clean enough?
• Do I have gaps in my data?
…for understanding customer purchase behavior
• How good is my data?
…for predicting quarterly sales
• Is my data clean enough?
…for automating production
4343 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
• What are you trying to achieve as a business [with data]? These are your business objectives.
• How do you plan to achieve it [with data]? These are your use cases.
UNDERSTAND YOUR BUSINESS GOALS
44 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
UNDERSTAND YOUR AUDIENCE Who is going to use this analysis and how?
• CDO? Heads of Business Units? Data Science Directors? DBAs?
• Project assessment? Operational dashboard? Continuous improvement plan?
Understanding stakeholders and expectations will dictate the level of technical analysis required.
45 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
UNDERSTAND YOUR AUDIENCE What are the dimensions of requirements that matter to your audience?
• For a technical application, it might be depth, breadth, latency, frequency.
• For an executive perspective, it might be higher-order requirements like ease of integration or coverage.
What are the questions your audience needs answered? Select the dimensions that provide visibility into those questions.
4646 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
• Start with an effective catalog of your data.
• Organize the data to be effective. Think about how data is produced AND how it gets used in your organization.
• By data source?
• By entity?
• By organization?
• By data owner?
UNDERSTAND YOUR DATA
47 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
LINK IT ALL TOGETHER
Business Objectives
Use Cases
Requirements
Data
48 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
VISUALIZE YOUR GAPS
49 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
SO… WHAT IS A ”GAP”? Two schools of thought:
• Purists: If a requirement isn’t met, it’s a gap.
• Pragmatists: If you can still get the job done, it isn’t a gap.
Both views can be valuable ways of looking at your analysis.
5050 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
TODAY’S SCHEDULE Introduction
Why Have a Data Strategy?
Connecting Data with the Business
Understanding Data Gaps
The Data Platform Architecture
Break
Identifying Strategic Workloads
The Chief Data Officer
The Experimental Enterprise
THE DATA PLATFORM ARCHITECTURE
51 @SVDataScience
WHY BIG DATA? 1. New Capabilities
2. Economic Scalability
© 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience52
Edmunds.com wanted to reduce time- to-market by speeding creation of attribute data for new car models.
We developed a new capability to automatically extract vehicle features from specification guides and categorize the features into appropriate vehicle classes.
DATA PLATFORMS FOR NEW CAPABILITIES
53 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
Existing revenue streams: • Ads • Price quotes (leads)
Shopping is the focus: • Need real-time
inventory • Accurately described
VINs
54 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
DATA PLATFORMS FOR ECONOMIC SCALABILITY at NetApp
NOTE: NetApp is not an SVDS client. http://blogs.wsj.com/cio/2012/06/12/netapp-cio-uses-big-data-to-assess-product-performance
55 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
UP VS. OUT — SAAS EDITION
$, €
, ¥ , £
Users
Revenue
Cost to serve
Scale-out cost
Profit
Loss
56 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
UP VS. OUT — ENTERPRISE EDITION $,
€ , ¥
, £
Data Resource Usage
Scale-up cost
Scale-out cost
UC1
UC2
UC3
UC4
UC5
57 @SVDataScience
BIG DATA … it’s really about agility
58 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
• Linear scale-out cost
• Opex vs. capex
• Ease of purchase
BUYING AGILITY
59 @SVDataScience
Scale-out systems move us from managing scarcity to promoting utility.
6060 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
• Architectural factors • Schema on read • Rapid deployment • Mirror production setup • Executes faster
• Programmer factors • Fun to program • Concision • Easier to test • Faster to write
DEVELOPMENT AGILITY
61 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
WHAT IS DOCKER? • Container technology: bundles every part of an
application • Provides isolation for each application without the
overhead of running a virtual machine • Ships only the parts that are needed—leaves out the
operating system
62 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
WHY SHOULD BUSINESS CARE? • Better use of server resource than virtual machines • A fast and reliable way of deploying applications
• It’s the ideal packaging mechanism for scale-out distributed systems
• Easy for developers to work in an environment identical to production • Sharing containers leads to innovation
63 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
WHAT IS APACHE KAFKA?
• Scale-out fault-tolerant messaging system • Comes from LinkedIn • Supported by Confluent
64 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
USE CASES
• Stream processing • Log aggregation • Creating decoupled evented architectures
65 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
WHY SHOULD BUSINESS CARE?
• Scalability in a critical area of distributed applications • Online reliability, compared to alternatives • Will be a core building block of distributed data
architecture
66 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
WHAT IS APACHE SPARK?
• In-memory distributed computing platform • Comes from Berkeley AMPlab • In production with early adopters, now integral to
every commercial Hadoop distribution • Doesn’t need Hadoop, but runs easily on top
67 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
USE CASES
• Managing a major retailer’s inventory across a diverse network of entities in near real time
• Managing and processing event streams for online gaming
• Supporting data science initiatives across massive data sets at a media analytics company
68 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
WHY SHOULD BUSINESS CARE?
• Enables use cases Hadoop didn’t provide, all in one platform • streaming, interactive analytics, machine learning,
graphs
• Fast • Iteration time down, more productive
• Use existing cluster investment • Sits on HDFS, can run under YARN
(or use Amazon S3, or Cassandra)
69 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
WHY SHOULD BUSINESS CARE?
• SparkSQL • Use SQL skills and tools, e.g. Tableau • Dataframes integrate external data sources into one
context: RDBMS, Hive, JSON…
• Developer-friendly • Concise and fluid to program • Language integration: Scala, R, Python, Java
70 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
WHAT ARE NOTEBOOKS? • Interactive documents that contain a program and
its output • Long history: Mathematica
• Particularly successful with data science • Projects to watch
• Jupyter — https://jupyter.org/ • Apache Zeppelin —
https://zeppelin.incubator.apache.org/
71 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
72 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
WHY SHOULD BUSINESS CARE? • Easy collaboration and sharing of data science
• Think “Docker for analysis”
• Easy access to data and compute resource • A building block for more self-service analytical
capabilities
Commercial version of Notebooks + Spark is the Databricks Cloud
@SVDataScience
ENTERPRISE DATA ARCHITECTURE
Towards a production
74 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
DATA PLATFORM
Data Management Security, Operations, Data Quality, Meta Data Management and Data Lineage
Analytics
Lo w
L at
en cy
A cc
es s
Data Ingest
Data Repository
Persistence
Offline Processing
Real-Time Processing
Batch Processing
Data Services
External Systems
Data Acquisition
Internal External
75 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
CHOICES: TOOLS
76 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
Graph Document Key-Value Columnar Social networks
Ontologies Knowledge, Property
Logging Document archive
Web content
Shopping Cart Session Data
Sensors Network devices
Internet of Things
Technical Use Cases
CHOICES: DATABASES
77 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
Graph Document Key-Value Columnar Social networks
Ontologies Knowledge, Property
Logging Document archive
Web content
Shopping Cart Session Data
Sensors Network devices
Internet of Things
CHOICES: DATABASES SPECIALIZED
78 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
Graph Document Key-Value Columnar Social networks
Ontologies Knowledge, Property
Logging Document archive
Web content
Shopping Cart Session Data
Sensors Network devices
Internet of Things
CHOICES: DATABASES GENERAL PURPOSE
79 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
CHOICES: VELOCITY SVDS R&D TRAINS Batch:
• Using FFT transformed frequency data, identify the train based around fundamental frequencies of train whistle.
• Construct the decision tree for train classifier based on minimum and maximum fundamental frequencies
Real-Time:
• Apply FFT to audio signal
• Extract min and max fundamental frequencies
• Classify the train into local or express
• Send data to the Event Detector to alert the APP
• Store results in HBase
80 @SVDataScience
[Amazon] do services because they've come to understand that it's the Right Thing. There are without question pros and cons to the SOA approach, and some of the cons are pretty long. But overall it's the right thing because SOA-driven design enables Platforms. … You wouldn’t really think that an online bookstore needs to be an extensible, programmable platform. Would you?
+Steve Yegge
CHOICES: SERVICES
https://plus.google.com/112678702228711889851/posts/eVeouesvaV X
81 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
CHOICES: DATA RESILIENCY
Hard Failure: If the data source is broken, so is the app.
Stovepipe: One-to-one relationship from data source to product.
Multi-sourced: Redundancy of overlapping data sources makes your products more resilient.
Graceful Degradation: If a data source breaks, there is a backup and your app continues to function.
Production data services abstract the probabilistic integration of overlapping data sources. We call this model a Data Mesh.
82 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
CHOICES: EXTERNAL SYSTEMS Applications, visualization, business intelligence
8383 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED.
üIncremental revenue
üTime to market
üEconomically viable implementation
üCost avoidance
üBrand benefit
üEcosystem friendliness
DEFINING SUCCESS
@SVDataScience
BREAK
8585 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
TODAY’S SCHEDULE Introduction
Why Have a Data Strategy?
Connecting Data with the Business
Understanding Data Gaps
The Data Platform Architecture
Break
Identifying Strategic Workloads
The Chief Data Officer
The Experimental Enterprise
IDENTIFYING STRATEGIC WORKLOADS
86 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
HOW SVDS DOES DATA STRATEGY • We work with your stakeholders to analyze and articulate a data
strategy.
• The data strategy provides an actionable roadmap that generates immediate value and serves as the foundation for future capability investments.
• We work to understand your current business and technology landscapes in order to unlock untapped business opportunities.
• Our collaborative approach ensures that your business, product, and technology teams become effective advocates within your organization.
87 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
BUSINESS MODEL TRANSFORMATION
PRODUCT RESEARCH & RECOMMENDATION COMPANY
A product research and recommendation company is transforming their core business from content and information services to a referrer of high-value transactions to partners.
SVDS devised a data strategy that enables new analytical capabilities core to their retail ambitions, addressing critical accuracy and timeliness issues with unstructured data.
Based on this data strategy, they are building a solution for near real-time product inventory that increases their value to partners in a complex, multi-tier market.
88 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
PERSONALIZED USER EXPERIENCE
MEDIA & ENTERTAINMENT COMPANY
A media and entertainment company seeks to deliver personalized content directly to users on digital entertainment devices.
SVDS developed a data strategy and architecture that enables real-time data ingestion, deeper customer insight, and highly-personalized content recommendations.
The data strategy and architecture design now serve as the foundation for iterative, new product development and guide technology investments and acquisitions.
89 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
ACTION PLAN & ROADMAP
OUR METHOD FOR DATA STRATEGY
IDENTIFY STRATEGIC IMPERATIVES
DEFINE BUSINESS OBJECTIVES
DEFINE DATA REQUIREMENTS
IDENTIFY GAPS IN CURRENT SYSTEMS & TECHNOLOGY
MAP BUSINESS OBJECTIVES TO USE CASES
RATIONALIZE USE CASES INTO WORKLOADS
90 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
USE CASE
2
IDENTIFY YOUR STRATEGIC WORKLOADS
USE CASE
1 WORKLOAD
A WORKLOAD
B
WORKLOAD
C
WORKLOAD
B WORKLOAD
C
USE CASE
3 WORKLOAD
B WORKLOAD
D
@SVDataScience © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED.
AN EXAMPLE DATA STRATEGY FOR THE DOGS
92 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
NOTE: PetSmart is not an SVDS client. This is a fictional example based on public information. http://risnews.edgl.com/retail-news/PetSmart-Leverages-Analytics-for-Personalized-Experience91783
AN EXAMPLE DATA STRATEGY FOR THE DOGS
We've been investing in new capabilities to help us capture and use customer and pet data, and this year, we will deliver on new methods to use this data to drive growth.
— David Lenhardt PetSmart CEO
“
@SVDataScience © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED.
STRATEGIC IMPERATIVES
STRATEGIC IMPERATIVES
BUSINESS OBJECTIVES
USE CASES
WORKLOAD
Our strategy: “To be the preferred provider for the lifetime needs of pets.”
Connect with pet parents in a personalized way
Attract and retain our most valuable customers
Provide innovative products & services at fair prices
Drive consistent execution in our stores
@SVDataScience © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED.
AN EXAMPLE
Connect with pet parents in a personalized way
Deliver personalized recommendations and offers
Recommendation Engine
Recommend new pet products based on past
purchases at point of sale
Recommend upcoming store/community events
based on customer preferences
STRATEGIC IMPERATIVES
BUSINESS OBJECTIVES
USE CASES
WORKLOAD
@SVDataScience © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED.
BUSINESS OBJECTIVES
Illustrative
Connect with pet parents in a personalized way
Learn from consumer interactions
Optimize consumer journeys based on insights
Deliver personalized content to customers
1
2
3
. . .
STRATEGIC IMPERATIVES
BUSINESS OBJECTIVES
USE CASES
WORKLOAD
@SVDataScience © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED.
USE CASES
Deliver personalized content to customers
1. Identify customers 2. Profile behaviors
4. Anticipate behaviors
. . .
3. Understand context
5. Optimize personalization
Illustrative
STRATEGIC IMPERATIVES
BUSINESS OBJECTIVES
USE CASES
WORKLOAD
@SVDataScience © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED.
WORKLOADS Data Value Chain Example Workloads
Acquire • Capture mobile app transactions• Accessing streaming web activity data
Ingest • Flexible data ingestion• Ingest unstructured data
Process • Data validation• Omnichannel data integration
Persist • Heterogeneous data storage• Scalable data storage
Analyze • Probabilistic data integration• Predictive modeling
Expose • Service based data access• Interactive visualization
STRATEGIC IMPERATIVES
BUSINESS OBJECTIVES
USE CASES
WORKLOAD
98 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
TECHNICAL WORKLOADS
Acquire
Ingest
Process
Persist
Analyze
Expose
1. Identify customers Technical Workload
Customer data (Acquire, Ingest, Persist)
• Acquire multiple data sources & formats • Flexible data ingestion • Flexible & scalable data storage and
processing
Identity resolution • Probabilistic data integration
Data cleansing • Data validation
Householding • Probabilistic data integration
Relationship context • Detailed views of entities
Life-time Value • Feature engineering
Illustrative
99 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
2. Profile behaviors Technical Workload
360 degree view of customer • Detailed views of entities
Views of historical transactions • Time series analysis
Determination of ‘favorites’ • Predicting customer behavior
Map to archetype • Stream processing
Evaluate previously unseen transactions and classify
• Stream processing
Update archetypes • Feature extraction • Analyze customer behavior
TECHNICAL WORKLOADS
Acquire
Ingest
Process
Persist
Analyze
Expose
Illustrative
100 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
3. Understand context Technical Workload
Characterize temporal customer behavior
• Feature engineering • Analyze customer behavior
Determine goal of next interaction
• Predictive modeling
Categorize content needs • Predictive modeling
TECHNICAL WORKLOADS
Acquire
Ingest
Process
Persist
Analyze
Expose
Illustrative
101 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
4. Anticipate behaviors Technical Workload
Score product offers with likelihood to respond
• Integrate internal systems • Service based data access
Score content options with likelihood to respond
• Integrate internal systems • Service based data access
Identify next best action • Third party structured data integration
• Business rules execution
TECHNICAL WORKLOADS
Acquire
Ingest
Process
Persist
Analyze
Expose
Illustrative
102 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
5. Optimize personalization Technical Workload
Apply business rules, constraints to personalization options
• Business rule execution
Select optimal personalization to achieve goal
• Optimization execution
TECHNICAL WORKLOADS
Acquire
Ingest
Process
Persist
Analyze
Expose
Illustrative
103 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
PRIORITIES DIMENSIONS OVERCOME YOUR ASSUMPTIONS
FOCUS ON THE VALUE
104 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
DEVELOPMENT HORIZONS Illustrative
105 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
TECHNICAL WORKLOAD PRIORITIZATION
TECHNICAL WORKLOAD STRATEGIC VALUE
TECHNICAL FEASIBILITY
ACCESSIBILITY OF REQUIRED SKILLS
ARCHITECTURAL FIT
PROD ROLL- OUT EFFORT
Real time recommendations 10
Omnichannel data integration 10
Predictive modeling 9
Unstructured text analytics 8
Behavioral analytics 7
Data quality monitoring 6
Pattern recognition 5
Heterogeneous data storage 3
Data ingestion 3
Illustrative
106 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
DEFINE YOUR ROADMAP
107 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
Plan Prove Pilot Production
We define a project plan to build a specific capability.
For each capability, we describe a project to build technical workloads that implement use cases that address high-priority business objectives.
Silicon Valley Data Science employs an agile development processes as we work with our clients from planning and proof-of-concepts to pilot implementations and finally full scale production systems.
PROJECT ACTION PLAN
Plan Prove Pilot Production Agile Build Process
Illustrative
108 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
PATH FORWARD
Horizon I Horizon II Horizon III Horizon IV
2-3 months
5-6 months
3-4 months
3-4 months
0 months
Illustrative
109 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
DEVISING A PROJECT PLAN: INPUTS & APPROACH
Technical Workload AssessmentData Gaps
Project Roadmaps
Workload Rationalization Development Horizons
110 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
RINSE REPEATLATHER
111 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
MAKE SURE IT’S FLEXIBLE • Technology moves incredibly fast, and competitive
landscapes are highly dynamic.
• Your data strategy should be a living document, revisited often and revised as conditions change.
112 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
MAKE SURE IT’S ACTIONABLE • If it isn’t clear how you’re going to execute your
strategy, then you don’t have the right one.
• Must work within the realm of the possible and practical.
113 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
FROM IDEA TO PRODUCTION We identify the business goals, distill those into use cases, and then work in short, iterative cycles to achieve tangible gains.
Plan Prototype Pilot Production
What can we do with data?
114 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
MODERNIZING DATA TECHNOLOGY HEALTH MANAGEMENT COMPANY
Aging data infrastructure and brittle application integration was inhibiting growth and business insight for a health management company. Their data strategy focused on creating a concrete roadmap for migrating to a new data platform so that technology and infrastructure are no longer a barrier to growth and transparency. Based on this data strategy, they are building a new data platform in stages that allows them to add new products and services to capture more market opportunity.
115 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
Case Study: Data Strategy Major Pharmaceutical Company
Defined Data Strategy that will help enable business growth and enable expansion into new markets
Challenge • Ongoing need to improve
discovery and better predict new targets for drug development
• Difficulty to integrate new data sources into identification & discovery processes
• Inability to connect business strategy & aims with specific, tangible projects
Solution • SVDS devised a data strategy with a
concrete roadmap for migrating to a new data platform
• Recommended data technology & architecture which supports highest value projects
• Outlined cultural, technological, organizational, and collaboration challenges & objectives
Results • Identified specific opportunity areas
to increase GTM efficiency • Prescribed Common Data and
Analytics Platform for Commercial and R&D operations
• Recommended projects for Predictive Modeling & Data Exploration
116 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
DATA STRATEGY CHECKLIST ¨ Identify your business objectives ¨ Go from objectives to tactics ¨ Include all stakeholders in the conversation ¨ Look at how technology can support strategic
workloads ¨ Exploit patterns and reuse ¨ Prioritize the possibilities to figure out where to start ¨ Define your roadmap with an end-point in mind ¨ Lather, rinse, repeat
117117 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
TODAY’S SCHEDULE Introduction
Why Have a Data Strategy?
Connecting Data with the Business
Understanding Data Gaps
The Data Platform Architecture
Break
Identifying Strategic Workloads
The Chief Data Officer
The Experimental Enterprise
THE CHIEF DATA OFFICER
118
DO YOU NEED EXECUTIVE HELP?
119 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
To download a free PDF, go to: www.svds.com/CDOreport
120 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
EMERGENCE OF THE CDO • Started with heavily regulated industries such as
government and finance
• Now becoming common in “disruptable” industries such as retail and telecommunications
121 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
RESPONSIBILITIES OF THE CDO Centralization:
• Data from internal silos
• Data from external APIs and real-time streams
• The organization’s priorities
122 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
RESPONSIBILITIES OF THE CDO Evangelization:
• Technical chops, business savvy, and the diplomacy skills to translate between the two
123 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
RESPONSIBILITIES OF THE CDO Facilitation:
• Coordinate stakeholders across the organization
• Free up resources and lower barriers
• Offer tools and training to help others succeed
124 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
CHALLENGES FOR THE CDO Building technical bridges:
• Working with data in different silos, formats, etc.
Mining for business value:
• “If you don’t have good business questions it doesn’t matter what kind of technology you have.” — Joy Bonaguro
125 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
UNDERSTANDING THE CDO “While technology is inevitably involved when working with data, the defining goal of the CDO is not technological, but business-oriented. The ideal CDO exists to drive business value.”
— Julie Steele
126 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
DECIDING TO HIRE A CDO Know why you want one:
• Are you part of a regulated industry? • Do you need to move from being product-centric
to customer-centric? • Could you add products or services? • Could your current processes and outcomes be
optimized even further? • Are there insights in one part of your company
that could benefit others?
127 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
DECIDING TO HIRE A CDO Look for the right skill set:
• Technical chops
• Business savvy
• Diplomacy and political skills
• Executive-level experience
128 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
THE AVAILABILITY GAP “The spike in demand for Chief Digital Officers has been felt globally. In Europe, the number of search requests for this role has risen by almost a third in the last 24 months. The United States has seen the same growth in half that time.”
— Russell Reynolds Associates
129 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
PREPPING FOR SUCCESS Companies that are eager and prepared for real change will be the most appealing to qualified CDO candidates.
130130 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
TODAY’S SCHEDULE Introduction
Why Have a Data Strategy?
Connecting Data with the Business
Understanding Data Gaps
The Data Platform Architecture
Break
Identifying Strategic Workloads
The Chief Data Officer
The Experimental Enterprise
THE EXPERIMENTAL ENTERPRISE
131 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
“…let's seek to understand how the new generation of technology companies are doing what they do, what the broader consequences are for businesses and the economy.”
– Marc Andreesen
132 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
DIGITAL NERVOUS SYSTEM
133 @SVDataScience
Data is your business.
134 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
Disruptive Change
Cloud Computing
Customer Content
Internet of Things
User Experience
SAAS & Apps
Business Intelligence Consumer IT
Regulation
Employees Partners
Contractors Suppliers
?
135 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
FROM: Innosight Executive Briefing Winter 2012
136 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
SILICON VALLEY’S DATA MACHINE
137 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
138 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
UP VS. OUT $,
€ , ¥
, £
Data Resource Usage
Scale-up cost
Scale-out cost
UC1
UC2
UC3
UC4
UC5
139 @SVDataScience
The legacy of big data is business agility.
140140 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
• Make it cheap
• Failure as a feature
• Ask good questions
• Make it quick
• Both learning and adaptation
• Enable the feedback loop
• Don’t break things
• Make operations a platform for innovation
• APIs, platforms, simulation
BUILD FOR EXPERIMENTS
141 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
THE EXPERIMENTAL ENTERPRISE
Supports investigative work and builds a solid layer for production.
Conducts experiments and responds to the changing environment.
Makes foundational infrastructure readily accessible.
142 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
LEAD A DATA REVOLUTION • You can only win with situational awareness
• New architectures offer new opportunities
• Creation of data-driven value requires new approach
• Create an Experimental Enterprise
• Business must lead, and understand the potential of the technology
143 © 2017 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED. @SVDataScience
To view SVDS speakers and scheduling, or to receive a copy of our slides, go to:
www.svds.com/StrataCA2017
THANK YOU
Ask how we can help info@svds.com
Edd Wilder-James (@edd)
Scott Kurth (@ScottWKurth)
March 2017