SlideShare a Scribd company logo
Enabling Digital business with IBM
Governed Data Lake
Karan Sachdeva (karan@sg.ibm.com)
IBM Big Data Analytics
Sales Leader
Asia Pacific
Connect with me at -
https://www.linkedin.com/in/karan20/
1.Digital and AI Challenges
2.How IBM Governed Data Lake can help?
3.Industry Use Cases
4.IBM Governed Data Lake Building blocks
1.Collect
2.Govern
3.Analyze (Data Science and ML)
5.How to start your Governed Data Lake journey
IBM Cloud / © 2018 IBM Corporation
Companies are pouring investment into AI and ML
AI and ML is basis of Digital Transformation
9.8X
$39 Billion+
83%
Increase in demand for
ML Engineers over the last 5 years
Investment by companies in AI during
2017 – led by tech, finance, and
automotive
Of companies say AI is strategically
important to their business
IBM Cloud / © 2018 IBM Corporation
Why are companies struggling to capture the value of AI and ML for their
Digital Transformation projects?
AI and ML is basis of Digital Transformation
IA and Data is basis of AI and ML
80%
Say they do not understand
the data needed for AI
There is no
AI without IA
Only 20%Of companies say they have
successfully adopted AI
IBM Cloud / © 2018 IBM Corporation
Barriers to successful AI and Digital transformations
There is no Artificial Intelligence (AI) without Information Architecture (IA)
Data Ecosystem
• Data in silos
• Difficult to access
• No lineage
Analytics Tools
• Discrete tools
• Different preferences
• Difficult to manage
Workflow
• Not integrated
• Not governed
• Lack dev/prod parity
Culture
• Not collaborative
• Slow provisioning
• Lack trust in AI
1.Digital and AI Challenges
2.How IBM Governed Data Lake can help?
3.Industry Use Cases
4.IBM Governed Data Lake Building blocks
1.Collect
2.Govern
3.Analyze (Data Science and ML)
5.How to start your Governed Data Lake journey
Persist
Analyze
Ingest Deploy
Data | Assets | Pipelines | APIs
Intelligent governance | Metadata Management
What you need is a Integrated Platform based on Open
Standards
Governed Data Lake
Core Tenets
1. Intelligent by Design
2. Based on Open Standards
and Extensible
3. Collaborative for data
Professionals
4. Self-service access to
trusted data
5. Best in class streaming
and real-time analytics
Collaborate
Data steward Data scientistData engineer Developer
Find Share
1.Digital and AI Challenges
2.How IBM Governed Data Lake can help?
3.Industry Use Cases
4.IBM Governed Data Lake Building blocks
1.Collect
2.Govern
3.Analyze (Data Science and ML)
5.How to start your Governed Data Lake journey
Industry Use Cases
Governed Data Lake Use Cases: Burning Business Problems More Robustly Addressed
Financial
Services
Insurance
••Customer 360
••Fraud
••Compliance- GDPR,
PDPA etc
••Risk Analytics
••Operational Data Store
••Predictive analytics
Telco
Media
Energy & Utilities
••Customer 360
••Customer Insights
••Network Optimization
••Data Monetization
••EDW Augmentation
••Predictive maintenance
Retail
Ecommerce
••Personalized Customer
offers
••Omni-channel
Customer Experience
••Loyalty programs
••Next Best Offer/Action
••Recommendation
Engine
••Agile Supply Chain
Manufacturing
Industrial
••Connected: Car, Plane,
Equipment
••Agile Supply Chain
••Predictive Maintenance
••IoT Data enabled
“Smart Services”
Government
Public Sector
••Border Control
••Public Safety /
Intelligence
••360 Tax payer
••Tax Optimization
••Cyber Threat
••Citizen Self Service
••Social Services Fraud
IBM Governed Data Lake
Open Standards | Governance | Machine Learning & Data Science
Enterprise data warehouse(EDW) Modernization | EDW Offload | Teradata/Oracle Refresh
Integrate	inbound	
touchpoints	
-offer	the	best	offer		for	
optimum	outcomes
Retail	Data Governed	
Data	Lake
Insight Action
Business	
Outcomes
Behavioural
- CRM
- Store
- POS
Descriptive
- Location
- Mobile
- Demographic
- Weather
Interaction
- Web	
clicksteams
- Call	Center
Notes
- Emails
Attitudes
- Social	
Media	
Sentiments
Continual	data	feeds
- To	support	real-time	decision	
making	at	every	point	of	
customer	interaction
Leverage	All	Data
- High	volumes
Integrate	Data
- Traditional	CRM	and	POS	
data
- Combined	with	modern	
sources
Capture	and	Access	all	data
- Paper-based	customer	
notes
- Call	center notes
Customer	identification
- Single	record	of	customer	
detail
Text	processing
- Social	media	comments
- Call	center logs
Understanding	your	customer
-Helps	retailers	understand	
the	“why” question	and	not	
only	the	
“who/what/where/when”
providing	you	with	greater	
insight	into	customer	
behavior	and	buying	patterns.
-Real-time	analytics	to	
anticipate	customer	behavior
Analytical	questioning
- Exploration	of	the	data
Business	user	dashboards	
and	visualisation	across	
multiple	departments
- Upsell/cross	sell	options
- Marketing	campaign	
effectiveness
- Product	analysis
- Comprehensive	view	of	a	
customer
Improve	customer	
segmentation
-Advanced	customer	
analytics	to	better	
define	homogeneous	
customer	clusters
Cross-channel	
delivery	of	best	
action	to	address	
customer	need	and	
enhance	long	term	
business	revenue
Relevant	&	Timely	
Marketing	Offers
- Highly	personalized	
communications	and	
offers	
- making	your	
customer	
relationship	
management	more	
proactive
Consistency	across	all	
customer	interaction	
points
- Web
- Mobile
- Call	Center
- Email
- Social	Media
Improve	service	
delivery	and	customer	
satisfaction
Optimize	revenue	
generating	actions	
such	as	up	sell,	cross	
sell	and	retention
Increase	strategic	
lifetime	value	and	
loyalty
Example- Governed	Data	lake	vision	for	Retail
Hadoop	System
Data	Federation
Data	Science	
Models
Streaming	Data
Hadoop+	BigSQL
Info	Integration	
&	Governance
Entity	Matching
Predictive	
Analytics
Social	Media	Analytics
Discovery	&	
Exploration
Business	
Intelligence
Prescriptive	
Analytics
Prescriptive	
Analytics
IBM	Data	Science	Experience
Decision	Optimization
Social	Merchandise	
-Social	data	(internal	and	external),	
frameworks,	models	and	
dashboards
Retail
Ecommerce
•• Personalized
Customer offers
•• Omni-channel
Customer
Experience
•• Loyalty programs
•• Next Best
Offer/Action
•• Recommendation
Engine
•• Agile Supply
Chain
Select the entry points to your Governed Data Lake journey
Disruptive
Competitive
Optimized
Data & Insight Accessibility
Big Data made
accessible and
simple
Data assets made
understood,
protected and
trusted
ML,AI, Optimize
and Automate
Natural Language
Visualization and
Exploration
Collect
Govern
Data Science
Enterprise DW Modernization
EDW Augmentation
Customer 360
Operational DataStore
Machine Learning
Anomaly Detection
Recommendation Engine
Cognitive Text Analytics
12
Hybrid
Data Management
Unified Governance
& Integration
Data Science
Machine Learning
Organize AnalyzeCollect
Understand customer
behavior to make smarter
marketing
& programming decisions.
Billions of records analyzed
in seconds, rather than
days, increasing on-
demand viewing.
Provide governed self-service
data lake for fraud detection
and customer engagement.
Disciplined data classification
upon entry, managing access,
quality, privacy, and retention.
Reduce unplanned trucking
standstills helping clients
better predict maintenance
needs
Applied statistical and ML
techniques to lower
diagnostic time 70% and
repair time 20%
IBM Governed Data Lake Customer Spotlights
1.Digital and AI Challenges
2.How IBM Governed Data Lake can help?
3.Industry Use Cases
4.IBM Governed Data Lake Building blocks
1.Collect
2.Govern
3.Analyze (Data Science and ML)
5.How to start your Governed Data Lake journey
- Hortonworks Data
Platform
- Hortonworks Data Flow
- Db2 Big SQL
- IBM Big Replicate
- Information Governance
Catalog
- BigIntegrate
- BigQuality
- BigMatch
- CDC for Hadoop
§ Data Science Experience
Local
§ Decision Optimization
§ Watson Explorer (v12 +)
Under the hood IBM technologies
Governed Data Lake Use Cases: Burning Business Problems More Robustly Addressed
Collect- Hybrid Data
Management
Govern- Unified
Governance & Integration
Analyze- Data Science &
Business Analytics
Financial
Services
Insurance
••Customer 360
••Fraud
••Compliance- GDPR
••Risk
••Operational Data
Store
••Predictive analytics
Telco
Media
Energy & Utilities
••Customer 360
••Customer Insights
••Network
Optimization
••Data Monetization
••EDW Augmentation
••Predictive
maintenance
Retail
Ecommerce
••Personalized
Customer offers
••Omni-channel
Customer Experience
••Loyalty programs
••Next Best Offer/Action
••Recommendation
Engine
••Agile Supply Chain
Manufacturing
Industrial
••Connected: Car,
Plane, Equipment
••Agile Supply Chain
••Predictive
Maintenance
••IoT Data enabled
“Smart Services”
Government
Public Sector
••Border Control
••Risk / Intelligence
••360 Tax payer
••Tax Optimization
••Cyber Threat
••Fraud prevention
Enterprise data warehouse(EDW) Modernization/ EDW Offload/Teradata Takeout/
Capabilities
IBM Cloud
Embedded machine
learning and data
science
Drive more value from your
data. Run analytics where the
data lives using the tools your
data professionals prefer.
§ Spark and Jupyter
notebooks built-in
§ Integration with model
building, BI, and
visualization tools
Transactional and
analytic processing
—all in one place
Instant insight from real-time
operational data for growing
revenue, reducing cost and
lowering risk.
§ Simplify IT with transactions
and reporting (HTAP) within
the same system
§ Easy, low-risk offload from
expensive data warehouse-
Teradata or Oracle.
Common SQL engine
with built-in data
virtualization
Anchored by a common SQL
engine to enable scalable data
management solutions with
portable analytics.
§ Application and
operational compatibility
§ Provide transparent access
to other data sources
Support for
on-premises or cloud,
NoSQL or SQL
Offers flexibility in choosing the
form factor that best suits your
business, enabling a controlled
journey to the cloud.
§ A platform that fits your
data strategy
§ Bridge data stores for
seamless data integration
IBM Big SQL for Hadoop- Data Federation across data repositories
More intelligent analytics
and insights
Go at the speed
of your business
Write once, run anywhere,
from any source
Deploy your data
where you need it
Governance for Hadoop and Data Lake to drive use cases
like GDPR, PDPA…
16IBM Cloud / © 2018 IBM Corporation
– Open and extensible platform
– Brings structured and
unstructured together
– Scalability and parallel
processing
– Smarter metadata drives
embedded governance
– Pre-built industry
data models
– Unified platform with adaptive
deployment and licensing
16
IBM BigQuality CDC for Hadoop
IBM Big MatchIBM BigIntegrate IBM Industry Models
Information
lifecycle
Master data
& entity
insights
Governance,
compliance &
data
protection
Integration &
replication
Enterprise Information Catalog
Structured & unstructured data
AI & machine learning capabilities
Public
cloud
On-
premises
Private
cloud
Know your data. Trust your data. Use your data.
Trusted Analytics Foundation
IBM Information Governance Catalog
Key
Offerings
IBM Cloud / © 2018 IBM Corporation
IBM Data Science and Machine Learning Platform
Gain deeper insights to see all
contributing patterns
Watson Explorer
Report, monitor and
analyze your data
with confidence
Watson Explorer
Optimize plans based on
prescriptions
Decision Optimization
Develop, deploy and
manage models
Data Science Experience
Optimize business
decisions
Decision Optimization
Built-in learning to
get started or go
the distance with
advanced tutorials
Learn
The best of open source
and IBM value-add to
create state-of-the-art
data products
Create
Community and
social features that
provide meaningful
collaboration
Collaborate
http://datascience.ibm.com
IBM Data Science Experience
• Find tutorials and datasets
• Connect with Data Scientists
• Ask questions
• Read articles and papers
• Fork and share projects
• Watson Machine Learning
• SPSS Modeler Canvas
• Advanced Visualizations
• Projects and Version Control
• Managed Spark Service
• Code in Scala/Python/R
• Jupyter Notebooks
• RStudio IDE and Shiny
• Apache Spark
• Your favorite libraries
Predictive
Power
100%
Capacity
Model Builder
(CADS)
Build model1
Deploy model2
Refresh model3
Import Sources:
§ DSx Notebooks
§ DSx Flow UI
§ External tools
Auto-generate model
from input data,
testing various
algorithms for best
fit (e.g. CADS)
Detect loss of
predictive power and
refresh model,
subject to
preferences
Model
ML Model Lifecycle
Open source is a powerful engine, but as with any engine, it
needs the full system to accomplish any work
§ Security – SSO and code hardening to
reduce security gaps
§ Version Currency – We keep up-to-date
as open source quickly iterates
§ Data Connectivity – Connect to data
sources
§ Scalability – Makes tools designed for
desktops scalable to enterprise workloads
§ Enterprise IBM Support- World Class
support by SMEs
We provide:
1.Digital and AI Challenges
2.How IBM Governed Data Lake can help?
3.Industry Use Cases
4.IBM Governed Data Lake Building blocks
1.Collect
2.Govern
3.Analyze (Data Science and ML)
5.How to start your Governed Data Lake journey
Data Science Sandbox Quick Start Solution
1. Receive best practices
for your organizations
to get started with
governed data lake
2. Achieve faster time-to-
value with pre-built
accelerators
3. Leverage world-class
data scientists and
engineers with proven
results at most mature
big data and Data
Science customers
12 Nodes of best in
class Open Source
Hadoop- IBM
Hortonworks Data
Platform
5 Users for IBM
Data Science
Experience
1 weeks of
partner
services
engagement
_________
= 75K USD*
*Commercials for services from third party Partner Services
*Commercials focused on Data Science environment for Governed Data Lake
*Commercials may vary from local conditions and countries to countries
*Offer valid till 31st March 2018
+ +
Governed Data Lake – Quick Start Solutions
IBM is the industry leader in Open Source and Data Science
platforms
23
40,000+ Clients
in 160 countries, training
70,000+ client employees
43% Improvement
in Client Satisfaction
IBM Leadership - Total Portfolio
80% of reports, 60% of Forrester
IBM Consensus Leader in Data
Science & Business Analytics
General Excellence
reddot award
Interface Design
NPS 2017
Apache Committers Top 20
OPEN SOURCE
IBM Cloud
Call To Action:
• Identify the use case for your business
Involve us for free Discovery workshop
for Governed Data Lake.
• Read, learn and Contribute at-
• www.ibmbigdatahub.com
• Write to us- karan@sg.ibm.com
Persist
Analyze
Ingest Deploy
Projects | Data | Assets | Pipelines | APIs
Intelligent governance | Policy enforcement
What you need is a Integrated Platform based on Open
Standards
Our Core Tenets
1. Intelligent by Design
2. Collaborative for data
Professionals
3. Self-service access to
trusted data
4. Best in class streaming
and real-time analytics
5. Open and Extensible
Collaborate
Data steward Data scientistData engineer Developer
Find Share
IBM Big SQL Query Federation = virtualized data access
Transparent
§ Appears to be one source
§ Programmers don’t need to know how /
where data is stored
Heterogeneous
§ Accesses data from diverse sources
High Function
§ Full query support against all data
§ Capabilities of sources as well
Autonomous
§ Non-disruptive to data sources, existing
applications, systems.
High Performance
§ Optimization of distributed queries
SQL tools,
applications Data sources
Virtualized
data
Use Case- Social Programs Organizations Need to Be Able to Turn
Their Data into Actionable Information
Gain a comprehensive view of a
family’s ongoing needs and
program results
Match citizen’s needs to the right
program or service
Maximizing a limited budget
Am I managing my
resources effectively?
2. Outcomes
Focused
3. Integrated Service
Delivery
1. Citizen/
focused
information
Governed Data Lake Reference Architecture
Low Latency
Data Feeds
Reports,
dashboards,
apps
Real Time
Data Flow
Kafka
Apache NiFi
Governance
CDC for
Hadoop
Data
Processing
MapReduce
Spark
Analyze
Big SQL
Watson
Explorer
Ingest
Sqoop
Flume
IBM Information
Governance
Catalog
HDFS
Atlas
DSX
Big Integrate
Big Quality
Big Match
Yarn
Streaming Data
Text Data
Applications
Data
Time Series
Geo Spatial
Relational
Social Network
Video &
Image
New / Enhanced
Applications
Automated
Process
Use cases
Analytic
Applications
Watson
Cloud Services
ISV Solutions
Alerts
All Data Sources
Top 5 Best Practices to set up Governed Data Lake
2. Metadata management and Data Lineage
3. Cognitive Search in Natural Language
4. People: Data Engineers, Data Scientists, CDO and LOB Executives
5. Continuous Learning Incorporation with Feedback Loops
1. Use Case Generation and Prioritization:
Make innovation central to business vision, strategy, and execution
Five Key Recommendations to Innovate with Machine Learning and Big Data

More Related Content

PDF
IBM Governed Data Lake
Karan Sachdeva
 
PPTX
IBM Industry Models and Data Lake
Pat O'Sullivan
 
PDF
Big Data & Analytics Architecture
Arvind Sathi
 
PDF
Data Lake: A simple introduction
IBM Analytics
 
PPTX
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Caserta
 
PDF
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...
StampedeCon
 
PDF
Benefits of the Azure Cloud
Caserta
 
PDF
Data Lake Architecture – Modern Strategies & Approaches
DATAVERSITY
 
IBM Governed Data Lake
Karan Sachdeva
 
IBM Industry Models and Data Lake
Pat O'Sullivan
 
Big Data & Analytics Architecture
Arvind Sathi
 
Data Lake: A simple introduction
IBM Analytics
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Caserta
 
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...
StampedeCon
 
Benefits of the Azure Cloud
Caserta
 
Data Lake Architecture – Modern Strategies & Approaches
DATAVERSITY
 

What's hot (20)

PPTX
A modern, flexible approach to Hadoop implementation incorporating innovation...
DataWorks Summit
 
PDF
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...
Capgemini
 
PDF
Traditional BI vs. Business Data Lake – A Comparison
Capgemini
 
PDF
Modern Integrated Data Environment - Whitepaper | Qubole
Vasu S
 
PDF
The principles of the business data lake
Capgemini
 
PPTX
DataOps: Nine steps to transform your data science impact Strata London May 18
Harvinder Atwal
 
PDF
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Caserta
 
PPTX
Data Leaders Summit Barcelona 2018
Harvinder Atwal
 
PDF
Business intelligence 3.0 and the data lake
Data Science Thailand
 
PDF
Mastering Customer Data on Apache Spark
Caserta
 
PDF
Bringing Strategy to Life: Using an Intelligent Data Platform to Become Data ...
DLT Solutions
 
PDF
Intro to Data Science on Hadoop
Caserta
 
PPTX
Using Machine Learning & Spark to Power Data-Driven Marketing
Caserta
 
PDF
Building a New Platform for Customer Analytics
Caserta
 
PDF
You're the New CDO, Now What?
Caserta
 
PPTX
Creating an Enterprise AI Strategy
AtScale
 
PDF
The Emerging Role of the Data Lake
Caserta
 
PDF
Informatica Becomes Part of the Business Data Lake Ecosystem
Capgemini
 
PDF
ADV Slides: The Data Needed to Evolve an Enterprise Artificial Intelligence S...
DATAVERSITY
 
PPTX
Big MDM Part 2: Using a Graph Database for MDM and Relationship Management
Caserta
 
A modern, flexible approach to Hadoop implementation incorporating innovation...
DataWorks Summit
 
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...
Capgemini
 
Traditional BI vs. Business Data Lake – A Comparison
Capgemini
 
Modern Integrated Data Environment - Whitepaper | Qubole
Vasu S
 
The principles of the business data lake
Capgemini
 
DataOps: Nine steps to transform your data science impact Strata London May 18
Harvinder Atwal
 
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Caserta
 
Data Leaders Summit Barcelona 2018
Harvinder Atwal
 
Business intelligence 3.0 and the data lake
Data Science Thailand
 
Mastering Customer Data on Apache Spark
Caserta
 
Bringing Strategy to Life: Using an Intelligent Data Platform to Become Data ...
DLT Solutions
 
Intro to Data Science on Hadoop
Caserta
 
Using Machine Learning & Spark to Power Data-Driven Marketing
Caserta
 
Building a New Platform for Customer Analytics
Caserta
 
You're the New CDO, Now What?
Caserta
 
Creating an Enterprise AI Strategy
AtScale
 
The Emerging Role of the Data Lake
Caserta
 
Informatica Becomes Part of the Business Data Lake Ecosystem
Capgemini
 
ADV Slides: The Data Needed to Evolve an Enterprise Artificial Intelligence S...
DATAVERSITY
 
Big MDM Part 2: Using a Graph Database for MDM and Relationship Management
Caserta
 
Ad

Similar to Enabling digital business with governed data lake (20)

PDF
Data Virtualization: An Essential Component of a Cloud Data Lake
Denodo
 
PPTX
Data lake-itweekend-sharif university-vahid amiry
datastack
 
PPTX
Unlocking Operational Intelligence from the Data Lake
MongoDB
 
PPTX
Balancing data democratization with comprehensive information governance: bui...
DataWorks Summit
 
PDF
Data lakes
Şaban Dalaman
 
PDF
Gse uk-cedrinemadera-2018-shared
cedrinemadera
 
PDF
The Emerging Data Lake IT Strategy
Thomas Kelly, PMP
 
PPTX
Deliveinrg explainable AI
Gary Allemann
 
PPTX
How to build a successful data lake Presentation.pptx
TarekHassan840678
 
PDF
How to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
Informatica
 
PPTX
Data-As-A-Service to enable compliance reporting
AnalyticsWeek
 
PDF
Harness the power of Data in a Big Data Lake
Saurabh K. Gupta
 
PDF
Unlock Your Data for ML & AI using Data Virtualization
Denodo
 
PDF
Data Lakes: A Logical Approach for Faster Unified Insights
Denodo
 
PDF
5 Steps for Architecting a Data Lake
MetroStar
 
PDF
Got data?… now what? An introduction to modern data platforms
JamesAnderson599331
 
PPTX
Unlocking Operational Intelligence from the Data Lake
MongoDB
 
PDF
GE’s Industrial Data Lake Platform
International Society of Service Innovation Professionals
 
PDF
Myth Busters III: I’m Building a Data Lake, So I Don’t Need Data Virtualization
Denodo
 
PPTX
Unlock Data-driven Insights in Databricks Using Location Intelligence
Precisely
 
Data Virtualization: An Essential Component of a Cloud Data Lake
Denodo
 
Data lake-itweekend-sharif university-vahid amiry
datastack
 
Unlocking Operational Intelligence from the Data Lake
MongoDB
 
Balancing data democratization with comprehensive information governance: bui...
DataWorks Summit
 
Data lakes
Şaban Dalaman
 
Gse uk-cedrinemadera-2018-shared
cedrinemadera
 
The Emerging Data Lake IT Strategy
Thomas Kelly, PMP
 
Deliveinrg explainable AI
Gary Allemann
 
How to build a successful data lake Presentation.pptx
TarekHassan840678
 
How to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
Informatica
 
Data-As-A-Service to enable compliance reporting
AnalyticsWeek
 
Harness the power of Data in a Big Data Lake
Saurabh K. Gupta
 
Unlock Your Data for ML & AI using Data Virtualization
Denodo
 
Data Lakes: A Logical Approach for Faster Unified Insights
Denodo
 
5 Steps for Architecting a Data Lake
MetroStar
 
Got data?… now what? An introduction to modern data platforms
JamesAnderson599331
 
Unlocking Operational Intelligence from the Data Lake
MongoDB
 
GE’s Industrial Data Lake Platform
International Society of Service Innovation Professionals
 
Myth Busters III: I’m Building a Data Lake, So I Don’t Need Data Virtualization
Denodo
 
Unlock Data-driven Insights in Databricks Using Location Intelligence
Precisely
 
Ad

More from Karan Sachdeva (7)

PDF
Auto AI : AI used to create AI applications
Karan Sachdeva
 
PDF
Jakarta keynote
Karan Sachdeva
 
PDF
AI: A risk and way to manage risk
Karan Sachdeva
 
PDF
Data monetization webinar
Karan Sachdeva
 
PDF
Is your data paying you dividends?
Karan Sachdeva
 
PDF
ICP for Data- Enterprise platform for AI, ML and Data Science
Karan Sachdeva
 
PPTX
Big Data in Education Sector
Karan Sachdeva
 
Auto AI : AI used to create AI applications
Karan Sachdeva
 
Jakarta keynote
Karan Sachdeva
 
AI: A risk and way to manage risk
Karan Sachdeva
 
Data monetization webinar
Karan Sachdeva
 
Is your data paying you dividends?
Karan Sachdeva
 
ICP for Data- Enterprise platform for AI, ML and Data Science
Karan Sachdeva
 
Big Data in Education Sector
Karan Sachdeva
 

Recently uploaded (20)

PPTX
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
PPTX
INFO8116 -Big data architecture and analytics
guddipatel10
 
PPTX
Introduction-to-Python-Programming-Language (1).pptx
dhyeysapariya
 
PPTX
HSE WEEKLY REPORT for dummies and lazzzzy.pptx
ahmedibrahim691723
 
PPTX
Introduction to computer chapter one 2017.pptx
mensunmarley
 
PPTX
Fuzzy_Membership_Functions_Presentation.pptx
pythoncrazy2024
 
PPTX
short term project on AI Driven Data Analytics
JMJCollegeComputerde
 
PPT
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
PDF
Fundamentals and Techniques of Biophysics and Molecular Biology (Pranav Kumar...
RohitKumar868624
 
PPTX
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
PPTX
Presentation on animal welfare a good topic
kidscream385
 
PDF
blockchain123456789012345678901234567890
tanvikhunt1003
 
PDF
TIC ACTIVIDAD 1geeeeeeeeeeeeeeeeeeeeeeeeeeeeeer3.pdf
Thais Ruiz
 
PDF
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PDF
Technical Writing Module-I Complete Notes.pdf
VedprakashArya13
 
PDF
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
PPTX
Blue and Dark Blue Modern Technology Presentation.pptx
ap177979
 
PPTX
short term internship project on Data visualization
JMJCollegeComputerde
 
PDF
SUMMER INTERNSHIP REPORT[1] (AutoRecovered) (6) (1).pdf
pandeydiksha814
 
PDF
Practical Measurement Systems Analysis (Gage R&R) for design
Rob Schubert
 
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
INFO8116 -Big data architecture and analytics
guddipatel10
 
Introduction-to-Python-Programming-Language (1).pptx
dhyeysapariya
 
HSE WEEKLY REPORT for dummies and lazzzzy.pptx
ahmedibrahim691723
 
Introduction to computer chapter one 2017.pptx
mensunmarley
 
Fuzzy_Membership_Functions_Presentation.pptx
pythoncrazy2024
 
short term project on AI Driven Data Analytics
JMJCollegeComputerde
 
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
Fundamentals and Techniques of Biophysics and Molecular Biology (Pranav Kumar...
RohitKumar868624
 
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
Presentation on animal welfare a good topic
kidscream385
 
blockchain123456789012345678901234567890
tanvikhunt1003
 
TIC ACTIVIDAD 1geeeeeeeeeeeeeeeeeeeeeeeeeeeeeer3.pdf
Thais Ruiz
 
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
Technical Writing Module-I Complete Notes.pdf
VedprakashArya13
 
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
Blue and Dark Blue Modern Technology Presentation.pptx
ap177979
 
short term internship project on Data visualization
JMJCollegeComputerde
 
SUMMER INTERNSHIP REPORT[1] (AutoRecovered) (6) (1).pdf
pandeydiksha814
 
Practical Measurement Systems Analysis (Gage R&R) for design
Rob Schubert
 

Enabling digital business with governed data lake

  • 1. Enabling Digital business with IBM Governed Data Lake Karan Sachdeva ([email protected]) IBM Big Data Analytics Sales Leader Asia Pacific Connect with me at - https://www.linkedin.com/in/karan20/
  • 2. 1.Digital and AI Challenges 2.How IBM Governed Data Lake can help? 3.Industry Use Cases 4.IBM Governed Data Lake Building blocks 1.Collect 2.Govern 3.Analyze (Data Science and ML) 5.How to start your Governed Data Lake journey
  • 3. IBM Cloud / © 2018 IBM Corporation Companies are pouring investment into AI and ML AI and ML is basis of Digital Transformation 9.8X $39 Billion+ 83% Increase in demand for ML Engineers over the last 5 years Investment by companies in AI during 2017 – led by tech, finance, and automotive Of companies say AI is strategically important to their business
  • 4. IBM Cloud / © 2018 IBM Corporation Why are companies struggling to capture the value of AI and ML for their Digital Transformation projects? AI and ML is basis of Digital Transformation IA and Data is basis of AI and ML 80% Say they do not understand the data needed for AI There is no AI without IA Only 20%Of companies say they have successfully adopted AI
  • 5. IBM Cloud / © 2018 IBM Corporation Barriers to successful AI and Digital transformations There is no Artificial Intelligence (AI) without Information Architecture (IA) Data Ecosystem • Data in silos • Difficult to access • No lineage Analytics Tools • Discrete tools • Different preferences • Difficult to manage Workflow • Not integrated • Not governed • Lack dev/prod parity Culture • Not collaborative • Slow provisioning • Lack trust in AI
  • 6. 1.Digital and AI Challenges 2.How IBM Governed Data Lake can help? 3.Industry Use Cases 4.IBM Governed Data Lake Building blocks 1.Collect 2.Govern 3.Analyze (Data Science and ML) 5.How to start your Governed Data Lake journey
  • 7. Persist Analyze Ingest Deploy Data | Assets | Pipelines | APIs Intelligent governance | Metadata Management What you need is a Integrated Platform based on Open Standards Governed Data Lake Core Tenets 1. Intelligent by Design 2. Based on Open Standards and Extensible 3. Collaborative for data Professionals 4. Self-service access to trusted data 5. Best in class streaming and real-time analytics Collaborate Data steward Data scientistData engineer Developer Find Share
  • 8. 1.Digital and AI Challenges 2.How IBM Governed Data Lake can help? 3.Industry Use Cases 4.IBM Governed Data Lake Building blocks 1.Collect 2.Govern 3.Analyze (Data Science and ML) 5.How to start your Governed Data Lake journey
  • 9. Industry Use Cases Governed Data Lake Use Cases: Burning Business Problems More Robustly Addressed Financial Services Insurance ••Customer 360 ••Fraud ••Compliance- GDPR, PDPA etc ••Risk Analytics ••Operational Data Store ••Predictive analytics Telco Media Energy & Utilities ••Customer 360 ••Customer Insights ••Network Optimization ••Data Monetization ••EDW Augmentation ••Predictive maintenance Retail Ecommerce ••Personalized Customer offers ••Omni-channel Customer Experience ••Loyalty programs ••Next Best Offer/Action ••Recommendation Engine ••Agile Supply Chain Manufacturing Industrial ••Connected: Car, Plane, Equipment ••Agile Supply Chain ••Predictive Maintenance ••IoT Data enabled “Smart Services” Government Public Sector ••Border Control ••Public Safety / Intelligence ••360 Tax payer ••Tax Optimization ••Cyber Threat ••Citizen Self Service ••Social Services Fraud IBM Governed Data Lake Open Standards | Governance | Machine Learning & Data Science Enterprise data warehouse(EDW) Modernization | EDW Offload | Teradata/Oracle Refresh
  • 10. Integrate inbound touchpoints -offer the best offer for optimum outcomes Retail Data Governed Data Lake Insight Action Business Outcomes Behavioural - CRM - Store - POS Descriptive - Location - Mobile - Demographic - Weather Interaction - Web clicksteams - Call Center Notes - Emails Attitudes - Social Media Sentiments Continual data feeds - To support real-time decision making at every point of customer interaction Leverage All Data - High volumes Integrate Data - Traditional CRM and POS data - Combined with modern sources Capture and Access all data - Paper-based customer notes - Call center notes Customer identification - Single record of customer detail Text processing - Social media comments - Call center logs Understanding your customer -Helps retailers understand the “why” question and not only the “who/what/where/when” providing you with greater insight into customer behavior and buying patterns. -Real-time analytics to anticipate customer behavior Analytical questioning - Exploration of the data Business user dashboards and visualisation across multiple departments - Upsell/cross sell options - Marketing campaign effectiveness - Product analysis - Comprehensive view of a customer Improve customer segmentation -Advanced customer analytics to better define homogeneous customer clusters Cross-channel delivery of best action to address customer need and enhance long term business revenue Relevant & Timely Marketing Offers - Highly personalized communications and offers - making your customer relationship management more proactive Consistency across all customer interaction points - Web - Mobile - Call Center - Email - Social Media Improve service delivery and customer satisfaction Optimize revenue generating actions such as up sell, cross sell and retention Increase strategic lifetime value and loyalty Example- Governed Data lake vision for Retail Hadoop System Data Federation Data Science Models Streaming Data Hadoop+ BigSQL Info Integration & Governance Entity Matching Predictive Analytics Social Media Analytics Discovery & Exploration Business Intelligence Prescriptive Analytics Prescriptive Analytics IBM Data Science Experience Decision Optimization Social Merchandise -Social data (internal and external), frameworks, models and dashboards Retail Ecommerce •• Personalized Customer offers •• Omni-channel Customer Experience •• Loyalty programs •• Next Best Offer/Action •• Recommendation Engine •• Agile Supply Chain
  • 11. Select the entry points to your Governed Data Lake journey Disruptive Competitive Optimized Data & Insight Accessibility Big Data made accessible and simple Data assets made understood, protected and trusted ML,AI, Optimize and Automate Natural Language Visualization and Exploration Collect Govern Data Science Enterprise DW Modernization EDW Augmentation Customer 360 Operational DataStore Machine Learning Anomaly Detection Recommendation Engine Cognitive Text Analytics
  • 12. 12 Hybrid Data Management Unified Governance & Integration Data Science Machine Learning Organize AnalyzeCollect Understand customer behavior to make smarter marketing & programming decisions. Billions of records analyzed in seconds, rather than days, increasing on- demand viewing. Provide governed self-service data lake for fraud detection and customer engagement. Disciplined data classification upon entry, managing access, quality, privacy, and retention. Reduce unplanned trucking standstills helping clients better predict maintenance needs Applied statistical and ML techniques to lower diagnostic time 70% and repair time 20% IBM Governed Data Lake Customer Spotlights
  • 13. 1.Digital and AI Challenges 2.How IBM Governed Data Lake can help? 3.Industry Use Cases 4.IBM Governed Data Lake Building blocks 1.Collect 2.Govern 3.Analyze (Data Science and ML) 5.How to start your Governed Data Lake journey
  • 14. - Hortonworks Data Platform - Hortonworks Data Flow - Db2 Big SQL - IBM Big Replicate - Information Governance Catalog - BigIntegrate - BigQuality - BigMatch - CDC for Hadoop § Data Science Experience Local § Decision Optimization § Watson Explorer (v12 +) Under the hood IBM technologies Governed Data Lake Use Cases: Burning Business Problems More Robustly Addressed Collect- Hybrid Data Management Govern- Unified Governance & Integration Analyze- Data Science & Business Analytics Financial Services Insurance ••Customer 360 ••Fraud ••Compliance- GDPR ••Risk ••Operational Data Store ••Predictive analytics Telco Media Energy & Utilities ••Customer 360 ••Customer Insights ••Network Optimization ••Data Monetization ••EDW Augmentation ••Predictive maintenance Retail Ecommerce ••Personalized Customer offers ••Omni-channel Customer Experience ••Loyalty programs ••Next Best Offer/Action ••Recommendation Engine ••Agile Supply Chain Manufacturing Industrial ••Connected: Car, Plane, Equipment ••Agile Supply Chain ••Predictive Maintenance ••IoT Data enabled “Smart Services” Government Public Sector ••Border Control ••Risk / Intelligence ••360 Tax payer ••Tax Optimization ••Cyber Threat ••Fraud prevention Enterprise data warehouse(EDW) Modernization/ EDW Offload/Teradata Takeout/ Capabilities
  • 15. IBM Cloud Embedded machine learning and data science Drive more value from your data. Run analytics where the data lives using the tools your data professionals prefer. § Spark and Jupyter notebooks built-in § Integration with model building, BI, and visualization tools Transactional and analytic processing —all in one place Instant insight from real-time operational data for growing revenue, reducing cost and lowering risk. § Simplify IT with transactions and reporting (HTAP) within the same system § Easy, low-risk offload from expensive data warehouse- Teradata or Oracle. Common SQL engine with built-in data virtualization Anchored by a common SQL engine to enable scalable data management solutions with portable analytics. § Application and operational compatibility § Provide transparent access to other data sources Support for on-premises or cloud, NoSQL or SQL Offers flexibility in choosing the form factor that best suits your business, enabling a controlled journey to the cloud. § A platform that fits your data strategy § Bridge data stores for seamless data integration IBM Big SQL for Hadoop- Data Federation across data repositories More intelligent analytics and insights Go at the speed of your business Write once, run anywhere, from any source Deploy your data where you need it
  • 16. Governance for Hadoop and Data Lake to drive use cases like GDPR, PDPA… 16IBM Cloud / © 2018 IBM Corporation – Open and extensible platform – Brings structured and unstructured together – Scalability and parallel processing – Smarter metadata drives embedded governance – Pre-built industry data models – Unified platform with adaptive deployment and licensing 16 IBM BigQuality CDC for Hadoop IBM Big MatchIBM BigIntegrate IBM Industry Models Information lifecycle Master data & entity insights Governance, compliance & data protection Integration & replication Enterprise Information Catalog Structured & unstructured data AI & machine learning capabilities Public cloud On- premises Private cloud Know your data. Trust your data. Use your data. Trusted Analytics Foundation IBM Information Governance Catalog Key Offerings
  • 17. IBM Cloud / © 2018 IBM Corporation IBM Data Science and Machine Learning Platform Gain deeper insights to see all contributing patterns Watson Explorer Report, monitor and analyze your data with confidence Watson Explorer Optimize plans based on prescriptions Decision Optimization Develop, deploy and manage models Data Science Experience Optimize business decisions Decision Optimization
  • 18. Built-in learning to get started or go the distance with advanced tutorials Learn The best of open source and IBM value-add to create state-of-the-art data products Create Community and social features that provide meaningful collaboration Collaborate http://datascience.ibm.com IBM Data Science Experience • Find tutorials and datasets • Connect with Data Scientists • Ask questions • Read articles and papers • Fork and share projects • Watson Machine Learning • SPSS Modeler Canvas • Advanced Visualizations • Projects and Version Control • Managed Spark Service • Code in Scala/Python/R • Jupyter Notebooks • RStudio IDE and Shiny • Apache Spark • Your favorite libraries Predictive Power 100% Capacity Model Builder (CADS) Build model1 Deploy model2 Refresh model3 Import Sources: § DSx Notebooks § DSx Flow UI § External tools Auto-generate model from input data, testing various algorithms for best fit (e.g. CADS) Detect loss of predictive power and refresh model, subject to preferences Model ML Model Lifecycle
  • 19. Open source is a powerful engine, but as with any engine, it needs the full system to accomplish any work § Security – SSO and code hardening to reduce security gaps § Version Currency – We keep up-to-date as open source quickly iterates § Data Connectivity – Connect to data sources § Scalability – Makes tools designed for desktops scalable to enterprise workloads § Enterprise IBM Support- World Class support by SMEs We provide:
  • 20. 1.Digital and AI Challenges 2.How IBM Governed Data Lake can help? 3.Industry Use Cases 4.IBM Governed Data Lake Building blocks 1.Collect 2.Govern 3.Analyze (Data Science and ML) 5.How to start your Governed Data Lake journey
  • 21. Data Science Sandbox Quick Start Solution 1. Receive best practices for your organizations to get started with governed data lake 2. Achieve faster time-to- value with pre-built accelerators 3. Leverage world-class data scientists and engineers with proven results at most mature big data and Data Science customers 12 Nodes of best in class Open Source Hadoop- IBM Hortonworks Data Platform 5 Users for IBM Data Science Experience 1 weeks of partner services engagement _________ = 75K USD* *Commercials for services from third party Partner Services *Commercials focused on Data Science environment for Governed Data Lake *Commercials may vary from local conditions and countries to countries *Offer valid till 31st March 2018 + +
  • 22. Governed Data Lake – Quick Start Solutions
  • 23. IBM is the industry leader in Open Source and Data Science platforms 23 40,000+ Clients in 160 countries, training 70,000+ client employees 43% Improvement in Client Satisfaction IBM Leadership - Total Portfolio 80% of reports, 60% of Forrester IBM Consensus Leader in Data Science & Business Analytics General Excellence reddot award Interface Design NPS 2017 Apache Committers Top 20 OPEN SOURCE
  • 24. IBM Cloud Call To Action: • Identify the use case for your business Involve us for free Discovery workshop for Governed Data Lake. • Read, learn and Contribute at- • www.ibmbigdatahub.com • Write to us- [email protected]
  • 25. Persist Analyze Ingest Deploy Projects | Data | Assets | Pipelines | APIs Intelligent governance | Policy enforcement What you need is a Integrated Platform based on Open Standards Our Core Tenets 1. Intelligent by Design 2. Collaborative for data Professionals 3. Self-service access to trusted data 4. Best in class streaming and real-time analytics 5. Open and Extensible Collaborate Data steward Data scientistData engineer Developer Find Share
  • 26. IBM Big SQL Query Federation = virtualized data access Transparent § Appears to be one source § Programmers don’t need to know how / where data is stored Heterogeneous § Accesses data from diverse sources High Function § Full query support against all data § Capabilities of sources as well Autonomous § Non-disruptive to data sources, existing applications, systems. High Performance § Optimization of distributed queries SQL tools, applications Data sources Virtualized data
  • 27. Use Case- Social Programs Organizations Need to Be Able to Turn Their Data into Actionable Information Gain a comprehensive view of a family’s ongoing needs and program results Match citizen’s needs to the right program or service Maximizing a limited budget Am I managing my resources effectively? 2. Outcomes Focused 3. Integrated Service Delivery 1. Citizen/ focused information
  • 28. Governed Data Lake Reference Architecture Low Latency Data Feeds Reports, dashboards, apps Real Time Data Flow Kafka Apache NiFi Governance CDC for Hadoop Data Processing MapReduce Spark Analyze Big SQL Watson Explorer Ingest Sqoop Flume IBM Information Governance Catalog HDFS Atlas DSX Big Integrate Big Quality Big Match Yarn Streaming Data Text Data Applications Data Time Series Geo Spatial Relational Social Network Video & Image New / Enhanced Applications Automated Process Use cases Analytic Applications Watson Cloud Services ISV Solutions Alerts All Data Sources
  • 29. Top 5 Best Practices to set up Governed Data Lake 2. Metadata management and Data Lineage 3. Cognitive Search in Natural Language 4. People: Data Engineers, Data Scientists, CDO and LOB Executives 5. Continuous Learning Incorporation with Feedback Loops 1. Use Case Generation and Prioritization: Make innovation central to business vision, strategy, and execution Five Key Recommendations to Innovate with Machine Learning and Big Data