SlideShare a Scribd company logo
Rethink Analytics: EDH for Advanced Analytics
Josh Wills, Director of Data Science
Sandy Lii, Senior Manager, Solutions Marketing

1
Agenda
• Market Background
• Challenges and Limitations
• EDH for Advanced Analytics
• Case Studies
• How to Get Started

2
Market Background

3
From BI to Advanced Analytics

What will happen?

How can we do
better?

What happened?
When? And
Where?

How and why did
it happen?

Time

Data Size
4

Facts

Interpretations
Advanced Analytics that Saves Us Money
• Customer churn analysis

model
• Integrated customer support
and services
• Fraud detection

5
5
Advanced Analytics that Makes Us Money
• Product recommendation

$
6
6

engines
• Location-based real-time
offers
• Target-based pricing strategy
Traditional Advanced Analytics Process

Problem
ID

Project
Definition

Data Access Request
& Discovery

Data Transformation

Data Sampling

Model
Evaluation

Data
Preparation

Time-to-Insight
7

Model
Creation

Model
Development

Deploy
Model

Model
Deployment
Challenges and Requirements

8
Accessing the Right Data is Difficult
Multi-structured or
External Data
Structured
Internal Data
Data
Warehouse

9
“Are we there yet?”
2. Get access
to data

3. Learn
about the data

4. Move data to
ADW and
process data

1. Find
the data

6. Model
Deployment

Data Discovery
5. Data
Modeling

10
Silo’d Platforms Challenge Collaboration & Mgmt
Non-Agile Models
Data
Sources

Departmental
Warehouse

Enterprise
Apps

Departmental
Warehouse

Reporting

Silo’d
Analytics

Silo’d
Analytics

Opaque schemas accumulates over time
11

Silo’d
Analytics
Impact of Status Quo
Executives

“We don’t have the information
we need to answer key business
questions.”

Data
Scientists
“I’m sick of waiting for
my data, I’m going to
make my own copy.”
12

DBA/DW
Admins
“I need to make sure the
DW is secure & compliant
for the mission critical
reports.”
Cloudera’s Enterprise Data Hub

13
Use All Your Data
Use more data, and more types
of data, with existing tools
• Reduce the need to limit or
move large datasets
• Centralize information security,
metadata, management, and
governance
•

14
Shorten Analytics Lifecycle
Facilitate data discovery
• Track data life-cycle in
place
• Define, test, deploy, and
update models all within
a single platform
•

15
Do More with Data
Deliver multi-genre analytics
in a single platform
• Apply diverse concurrent
analytics to full datasets inplace
• Protect existing technology
and skillset investments
•

Search

EDH
Machine
Learning

BI

16

SQL
Query

In-memory
analytics
Cloudera EDH for Analytics

ANALYTIC
SQL

SEARCH
ENGINE

MACHINE
LEARNING

STREAM
PROCESSING

WORKLOAD MANAGEMENT

3RD PARTY
APPS

DATA
MANAGEMENT

BATCH
PROCESSING

STORAGE FOR ANY TYPE OF DATA

Filesystem

17

Online NoSQL

SYSTEM
MANAGEMENT

UNIFIED, ELASTIC, RESILIENT, SECURE
Cloudera EDH for Analytics
Use all data with
centralized mgmt
& security
ANALYTIC
SQL

SEARCH
ENGINE

MACHINE
LEARNING

STREAM
PROCESSING

WORKLOAD MANAGEMENT

UNIFIED, ELASTIC, RESILIENT, SECURE

HADOOP
Filesystem

18

Online NoSQL

SYSTEM
CLOUDERA MANAGER
MANAGEMENT

STORAGE FOR ANY TYPE OF DATA

3RD PARTY
APPS

DATA
MANAGEMENT

BATCH
MAPREDUCE
PROCESSING
Cloudera EDH for Analytics
Faster data
discovery
ANALYTIC
SQL

SEARCH
SEARCH
ENGINE

MACHINE
LEARNING

STREAM
PROCESSING

WORKLOAD MANAGEMENT

3RD PARTY
APPS

DATA
NAVIGATOR
MANAGEMENT

BATCH
PROCESSING

STORAGE FOR ANY TYPE OF DATA

Filesystem

19

Online NoSQL

SYSTEM
MANAGEMENT

UNIFIED, ELASTIC, RESILIENT, SECURE
Cloudera EDH for Analytics
Multiple tools on
one platform
ANALYTIC
IMPALA
SQL

SEARCH
ENGINE

SPARK/ ORYX
MACHINE
LEARNING
/ MAHOUT

STREAM
PROCESSING

WORKLOAD MANAGEMENT

RD
3RD PARTY
APPS

DATA
MANAGEMENT

BATCH
PROCESSING

STORAGE FOR ANY TYPE OF DATA

Filesystem

20

Online NoSQL

SYSTEM
MANAGEMENT

UNIFIED, ELASTIC, RESILIENT, SECURE
Cloudera EDH for Analytics
Operationalize
Models
ANALYTIC
SQL

SEARCH
ENGINE

MACHINE
LEARNING

SPARK
STREAM
STREAMING /
PROCESSING
FLUME

WORKLOAD MANAGEMENT

3RD PARTY
APPS

DATA
MANAGEMENT

BATCH
PROCESSING

STORAGE FOR ANY TYPE OF DATA

Filesystem

21

Online NoSQL

SYSTEM
MANAGEMENT

UNIFIED, ELASTIC, RESILIENT, SECURE
Cloudera Enterprise
CLOUDERA ENTERPRISE
ANALYTIC
SQL

SEARCH
ENGINE

MACHINE
LEARNING

STREAM
PROCESSING

WORKLOAD MANAGEMENT

3RD PARTY
APPS

DATA
MANAGEMENT

BATCH
PROCESSING

STORAGE FOR ANY TYPE OF DATA

Filesystem

22

Online NoSQL

SYSTEM
MANAGEMENT

UNIFIED, ELASTIC, RESILIENT, SECURE
Capabilities of Cloudera Enterprise

APACHE
HADOOP™

23
Capabilities of Cloudera Enterprise

APACHE
HADOOP™

24
Capabilities of Cloudera Enterprise

APACHE
HADOOP™

25
Capabilities of Cloudera Enterprise

APACHE
HADOOP™

26
Analytics Process with EDH

Problem
ID

Project
Definition

Data Access Request
& Discovery

Model
Creation
Data Transformation

Data Sampling
Model
Evaluation

Data
Preparation

Time-to-Insight
27

Model
Development

Deploy
Model

Model
Deployment
Analytics Process with EDH

Problem
ID

Project
Definition

Data
Access
Request &
Discovery

Data
Transformation

Data
Sampling

Data
Preparation

Time-to-Insight
28

Model
Creation
Model
Evaluation

Model
Development

Deploy
Model

Model
Deployment
Analytics Process with EDH

Problem
ID

Project
Definition

Data
Access
Request
&
Discovery

Data
Transformation

Data
Preparation

Data
Sampling

Model
Creation
Model
Evaluation

Model
Development

Deliver Insights Sooner
29

Deploy
Model

Model
Deployment
Business Value Delivered
Data Scientists

Executives

DBA/DW
Admins

• Acquire data
necessary for projects

• Acquire necessary
information sooner to
make critical business
decisions

• Support both
reporting and
analytics needs

• Develop
analysis/models with
better lift faster
• Share data sets to
empower others

30

• Save resources with
shared security and
management
Case Studies

31
Ask Bigger Questions:
How can we prevent
re-admittance?
Kaiser Permanente helps providers
recommend at-home action based on real-time data
to prevent hospital visits.

32
32
32
Kaiser Makes Medical Data Actionable
The Challenge:
•
•
•

Re-admittance is expensive, reflects sub-par provider-to-patient communications
IT infrastructures can’t accommodate 24x7 data streams from devices
Diverse medical ontologies present data challenge
Kaiser Permanente helps providers recommend
at-home action based on real-time data to prevent
hospital visits.

The Solution:
Cloudera EDH provides a scalable, flexible
platform for collection, ingestion &
dissemination of healthcare information
• Ingests real-time data streams of multistructured data
•

33
Ask Bigger Questions:
How do we feed the world?
Monsanto can automate data-driven R&D
decisions to reduce time to market from
years to months.

34
Monsanto feeds our growing, global population
The Challenge:
• 1,000+ research scientists developing products in silos
• Data processing bottleneck slows development
• Time to market for new product is 5-10 years
Monsanto can automate data-driven
R&D decisions to reduce time to
market to months from years.
The Solution:
• Cloudera Enterprise + Search + Impala: PB-scale
platform for single view of all R&D data
• Integration: Exadata, spatial awareness &
visualization
• Scientists directly access CDH; Navigator offers
auditing & access control
35
ARE YOU READY TO START?

Answer
questions using
ALL YOUR DATA

36
QUESTIONS?
•

Try Cloudera today

Type in the “Chat” panel to ask
a question

cloudera.com/downloads

Learn more

•

http://tinyurl.com/membtaw

Tweet @cloudera

Register now for Data Analysts Training
•

•

37

Follow Josh @josh_wills
Follow Sandy @sandyliiwozniak
Recording will be available
on-demand at cloudera.com

university.cloudera.com

•

•

Use discount code Analytics10 to save 10%
on new enrollments in classes delivered by
Cloudera until May 2014*
Use discount code 15off2 to save 15% on
enrollments in two or more classes
delivered by Cloudera until May 2014*

* Excludes classes sold or delivered by Cloudera Partners
Thank You!
Josh Wills
@josh_wills
Sandy Lii
@sandyliiwozniak

38

More Related Content

PPTX
Enterprise Data Hub: The Next Big Thing in Big Data
Cloudera, Inc.
 
PDF
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Technologies
 
PDF
Case study: Hadoop as ELT for Leading US Retailer - Happiest Minds
Happiest Minds Technologies
 
PPTX
Enable Advanced Analytics with Hadoop and an Enterprise Data Hub
Cloudera, Inc.
 
PPT
Emergence of MongoDB as an Enterprise Data Hub
MongoDB
 
PDF
AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14
Phillip Delaney
 
PPTX
The Future of Data Management: The Enterprise Data Hub
Cloudera, Inc.
 
PDF
The Future of Data Management: The Enterprise Data Hub
Cloudera, Inc.
 
Enterprise Data Hub: The Next Big Thing in Big Data
Cloudera, Inc.
 
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Technologies
 
Case study: Hadoop as ELT for Leading US Retailer - Happiest Minds
Happiest Minds Technologies
 
Enable Advanced Analytics with Hadoop and an Enterprise Data Hub
Cloudera, Inc.
 
Emergence of MongoDB as an Enterprise Data Hub
MongoDB
 
AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14
Phillip Delaney
 
The Future of Data Management: The Enterprise Data Hub
Cloudera, Inc.
 
The Future of Data Management: The Enterprise Data Hub
Cloudera, Inc.
 

What's hot (20)

PPT
Data Science Day New York: Data Science: A Personal History
Cloudera, Inc.
 
PPTX
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Cloudera, Inc.
 
PPTX
Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...
Cloudera, Inc.
 
PDF
Building the Modern Data Hub: Beyond the Traditional Enterprise Data Warehouse
Formant
 
PPTX
Evolution from Apache Hadoop to the Enterprise Data Hub by Cloudera - ArabNet...
ArabNet ME
 
PPTX
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB
 
PPTX
Govern This! Data Discovery and the application of data governance with new s...
Cloudera, Inc.
 
PPTX
Is Your Enterprise Ready to Shine This Holiday Season?
DataStax
 
PPTX
Building a Modern Analytic Database with Cloudera 5.8
Cloudera, Inc.
 
PPTX
Breakout: Data Discovery with Hadoop
Cloudera, Inc.
 
PPTX
Creating an Enterprise AI Strategy
AtScale
 
PPTX
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Cloudera, Inc.
 
PPTX
Cloudera Federal Forum 2014: The Building Blocks of the Enterprise Data Hub
Cloudera, Inc.
 
PPTX
Better Together: The New Data Management Orchestra
Cloudera, Inc.
 
PPTX
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Cloudera, Inc.
 
PDF
Making Big Data Easy for Everyone
Caserta
 
PPTX
Increase your ROI with Hadoop in Six Months - Presented by Dell, Cloudera and...
Cloudera, Inc.
 
PPTX
Unlocking data science in the enterprise - with Oracle and Cloudera
Cloudera, Inc.
 
PPTX
How to get Real-Time Value from your IoT Data - Datastax
DataStax
 
PPTX
2020 Big Data & Analytics Maturity Survey Results
AtScale
 
Data Science Day New York: Data Science: A Personal History
Cloudera, Inc.
 
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Cloudera, Inc.
 
Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...
Cloudera, Inc.
 
Building the Modern Data Hub: Beyond the Traditional Enterprise Data Warehouse
Formant
 
Evolution from Apache Hadoop to the Enterprise Data Hub by Cloudera - ArabNet...
ArabNet ME
 
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB
 
Govern This! Data Discovery and the application of data governance with new s...
Cloudera, Inc.
 
Is Your Enterprise Ready to Shine This Holiday Season?
DataStax
 
Building a Modern Analytic Database with Cloudera 5.8
Cloudera, Inc.
 
Breakout: Data Discovery with Hadoop
Cloudera, Inc.
 
Creating an Enterprise AI Strategy
AtScale
 
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Cloudera, Inc.
 
Cloudera Federal Forum 2014: The Building Blocks of the Enterprise Data Hub
Cloudera, Inc.
 
Better Together: The New Data Management Orchestra
Cloudera, Inc.
 
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Cloudera, Inc.
 
Making Big Data Easy for Everyone
Caserta
 
Increase your ROI with Hadoop in Six Months - Presented by Dell, Cloudera and...
Cloudera, Inc.
 
Unlocking data science in the enterprise - with Oracle and Cloudera
Cloudera, Inc.
 
How to get Real-Time Value from your IoT Data - Datastax
DataStax
 
2020 Big Data & Analytics Maturity Survey Results
AtScale
 
Ad

Viewers also liked (8)

PPTX
Strata + Hadoop World 2012: Taming the Elephant - Learn how Monsanto manages ...
Cloudera, Inc.
 
PDF
Monsanto Automates R&D Decisions with Big Data
Cloudera, Inc.
 
PDF
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
Global Business Events
 
PPTX
Data on the Move: Transitioning from a Legacy Architecture to a Big Data Plat...
MapR Technologies
 
PPTX
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Cloudera, Inc.
 
PPTX
Software is eating the world and MDD should be in the driving seat
Johan den Haan
 
PDF
Apache HBase in the Enterprise Data Hub at Cerner
HBaseCon
 
PDF
Building a Modern Data Architecture with Enterprise Hadoop
Slim Baltagi
 
Strata + Hadoop World 2012: Taming the Elephant - Learn how Monsanto manages ...
Cloudera, Inc.
 
Monsanto Automates R&D Decisions with Big Data
Cloudera, Inc.
 
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
Global Business Events
 
Data on the Move: Transitioning from a Legacy Architecture to a Big Data Plat...
MapR Technologies
 
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Cloudera, Inc.
 
Software is eating the world and MDD should be in the driving seat
Johan den Haan
 
Apache HBase in the Enterprise Data Hub at Cerner
HBaseCon
 
Building a Modern Data Architecture with Enterprise Hadoop
Slim Baltagi
 
Ad

Similar to Rethink Analytics with an Enterprise Data Hub (20)

PDF
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Denodo
 
PDF
Advanced Analytics and Machine Learning with Data Virtualization
Denodo
 
PDF
Gse uk-cedrinemadera-2018-shared
cedrinemadera
 
PDF
Advanced Analytics and Machine Learning with Data Virtualization (India)
Denodo
 
PDF
TOUG Big Data Challenge and Impact
Toronto-Oracle-Users-Group
 
PDF
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
Revolution Analytics
 
PDF
Data Discovery vs BI Webinar
Birst
 
PDF
02 a holistic approach to big data
Raul Chong
 
PDF
2022 Trends in Enterprise Analytics
DATAVERSITY
 
PDF
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
Denodo
 
PPTX
How to Capitalize on Big Data with Oracle Analytics Cloud
Perficient, Inc.
 
PDF
R for SAS Users Complement or Replace Two Strategies
Revolution Analytics
 
PDF
Cloudian 451-hortonworks - webinar
Hortonworks
 
PPTX
Breed data scientists_ A Presentation.pptx
GautamPopli1
 
PDF
Advanced Analytics and Machine Learning with Data Virtualization
Denodo
 
PPTX
Your data layer - Choosing the right database solutions for the future
ObjectRocket
 
PPTX
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
Infochimps, a CSC Big Data Business
 
PPTX
Accelerating Data Lakes and Streams with Real-time Analytics
Arcadia Data
 
PPTX
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
DataScienceConferenc1
 
PPTX
Hadoop and Manufacturing
Cloudera, Inc.
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Denodo
 
Advanced Analytics and Machine Learning with Data Virtualization
Denodo
 
Gse uk-cedrinemadera-2018-shared
cedrinemadera
 
Advanced Analytics and Machine Learning with Data Virtualization (India)
Denodo
 
TOUG Big Data Challenge and Impact
Toronto-Oracle-Users-Group
 
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
Revolution Analytics
 
Data Discovery vs BI Webinar
Birst
 
02 a holistic approach to big data
Raul Chong
 
2022 Trends in Enterprise Analytics
DATAVERSITY
 
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
Denodo
 
How to Capitalize on Big Data with Oracle Analytics Cloud
Perficient, Inc.
 
R for SAS Users Complement or Replace Two Strategies
Revolution Analytics
 
Cloudian 451-hortonworks - webinar
Hortonworks
 
Breed data scientists_ A Presentation.pptx
GautamPopli1
 
Advanced Analytics and Machine Learning with Data Virtualization
Denodo
 
Your data layer - Choosing the right database solutions for the future
ObjectRocket
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
Infochimps, a CSC Big Data Business
 
Accelerating Data Lakes and Streams with Real-time Analytics
Arcadia Data
 
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
DataScienceConferenc1
 
Hadoop and Manufacturing
Cloudera, Inc.
 

More from Cloudera, Inc. (20)

PPTX
Partner Briefing_January 25 (FINAL).pptx
Cloudera, Inc.
 
PPTX
Cloudera Data Impact Awards 2021 - Finalists
Cloudera, Inc.
 
PPTX
2020 Cloudera Data Impact Awards Finalists
Cloudera, Inc.
 
PPTX
Edc event vienna presentation 1 oct 2019
Cloudera, Inc.
 
PPTX
Machine Learning with Limited Labeled Data 4/3/19
Cloudera, Inc.
 
PPTX
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Cloudera, Inc.
 
PPTX
Introducing Cloudera DataFlow (CDF) 2.13.19
Cloudera, Inc.
 
PPTX
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Cloudera, Inc.
 
PPTX
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Cloudera, Inc.
 
PPTX
Leveraging the cloud for analytics and machine learning 1.29.19
Cloudera, Inc.
 
PPTX
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Cloudera, Inc.
 
PPTX
Leveraging the Cloud for Big Data Analytics 12.11.18
Cloudera, Inc.
 
PPTX
Modern Data Warehouse Fundamentals Part 3
Cloudera, Inc.
 
PPTX
Modern Data Warehouse Fundamentals Part 2
Cloudera, Inc.
 
PPTX
Modern Data Warehouse Fundamentals Part 1
Cloudera, Inc.
 
PPTX
Extending Cloudera SDX beyond the Platform
Cloudera, Inc.
 
PPTX
Federated Learning: ML with Privacy on the Edge 11.15.18
Cloudera, Inc.
 
PPTX
Analyst Webinar: Doing a 180 on Customer 360
Cloudera, Inc.
 
PPTX
Build a modern platform for anti-money laundering 9.19.18
Cloudera, Inc.
 
PPTX
Introducing the data science sandbox as a service 8.30.18
Cloudera, Inc.
 
Partner Briefing_January 25 (FINAL).pptx
Cloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
Cloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Cloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Cloudera, Inc.
 

Recently uploaded (20)

PDF
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
PDF
The Future of Artificial Intelligence (AI)
Mukul
 
PDF
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
PDF
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
PDF
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
PDF
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
PPTX
The Future of AI & Machine Learning.pptx
pritsen4700
 
PDF
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
PDF
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PPTX
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PPTX
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
PPTX
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
PPTX
Simple and concise overview about Quantum computing..pptx
mughal641
 
PPTX
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
The Future of Artificial Intelligence (AI)
Mukul
 
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
The Future of AI & Machine Learning.pptx
pritsen4700
 
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
Simple and concise overview about Quantum computing..pptx
mughal641
 
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 

Rethink Analytics with an Enterprise Data Hub

  • 1. Rethink Analytics: EDH for Advanced Analytics Josh Wills, Director of Data Science Sandy Lii, Senior Manager, Solutions Marketing 1
  • 2. Agenda • Market Background • Challenges and Limitations • EDH for Advanced Analytics • Case Studies • How to Get Started 2
  • 4. From BI to Advanced Analytics What will happen? How can we do better? What happened? When? And Where? How and why did it happen? Time Data Size 4 Facts Interpretations
  • 5. Advanced Analytics that Saves Us Money • Customer churn analysis model • Integrated customer support and services • Fraud detection 5 5
  • 6. Advanced Analytics that Makes Us Money • Product recommendation $ 6 6 engines • Location-based real-time offers • Target-based pricing strategy
  • 7. Traditional Advanced Analytics Process Problem ID Project Definition Data Access Request & Discovery Data Transformation Data Sampling Model Evaluation Data Preparation Time-to-Insight 7 Model Creation Model Development Deploy Model Model Deployment
  • 9. Accessing the Right Data is Difficult Multi-structured or External Data Structured Internal Data Data Warehouse 9
  • 10. “Are we there yet?” 2. Get access to data 3. Learn about the data 4. Move data to ADW and process data 1. Find the data 6. Model Deployment Data Discovery 5. Data Modeling 10
  • 11. Silo’d Platforms Challenge Collaboration & Mgmt Non-Agile Models Data Sources Departmental Warehouse Enterprise Apps Departmental Warehouse Reporting Silo’d Analytics Silo’d Analytics Opaque schemas accumulates over time 11 Silo’d Analytics
  • 12. Impact of Status Quo Executives “We don’t have the information we need to answer key business questions.” Data Scientists “I’m sick of waiting for my data, I’m going to make my own copy.” 12 DBA/DW Admins “I need to make sure the DW is secure & compliant for the mission critical reports.”
  • 14. Use All Your Data Use more data, and more types of data, with existing tools • Reduce the need to limit or move large datasets • Centralize information security, metadata, management, and governance • 14
  • 15. Shorten Analytics Lifecycle Facilitate data discovery • Track data life-cycle in place • Define, test, deploy, and update models all within a single platform • 15
  • 16. Do More with Data Deliver multi-genre analytics in a single platform • Apply diverse concurrent analytics to full datasets inplace • Protect existing technology and skillset investments • Search EDH Machine Learning BI 16 SQL Query In-memory analytics
  • 17. Cloudera EDH for Analytics ANALYTIC SQL SEARCH ENGINE MACHINE LEARNING STREAM PROCESSING WORKLOAD MANAGEMENT 3RD PARTY APPS DATA MANAGEMENT BATCH PROCESSING STORAGE FOR ANY TYPE OF DATA Filesystem 17 Online NoSQL SYSTEM MANAGEMENT UNIFIED, ELASTIC, RESILIENT, SECURE
  • 18. Cloudera EDH for Analytics Use all data with centralized mgmt & security ANALYTIC SQL SEARCH ENGINE MACHINE LEARNING STREAM PROCESSING WORKLOAD MANAGEMENT UNIFIED, ELASTIC, RESILIENT, SECURE HADOOP Filesystem 18 Online NoSQL SYSTEM CLOUDERA MANAGER MANAGEMENT STORAGE FOR ANY TYPE OF DATA 3RD PARTY APPS DATA MANAGEMENT BATCH MAPREDUCE PROCESSING
  • 19. Cloudera EDH for Analytics Faster data discovery ANALYTIC SQL SEARCH SEARCH ENGINE MACHINE LEARNING STREAM PROCESSING WORKLOAD MANAGEMENT 3RD PARTY APPS DATA NAVIGATOR MANAGEMENT BATCH PROCESSING STORAGE FOR ANY TYPE OF DATA Filesystem 19 Online NoSQL SYSTEM MANAGEMENT UNIFIED, ELASTIC, RESILIENT, SECURE
  • 20. Cloudera EDH for Analytics Multiple tools on one platform ANALYTIC IMPALA SQL SEARCH ENGINE SPARK/ ORYX MACHINE LEARNING / MAHOUT STREAM PROCESSING WORKLOAD MANAGEMENT RD 3RD PARTY APPS DATA MANAGEMENT BATCH PROCESSING STORAGE FOR ANY TYPE OF DATA Filesystem 20 Online NoSQL SYSTEM MANAGEMENT UNIFIED, ELASTIC, RESILIENT, SECURE
  • 21. Cloudera EDH for Analytics Operationalize Models ANALYTIC SQL SEARCH ENGINE MACHINE LEARNING SPARK STREAM STREAMING / PROCESSING FLUME WORKLOAD MANAGEMENT 3RD PARTY APPS DATA MANAGEMENT BATCH PROCESSING STORAGE FOR ANY TYPE OF DATA Filesystem 21 Online NoSQL SYSTEM MANAGEMENT UNIFIED, ELASTIC, RESILIENT, SECURE
  • 22. Cloudera Enterprise CLOUDERA ENTERPRISE ANALYTIC SQL SEARCH ENGINE MACHINE LEARNING STREAM PROCESSING WORKLOAD MANAGEMENT 3RD PARTY APPS DATA MANAGEMENT BATCH PROCESSING STORAGE FOR ANY TYPE OF DATA Filesystem 22 Online NoSQL SYSTEM MANAGEMENT UNIFIED, ELASTIC, RESILIENT, SECURE
  • 23. Capabilities of Cloudera Enterprise APACHE HADOOP™ 23
  • 24. Capabilities of Cloudera Enterprise APACHE HADOOP™ 24
  • 25. Capabilities of Cloudera Enterprise APACHE HADOOP™ 25
  • 26. Capabilities of Cloudera Enterprise APACHE HADOOP™ 26
  • 27. Analytics Process with EDH Problem ID Project Definition Data Access Request & Discovery Model Creation Data Transformation Data Sampling Model Evaluation Data Preparation Time-to-Insight 27 Model Development Deploy Model Model Deployment
  • 28. Analytics Process with EDH Problem ID Project Definition Data Access Request & Discovery Data Transformation Data Sampling Data Preparation Time-to-Insight 28 Model Creation Model Evaluation Model Development Deploy Model Model Deployment
  • 29. Analytics Process with EDH Problem ID Project Definition Data Access Request & Discovery Data Transformation Data Preparation Data Sampling Model Creation Model Evaluation Model Development Deliver Insights Sooner 29 Deploy Model Model Deployment
  • 30. Business Value Delivered Data Scientists Executives DBA/DW Admins • Acquire data necessary for projects • Acquire necessary information sooner to make critical business decisions • Support both reporting and analytics needs • Develop analysis/models with better lift faster • Share data sets to empower others 30 • Save resources with shared security and management
  • 32. Ask Bigger Questions: How can we prevent re-admittance? Kaiser Permanente helps providers recommend at-home action based on real-time data to prevent hospital visits. 32 32 32
  • 33. Kaiser Makes Medical Data Actionable The Challenge: • • • Re-admittance is expensive, reflects sub-par provider-to-patient communications IT infrastructures can’t accommodate 24x7 data streams from devices Diverse medical ontologies present data challenge Kaiser Permanente helps providers recommend at-home action based on real-time data to prevent hospital visits. The Solution: Cloudera EDH provides a scalable, flexible platform for collection, ingestion & dissemination of healthcare information • Ingests real-time data streams of multistructured data • 33
  • 34. Ask Bigger Questions: How do we feed the world? Monsanto can automate data-driven R&D decisions to reduce time to market from years to months. 34
  • 35. Monsanto feeds our growing, global population The Challenge: • 1,000+ research scientists developing products in silos • Data processing bottleneck slows development • Time to market for new product is 5-10 years Monsanto can automate data-driven R&D decisions to reduce time to market to months from years. The Solution: • Cloudera Enterprise + Search + Impala: PB-scale platform for single view of all R&D data • Integration: Exadata, spatial awareness & visualization • Scientists directly access CDH; Navigator offers auditing & access control 35
  • 36. ARE YOU READY TO START? Answer questions using ALL YOUR DATA 36
  • 37. QUESTIONS? • Try Cloudera today Type in the “Chat” panel to ask a question cloudera.com/downloads Learn more • http://tinyurl.com/membtaw Tweet @cloudera Register now for Data Analysts Training • • 37 Follow Josh @josh_wills Follow Sandy @sandyliiwozniak Recording will be available on-demand at cloudera.com university.cloudera.com • • Use discount code Analytics10 to save 10% on new enrollments in classes delivered by Cloudera until May 2014* Use discount code 15off2 to save 15% on enrollments in two or more classes delivered by Cloudera until May 2014* * Excludes classes sold or delivered by Cloudera Partners
  • 38. Thank You! Josh Wills @josh_wills Sandy Lii @sandyliiwozniak 38

Editor's Notes

  • #11: Challenge and ProblemsData discovery is 90% of the projectLong data discovery => Cannot iterate fast, cannot capture business value quicklyDS are expensive! Shorten the analytics lifecycle means you can get more project done in the same timeframe