SlideShare a Scribd company logo
1	
  
Becoming	
  Informa/on-­‐Driven	
  
Introduc/on	
  to	
  the	
  Enterprise	
  Data	
  Hub	
  
Mike	
  Olson	
  
Cloudera,	
  Inc.	
  
Co-­‐Founder	
  &	
  Chief	
  Strategy	
  Officer	
  
2	
  
Expanding	
  Data	
  Requires	
  A	
  New	
  Approach	
  
©2014	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  2	
  
1980s	
  
Bring	
  Data	
  to	
  Compute	
  
Now	
  
Bring	
  Compute	
  to	
  Data	
  
RelaEve	
  size	
  &	
  complexity	
  
Data	
  
InformaEon-­‐centric	
  
businesses	
  use	
  all	
  data:	
  
	
  	
  
Mul/-­‐structured,	
  	
  
internal	
  &	
  external	
  data	
  	
  
of	
  all	
  types	
  
Compute	
  
Compute	
  
Compute	
  
Process-­‐centric	
  	
  
businesses	
  use:	
  
	
  
• Structured	
  data	
  mainly	
  
• Internal	
  data	
  only	
  
• “Important”	
  data	
  only	
  
	
  
	
  
Compute	
  
Compute	
  
Compute	
  
Data	
  
Data	
  
Data	
  
Data	
  
3	
  
The	
  Old	
  Way:	
  Bringing	
  Data	
  to	
  Compute	
  
©2014	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  3	
  
Complex	
  Architecture	
  
•  Many	
  special-­‐purpose	
  
systems	
  
•  Moving	
  data	
  around	
  
•  No	
  complete	
  views	
  
Visibility	
  
•  Leaving	
  data	
  behind	
  
•  Risk	
  and	
  compliance	
  
•  High	
  cost	
  of	
  storage	
  
Time	
  to	
  Data	
  
•  Up-­‐front	
  modeling	
  
•  Transforms	
  slow	
  
•  Transforms	
  lose	
  data	
  
Cost	
  of	
  AnalyEcs	
  
•  Exis/ng	
  systems	
  strained	
  
•  No	
  agility	
  
•  BI	
  backlog	
  
4	
  
1	
  
2	
  
3	
  
SERVERS	
  MARTS	
  EDWS	
   DOCUMENTS	
   STORAGE	
   SEARCH	
   ARCHIVE	
  
ERP,	
  CRM,	
  RDBMS,	
  MACHINES	
   FILES,	
  IMAGES,	
  VIDEOS,	
  LOGS,	
  CLICKSTREAMS	
   EXTERNAL	
  DATA	
  SOURCES	
  
4	
  
SERVERS	
   MARTS	
   EDWS	
   DOCUMENTS	
   STORAGE	
   SEARCH	
   ARCHIVE	
  
ERP,	
  CRM,	
  RDBMS,	
  MACHINES	
   FILES,	
  IMAGES,	
  VIDEOS,	
  LOGS,	
  CLICKSTREAMS	
   ESTERNAL	
  DATA	
  SOURCES	
  
©2014	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
MulE-­‐workload	
  analyEc	
  plaRorm	
  
•  Bring	
  applica/ons	
  to	
  data	
  
•  Combine	
  different	
  workloads	
  on	
  	
  
common	
  data	
  (i.e.	
  SQL	
  +	
  Search)	
  
•  True	
  BI	
  agility	
  
4	
  
1	
  
2	
  
3	
   4	
  
The	
  New	
  Way:	
  Bringing	
  Compute	
  to	
  Data	
  
4	
  
AcEve	
  archive	
  
•  Full	
  fidelity	
  original	
  data	
  
•  Indefinite	
  /me,	
  any	
  source	
  
•  Lowest	
  cost	
  storage	
  
1	
  
Data	
  management,	
  transforms	
  
•  One	
  source	
  of	
  data	
  for	
  all	
  analy/cs	
  
•  Persist	
  state	
  of	
  transformed	
  data	
  
•  Significantly	
  faster	
  &	
  cheaper	
  
2	
  
Self-­‐service	
  exploratory	
  BI	
  
•  Simple	
  search	
  +	
  BI	
  tools	
  
•  “Schema	
  on	
  read”	
  agility	
  
•  Reduce	
  BI	
  user	
  backlog	
  requests	
  
3	
  
5	
  
Beeer,	
  faster,	
  cheaper	
  and	
  mul/-­‐framework	
  
BATCH	
  
PROCESSING	
  
MR	
  /	
  PIG/	
  Hive	
  /	
  Cascading	
  
SQL	
  
IMPALA	
  
SEARCH	
  
SOLR	
  
MACHINE	
  
LEARNING	
  
SAS,	
  R,	
  H20,	
  MLlib	
  
STREAM	
  
PROCESSING	
  
SPARK	
  STREAMING	
  
NOSQL	
  
HBASE	
  
Process	
  Data	
  
IN-­‐MEMORY	
  
SPARK	
  
Train	
  &	
  Test	
  
Models	
  
Respond	
  to	
  
Events	
  in	
  RT	
  
Explore	
  &	
  
Analyze	
  Data	
  
• Highly	
  mature	
  
• Wide	
  range	
  of	
  clients	
  
• Significant	
  advances	
  
in	
  speed	
  &	
  usability	
  
• Integra/on	
  with	
  the	
  
SAS	
  &	
  Revolu/on	
  
product	
  porgolio	
  
• Python	
  /	
  0xdata	
  /	
  ML	
  
lib	
  for	
  advanced	
  users	
  
• Very	
  low	
  (~10ms)	
  
latency	
  
• High	
  volumes	
  of	
  
single	
  events	
  
• High	
  speed	
  
• High	
  concurrency	
  
• Workload	
  mgt	
  
• Broad	
  BI	
  support	
  
• For	
  unstructured	
  &	
  
semi-­‐structured	
  data	
  
• For	
  business	
  users	
  
• Low	
  (1	
  second)	
  latency	
  
• Windows	
  (collec/ons)	
  
of	
  events	
  
©2014	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
6	
  
Opera/onal	
  Data	
  Store	
  
•  Consolidate,	
  cleanse	
  &	
  stage	
  
data	
  
•  Promote	
  to	
  other	
  opera/onal	
  
systems	
  or	
  EDW’s	
  
Data	
  Warehouse	
  
•  ELT	
  
•  Archive	
  
Ra/onalizing	
  exis/ng	
  infrastructure	
  
Migra/ng	
  data	
  sets,	
  workloads	
  or	
  en/re	
  systems	
  from	
  more	
  expensive	
  or	
  less	
  
flexible	
  systems	
  
©2014	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
7	
  
Combine	
  &	
  
explore	
  new	
  	
  
data	
  sets	
  
• Scrip/ng	
  
• Data	
  blending	
  
• Tradi/onal	
  ETL	
  
Support	
  ad-­‐hoc	
  
marts	
  and	
  self-­‐
serve	
  BI	
  users	
  
• Tableau,	
  Qlik	
  et	
  al	
  
Enable	
  data	
  
scien/sts	
  to	
  train	
  
&	
  test	
  models	
  
• ML	
  libraries	
  
• SAS,	
  Revolu/on	
  
What	
  do	
  we	
  mean	
  by	
  data	
  discovery?	
  
Providing	
  a	
  flexible	
  analy/c	
  sandbox	
  where	
  users	
  can	
  apply	
  mul/ple	
  tools	
  &	
  
techniques	
  to	
  derive	
  insights	
  from	
  new	
  &	
  tradi/onal	
  data	
  
©2014	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
8	
  
Analyze	
  paeerns	
  
over	
  deep	
  
histories	
  
• Recommenda/ons	
  
• Outliers	
  
Automate	
  
responses	
  to	
  new	
  
data	
  /	
  
observa/ons	
  
• Classifying	
  or	
  scoring	
  
new	
  data	
  
User	
  explora/on	
  /	
  
judgment	
  
applica/on	
  
• Reviewing	
  outliers	
  
• Overriding	
  sugges/ons	
  
What	
  do	
  we	
  mean	
  by	
  pervasive	
  analy/cs?	
  
Using	
  predic/ve	
  analy/cs	
  to	
  improve	
  business	
  processes	
  or	
  augment	
  
professional	
  judgment	
  in	
  an	
  automated	
  way	
  across	
  the	
  organiza/on	
  
©2014	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
9	
  
Big	
  Data	
  in	
  Credit	
  Card	
  Processing	
  
“Customer	
  privacy	
  is	
  
paramount,	
  but	
  we	
  need	
  to	
  
keep	
  vast	
  amounts	
  of	
  
informaFon	
  online	
  to	
  run	
  
our	
  business.	
  Can	
  we	
  achieve	
  
both	
  goals?”	
  
“Modern	
  credit	
  card	
  fraud	
  
rings	
  operate	
  globally	
  over	
  
long	
  Fme	
  scales	
  –	
  how	
  can	
  we	
  
collect,	
  store	
  &	
  analyze	
  the	
  
petabytes	
  of	
  data	
  it	
  takes	
  to	
  
detect	
  them?”	
  
“We	
  obviously	
  have	
  vast	
  and	
  
detailed	
  informaFon	
  about	
  
customer	
  purchases.	
  Can	
  we	
  
combine	
  it	
  with	
  GPS	
  &	
  mobile	
  
data,	
  combined	
  with	
  
browsing	
  behavior	
  to	
  offer	
  
new	
  products?”	
  
“How	
  can	
  we	
  deliver	
  what	
  
the	
  business	
  team	
  wants,	
  
and	
  faster,	
  without	
  
spending	
  tens	
  of	
  millions	
  of	
  
dollars	
  to	
  expand	
  our	
  data	
  
warehouse?”	
  
Fraud	
  DetecEon	
  
Regulatory	
  	
  
Compliance	
  
Product	
  &	
  Service	
  	
  
InnovaEon	
  
OperaEonal	
  	
  
Efficiency	
  
CFO	
  &	
  CRO	
   CIO	
  &	
  CRO	
   R&D,	
  CMO	
   CIO	
  
10	
  
Big	
  Data	
  in	
  Retail	
  
360°	
  Customer	
  View	
   Fraud	
  PrevenEon	
  
LogisEcs	
  &	
  	
  
Supply	
  Chain	
   OperaEonal	
  Efficiency	
  
CMO	
   CMO	
  &	
  	
  
Customer	
  Service	
  
CEO,	
  VP	
  OperaEons	
   CIO	
  
“We	
  want	
  to	
  know	
  what	
  our	
  
customer	
  do	
  on-­‐line	
  and	
  in	
  
our	
  stored.	
  How	
  can	
  we	
  
combine	
  data	
  from	
  separate	
  
analyFcs	
  silos	
  to	
  understand	
  
&	
  serve	
  them	
  beSer?”	
  
“TheT,	
  or	
  ‘shrinkage’	
  in	
  our	
  
stores	
  is	
  on	
  the	
  increase	
  –	
  
can	
  we	
  combine	
  POS	
  data	
  
with	
  video	
  surveillance	
  to	
  
reduce	
  it	
  without	
  impacFng	
  
customer	
  service	
  
negaFvely?”	
  
“How	
  can	
  we	
  reduce	
  stock-­‐
outs	
  &	
  ensure	
  products	
  are	
  in	
  
the	
  right	
  stores	
  at	
  the	
  right	
  
Fme?	
  Can	
  we	
  combine	
  data	
  
from	
  our	
  carriers	
  with	
  in-­‐
store	
  historical	
  data	
  from	
  
thousands	
  of	
  stores?	
  
“Our	
  EDW	
  infrastructure	
  is	
  
being	
  overwhelmed	
  with	
  
data	
  and	
  workloads;	
  we	
  are	
  
running	
  into	
  capacity	
  limits,	
  
and	
  the	
  annual	
  costs	
  of	
  
expansion	
  are	
  in	
  the	
  tens	
  of	
  
millions.	
  What	
  can	
  we	
  do?”	
  
11	
  
Big	
  Data	
  in	
  Health	
  Care	
  
360°	
  PaEent	
  View	
  
Regulatory	
  
Compliance	
  
Maximize	
  
Medical	
  Efficacy	
   OperaEonal	
  Efficiency	
  
VP	
  OperaEons,	
  	
  
Chief	
  of	
  Compliance	
  
VP	
  OperaEons	
  
Chief	
  Medical	
  Officer	
  
CFO	
  
Chief	
  Medical	
  Officer	
  
CIO	
  
“PaFent	
  data	
  ends	
  up	
  
scaSered	
  across	
  many	
  
different	
  systems	
  –	
  is	
  there	
  a	
  
way	
  to	
  get	
  a	
  complete	
  picture	
  
by	
  combining	
  it	
  while	
  
ensuring	
  HIPAA	
  compliance?”	
  
“The	
  move	
  to	
  EMR	
  combined	
  
with	
  the	
  strict	
  regulaFons	
  
means	
  we	
  need	
  to	
  keep	
  at	
  
least	
  7	
  years	
  of	
  data	
  online	
  –	
  
how	
  can	
  we	
  afford	
  to	
  do	
  that	
  
and	
  make	
  it	
  searchable	
  and	
  
available	
  for	
  analysis?”	
  
“We	
  invest	
  hundreds	
  of	
  
millions	
  in	
  new	
  equipment	
  
every	
  year.	
  How	
  can	
  we	
  judge	
  
the	
  long	
  term	
  efficacy	
  for	
  
paFent	
  outcomes,	
  and	
  make	
  
smarter	
  investment	
  
decisions?”	
  
“Our	
  EDW	
  infrastructure	
  is	
  
being	
  overwhelmed	
  with	
  data	
  
and	
  workloads;	
  we	
  are	
  
running	
  into	
  capacity	
  limits,	
  
and	
  the	
  annual	
  costs	
  of	
  
expansion	
  are	
  in	
  the	
  tens	
  of	
  
millions.	
  What	
  can	
  we	
  do?”	
  
12
13
14	
   ©2014	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Mike	
  Olson	
  
@mikeolson	
  
mike.olson@cloudera.com	
  

More Related Content

What's hot (20)

PDF
Leverage Big Data to Enhance Customer Experience in Telecommunications – with...
Hortonworks
 
PPTX
Hadoop and Manufacturing
Cloudera, Inc.
 
PPTX
Hortonworks Oracle Big Data Integration
Hortonworks
 
PPTX
10 Amazing Things To Do With a Hadoop-Based Data Lake
VMware Tanzu
 
PPTX
Breakout: Operational Analytics with Hadoop
Cloudera, Inc.
 
PDF
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Technologies
 
PPTX
Building a Modern Analytic Database with Cloudera 5.8
Cloudera, Inc.
 
PDF
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...
StampedeCon
 
PDF
Dataguise hortonworks insurance_feb25
Hortonworks
 
PDF
Solving Big Data Problems using Hortonworks
DataWorks Summit/Hadoop Summit
 
PDF
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Pentaho
 
PPTX
Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse
DataWorks Summit
 
PDF
IMCSummit 2015 - Day 2 IT Business Track - Real-time Interactive Big Data Ana...
In-Memory Computing Summit
 
PDF
The path to a Modern Data Architecture in Financial Services
Hortonworks
 
PPTX
Oracle's BigData solutions
Swiss Big Data User Group
 
PDF
Data Governance for Data Lakes
Kiran Kamreddy
 
PDF
How to Become an Analytics Ready Insurer - with Informatica and Hortonworks
Hortonworks
 
PPTX
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...
StampedeCon
 
PPTX
Govern This! Data Discovery and the application of data governance with new s...
Cloudera, Inc.
 
PDF
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Pentaho
 
Leverage Big Data to Enhance Customer Experience in Telecommunications – with...
Hortonworks
 
Hadoop and Manufacturing
Cloudera, Inc.
 
Hortonworks Oracle Big Data Integration
Hortonworks
 
10 Amazing Things To Do With a Hadoop-Based Data Lake
VMware Tanzu
 
Breakout: Operational Analytics with Hadoop
Cloudera, Inc.
 
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Technologies
 
Building a Modern Analytic Database with Cloudera 5.8
Cloudera, Inc.
 
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...
StampedeCon
 
Dataguise hortonworks insurance_feb25
Hortonworks
 
Solving Big Data Problems using Hortonworks
DataWorks Summit/Hadoop Summit
 
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Pentaho
 
Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse
DataWorks Summit
 
IMCSummit 2015 - Day 2 IT Business Track - Real-time Interactive Big Data Ana...
In-Memory Computing Summit
 
The path to a Modern Data Architecture in Financial Services
Hortonworks
 
Oracle's BigData solutions
Swiss Big Data User Group
 
Data Governance for Data Lakes
Kiran Kamreddy
 
How to Become an Analytics Ready Insurer - with Informatica and Hortonworks
Hortonworks
 
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...
StampedeCon
 
Govern This! Data Discovery and the application of data governance with new s...
Cloudera, Inc.
 
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Pentaho
 

Similar to Ask bigger questions (20)

PPTX
The Future of Data Management: The Enterprise Data Hub
Cloudera, Inc.
 
PPTX
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB
 
PDF
The Future of Data Management: The Enterprise Data Hub
Cloudera, Inc.
 
PPTX
Big data solutions on cloud – the way forward
Kiththi Perera
 
PPTX
Big Data Solutions on Cloud – The Way Forward by Kiththi Perera SLT
Kiththi Perera
 
PDF
Big data rmoug
Gwen (Chen) Shapira
 
PDF
The Shifting Landscape of Data Integration
DATAVERSITY
 
PPTX
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
DataScienceConferenc1
 
PPTX
5 Things that Make Hadoop a Game Changer
Caserta
 
PPTX
Data lake-itweekend-sharif university-vahid amiry
datastack
 
PPTX
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Cloudera, Inc.
 
PDF
2022 Trends in Enterprise Analytics
DATAVERSITY
 
PPTX
Big Data Case study - caixa bank
Chungsik Yun
 
PDF
Cloudian 451-hortonworks - webinar
Hortonworks
 
PPTX
Hadoop and Your Data Warehouse
Caserta
 
PDF
The Value of Customer Insights & Analytics in a Modern Retail Environment
Denodo
 
PPTX
DATA WAREHOUSING
Rishikese MR
 
PDF
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Precisely
 
PPTX
Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...
BigDataEverywhere
 
PDF
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
DATAVERSITY
 
The Future of Data Management: The Enterprise Data Hub
Cloudera, Inc.
 
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB
 
The Future of Data Management: The Enterprise Data Hub
Cloudera, Inc.
 
Big data solutions on cloud – the way forward
Kiththi Perera
 
Big Data Solutions on Cloud – The Way Forward by Kiththi Perera SLT
Kiththi Perera
 
Big data rmoug
Gwen (Chen) Shapira
 
The Shifting Landscape of Data Integration
DATAVERSITY
 
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
DataScienceConferenc1
 
5 Things that Make Hadoop a Game Changer
Caserta
 
Data lake-itweekend-sharif university-vahid amiry
datastack
 
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Cloudera, Inc.
 
2022 Trends in Enterprise Analytics
DATAVERSITY
 
Big Data Case study - caixa bank
Chungsik Yun
 
Cloudian 451-hortonworks - webinar
Hortonworks
 
Hadoop and Your Data Warehouse
Caserta
 
The Value of Customer Insights & Analytics in a Modern Retail Environment
Denodo
 
DATA WAREHOUSING
Rishikese MR
 
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Precisely
 
Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...
BigDataEverywhere
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
DATAVERSITY
 
Ad

More from South West Data Meetup (11)

PDF
Leveraging open source for large scale analytics
South West Data Meetup
 
PDF
Met Office Informatics Lab
South West Data Meetup
 
PDF
Time Series Analytics for Big Fast Data
South West Data Meetup
 
PDF
@Bristol Data Dome Workshop (ISO/Urban Tide)
South West Data Meetup
 
PPTX
Assurance Scoring: using machine learning and analytics to reduce risk in the...
South West Data Meetup
 
PDF
Imagine Bristol - interactive workshop day
South West Data Meetup
 
PDF
Open Data Institute (ODI) Node
South West Data Meetup
 
PPTX
Bristol's Open Data Journey
South West Data Meetup
 
PDF
@Bristol Data Dome workshop - NSC Creative
South West Data Meetup
 
PDF
Declarative data analysis
South West Data Meetup
 
PPTX
Bristol is Open: Exploring Open Data in the City
South West Data Meetup
 
Leveraging open source for large scale analytics
South West Data Meetup
 
Met Office Informatics Lab
South West Data Meetup
 
Time Series Analytics for Big Fast Data
South West Data Meetup
 
@Bristol Data Dome Workshop (ISO/Urban Tide)
South West Data Meetup
 
Assurance Scoring: using machine learning and analytics to reduce risk in the...
South West Data Meetup
 
Imagine Bristol - interactive workshop day
South West Data Meetup
 
Open Data Institute (ODI) Node
South West Data Meetup
 
Bristol's Open Data Journey
South West Data Meetup
 
@Bristol Data Dome workshop - NSC Creative
South West Data Meetup
 
Declarative data analysis
South West Data Meetup
 
Bristol is Open: Exploring Open Data in the City
South West Data Meetup
 
Ad

Recently uploaded (20)

PPTX
GenAI-Introduction-to-Copilot-for-Bing-March-2025-FOR-HUB.pptx
cleydsonborges1
 
PPTX
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
PDF
How to Connect Your On-Premises Site to AWS Using Site-to-Site VPN.pdf
Tamanna
 
PPTX
The _Operations_on_Functions_Addition subtruction Multiplication and Division...
mdregaspi24
 
PPTX
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
PDF
Context Engineering for AI Agents, approaches, memories.pdf
Tamanna
 
PPT
Data base management system Transactions.ppt
gandhamcharan2006
 
PPTX
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
PPTX
AI Presentation Tool Pitch Deck Presentation.pptx
ShyamPanthavoor1
 
PPTX
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
PDF
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
PDF
apidays Helsinki & North 2025 - REST in Peace? Hunting the Dominant Design fo...
apidays
 
PDF
AUDITABILITY & COMPLIANCE OF AI SYSTEMS IN HEALTHCARE
GAHI Youssef
 
PDF
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
PDF
Early_Diabetes_Detection_using_Machine_L.pdf
maria879693
 
PPTX
apidays Helsinki & North 2025 - API access control strategies beyond JWT bear...
apidays
 
PDF
List of all the AI prompt cheat codes.pdf
Avijit Kumar Roy
 
PDF
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
PPTX
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
PPTX
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
GenAI-Introduction-to-Copilot-for-Bing-March-2025-FOR-HUB.pptx
cleydsonborges1
 
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
How to Connect Your On-Premises Site to AWS Using Site-to-Site VPN.pdf
Tamanna
 
The _Operations_on_Functions_Addition subtruction Multiplication and Division...
mdregaspi24
 
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
Context Engineering for AI Agents, approaches, memories.pdf
Tamanna
 
Data base management system Transactions.ppt
gandhamcharan2006
 
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
AI Presentation Tool Pitch Deck Presentation.pptx
ShyamPanthavoor1
 
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
apidays Helsinki & North 2025 - REST in Peace? Hunting the Dominant Design fo...
apidays
 
AUDITABILITY & COMPLIANCE OF AI SYSTEMS IN HEALTHCARE
GAHI Youssef
 
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
Early_Diabetes_Detection_using_Machine_L.pdf
maria879693
 
apidays Helsinki & North 2025 - API access control strategies beyond JWT bear...
apidays
 
List of all the AI prompt cheat codes.pdf
Avijit Kumar Roy
 
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 

Ask bigger questions

  • 1. 1   Becoming  Informa/on-­‐Driven   Introduc/on  to  the  Enterprise  Data  Hub   Mike  Olson   Cloudera,  Inc.   Co-­‐Founder  &  Chief  Strategy  Officer  
  • 2. 2   Expanding  Data  Requires  A  New  Approach   ©2014  Cloudera,  Inc.  All  rights  reserved.  2   1980s   Bring  Data  to  Compute   Now   Bring  Compute  to  Data   RelaEve  size  &  complexity   Data   InformaEon-­‐centric   businesses  use  all  data:       Mul/-­‐structured,     internal  &  external  data     of  all  types   Compute   Compute   Compute   Process-­‐centric     businesses  use:     • Structured  data  mainly   • Internal  data  only   • “Important”  data  only       Compute   Compute   Compute   Data   Data   Data   Data  
  • 3. 3   The  Old  Way:  Bringing  Data  to  Compute   ©2014  Cloudera,  Inc.  All  rights  reserved.  3   Complex  Architecture   •  Many  special-­‐purpose   systems   •  Moving  data  around   •  No  complete  views   Visibility   •  Leaving  data  behind   •  Risk  and  compliance   •  High  cost  of  storage   Time  to  Data   •  Up-­‐front  modeling   •  Transforms  slow   •  Transforms  lose  data   Cost  of  AnalyEcs   •  Exis/ng  systems  strained   •  No  agility   •  BI  backlog   4   1   2   3   SERVERS  MARTS  EDWS   DOCUMENTS   STORAGE   SEARCH   ARCHIVE   ERP,  CRM,  RDBMS,  MACHINES   FILES,  IMAGES,  VIDEOS,  LOGS,  CLICKSTREAMS   EXTERNAL  DATA  SOURCES  
  • 4. 4   SERVERS   MARTS   EDWS   DOCUMENTS   STORAGE   SEARCH   ARCHIVE   ERP,  CRM,  RDBMS,  MACHINES   FILES,  IMAGES,  VIDEOS,  LOGS,  CLICKSTREAMS   ESTERNAL  DATA  SOURCES   ©2014  Cloudera,  Inc.  All  rights  reserved.   MulE-­‐workload  analyEc  plaRorm   •  Bring  applica/ons  to  data   •  Combine  different  workloads  on     common  data  (i.e.  SQL  +  Search)   •  True  BI  agility   4   1   2   3   4   The  New  Way:  Bringing  Compute  to  Data   4   AcEve  archive   •  Full  fidelity  original  data   •  Indefinite  /me,  any  source   •  Lowest  cost  storage   1   Data  management,  transforms   •  One  source  of  data  for  all  analy/cs   •  Persist  state  of  transformed  data   •  Significantly  faster  &  cheaper   2   Self-­‐service  exploratory  BI   •  Simple  search  +  BI  tools   •  “Schema  on  read”  agility   •  Reduce  BI  user  backlog  requests   3  
  • 5. 5   Beeer,  faster,  cheaper  and  mul/-­‐framework   BATCH   PROCESSING   MR  /  PIG/  Hive  /  Cascading   SQL   IMPALA   SEARCH   SOLR   MACHINE   LEARNING   SAS,  R,  H20,  MLlib   STREAM   PROCESSING   SPARK  STREAMING   NOSQL   HBASE   Process  Data   IN-­‐MEMORY   SPARK   Train  &  Test   Models   Respond  to   Events  in  RT   Explore  &   Analyze  Data   • Highly  mature   • Wide  range  of  clients   • Significant  advances   in  speed  &  usability   • Integra/on  with  the   SAS  &  Revolu/on   product  porgolio   • Python  /  0xdata  /  ML   lib  for  advanced  users   • Very  low  (~10ms)   latency   • High  volumes  of   single  events   • High  speed   • High  concurrency   • Workload  mgt   • Broad  BI  support   • For  unstructured  &   semi-­‐structured  data   • For  business  users   • Low  (1  second)  latency   • Windows  (collec/ons)   of  events   ©2014  Cloudera,  Inc.  All  rights  reserved.  
  • 6. 6   Opera/onal  Data  Store   •  Consolidate,  cleanse  &  stage   data   •  Promote  to  other  opera/onal   systems  or  EDW’s   Data  Warehouse   •  ELT   •  Archive   Ra/onalizing  exis/ng  infrastructure   Migra/ng  data  sets,  workloads  or  en/re  systems  from  more  expensive  or  less   flexible  systems   ©2014  Cloudera,  Inc.  All  rights  reserved.  
  • 7. 7   Combine  &   explore  new     data  sets   • Scrip/ng   • Data  blending   • Tradi/onal  ETL   Support  ad-­‐hoc   marts  and  self-­‐ serve  BI  users   • Tableau,  Qlik  et  al   Enable  data   scien/sts  to  train   &  test  models   • ML  libraries   • SAS,  Revolu/on   What  do  we  mean  by  data  discovery?   Providing  a  flexible  analy/c  sandbox  where  users  can  apply  mul/ple  tools  &   techniques  to  derive  insights  from  new  &  tradi/onal  data   ©2014  Cloudera,  Inc.  All  rights  reserved.  
  • 8. 8   Analyze  paeerns   over  deep   histories   • Recommenda/ons   • Outliers   Automate   responses  to  new   data  /   observa/ons   • Classifying  or  scoring   new  data   User  explora/on  /   judgment   applica/on   • Reviewing  outliers   • Overriding  sugges/ons   What  do  we  mean  by  pervasive  analy/cs?   Using  predic/ve  analy/cs  to  improve  business  processes  or  augment   professional  judgment  in  an  automated  way  across  the  organiza/on   ©2014  Cloudera,  Inc.  All  rights  reserved.  
  • 9. 9   Big  Data  in  Credit  Card  Processing   “Customer  privacy  is   paramount,  but  we  need  to   keep  vast  amounts  of   informaFon  online  to  run   our  business.  Can  we  achieve   both  goals?”   “Modern  credit  card  fraud   rings  operate  globally  over   long  Fme  scales  –  how  can  we   collect,  store  &  analyze  the   petabytes  of  data  it  takes  to   detect  them?”   “We  obviously  have  vast  and   detailed  informaFon  about   customer  purchases.  Can  we   combine  it  with  GPS  &  mobile   data,  combined  with   browsing  behavior  to  offer   new  products?”   “How  can  we  deliver  what   the  business  team  wants,   and  faster,  without   spending  tens  of  millions  of   dollars  to  expand  our  data   warehouse?”   Fraud  DetecEon   Regulatory     Compliance   Product  &  Service     InnovaEon   OperaEonal     Efficiency   CFO  &  CRO   CIO  &  CRO   R&D,  CMO   CIO  
  • 10. 10   Big  Data  in  Retail   360°  Customer  View   Fraud  PrevenEon   LogisEcs  &     Supply  Chain   OperaEonal  Efficiency   CMO   CMO  &     Customer  Service   CEO,  VP  OperaEons   CIO   “We  want  to  know  what  our   customer  do  on-­‐line  and  in   our  stored.  How  can  we   combine  data  from  separate   analyFcs  silos  to  understand   &  serve  them  beSer?”   “TheT,  or  ‘shrinkage’  in  our   stores  is  on  the  increase  –   can  we  combine  POS  data   with  video  surveillance  to   reduce  it  without  impacFng   customer  service   negaFvely?”   “How  can  we  reduce  stock-­‐ outs  &  ensure  products  are  in   the  right  stores  at  the  right   Fme?  Can  we  combine  data   from  our  carriers  with  in-­‐ store  historical  data  from   thousands  of  stores?   “Our  EDW  infrastructure  is   being  overwhelmed  with   data  and  workloads;  we  are   running  into  capacity  limits,   and  the  annual  costs  of   expansion  are  in  the  tens  of   millions.  What  can  we  do?”  
  • 11. 11   Big  Data  in  Health  Care   360°  PaEent  View   Regulatory   Compliance   Maximize   Medical  Efficacy   OperaEonal  Efficiency   VP  OperaEons,     Chief  of  Compliance   VP  OperaEons   Chief  Medical  Officer   CFO   Chief  Medical  Officer   CIO   “PaFent  data  ends  up   scaSered  across  many   different  systems  –  is  there  a   way  to  get  a  complete  picture   by  combining  it  while   ensuring  HIPAA  compliance?”   “The  move  to  EMR  combined   with  the  strict  regulaFons   means  we  need  to  keep  at   least  7  years  of  data  online  –   how  can  we  afford  to  do  that   and  make  it  searchable  and   available  for  analysis?”   “We  invest  hundreds  of   millions  in  new  equipment   every  year.  How  can  we  judge   the  long  term  efficacy  for   paFent  outcomes,  and  make   smarter  investment   decisions?”   “Our  EDW  infrastructure  is   being  overwhelmed  with  data   and  workloads;  we  are   running  into  capacity  limits,   and  the  annual  costs  of   expansion  are  in  the  tens  of   millions.  What  can  we  do?”  
  • 12. 12
  • 13. 13
  • 14. 14   ©2014  Cloudera,  Inc.  All  rights  reserved.   Mike  Olson   @mikeolson   [email protected]