SlideShare a Scribd company logo
1	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Mike	
  Olson	
  |	
  co-­‐founder	
  and	
  chief	
  strategy	
  officer	
  
Spark	
  in	
  the	
  Hadoop	
  Ecosystem	
  
2	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Hadoop:	
  From	
  MapReduce	
  to	
  an	
  Enterprise	
  Data	
  Hub	
  
Hadoop	
  delivers:	
  
•  One	
  place	
  for	
  unlimited	
  data	
  
•  Unified,	
  mulM-­‐framework	
  data	
  access	
  
	
  
Enterprises	
  require:	
  
•  Leading	
  Performance	
  
•  Open	
  Source,	
  Open	
  Standards	
  
•  Enterprise	
  Security	
  
•  Data	
  Governance	
  
•  Complete	
  Management	
  
Security	
  and	
  AdministraMon	
  
Unlimited	
  Storage	
  
Process	
   Discover	
   Model	
   Serve	
  
Deployment	
  
Flexibility	
  
On-­‐Premises	
  
Appliances	
  
Engineered	
  Systems	
  
Public	
  Cloud	
  
Private	
  Cloud	
  
Hybrid	
  Cloud	
  
A	
  modern	
  data	
  plaSorm	
  plus	
  what	
  the	
  enterprise	
  requires.	
  
3	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Where	
  Spark	
  Fits	
  in	
  the	
  Hadoop	
  Ecosystem	
  
YARN: Shared resource management
HDFS and HBase: Shared storage
Impala
Hive
Pig
MapReduce2
Search
Spark
Spark
Streaming
Hive
(beta)
Pig
(beta)
…	
  
With	
  common	
  
	
  
•  Security	
  
•  Data	
  governance	
  
•  ConfiguraMon,	
  
deployment	
  and	
  
operaMons	
  
	
  
across	
  all	
  
components	
  in	
  the	
  
stack	
  
4	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Process	
  millions	
  of	
  
equity	
  and	
  bond	
  	
  
market	
  posiMons,	
  and	
  
evaluate	
  against	
  
future	
  scenarios	
  in	
  
minutes,	
  versus	
  days	
  
with	
  MapReduce.	
  
Major	
  Global	
  
Financial	
  InsMtuMon	
  
5	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Monitor	
  on-­‐line	
  user	
  
acMvity	
  and	
  opMmize	
  
content	
  delivery	
  and	
  
search	
  results	
  in	
  real	
  
Mme.	
  
Large	
  Consumer	
  
Company	
  
6	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Ingest	
  and	
  analyze	
  
complex	
  data	
  from	
  a	
  
variety	
  of	
  sources	
  
conMnually,	
  building	
  
new	
  risk	
  and	
  value	
  
models	
  in	
  real	
  Mme	
  
7	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Combine	
  genomic	
  and	
  
phenotype	
  data	
  with	
  
other	
  data	
  sources	
  to	
  
understand	
  disease	
  
onset	
  and	
  progression	
  
8	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Spark	
  extends	
  the	
  
Hadoop	
  ecosystem	
  
with	
  new	
  analyMc	
  
and	
  processing	
  
capabiliMes.	
  
8	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
9	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Thank	
  you!	
  
Mike	
  Olson,	
  chief	
  strategy	
  officer	
  
mike.olson@cloudera.com	
  
@mikeolson	
  

More Related Content

What's hot (20)

PPTX
Supercharge Splunk with Cloudera

Cloudera, Inc.
 
PDF
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Spark Summit
 
PDF
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Databricks
 
PDF
Pandas UDF: Scalable Analysis with Python and PySpark
Li Jin
 
PDF
Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Sriram Krishnan
 
PDF
Auto Scaling Systems With Elastic Spark Streaming: Spark Summit East talk by ...
Spark Summit
 
PDF
Real World Use Cases: Hadoop and NoSQL in Production
Codemotion
 
PPTX
High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summi...
Spark Summit
 
PPT
A Community Approach to Fighting Cyber Threats
Cloudera, Inc.
 
PPTX
GPU 101: The Beast In Data Centers
Rommel Garcia
 
PDF
Announcing Databricks Cloud (Spark Summit 2014)
Databricks
 
PPTX
Keynote – From MapReduce to Spark: An Ecosystem Evolves by Doug Cutting, Chie...
Cloudera, Inc.
 
PDF
Machine learning at scale challenges and solutions
Stavros Kontopoulos
 
PDF
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
DataWorks Summit
 
PPTX
Data infrastructure architecture for medium size organization: tips for colle...
DataWorks Summit/Hadoop Summit
 
PPTX
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
Mathieu Dumoulin
 
PDF
Simplify and Scale Data Engineering Pipelines with Delta Lake
Databricks
 
PPTX
Apache solr performance and scalability effort update palo alto 2017%2 f7
Cloudera, Inc.
 
PPTX
Hadoop Everywhere
DataWorks Summit/Hadoop Summit
 
Supercharge Splunk with Cloudera

Cloudera, Inc.
 
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Spark Summit
 
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Databricks
 
Pandas UDF: Scalable Analysis with Python and PySpark
Li Jin
 
Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Sriram Krishnan
 
Auto Scaling Systems With Elastic Spark Streaming: Spark Summit East talk by ...
Spark Summit
 
Real World Use Cases: Hadoop and NoSQL in Production
Codemotion
 
High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summi...
Spark Summit
 
A Community Approach to Fighting Cyber Threats
Cloudera, Inc.
 
GPU 101: The Beast In Data Centers
Rommel Garcia
 
Announcing Databricks Cloud (Spark Summit 2014)
Databricks
 
Keynote – From MapReduce to Spark: An Ecosystem Evolves by Doug Cutting, Chie...
Cloudera, Inc.
 
Machine learning at scale challenges and solutions
Stavros Kontopoulos
 
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
DataWorks Summit
 
Data infrastructure architecture for medium size organization: tips for colle...
DataWorks Summit/Hadoop Summit
 
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
Mathieu Dumoulin
 
Simplify and Scale Data Engineering Pipelines with Delta Lake
Databricks
 
Apache solr performance and scalability effort update palo alto 2017%2 f7
Cloudera, Inc.
 

Viewers also liked (20)

PPTX
Intro to Apache Spark by Marco Vasquez
MapR Technologies
 
PPTX
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Cloudera, Inc.
 
PDF
Apache Spark at Viadeo
Cepoi Eugen
 
PPTX
Intro to Apache Spark
Cloudera, Inc.
 
PPTX
Recommendation Techn
Ted Dunning
 
PPTX
Big Data Paris
MapR Technologies
 
PDF
The Evolution of Data Analysis with Hadoop - StampedeCon 2014
StampedeCon
 
PPTX
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Cloudera, Inc.
 
PPTX
Intro to Spark development
Spark Summit
 
PDF
Anatomy of Spark SQL Catalyst - Part 2
datamantra
 
PPTX
Moving Beyond Lambda Architectures with Apache Kudu
Cloudera, Inc.
 
PPTX
Practical Machine Learning: Innovations in Recommendation Workshop
MapR Technologies
 
PPTX
Deep Dive : Spark Data Frames, SQL and Catalyst Optimizer
Sachin Aggarwal
 
PPTX
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Ted Dunning
 
PPTX
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Cloudera, Inc.
 
PDF
Spark Application for Time Series Analysis
MapR Technologies
 
PPTX
Hadoop and Hive in Enterprises
markgrover
 
PPTX
Advanced Apache Spark Meetup Spark SQL + DataFrames + Catalyst Optimizer + Da...
Chris Fregly
 
PDF
Introduction to Spark (Intern Event Presentation)
Databricks
 
PPTX
Real Time and Big Data – It’s About Time
MapR Technologies
 
Intro to Apache Spark by Marco Vasquez
MapR Technologies
 
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Cloudera, Inc.
 
Apache Spark at Viadeo
Cepoi Eugen
 
Intro to Apache Spark
Cloudera, Inc.
 
Recommendation Techn
Ted Dunning
 
Big Data Paris
MapR Technologies
 
The Evolution of Data Analysis with Hadoop - StampedeCon 2014
StampedeCon
 
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Cloudera, Inc.
 
Intro to Spark development
Spark Summit
 
Anatomy of Spark SQL Catalyst - Part 2
datamantra
 
Moving Beyond Lambda Architectures with Apache Kudu
Cloudera, Inc.
 
Practical Machine Learning: Innovations in Recommendation Workshop
MapR Technologies
 
Deep Dive : Spark Data Frames, SQL and Catalyst Optimizer
Sachin Aggarwal
 
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Ted Dunning
 
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Cloudera, Inc.
 
Spark Application for Time Series Analysis
MapR Technologies
 
Hadoop and Hive in Enterprises
markgrover
 
Advanced Apache Spark Meetup Spark SQL + DataFrames + Catalyst Optimizer + Da...
Chris Fregly
 
Introduction to Spark (Intern Event Presentation)
Databricks
 
Real Time and Big Data – It’s About Time
MapR Technologies
 
Ad

Similar to Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera) (20)

PPTX
Spark One Platform Webinar
Cloudera, Inc.
 
PPTX
Apache Spark: Usage and Roadmap in Hadoop
Cloudera Japan
 
PPTX
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Cloudera, Inc.
 
PPTX
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Stefan Lipp
 
PPTX
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Stefan Lipp
 
PPTX
The 6th Wave of Automation: Automation of Decisions | Cloudera Analytics & Ma...
Cloudera, Inc.
 
PPTX
Turning Data into Business Value with a Modern Data Platform
Cloudera, Inc.
 
PDF
Cloudera 5.3 Update
Cloudera, Inc.
 
PPTX
Fighting cyber fraud with hadoop
Niel Dunnage
 
PPTX
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB
 
PPTX
Intel and Cloudera: Accelerating Enterprise Big Data Success
Cloudera, Inc.
 
PPTX
Big data journey to the cloud 5.30.18 asher bartch
Cloudera, Inc.
 
PPTX
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Cloudera, Inc.
 
PPTX
Hadoop Summit 2012 | Integrating Hadoop Into the Enterprise
Cloudera, Inc.
 
PPTX
Integrating Hadoop Into the Enterprise
DataWorks Summit
 
PPTX
Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Cloudera, Inc.
 
PPTX
Real Time Data Processing Using Spark Streaming
Hari Shreedharan
 
PPTX
Large-Scale Data Science on Hadoop (Intel Big Data Day)
Uri Laserson
 
PPTX
High-Performance Analytics in the Cloud with Apache Impala
Cloudera, Inc.
 
PDF
Machine Learning and Hadoop: Present and Future
Data Science London
 
Spark One Platform Webinar
Cloudera, Inc.
 
Apache Spark: Usage and Roadmap in Hadoop
Cloudera Japan
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Cloudera, Inc.
 
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Stefan Lipp
 
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Stefan Lipp
 
The 6th Wave of Automation: Automation of Decisions | Cloudera Analytics & Ma...
Cloudera, Inc.
 
Turning Data into Business Value with a Modern Data Platform
Cloudera, Inc.
 
Cloudera 5.3 Update
Cloudera, Inc.
 
Fighting cyber fraud with hadoop
Niel Dunnage
 
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB
 
Intel and Cloudera: Accelerating Enterprise Big Data Success
Cloudera, Inc.
 
Big data journey to the cloud 5.30.18 asher bartch
Cloudera, Inc.
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Cloudera, Inc.
 
Hadoop Summit 2012 | Integrating Hadoop Into the Enterprise
Cloudera, Inc.
 
Integrating Hadoop Into the Enterprise
DataWorks Summit
 
Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Cloudera, Inc.
 
Real Time Data Processing Using Spark Streaming
Hari Shreedharan
 
Large-Scale Data Science on Hadoop (Intel Big Data Day)
Uri Laserson
 
High-Performance Analytics in the Cloud with Apache Impala
Cloudera, Inc.
 
Machine Learning and Hadoop: Present and Future
Data Science London
 
Ad

More from Spark Summit (20)

PDF
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
Spark Summit
 
PDF
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Spark Summit
 
PDF
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
 
PDF
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
 
PDF
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
Spark Summit
 
PDF
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Spark Summit
 
PDF
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
PDF
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
PDF
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
 
PDF
Next CERN Accelerator Logging Service with Jakub Wozniak
Spark Summit
 
PDF
Powering a Startup with Apache Spark with Kevin Kim
Spark Summit
 
PDF
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
 
PDF
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Spark Summit
 
PDF
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spark Summit
 
PDF
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spark Summit
 
PDF
Goal Based Data Production with Sim Simeonov
Spark Summit
 
PDF
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Spark Summit
 
PDF
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Spark Summit
 
PDF
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
PDF
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
 
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
Spark Summit
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Spark Summit
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
Spark Summit
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Spark Summit
 
Powering a Startup with Apache Spark with Kevin Kim
Spark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Spark Summit
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spark Summit
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spark Summit
 
Goal Based Data Production with Sim Simeonov
Spark Summit
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Spark Summit
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Spark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
 

Recently uploaded (20)

PDF
apidays Helsinki & North 2025 - APIs in the healthcare sector: hospitals inte...
apidays
 
PPTX
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
PDF
Data Chunking Strategies for RAG in 2025.pdf
Tamanna
 
PPTX
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
PPTX
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
PDF
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
PPTX
recruitment Presentation.pptxhdhshhshshhehh
devraj40467
 
PDF
List of all the AI prompt cheat codes.pdf
Avijit Kumar Roy
 
PPTX
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
PDF
Context Engineering for AI Agents, approaches, memories.pdf
Tamanna
 
PDF
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
PDF
Early_Diabetes_Detection_using_Machine_L.pdf
maria879693
 
PPTX
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
PPT
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
PDF
WEF_Future_of_Global_Fintech_Second_Edition_2025.pdf
AproximacionAlFuturo
 
PDF
How to Connect Your On-Premises Site to AWS Using Site-to-Site VPN.pdf
Tamanna
 
PPTX
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
PPTX
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
PDF
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
PPTX
GenAI-Introduction-to-Copilot-for-Bing-March-2025-FOR-HUB.pptx
cleydsonborges1
 
apidays Helsinki & North 2025 - APIs in the healthcare sector: hospitals inte...
apidays
 
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
Data Chunking Strategies for RAG in 2025.pdf
Tamanna
 
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
recruitment Presentation.pptxhdhshhshshhehh
devraj40467
 
List of all the AI prompt cheat codes.pdf
Avijit Kumar Roy
 
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
Context Engineering for AI Agents, approaches, memories.pdf
Tamanna
 
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
Early_Diabetes_Detection_using_Machine_L.pdf
maria879693
 
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
WEF_Future_of_Global_Fintech_Second_Edition_2025.pdf
AproximacionAlFuturo
 
How to Connect Your On-Premises Site to AWS Using Site-to-Site VPN.pdf
Tamanna
 
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
GenAI-Introduction-to-Copilot-for-Bing-March-2025-FOR-HUB.pptx
cleydsonborges1
 

Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)

  • 1. 1  ©  Cloudera,  Inc.  All  rights  reserved.   Mike  Olson  |  co-­‐founder  and  chief  strategy  officer   Spark  in  the  Hadoop  Ecosystem  
  • 2. 2  ©  Cloudera,  Inc.  All  rights  reserved.   Hadoop:  From  MapReduce  to  an  Enterprise  Data  Hub   Hadoop  delivers:   •  One  place  for  unlimited  data   •  Unified,  mulM-­‐framework  data  access     Enterprises  require:   •  Leading  Performance   •  Open  Source,  Open  Standards   •  Enterprise  Security   •  Data  Governance   •  Complete  Management   Security  and  AdministraMon   Unlimited  Storage   Process   Discover   Model   Serve   Deployment   Flexibility   On-­‐Premises   Appliances   Engineered  Systems   Public  Cloud   Private  Cloud   Hybrid  Cloud   A  modern  data  plaSorm  plus  what  the  enterprise  requires.  
  • 3. 3  ©  Cloudera,  Inc.  All  rights  reserved.   Where  Spark  Fits  in  the  Hadoop  Ecosystem   YARN: Shared resource management HDFS and HBase: Shared storage Impala Hive Pig MapReduce2 Search Spark Spark Streaming Hive (beta) Pig (beta) …   With  common     •  Security   •  Data  governance   •  ConfiguraMon,   deployment  and   operaMons     across  all   components  in  the   stack  
  • 4. 4  ©  Cloudera,  Inc.  All  rights  reserved.   Process  millions  of   equity  and  bond     market  posiMons,  and   evaluate  against   future  scenarios  in   minutes,  versus  days   with  MapReduce.   Major  Global   Financial  InsMtuMon  
  • 5. 5  ©  Cloudera,  Inc.  All  rights  reserved.   Monitor  on-­‐line  user   acMvity  and  opMmize   content  delivery  and   search  results  in  real   Mme.   Large  Consumer   Company  
  • 6. 6  ©  Cloudera,  Inc.  All  rights  reserved.   Ingest  and  analyze   complex  data  from  a   variety  of  sources   conMnually,  building   new  risk  and  value   models  in  real  Mme  
  • 7. 7  ©  Cloudera,  Inc.  All  rights  reserved.   Combine  genomic  and   phenotype  data  with   other  data  sources  to   understand  disease   onset  and  progression  
  • 8. 8  ©  Cloudera,  Inc.  All  rights  reserved.   Spark  extends  the   Hadoop  ecosystem   with  new  analyMc   and  processing   capabiliMes.   8  ©  Cloudera,  Inc.  All  rights  reserved.  
  • 9. 9  ©  Cloudera,  Inc.  All  rights  reserved.   Thank  you!   Mike  Olson,  chief  strategy  officer   [email protected]   @mikeolson