SlideShare a Scribd company logo
1©	Cloudera,	Inc.	All	rights	reserved.
From	MapReduce	to	Apache	Spark:	
An	Ecosystem	Evolves
Doug	Cutting	(@cutting)
Chief	Architect	/	Apache	Hadoop	Co-founder
2©	Cloudera,	Inc.	All	rights	reserved.
The	Merging	Ecosystems
3©	Cloudera,	Inc.	All	rights	reserved.
Genomics
(Broad	Institute)
Physics
(CERN)
Healthcare	 Delivery
(CHOA)
The	New	Use	Cases	are	Amazing
4©	Cloudera,	Inc.	All	rights	reserved.
Bring	Your	Questions	About	Livy	to	the	Cloudera	Booth
Livy.io
• For	submitting	Spark	jobs	from	any	web/mobile	app,	with	no	
Spark	client	install — enabling	new	architectures	and	use	
cases
• Provides	multi-tenancy	and	fault	tolerance	to	support
multiple	users	reliably
• Works	in	standalone	mode,	with	YARN,	or	with	Mesos
• No	code	changes	needed
• Apache	License.	Initial	contributors	include	employees	of	
Cloudera,	Intel, and	Microsoft	+	more	wanted!
New	Contribution:	Livy,	an	Open	Source	REST	Service	for	Spark	(Alpha)
5©	Cloudera,	Inc.	All	rights	reserved.
Thank	you
@cutting
cloudera.com/spark

More Related Content

What's hot (20)

PDF
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Jen Aman
 
PDF
Connect Code to Resource Consumption to Scale Your Production Spark Applicati...
Databricks
 
PDF
Spark Summit EU talk by Oscar Castaneda
Spark Summit
 
PDF
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
Databricks
 
PDF
How to Share State Across Multiple Apache Spark Jobs using Apache Ignite with...
Spark Summit
 
PDF
Spark Summit EU talk by Tim Hunter
Spark Summit
 
PPTX
Spark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit
 
PDF
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit
 
PDF
Spark Summit EU talk by John Musser
Spark Summit
 
PDF
High Performance Python on Apache Spark
Wes McKinney
 
PDF
Building a Business Logic Translation Engine with Spark Streaming for Communi...
Spark Summit
 
PDF
Spark Summit EU talk by Kent Buenaventura and Willaim Lau
Spark Summit
 
PPTX
Template Languages for OpenStack - Heat and TOSCA
Cloud Native Day Tel Aviv
 
PDF
Spark Summit EU talk by Heiko Korndorf
Spark Summit
 
PDF
VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and D...
Spark Summit
 
PDF
Apache Spark-Bench: Simulate, Test, Compare, Exercise, and Yes, Benchmark wit...
Spark Summit
 
PDF
Scalable Scientific Computing with Dask
Uwe Korn
 
PDF
Spark Summit EU talk by Simon Whitear
Spark Summit
 
PDF
Spark Summit EU talk by Bas Geerdink
Spark Summit
 
PDF
Spark Summit EU talk by Emlyn Whittick
Spark Summit
 
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Jen Aman
 
Connect Code to Resource Consumption to Scale Your Production Spark Applicati...
Databricks
 
Spark Summit EU talk by Oscar Castaneda
Spark Summit
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
Databricks
 
How to Share State Across Multiple Apache Spark Jobs using Apache Ignite with...
Spark Summit
 
Spark Summit EU talk by Tim Hunter
Spark Summit
 
Spark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit
 
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit
 
Spark Summit EU talk by John Musser
Spark Summit
 
High Performance Python on Apache Spark
Wes McKinney
 
Building a Business Logic Translation Engine with Spark Streaming for Communi...
Spark Summit
 
Spark Summit EU talk by Kent Buenaventura and Willaim Lau
Spark Summit
 
Template Languages for OpenStack - Heat and TOSCA
Cloud Native Day Tel Aviv
 
Spark Summit EU talk by Heiko Korndorf
Spark Summit
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and D...
Spark Summit
 
Apache Spark-Bench: Simulate, Test, Compare, Exercise, and Yes, Benchmark wit...
Spark Summit
 
Scalable Scientific Computing with Dask
Uwe Korn
 
Spark Summit EU talk by Simon Whitear
Spark Summit
 
Spark Summit EU talk by Bas Geerdink
Spark Summit
 
Spark Summit EU talk by Emlyn Whittick
Spark Summit
 

Viewers also liked (20)

PDF
Large Scale Deep Learning with TensorFlow
Jen Aman
 
PDF
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
Databricks
 
PDF
Big Data in Production: Lessons from Running in the Cloud
Jen Aman
 
PDF
Spark Uber Development Kit
Jen Aman
 
PDF
Spark And Cassandra: 2 Fast, 2 Furious
Jen Aman
 
PDF
Analyzing Log Data With Apache Spark
Spark Summit
 
PDF
Elasticsearch And Apache Lucene For Apache Spark And MLlib
Jen Aman
 
PDF
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Databricks
 
PDF
Pedal to the Metal: Accelerating Spark with Silicon Innovation
Jen Aman
 
PDF
Temporal Operators For Spark Streaming And Its Application For Office365 Serv...
Jen Aman
 
PDF
Morticia: Visualizing And Debugging Complex Spark Workflows
Spark Summit
 
PPTX
Disrupting Big Data with Apache Spark in the Cloud
Jen Aman
 
PDF
Heterogeneous Workflows With Spark At Netflix
Jen Aman
 
PDF
A Spark Framework For < $100, < 1 Hour, Accurate Personalized DNA Analy...
Spark Summit
 
PDF
Utilizing Human Data Validation For KPI Analysis And Machine Learning
Jen Aman
 
PDF
Solr As A SparkSQL DataSource
Spark Summit
 
PDF
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Spark Summit
 
PDF
High-Performance Python On Spark
Jen Aman
 
PDF
A Graph-Based Method For Cross-Entity Threat Detection
Jen Aman
 
PDF
Solving The N+1 Problem In Personalized Genomics
Spark Summit
 
Large Scale Deep Learning with TensorFlow
Jen Aman
 
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
Databricks
 
Big Data in Production: Lessons from Running in the Cloud
Jen Aman
 
Spark Uber Development Kit
Jen Aman
 
Spark And Cassandra: 2 Fast, 2 Furious
Jen Aman
 
Analyzing Log Data With Apache Spark
Spark Summit
 
Elasticsearch And Apache Lucene For Apache Spark And MLlib
Jen Aman
 
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Databricks
 
Pedal to the Metal: Accelerating Spark with Silicon Innovation
Jen Aman
 
Temporal Operators For Spark Streaming And Its Application For Office365 Serv...
Jen Aman
 
Morticia: Visualizing And Debugging Complex Spark Workflows
Spark Summit
 
Disrupting Big Data with Apache Spark in the Cloud
Jen Aman
 
Heterogeneous Workflows With Spark At Netflix
Jen Aman
 
A Spark Framework For < $100, < 1 Hour, Accurate Personalized DNA Analy...
Spark Summit
 
Utilizing Human Data Validation For KPI Analysis And Machine Learning
Jen Aman
 
Solr As A SparkSQL DataSource
Spark Summit
 
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Spark Summit
 
High-Performance Python On Spark
Jen Aman
 
A Graph-Based Method For Cross-Entity Threat Detection
Jen Aman
 
Solving The N+1 Problem In Personalized Genomics
Spark Summit
 
Ad

Similar to From MapReduce to Apache Spark (20)

PPTX
Livy: A REST Web Service for Spark
Ashish kumar
 
PDF
E2E Data Pipeline - Apache Spark/Airflow/Livy
Rikin Tanna
 
PPTX
Spark One Platform Webinar
Cloudera, Inc.
 
PPTX
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Cloudera, Inc.
 
PPTX
Data Engineer’s Lunch #45: Apache Livy
Anant Corporation
 
PPTX
Apache Spark: Usage and Roadmap in Hadoop
Cloudera Japan
 
PPTX
Spark on Azure HDInsight - spark meetup seattle
Judy Nash
 
PPTX
Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Cloudera, Inc.
 
PPTX
Real Time Data Processing Using Spark Streaming
Hari Shreedharan
 
PPTX
Building Efficient Pipelines in Apache Spark
Jeremy Beard
 
PPTX
Intro to Apache Spark
Cloudera, Inc.
 
PPTX
Effective Spark on Multi-Tenant Clusters
DataWorks Summit/Hadoop Summit
 
PPTX
Overview of Apache Spark and PySpark.pptx
Accentfuture
 
PDF
Hive on spark berlin buzzwords
Szehon Ho
 
PDF
Apache Spark PDF
Naresh Rupareliya
 
PPTX
Empower Hive with Spark
DataWorks Summit
 
PPTX
Real Time Data Processing Using Spark Streaming
Hari Shreedharan
 
PPTX
Spark introduction and architecture
Sohil Jain
 
PPTX
Spark introduction and architecture
Sohil Jain
 
PPTX
Analyzing Hadoop Data Using Sparklyr

Cloudera, Inc.
 
Livy: A REST Web Service for Spark
Ashish kumar
 
E2E Data Pipeline - Apache Spark/Airflow/Livy
Rikin Tanna
 
Spark One Platform Webinar
Cloudera, Inc.
 
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Cloudera, Inc.
 
Data Engineer’s Lunch #45: Apache Livy
Anant Corporation
 
Apache Spark: Usage and Roadmap in Hadoop
Cloudera Japan
 
Spark on Azure HDInsight - spark meetup seattle
Judy Nash
 
Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Cloudera, Inc.
 
Real Time Data Processing Using Spark Streaming
Hari Shreedharan
 
Building Efficient Pipelines in Apache Spark
Jeremy Beard
 
Intro to Apache Spark
Cloudera, Inc.
 
Effective Spark on Multi-Tenant Clusters
DataWorks Summit/Hadoop Summit
 
Overview of Apache Spark and PySpark.pptx
Accentfuture
 
Hive on spark berlin buzzwords
Szehon Ho
 
Apache Spark PDF
Naresh Rupareliya
 
Empower Hive with Spark
DataWorks Summit
 
Real Time Data Processing Using Spark Streaming
Hari Shreedharan
 
Spark introduction and architecture
Sohil Jain
 
Spark introduction and architecture
Sohil Jain
 
Analyzing Hadoop Data Using Sparklyr

Cloudera, Inc.
 
Ad

More from Jen Aman (20)

PDF
Snorkel: Dark Data and Machine Learning with Christopher RĂ©
Jen Aman
 
PDF
Deep Learning on Apache® Spark™: Workflows and Best Practices
Jen Aman
 
PDF
RISELab:Enabling Intelligent Real-Time Decisions
Jen Aman
 
PDF
Spatial Analysis On Histological Images Using Spark
Jen Aman
 
PDF
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Jen Aman
 
PDF
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Jen Aman
 
PDF
Time-Evolving Graph Processing On Commodity Clusters
Jen Aman
 
PDF
Deploying Accelerators At Datacenter Scale Using Spark
Jen Aman
 
PDF
Re-Architecting Spark For Performance Understandability
Jen Aman
 
PDF
Re-Architecting Spark For Performance Understandability
Jen Aman
 
PDF
Low Latency Execution For Apache Spark
Jen Aman
 
PDF
Efficient State Management With Spark 2.0 And Scale-Out Databases
Jen Aman
 
PDF
GPU Computing With Apache Spark And Python
Jen Aman
 
PDF
Building Custom Machine Learning Algorithms With Apache SystemML
Jen Aman
 
PDF
Spark at Bloomberg: Dynamically Composable Analytics
Jen Aman
 
PDF
EclairJS = Node.Js + Apache Spark
Jen Aman
 
PDF
Spark: Interactive To Production
Jen Aman
 
PDF
Scalable Deep Learning Platform On Spark In Baidu
Jen Aman
 
PDF
Scaling Machine Learning To Billions Of Parameters
Jen Aman
 
PDF
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Jen Aman
 
Snorkel: Dark Data and Machine Learning with Christopher RĂ©
Jen Aman
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Jen Aman
 
RISELab:Enabling Intelligent Real-Time Decisions
Jen Aman
 
Spatial Analysis On Histological Images Using Spark
Jen Aman
 
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Jen Aman
 
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Jen Aman
 
Time-Evolving Graph Processing On Commodity Clusters
Jen Aman
 
Deploying Accelerators At Datacenter Scale Using Spark
Jen Aman
 
Re-Architecting Spark For Performance Understandability
Jen Aman
 
Re-Architecting Spark For Performance Understandability
Jen Aman
 
Low Latency Execution For Apache Spark
Jen Aman
 
Efficient State Management With Spark 2.0 And Scale-Out Databases
Jen Aman
 
GPU Computing With Apache Spark And Python
Jen Aman
 
Building Custom Machine Learning Algorithms With Apache SystemML
Jen Aman
 
Spark at Bloomberg: Dynamically Composable Analytics
Jen Aman
 
EclairJS = Node.Js + Apache Spark
Jen Aman
 
Spark: Interactive To Production
Jen Aman
 
Scalable Deep Learning Platform On Spark In Baidu
Jen Aman
 
Scaling Machine Learning To Billions Of Parameters
Jen Aman
 
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Jen Aman
 

Recently uploaded (20)

PDF
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
PPTX
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
PPTX
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
PDF
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
PDF
Using AI/ML for Space Biology Research
VICTOR MAESTRE RAMIREZ
 
PPTX
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
PPTX
apidays Helsinki & North 2025 - Vero APIs - Experiences of API development in...
apidays
 
PDF
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
PPTX
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
PPTX
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
PDF
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
PDF
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
PDF
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
PDF
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
PPTX
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
PPTX
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
PDF
Data Retrieval and Preparation Business Analytics.pdf
kayserrakib80
 
PDF
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
PDF
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
PPTX
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
Using AI/ML for Space Biology Research
VICTOR MAESTRE RAMIREZ
 
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
apidays Helsinki & North 2025 - Vero APIs - Experiences of API development in...
apidays
 
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
Data Retrieval and Preparation Business Analytics.pdf
kayserrakib80
 
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 

From MapReduce to Apache Spark