SlideShare a Scribd company logo
DYNAMIC RESOURCE
ALLOCATION,
DO MORE WITH
YOUR CLUSTER
Luc Bourlier
Lightbend
● Dynamic Resource Allocation
● ^^ in Spark
● External Shuffle Service
● Configuration
● Demo
● Spark Streaming
Dynamic Resource Allocation
Dynamic Resource Allocation
Cluster
I’d like some
resources for a job
Dynamic Resource Allocation
Cluster
Oki. Thanks
Dynamic Resource Allocation
Cluster
Dynamic Resource Allocation
Cluster
Dynamic Resource Allocation
Cluster
Dynamic Resource Allocation
Cluster
Hmm, actually, I don’t need
all this power anymore.
Dynamic Resource Allocation
Cluster
Why?
● Shared cluster
● Optimization of resource
usage
When?
● variable load job
Dynamic Resource Allocation
Cluster
Spark Cluster Architecture
Spark Cluster Architecture
Spark Dynamic Allocation
Cluster Manager Worker Node
Executor
Worker Node
Worker Node
Executor
Driver
Scheduler(s)
need 2 executors
tasks are waiting
too long
Spark Dynamic Allocation
Cluster Manager Worker Node
Executor
Worker Node
Worker Node
Executor
Driver
Scheduler(s)
need 1 more executor
Executor
executor has been
idle for a while
Spark Dynamic Allocation
Cluster Manager Worker Node
Executor
Worker Node
Worker Node
Executor
Driver
Scheduler(s)
Executor
terminate the executor
External Shuffle Service
External Shuffle Service
● Did we lose any data?
External Shuffle Service
Shuffle write
Shuffle fetch
Map Task
Reduce Task
Aggregator
bucket bucket bucket
Aggregator
Map Task
Reduce Task
Aggregator
bucket bucket bucket
Aggregator
Map Task
Reduce Task
Aggregator
bucket bucket bucket
Aggregator
External Shuffle Service
Shuffle write
Shuffle fetch
Map Task
Reduce Task
Aggregator
bucket bucket bucket
Aggregator
Map Task
Reduce Task
Aggregator
bucket bucket bucket
Aggregator
Map Task
Reduce Task
Aggregator
bucket bucket bucket
Aggregator
External Shuffle Service
Shuffle write
Shuffle fetch
Map Task
Reduce Task
Aggregator
bucket bucket bucket
Aggregator
Map Task
Reduce Task
Aggregator
bucket bucket bucket
Aggregator
Map Task
Reduce Task
Aggregator
bucket bucket bucket
Aggregator
External Shuffle Service
● Extracted from executor
● Manage the local aggregated data for the
shuffle operations
● Maintain the data until the application is done.
Configuration
Configuration
● Dynamic Allocation
○ spark.dynamicAllocation.enabled
○ spark.dynamicAllocation.initialExecutors
○ spark.dynamicAllocation.maxExecutors
○ spark.dynamicAllocation.minExecutors
Configuration
● Dynamic Allocation
○ spark.dynamicAllocation.schedulerBacklogTimeout
○ spark.dynamicAllocation.executorIdleTimeout
○ spark.dynamicAllocation.sustainedSchedulerBacklogTimeout
Configuration
● External Shuffle Service
○ spark.shuffle.service.enabled
○ spark.shuffle.service.port
Demo
Dynamic Allocation in Action
Configuration values?
Configuration Values?
It depends ….
No, seriously
Configuration Values
● spark.dynamicAllocation.initialExecutors
● spark.dynamicAllocation.maxExecutors
● spark.dynamicAllocation.minExecutors
Depends on workload and how many resources are
potentially available to you.
Configuration Values
● spark.dynamicAllocation.schedulerBacklogTimeout
Too short, might trigger for short burst of tasks.
Too long, might be less effective.
● spark.dynamicAllocation.sustainedSchedulerBacklogTimeout
Executor start duration.
Default set to schedulerBacklogTimeout.
Configuration Values
● spark.dynamicAllocation.executorIdleTimeout
Relative to the duration of the longer task.
No big drawback on being too long, except cost.
Spark Streaming
Spark Streaming
http://spark.apache.org/docs/latest/streaming-programming-guide.htm
l
• In most case, schedulerBacklogTimeout longer
than batch interval.
• executorIdleTimeout a portion of batch interval.
• Should allow to manage processing delay.
• Not compatible with the dynamic rate estimator.
Spark Streaming
More Dynamic?
More Dynamic?
https://github.com/twosigma/Cook
‘Fair’ job scheduler for Spark on top of Mesos
● Not a recommendation, just a suggestion.
● Some assembly required.
THANK YOU.
github.com/skyluc/tree/master/talks/sparksummit-eu-2016
External Shuffle Service
Cluster Manager Worker Node
Executor
Worker Node
Worker Node
Driver
Scheduler(s)
Executor
External
Shuffle
Service
External
Shuffle
Service
External
Shuffle
Service

More Related Content

PDF
はじめてのOracle Cloud Infrastructure (Oracle Cloudウェビナーシリーズ: 2021年9月22日)
オラクルエンジニア通信
 
PDF
AWS vs Azure vs Google Cloud Storage Deep Dive
RightScale
 
PDF
Multi cloud data integration with data virtualization
Denodo
 
PDF
Dynamic Allocation in Spark
Databricks
 
PDF
Observability for Data Pipelines With OpenLineage
Databricks
 
PPTX
Cassandra Troubleshooting 3.0
J.B. Langston
 
PDF
Got data?… now what? An introduction to modern data platforms
JamesAnderson599331
 
PDF
Part 3 - Modern Data Warehouse with Azure Synapse
Nilesh Gule
 
はじめてのOracle Cloud Infrastructure (Oracle Cloudウェビナーシリーズ: 2021年9月22日)
オラクルエンジニア通信
 
AWS vs Azure vs Google Cloud Storage Deep Dive
RightScale
 
Multi cloud data integration with data virtualization
Denodo
 
Dynamic Allocation in Spark
Databricks
 
Observability for Data Pipelines With OpenLineage
Databricks
 
Cassandra Troubleshooting 3.0
J.B. Langston
 
Got data?… now what? An introduction to modern data platforms
JamesAnderson599331
 
Part 3 - Modern Data Warehouse with Azure Synapse
Nilesh Gule
 

What's hot (20)

PPTX
Core Concepts in azure data factory
BRIJESH KUMAR
 
PDF
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Cathrine Wilhelmsen
 
PDF
Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB
Denodo
 
PPTX
Big data, Big decision
Venkatesh Balakumar
 
PPTX
Azure Data Factory Data Flow Performance Tuning 101
Mark Kromer
 
PPTX
Azure Data Factory Data Flow
Mark Kromer
 
PPTX
Modern data warehouse presentation
David Rice
 
PDF
Building Applications with a Graph Database
Tobias Lindaaker
 
PDF
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
Databricks
 
PDF
Modern Data Challenges require Modern Graph Technology
Neo4j
 
PDF
Exploring BigData with Google BigQuery
Dharmesh Vaya
 
PDF
Measuring Data Quality Return on Investment
DATAVERSITY
 
PDF
Data Modeling & Data Integration
DATAVERSITY
 
PPTX
Big data ppt
AKASH SIHAG
 
PPSX
The Web of data and web data commons
Jesse Wang
 
PDF
Data Mesh at CMC Markets: Past, Present and Future
Lorenzo Nicora
 
PPTX
Free Training: How to Build a Lakehouse
Databricks
 
PPTX
Trucks on a Graph: How JB Hunt Uses Neo4j
Neo4j
 
PDF
Machine learning and big data @ uber a tale of two systems
Zhenxiao Luo
 
Core Concepts in azure data factory
BRIJESH KUMAR
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Cathrine Wilhelmsen
 
Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB
Denodo
 
Big data, Big decision
Venkatesh Balakumar
 
Azure Data Factory Data Flow Performance Tuning 101
Mark Kromer
 
Azure Data Factory Data Flow
Mark Kromer
 
Modern data warehouse presentation
David Rice
 
Building Applications with a Graph Database
Tobias Lindaaker
 
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
Databricks
 
Modern Data Challenges require Modern Graph Technology
Neo4j
 
Exploring BigData with Google BigQuery
Dharmesh Vaya
 
Measuring Data Quality Return on Investment
DATAVERSITY
 
Data Modeling & Data Integration
DATAVERSITY
 
Big data ppt
AKASH SIHAG
 
The Web of data and web data commons
Jesse Wang
 
Data Mesh at CMC Markets: Past, Present and Future
Lorenzo Nicora
 
Free Training: How to Build a Lakehouse
Databricks
 
Trucks on a Graph: How JB Hunt Uses Neo4j
Neo4j
 
Machine learning and big data @ uber a tale of two systems
Zhenxiao Luo
 
Ad

Viewers also liked (20)

PPT
Design and Development of a Resource Allocation Mechanism for the School Educ...
Gihan Wikramanayake
 
PDF
Spark Summit EU talk by Patrick Baier and Stanimir Dragiev
Spark Summit
 
PDF
Spark Summit EU talk by Oscar Castaneda
Spark Summit
 
PDF
Spark Summit EU talk by Javier Aguedes
Spark Summit
 
PDF
Workplace Practices & Resource Allocation
grawitch
 
PPT
Dynamic resource allocation
jarobertson2
 
PDF
Spark Summit EU talk by Josef Habdank
Spark Summit
 
PPTX
The Spark (R)evolution in The Netherlands
Spark Summit
 
PDF
Spark Summit EU talk by Heiko Korndorf
Spark Summit
 
PDF
Spark Summit EU talk by Jorg Schad
Spark Summit
 
PDF
SDPM - Lecture 4 - Activity planning and resource allocation
OpenLearningLab
 
PDF
Spark Summit EU talk by Erwin Datema and Roeland van Ham
Spark Summit
 
PPTX
Democratizing AI with Apache Spark
Spark Summit
 
PDF
Spark Summit EU talk by Sudeep Das and Aish Faenton
Spark Summit
 
PDF
Dynamic Resource Allocation Spark on YARN
Tsuyoshi OZAWA
 
PDF
Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)
Spark Summit
 
PDF
Spark Summit EU talk by Reza Karimi
Spark Summit
 
PDF
Spark Summit EU talk by Dean Wampler
Spark Summit
 
PDF
Spark Summit EU talk by Sital Kedia
Spark Summit
 
PDF
Spark Summit EU talk by Ruben Pulido and Behar Veliqi
Spark Summit
 
Design and Development of a Resource Allocation Mechanism for the School Educ...
Gihan Wikramanayake
 
Spark Summit EU talk by Patrick Baier and Stanimir Dragiev
Spark Summit
 
Spark Summit EU talk by Oscar Castaneda
Spark Summit
 
Spark Summit EU talk by Javier Aguedes
Spark Summit
 
Workplace Practices & Resource Allocation
grawitch
 
Dynamic resource allocation
jarobertson2
 
Spark Summit EU talk by Josef Habdank
Spark Summit
 
The Spark (R)evolution in The Netherlands
Spark Summit
 
Spark Summit EU talk by Heiko Korndorf
Spark Summit
 
Spark Summit EU talk by Jorg Schad
Spark Summit
 
SDPM - Lecture 4 - Activity planning and resource allocation
OpenLearningLab
 
Spark Summit EU talk by Erwin Datema and Roeland van Ham
Spark Summit
 
Democratizing AI with Apache Spark
Spark Summit
 
Spark Summit EU talk by Sudeep Das and Aish Faenton
Spark Summit
 
Dynamic Resource Allocation Spark on YARN
Tsuyoshi OZAWA
 
Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)
Spark Summit
 
Spark Summit EU talk by Reza Karimi
Spark Summit
 
Spark Summit EU talk by Dean Wampler
Spark Summit
 
Spark Summit EU talk by Sital Kedia
Spark Summit
 
Spark Summit EU talk by Ruben Pulido and Behar Veliqi
Spark Summit
 
Ad

Similar to Spark Summit EU talk by Luc Bourlier (20)

PDF
Manage Pulsar Cluster Lifecycles with Kubernetes Operators - Pulsar Summit NA...
StreamNative
 
PDF
Dynamic Class-Based Spark Workload Scheduling and Resource Using YARN with L...
Databricks
 
PDF
Extending Spark Streaming to Support Complex Event Processing
Oh Chan Kwon
 
PDF
Spark cep
Byungjin Kim
 
PDF
Scalable complex event processing on samza @UBER
Shuyi Chen
 
PDF
The benefits of running Spark on your own Docker
Itai Yaffe
 
PDF
Run Apache Spark on Kubernetes in Large Scale_ Challenges and Solutions-2.pdf
Anya Bida
 
PDF
NetflixOSS Open House Lightning talks
Ruslan Meshenberg
 
PDF
Serverless Event Streaming with Pulsar Functions
StreamNative
 
PPTX
Kafka Practices @ Uber - Seattle Apache Kafka meetup
Mingmin Chen
 
PDF
Set Up & Operate Open Source Oracle Replication
Continuent
 
PPTX
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
Shivji Kumar Jha
 
PDF
Reactive mistakes - ScalaDays Chicago 2017
Petr Zapletal
 
PDF
Understanding Kubernetes Scheduling - CNTUG 2024-10
vyhaxkgv4
 
PPTX
Optimizing spark based data pipelines - are you up for it?
Etti Gur
 
PPTX
Introduction to Serverless and Google Cloud Functions
Malepati Bala Siva Sai Akhil
 
PDF
Swarm migration
Janakiram MSV
 
PDF
Create a One Click Migration (OCM) process to Automate Repeatable Infrastruct...
Quantyca - Data at Core
 
PPTX
Spark on Yarn
Qubole
 
PDF
Load testing in Zonky with Gatling
Petr Vlček
 
Manage Pulsar Cluster Lifecycles with Kubernetes Operators - Pulsar Summit NA...
StreamNative
 
Dynamic Class-Based Spark Workload Scheduling and Resource Using YARN with L...
Databricks
 
Extending Spark Streaming to Support Complex Event Processing
Oh Chan Kwon
 
Spark cep
Byungjin Kim
 
Scalable complex event processing on samza @UBER
Shuyi Chen
 
The benefits of running Spark on your own Docker
Itai Yaffe
 
Run Apache Spark on Kubernetes in Large Scale_ Challenges and Solutions-2.pdf
Anya Bida
 
NetflixOSS Open House Lightning talks
Ruslan Meshenberg
 
Serverless Event Streaming with Pulsar Functions
StreamNative
 
Kafka Practices @ Uber - Seattle Apache Kafka meetup
Mingmin Chen
 
Set Up & Operate Open Source Oracle Replication
Continuent
 
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
Shivji Kumar Jha
 
Reactive mistakes - ScalaDays Chicago 2017
Petr Zapletal
 
Understanding Kubernetes Scheduling - CNTUG 2024-10
vyhaxkgv4
 
Optimizing spark based data pipelines - are you up for it?
Etti Gur
 
Introduction to Serverless and Google Cloud Functions
Malepati Bala Siva Sai Akhil
 
Swarm migration
Janakiram MSV
 
Create a One Click Migration (OCM) process to Automate Repeatable Infrastruct...
Quantyca - Data at Core
 
Spark on Yarn
Qubole
 
Load testing in Zonky with Gatling
Petr Vlček
 

More from Spark Summit (20)

PDF
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
Spark Summit
 
PDF
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Spark Summit
 
PDF
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
 
PDF
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
 
PDF
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
Spark Summit
 
PDF
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Spark Summit
 
PDF
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
PDF
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
PDF
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
 
PDF
Next CERN Accelerator Logging Service with Jakub Wozniak
Spark Summit
 
PDF
Powering a Startup with Apache Spark with Kevin Kim
Spark Summit
 
PDF
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
 
PDF
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Spark Summit
 
PDF
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spark Summit
 
PDF
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spark Summit
 
PDF
Goal Based Data Production with Sim Simeonov
Spark Summit
 
PDF
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Spark Summit
 
PDF
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Spark Summit
 
PDF
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
PDF
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
 
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
Spark Summit
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Spark Summit
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
Spark Summit
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Spark Summit
 
Powering a Startup with Apache Spark with Kevin Kim
Spark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Spark Summit
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spark Summit
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spark Summit
 
Goal Based Data Production with Sim Simeonov
Spark Summit
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Spark Summit
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Spark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
 

Recently uploaded (20)

PPTX
Presentation on animal welfare a good topic
kidscream385
 
PDF
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
PPTX
IP_Journal_Articles_2025IP_Journal_Articles_2025
mishell212144
 
PPTX
Introduction to computer chapter one 2017.pptx
mensunmarley
 
PPTX
Fuzzy_Membership_Functions_Presentation.pptx
pythoncrazy2024
 
PPTX
Multiscale Segmentation of Survey Respondents: Seeing the Trees and the Fores...
Sione Palu
 
PPTX
Introduction to Biostatistics Presentation.pptx
AtemJoshua
 
PDF
Blitz Campinas - Dia 24 de maio - Piettro.pdf
fabigreek
 
PPTX
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
PPTX
Introduction to Data Analytics and Data Science
KavithaCIT
 
PPTX
M1-T1.pptxM1-T1.pptxM1-T1.pptxM1-T1.pptx
teodoroferiarevanojr
 
PPTX
Pipeline Automatic Leak Detection for Water Distribution Systems
Sione Palu
 
PDF
Technical Writing Module-I Complete Notes.pdf
VedprakashArya13
 
PDF
Fundamentals and Techniques of Biophysics and Molecular Biology (Pranav Kumar...
RohitKumar868624
 
PPTX
INFO8116 - Week 10 - Slides.pptx data analutics
guddipatel10
 
PDF
SUMMER INTERNSHIP REPORT[1] (AutoRecovered) (6) (1).pdf
pandeydiksha814
 
PPTX
HSE WEEKLY REPORT for dummies and lazzzzy.pptx
ahmedibrahim691723
 
PPTX
Databricks-DE-Associate Certification Questions-june-2024.pptx
pedelli41
 
PPTX
Fluvial_Civilizations_Presentation (1).pptx
alisslovemendoza7
 
PPTX
Blue and Dark Blue Modern Technology Presentation.pptx
ap177979
 
Presentation on animal welfare a good topic
kidscream385
 
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
IP_Journal_Articles_2025IP_Journal_Articles_2025
mishell212144
 
Introduction to computer chapter one 2017.pptx
mensunmarley
 
Fuzzy_Membership_Functions_Presentation.pptx
pythoncrazy2024
 
Multiscale Segmentation of Survey Respondents: Seeing the Trees and the Fores...
Sione Palu
 
Introduction to Biostatistics Presentation.pptx
AtemJoshua
 
Blitz Campinas - Dia 24 de maio - Piettro.pdf
fabigreek
 
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
Introduction to Data Analytics and Data Science
KavithaCIT
 
M1-T1.pptxM1-T1.pptxM1-T1.pptxM1-T1.pptx
teodoroferiarevanojr
 
Pipeline Automatic Leak Detection for Water Distribution Systems
Sione Palu
 
Technical Writing Module-I Complete Notes.pdf
VedprakashArya13
 
Fundamentals and Techniques of Biophysics and Molecular Biology (Pranav Kumar...
RohitKumar868624
 
INFO8116 - Week 10 - Slides.pptx data analutics
guddipatel10
 
SUMMER INTERNSHIP REPORT[1] (AutoRecovered) (6) (1).pdf
pandeydiksha814
 
HSE WEEKLY REPORT for dummies and lazzzzy.pptx
ahmedibrahim691723
 
Databricks-DE-Associate Certification Questions-june-2024.pptx
pedelli41
 
Fluvial_Civilizations_Presentation (1).pptx
alisslovemendoza7
 
Blue and Dark Blue Modern Technology Presentation.pptx
ap177979
 

Spark Summit EU talk by Luc Bourlier