Spark Summit EU talk by Luc Bourlier

3 likes•1,748 views

This document discusses dynamic resource allocation in Spark clusters. It explains how Spark can add or remove executors from a cluster based on workload to optimize resource usage for jobs with variable loads. It also describes how the external shuffle service moves shuffle data management out of executors to improve performance and fault tolerance. The document provides details on configuring dynamic allocation and the external shuffle service and demonstrates dynamic allocation in action. It also discusses applying these techniques to Spark Streaming workloads.

Data & Analytics

DYNAMIC RESOURCE
ALLOCATION,
DO MORE WITH
YOUR CLUSTER
Luc Bourlier
Lightbend

● Dynamic Resource Allocation
● ^^ in Spark
● External Shuffle Service
● Configuration
● Demo
● Spark Streaming

Dynamic Resource Allocation
Cluster
I’d like some
resources for a job

Dynamic Resource Allocation
Cluster
Oki. Thanks

Dynamic Resource Allocation
Cluster
Hmm, actually, I don’t need
all this power anymore.

Why?
● Shared cluster
● Optimization of resource
usage
When?
● variable load job
Dynamic Resource Allocation
Cluster

Spark Dynamic Allocation
Cluster Manager Worker Node
Executor
Worker Node
Worker Node
Executor
Driver
Scheduler(s)
need 2 executors

tasks are waiting
too long
Spark Dynamic Allocation
Cluster Manager Worker Node
Executor
Worker Node
Worker Node
Executor
Driver
Scheduler(s)
need 1 more executor
Executor

executor has been
idle for a while
Spark Dynamic Allocation
Cluster Manager Worker Node
Executor
Worker Node
Worker Node
Executor
Driver
Scheduler(s)
Executor
terminate the executor

External Shuffle Service
● Did we lose any data?

External Shuffle Service
Shuffle write
Shuffle fetch
Map Task
Reduce Task
Aggregator
bucket bucket bucket
Aggregator
Map Task
Reduce Task
Aggregator
bucket bucket bucket
Aggregator
Map Task
Reduce Task
Aggregator
bucket bucket bucket
Aggregator

External Shuffle Service
● Extracted from executor
● Manage the local aggregated data for the
shuffle operations
● Maintain the data until the application is done.

Configuration
● Dynamic Allocation
○ spark.dynamicAllocation.enabled
○ spark.dynamicAllocation.initialExecutors
○ spark.dynamicAllocation.maxExecutors
○ spark.dynamicAllocation.minExecutors

Configuration
● Dynamic Allocation
○ spark.dynamicAllocation.schedulerBacklogTimeout
○ spark.dynamicAllocation.executorIdleTimeout
○ spark.dynamicAllocation.sustainedSchedulerBacklogTimeout

Configuration
● External Shuffle Service
○ spark.shuffle.service.enabled
○ spark.shuffle.service.port

Configuration Values?
It depends ….
No, seriously

Configuration Values
● spark.dynamicAllocation.initialExecutors
● spark.dynamicAllocation.maxExecutors
● spark.dynamicAllocation.minExecutors
Depends on workload and how many resources are
potentially available to you.

Configuration Values
● spark.dynamicAllocation.schedulerBacklogTimeout
Too short, might trigger for short burst of tasks.
Too long, might be less effective.
● spark.dynamicAllocation.sustainedSchedulerBacklogTimeout
Executor start duration.
Default set to schedulerBacklogTimeout.

Configuration Values
● spark.dynamicAllocation.executorIdleTimeout
Relative to the duration of the longer task.
No big drawback on being too long, except cost.

Spark Streaming
http://spark.apache.org/docs/latest/streaming-programming-guide.htm
l

• In most case, schedulerBacklogTimeout longer
than batch interval.
• executorIdleTimeout a portion of batch interval.
• Should allow to manage processing delay.
• Not compatible with the dynamic rate estimator.
Spark Streaming

More Dynamic?
https://github.com/twosigma/Cook
‘Fair’ job scheduler for Spark on top of Mesos
● Not a recommendation, just a suggestion.
● Some assembly required.

THANK YOU.
github.com/skyluc/tree/master/talks/sparksummit-eu-2016

External Shuffle Service
Cluster Manager Worker Node
Executor
Worker Node
Worker Node
Driver
Scheduler(s)
Executor
External
Shuffle
Service
External
Shuffle
Service
External
Shuffle
Service

More Related Content

PDF

はじめてのOracle Cloud Infrastructure (Oracle Cloudウェビナーシリーズ: 2021年9月22日)オラクルエンジニア通信

PDF

AWS vs Azure vs Google Cloud Storage Deep DiveRightScale

PDF

Multi cloud data integration with data virtualizationDenodo

PDF

Dynamic Allocation in SparkDatabricks

PDF

Observability for Data Pipelines With OpenLineageDatabricks

PPTX

Cassandra Troubleshooting 3.0J.B. Langston

PDF

Got data?… now what? An introduction to modern data platformsJamesAnderson599331

PDF

Part 3 - Modern Data Warehouse with Azure SynapseNilesh Gule

はじめてのOracle Cloud Infrastructure (Oracle Cloudウェビナーシリーズ: 2021年9月22日)オラクルエンジニア通信

AWS vs Azure vs Google Cloud Storage Deep DiveRightScale

Multi cloud data integration with data virtualizationDenodo

Dynamic Allocation in SparkDatabricks

Observability for Data Pipelines With OpenLineageDatabricks

Cassandra Troubleshooting 3.0J.B. Langston

Got data?… now what? An introduction to modern data platformsJamesAnderson599331

Part 3 - Modern Data Warehouse with Azure SynapseNilesh Gule

What's hot (20)

PPTX

Core Concepts in azure data factoryBRIJESH KUMAR

PDF

Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Cathrine Wilhelmsen

PDF

Data Integration Alternatives: When to use Data Virtualization, ETL, and ESBDenodo

PPTX

Big data, Big decisionVenkatesh Balakumar

PPTX

Azure Data Factory Data Flow Performance Tuning 101Mark Kromer

PPTX

Azure Data Factory Data FlowMark Kromer

PPTX

Modern data warehouse presentationDavid Rice

PDF

Building Applications with a Graph DatabaseTobias Lindaaker

PDF

The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...Databricks

PDF

Modern Data Challenges require Modern Graph TechnologyNeo4j

PDF

Exploring BigData with Google BigQueryDharmesh Vaya

PDF

Measuring Data Quality Return on InvestmentDATAVERSITY

PDF

Data Modeling & Data IntegrationDATAVERSITY

PPTX

Big data pptAKASH SIHAG

PPSX

The Web of data and web data commonsJesse Wang

PDF

Data Mesh at CMC Markets: Past, Present and FutureLorenzo Nicora

PPTX

Free Training: How to Build a LakehouseDatabricks

PPTX

Trucks on a Graph: How JB Hunt Uses Neo4jNeo4j

PPTX

Tableau pptSharepoint Online Training

PDF

Machine learning and big data @ uber a tale of two systemsZhenxiao Luo

Core Concepts in azure data factoryBRIJESH KUMAR

Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Cathrine Wilhelmsen

Data Integration Alternatives: When to use Data Virtualization, ETL, and ESBDenodo

Big data, Big decisionVenkatesh Balakumar

Azure Data Factory Data Flow Performance Tuning 101Mark Kromer

Azure Data Factory Data FlowMark Kromer

Modern data warehouse presentationDavid Rice

Building Applications with a Graph DatabaseTobias Lindaaker

The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...Databricks

Modern Data Challenges require Modern Graph TechnologyNeo4j

Exploring BigData with Google BigQueryDharmesh Vaya

Measuring Data Quality Return on InvestmentDATAVERSITY

Data Modeling & Data IntegrationDATAVERSITY

Big data pptAKASH SIHAG

The Web of data and web data commonsJesse Wang

Data Mesh at CMC Markets: Past, Present and FutureLorenzo Nicora

Free Training: How to Build a LakehouseDatabricks

Trucks on a Graph: How JB Hunt Uses Neo4jNeo4j

Tableau pptSharepoint Online Training

Machine learning and big data @ uber a tale of two systemsZhenxiao Luo

Viewers also liked (20)

PPT

Design and Development of a Resource Allocation Mechanism for the School Educ...Gihan Wikramanayake

PDF

Spark Summit EU talk by Patrick Baier and Stanimir DragievSpark Summit

PDF

Spark Summit EU talk by Oscar CastanedaSpark Summit

PDF

Spark Summit EU talk by Javier AguedesSpark Summit

PDF

Workplace Practices & Resource Allocationgrawitch

PPT

Dynamic resource allocationjarobertson2

PDF

Spark Summit EU talk by Josef HabdankSpark Summit

PPTX

The Spark (R)evolution in The NetherlandsSpark Summit

PDF

Spark Summit EU talk by Heiko KorndorfSpark Summit

PDF

Spark Summit EU talk by Jorg SchadSpark Summit

PDF

SDPM - Lecture 4 - Activity planning and resource allocationOpenLearningLab

PDF

Spark Summit EU talk by Erwin Datema and Roeland van HamSpark Summit

PPTX

Democratizing AI with Apache SparkSpark Summit

PDF

Spark Summit EU talk by Sudeep Das and Aish FaentonSpark Summit

PDF

Dynamic Resource Allocation Spark on YARNTsuyoshi OZAWA

PDF

Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)Spark Summit

PDF

Spark Summit EU talk by Reza KarimiSpark Summit

PDF

Spark Summit EU talk by Dean WamplerSpark Summit

PDF

Spark Summit EU talk by Sital KediaSpark Summit

PDF

Spark Summit EU talk by Ruben Pulido and Behar VeliqiSpark Summit

Design and Development of a Resource Allocation Mechanism for the School Educ...Gihan Wikramanayake

Spark Summit EU talk by Patrick Baier and Stanimir DragievSpark Summit

Spark Summit EU talk by Oscar CastanedaSpark Summit

Spark Summit EU talk by Javier AguedesSpark Summit

Workplace Practices & Resource Allocationgrawitch

Dynamic resource allocationjarobertson2

Spark Summit EU talk by Josef HabdankSpark Summit

The Spark (R)evolution in The NetherlandsSpark Summit

Spark Summit EU talk by Heiko KorndorfSpark Summit

Spark Summit EU talk by Jorg SchadSpark Summit

SDPM - Lecture 4 - Activity planning and resource allocationOpenLearningLab

Spark Summit EU talk by Erwin Datema and Roeland van HamSpark Summit

Democratizing AI with Apache SparkSpark Summit

Spark Summit EU talk by Sudeep Das and Aish FaentonSpark Summit

Dynamic Resource Allocation Spark on YARNTsuyoshi OZAWA

Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)Spark Summit

Spark Summit EU talk by Reza KarimiSpark Summit

Spark Summit EU talk by Dean WamplerSpark Summit

Spark Summit EU talk by Sital KediaSpark Summit

Spark Summit EU talk by Ruben Pulido and Behar VeliqiSpark Summit

Similar to Spark Summit EU talk by Luc Bourlier (20)

PDF

Manage Pulsar Cluster Lifecycles with Kubernetes Operators - Pulsar Summit NA...StreamNative

PDF

Dynamic Class-Based Spark Workload Scheduling and Resource Using YARN with L...Databricks

PDF

Extending Spark Streaming to Support Complex Event ProcessingOh Chan Kwon

PDF

Spark cepByungjin Kim

PDF

Scalable complex event processing on samza @UBERShuyi Chen

PDF

The benefits of running Spark on your own DockerItai Yaffe

PDF

Run Apache Spark on Kubernetes in Large Scale_ Challenges and Solutions-2.pdfAnya Bida

PDF

NetflixOSS Open House Lightning talksRuslan Meshenberg

PDF

Serverless Event Streaming with Pulsar FunctionsStreamNative

PPTX

Kafka Practices @ Uber - Seattle Apache Kafka meetupMingmin Chen

PDF

Set Up & Operate Open Source Oracle ReplicationContinuent

PPTX

Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutesShivji Kumar Jha

PDF

Reactive mistakes - ScalaDays Chicago 2017Petr Zapletal

PDF

Understanding Kubernetes Scheduling - CNTUG 2024-10vyhaxkgv4

PPTX

Optimizing spark based data pipelines - are you up for it?Etti Gur

PPTX

Introduction to Serverless and Google Cloud FunctionsMalepati Bala Siva Sai Akhil

PDF

Swarm migrationJanakiram MSV

PDF

Create a One Click Migration (OCM) process to Automate Repeatable Infrastruct...Quantyca - Data at Core

PPTX

Spark on YarnQubole

PDF

Load testing in Zonky with GatlingPetr Vlček

Manage Pulsar Cluster Lifecycles with Kubernetes Operators - Pulsar Summit NA...StreamNative

Dynamic Class-Based Spark Workload Scheduling and Resource Using YARN with L...Databricks

Extending Spark Streaming to Support Complex Event ProcessingOh Chan Kwon

Spark cepByungjin Kim

Scalable complex event processing on samza @UBERShuyi Chen

The benefits of running Spark on your own DockerItai Yaffe

Run Apache Spark on Kubernetes in Large Scale_ Challenges and Solutions-2.pdfAnya Bida

NetflixOSS Open House Lightning talksRuslan Meshenberg

Serverless Event Streaming with Pulsar FunctionsStreamNative

Kafka Practices @ Uber - Seattle Apache Kafka meetupMingmin Chen

Set Up & Operate Open Source Oracle ReplicationContinuent

Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutesShivji Kumar Jha

Reactive mistakes - ScalaDays Chicago 2017Petr Zapletal

Understanding Kubernetes Scheduling - CNTUG 2024-10vyhaxkgv4

Optimizing spark based data pipelines - are you up for it?Etti Gur

Introduction to Serverless and Google Cloud FunctionsMalepati Bala Siva Sai Akhil

Swarm migrationJanakiram MSV

Create a One Click Migration (OCM) process to Automate Repeatable Infrastruct...Quantyca - Data at Core

Spark on YarnQubole

Load testing in Zonky with GatlingPetr Vlček

More from Spark Summit (20)

PDF

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang Spark Summit

PDF

VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...Spark Summit

PDF

Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang WuSpark Summit

PDF

Improving Traffic Prediction Using Weather Data with Ramya RaghavendraSpark Summit

PDF

A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...Spark Summit

PDF

No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...Spark Summit

PDF

Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit

PDF

Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit

PDF

MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...Spark Summit

PDF

Next CERN Accelerator Logging Service with Jakub WozniakSpark Summit

PDF

Powering a Startup with Apache Spark with Kevin KimSpark Summit

PDF

Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraSpark Summit

PDF

Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Spark Summit

PDF

How Nielsen Utilized Databricks for Large-Scale Research and Development with...Spark Summit

PDF

Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spark Summit

PDF

Goal Based Data Production with Sim SimeonovSpark Summit

PDF

Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Spark Summit

PDF

Getting Ready to Use Redis with Apache Spark with Dvir VolkSpark Summit

PDF

Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Spark Summit

PDF

MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...Spark Summit