SlideShare a Scribd company logo
WIFI SSID:SparkAISummit | Password: UnifiedAnalytics
Rui Jian, Hao Lin, Facebook Inc.
rjian@fb.com, hlin@fb.com
Tangram: Distributed Scheduling
Framework for Apache Spark at
Facebook
#UnifiedAnalytics #SparkAISummit
About Us
• Rui Jian
– Software Engineer at Facebook (Data Warehouse & Graph Indexing)
– Master of Computer Science (Shanghai Jiao Tong university)
• Hao Lin
– Research scientist at Facebook (Data Warehouse Batch Scheduling)
– PhD in Parallel Computing (Purdue ECE)
3#UnifiedAnalytics #SparkAISummit
Agenda
• Overview
• Tangram Architecture
• Scheduling Policies & Resource Allocation
• Future work
4#UnifiedAnalytics #SparkAISummit
What is Tangram?
The scheduling platform for
• reliably running various batch workloads
• with efficient heterogenous resource management
• at scale
5#UnifiedAnalytics #SparkAISummit
Tangram Scheduling Targets
• Single jobs: adhoc/periodic
• Batch jobs: adhoc/periodic, malleable
• Gang jobs: adhoc/periodic, rigid
• Long-running jobs: steady and regular; e.g. online training
6#UnifiedAnalytics #SparkAISummit
Why Tangram?
• Various workload characteristics
– ML
– Apache Spark
– Apache Giraph
– Single jobs
• Customized scheduling policies
• Scalability
– Fleet size: hundreds of thousands worker nodes
– Job scheduling throughput: hundreds of millions jobs per day
7#UnifiedAnalytics #SparkAISummit
Overview
• What is Tangram?
8#UnifiedAnalytics #SparkAISummit
Admin
Job Manager
DB
ML
Resource
Manager
Master
Agent AgentAgent
Single
Job
Gang Job
ML Elastic
Scheduler
1
2
3
4
5
6
SQL query
Giraph
Spark
Client Library
9#UnifiedAnalytics #SparkAISummit
• Job management
• Request/Release resources
• Resource grant
• Preemption notification
• Launch containers
• Container status change event
Tangram
client
Resource
Manager
Agent
Application
1
2
3
4
5
6
Agent
• Report schedulable resources and runtime usage
• Health check reports
• Detect labels
• Launch/Kill Containers
• Container recovery
• Resource isolation with cgroup v2
10#UnifiedAnalytics #SparkAISummit
Failure Recovery
• Agent failure
– Scan the recovery directory and recover the running containers
• RM failure
– Both agent and client hold off communication to the RM until the new
master shows up
– Client sync session info to the new master to help it build the states
– Agents add them to the new master
11#UnifiedAnalytics #SparkAISummit
Scheduling Policies
• Hierarchical queue structure
• Jobs to be queued on leaves
• Queue configs:
– min/max resources
– Policy:
• FIFO
• Dominant Resource Fairness (DRF)
• User fairness
• Global
• …
12#UnifiedAnalytics #SparkAISummit
/
ads feed
pipelines interactive
Job
DRF
DRF DRF
User FairnessFIFO
20%80%
50% 50%
user1 user2
50% 50%
FIFO FIFO
Job
Job Job
Scheduling Policies
• Jobs ordered by priority, submission time within queue
• Gang job as first class in scheduling and resource allocation
• Lookahead scheduling for better throughput and utilization
• Job starvation prevention
13#UnifiedAnalytics #SparkAISummit
Gang 200 Gang 20 Single Gang 4 Single
Resource Allocation
• Fine-grained resource specification:
– {cpuMilliCores: 3000, memoryBytes: 200GB}
• Constraints:
– “dataCenter = dc1 & type in [1,2] & kernelVersion > 4.10”
• Job Affinity:
– inSameDatacenter
14#UnifiedAnalytics #SparkAISummit
Resource Allocation
15#UnifiedAnalytics #SparkAISummit
Prefetched
Host Cache
• Bypass the
steps of
host
filtering
and
scoring
• Speedup
allocation
process
Host Filtering
• Hard &
Soft
constraints
• Resource
constraint
• Label
constraint
• Job affinity
Host Scoring
and Ordering
• Packing
efficiency
• Host
healthiness
• Data
locality
Commit
Allocation
• Book
keeping
resources
• Update
cluster &
queue
parameters
Constraint-based Scheduling
• Machine type
• Datacenter
• Region
• CPU architecture
• Host prefix
• …
16#UnifiedAnalytics #SparkAISummit
Merged host pool - type 1 & 2
Job
Job
Job
Host 1
Host 2
Host 3
Host 4
Host 5
Labeled with
{”type”:”2”}
Labeled with
{”type”:”1”}
Job Job
Job constraint:
type=2
Job constraint:
type=1
Queue
Preemption
• Guarantee resource availability SLO within and across queues
• Identify the starving jobs and overallocated jobs
• Minimize preemption cost: two-phase protocol
– Only candidates appearing in both phases will be preempted
– Resource Manager notifies client with preemption intent s.t. necessary action can
be taken, e.g. checkpointing
17#UnifiedAnalytics #SparkAISummit
Cross Datacenter Scheduling
• The growing demand of computation and storage for Hive tables
spans across data centers
• Stranded capacity with imbalanced load
• Poor data locality and waste of network bandwidth
• Slow reaction to recover from crisis and disaster
18#UnifiedAnalytics #SparkAISummit
Cross Datacenter Scheduling
• Dispatcher Proxy
– Monitors resource consumption
across data centers
– Decides the Resource Manager
for scheduling jobs
– Provides location hints to the
Resource Manager for
enforcement
• Planner
– Decides where the data will be
replaced based on utilization and
available resources
19#UnifiedAnalytics #SparkAISummit
Datacenter 1 Datacenter 2 Datacenter 3
Resource Manager
1
Resource Manager
2
Dispatcher
Job
Cross Datacenter Scheduling
• Dispatcher Proxy
– Monitors resource consumption
across data centers
– Decides the Resource Manager
for scheduling jobs
– Provides location hints to the
Resource Manager for
enforcement
• Planner
– Decides where the data will be
replaced based on utilization and
available resources
20#UnifiedAnalytics #SparkAISummit
Datacenter 1 Datacenter 2 Datacenter 3
Resource Manager
1
Resource Manager
2
Dispatcher
Job
Job constraint:
datacenter=1
Cross Datacenter Scheduling
• Dispatcher Proxy
– Monitors resource consumption
across data centers
– Decides the Resource Manager
for scheduling jobs
– Provides location hints to the
Resource Manager for
enforcement
• Planner
– Decides where the data will be
replaced based on utilization and
available resources
21#UnifiedAnalytics #SparkAISummit
Datacenter 1 Datacenter 2 Datacenter 3
Resource Manager
1
Resource Manager
2
Dispatcher
Job
Job constraint:
datacenter=1
Table DataTable Data
Future Work
• Mix workloads managed by one resource manager
• Run batch workloads with off-peak resources from online services
• Automatic resource tuning for high utilization
• We’re hiring! Contact: rjian@fb.com
22#UnifiedAnalytics #SparkAISummit
DON’T FORGET TO RATE
AND REVIEW THE SESSIONS
SEARCH SPARK + AI SUMMIT

More Related Content

What's hot (20)

PPTX
Centralized Logging System Using ELK Stack
Rohit Sharma
 
PPT
Monitoring using Prometheus and Grafana
Arvind Kumar G.S
 
PPTX
data platform on kubernetes
창언 정
 
PPTX
Introduction to Map Reduce
Apache Apex
 
PDF
Linux Systems Performance 2016
Brendan Gregg
 
PDF
BIGDATA ANALYTICS LAB MANUAL final.pdf
ANJALAI AMMAL MAHALINGAM ENGINEERING COLLEGE
 
PDF
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Denodo
 
PDF
Performance Troubleshooting Using Apache Spark Metrics
Databricks
 
PDF
A Threat Hunter Himself
Sergey Soldatov
 
PDF
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
AWS Chicago
 
PDF
20090713 Hbase Schema Design Case Studies
Evan Liu
 
PDF
Building Reliable Data Lakes at Scale with Delta Lake
Databricks
 
PDF
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark Summit
 
PDF
How to implement a truly modular ecommerce platform on the example of Spryker...
Fabian Wesner
 
PDF
Apache Flume
Arinto Murdopo
 
PDF
Parquet Hadoop Summit 2013
Julien Le Dem
 
PDF
Java Performance Analysis on Linux with Flame Graphs
Brendan Gregg
 
PPTX
PowerUp - Automating Windows Privilege Escalation
Will Schroeder
 
PPTX
Elastic stack Presentation
Amr Alaa Yassen
 
PPTX
ELK Stack
Phuc Nguyen
 
Centralized Logging System Using ELK Stack
Rohit Sharma
 
Monitoring using Prometheus and Grafana
Arvind Kumar G.S
 
data platform on kubernetes
창언 정
 
Introduction to Map Reduce
Apache Apex
 
Linux Systems Performance 2016
Brendan Gregg
 
BIGDATA ANALYTICS LAB MANUAL final.pdf
ANJALAI AMMAL MAHALINGAM ENGINEERING COLLEGE
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Denodo
 
Performance Troubleshooting Using Apache Spark Metrics
Databricks
 
A Threat Hunter Himself
Sergey Soldatov
 
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
AWS Chicago
 
20090713 Hbase Schema Design Case Studies
Evan Liu
 
Building Reliable Data Lakes at Scale with Delta Lake
Databricks
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark Summit
 
How to implement a truly modular ecommerce platform on the example of Spryker...
Fabian Wesner
 
Apache Flume
Arinto Murdopo
 
Parquet Hadoop Summit 2013
Julien Le Dem
 
Java Performance Analysis on Linux with Flame Graphs
Brendan Gregg
 
PowerUp - Automating Windows Privilege Escalation
Will Schroeder
 
Elastic stack Presentation
Amr Alaa Yassen
 
ELK Stack
Phuc Nguyen
 

Similar to Tangram: Distributed Scheduling Framework for Apache Spark at Facebook (20)

PDF
Self-Service Apache Spark Structured Streaming Applications and Analytics
Databricks
 
PDF
Databricks: What We Have Learned by Eating Our Dog Food
Databricks
 
PDF
Scaling ML-Based Threat Detection For Production Cyber Attacks
Databricks
 
PDF
Connecting the Dots: Integrating Apache Spark into Production Pipelines
Databricks
 
PDF
Stream Processing: Choosing the Right Tool for the Job
Databricks
 
PDF
Parallelizing with Apache Spark in Unexpected Ways
Databricks
 
PDF
Cooperative Task Execution for Apache Spark
Databricks
 
PDF
Physical Plans in Spark SQL
Databricks
 
PDF
Blue Pill/Red Pill: The Matrix of Thousands of Data Streams
Databricks
 
PDF
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
Databricks
 
PDF
Tactical Data Science Tips: Python and Spark Together
Databricks
 
PDF
03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving
Databricks
 
PDF
Apache Spark Core – Practical Optimization
Databricks
 
PDF
Databricks + Snowflake: Catalyzing Data and AI Initiatives
Databricks
 
PDF
AI on Spark for Malware Analysis and Anomalous Threat Detection
Databricks
 
PDF
Scaling ML-Based Threat Detection For Production Cyber Attacks
Databricks
 
PDF
Databricks with R: Deep Dive
Databricks
 
PDF
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
Databricks
 
PDF
Life is but a Stream
Databricks
 
PDF
How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
Databricks
 
Self-Service Apache Spark Structured Streaming Applications and Analytics
Databricks
 
Databricks: What We Have Learned by Eating Our Dog Food
Databricks
 
Scaling ML-Based Threat Detection For Production Cyber Attacks
Databricks
 
Connecting the Dots: Integrating Apache Spark into Production Pipelines
Databricks
 
Stream Processing: Choosing the Right Tool for the Job
Databricks
 
Parallelizing with Apache Spark in Unexpected Ways
Databricks
 
Cooperative Task Execution for Apache Spark
Databricks
 
Physical Plans in Spark SQL
Databricks
 
Blue Pill/Red Pill: The Matrix of Thousands of Data Streams
Databricks
 
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
Databricks
 
Tactical Data Science Tips: Python and Spark Together
Databricks
 
03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving
Databricks
 
Apache Spark Core – Practical Optimization
Databricks
 
Databricks + Snowflake: Catalyzing Data and AI Initiatives
Databricks
 
AI on Spark for Malware Analysis and Anomalous Threat Detection
Databricks
 
Scaling ML-Based Threat Detection For Production Cyber Attacks
Databricks
 
Databricks with R: Deep Dive
Databricks
 
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
Databricks
 
Life is but a Stream
Databricks
 
How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
Databricks
 
Ad

More from Databricks (20)

PPTX
DW Migration Webinar-March 2022.pptx
Databricks
 
PPTX
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
PPT
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
PPTX
Data Lakehouse Symposium | Day 2
Databricks
 
PPTX
Data Lakehouse Symposium | Day 4
Databricks
 
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
PDF
Democratizing Data Quality Through a Centralized Platform
Databricks
 
PDF
Learn to Use Databricks for Data Science
Databricks
 
PDF
Why APM Is Not the Same As ML Monitoring
Databricks
 
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
PDF
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
PDF
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
PDF
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
PDF
Sawtooth Windows for Feature Aggregations
Databricks
 
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
PDF
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
PDF
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
PDF
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
Ad

Recently uploaded (20)

PDF
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
PDF
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
PDF
UNISE-Operation-Procedure-InDHIS2trainng
ahmedabduselam23
 
PDF
apidays Singapore 2025 - How APIs can make - or break - trust in your AI by S...
apidays
 
PPTX
How to Add Columns and Rows in an R Data Frame
subhashenia
 
PPTX
BinarySearchTree in datastructures in detail
kichokuttu
 
PPTX
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
 
PDF
Using AI/ML for Space Biology Research
VICTOR MAESTRE RAMIREZ
 
PPTX
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
PDF
NIS2 Compliance for MSPs: Roadmap, Benefits & Cybersecurity Trends (2025 Guide)
GRC Kompas
 
PPTX
thid ppt defines the ich guridlens and gives the information about the ICH gu...
shaistabegum14
 
PPTX
big data eco system fundamentals of data science
arivukarasi
 
PDF
1750162332_Snapshot-of-Indias-oil-Gas-data-May-2025.pdf
sandeep718278
 
PPTX
04_Tamás Marton_Intuitech .pptx_AI_Barometer_2025
FinTech Belgium
 
PDF
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
PDF
Data Science Course Certificate by Sigma Software University
Stepan Kalika
 
PPTX
05_Jelle Baats_Tekst.pptx_AI_Barometer_Release_Event
FinTech Belgium
 
PDF
apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)
apidays
 
PPT
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
PPTX
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
UNISE-Operation-Procedure-InDHIS2trainng
ahmedabduselam23
 
apidays Singapore 2025 - How APIs can make - or break - trust in your AI by S...
apidays
 
How to Add Columns and Rows in an R Data Frame
subhashenia
 
BinarySearchTree in datastructures in detail
kichokuttu
 
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
 
Using AI/ML for Space Biology Research
VICTOR MAESTRE RAMIREZ
 
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
NIS2 Compliance for MSPs: Roadmap, Benefits & Cybersecurity Trends (2025 Guide)
GRC Kompas
 
thid ppt defines the ich guridlens and gives the information about the ICH gu...
shaistabegum14
 
big data eco system fundamentals of data science
arivukarasi
 
1750162332_Snapshot-of-Indias-oil-Gas-data-May-2025.pdf
sandeep718278
 
04_Tamás Marton_Intuitech .pptx_AI_Barometer_2025
FinTech Belgium
 
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
Data Science Course Certificate by Sigma Software University
Stepan Kalika
 
05_Jelle Baats_Tekst.pptx_AI_Barometer_Release_Event
FinTech Belgium
 
apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)
apidays
 
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 

Tangram: Distributed Scheduling Framework for Apache Spark at Facebook

  • 1. WIFI SSID:SparkAISummit | Password: UnifiedAnalytics
  • 2. Rui Jian, Hao Lin, Facebook Inc. [email protected], [email protected] Tangram: Distributed Scheduling Framework for Apache Spark at Facebook #UnifiedAnalytics #SparkAISummit
  • 3. About Us • Rui Jian – Software Engineer at Facebook (Data Warehouse & Graph Indexing) – Master of Computer Science (Shanghai Jiao Tong university) • Hao Lin – Research scientist at Facebook (Data Warehouse Batch Scheduling) – PhD in Parallel Computing (Purdue ECE) 3#UnifiedAnalytics #SparkAISummit
  • 4. Agenda • Overview • Tangram Architecture • Scheduling Policies & Resource Allocation • Future work 4#UnifiedAnalytics #SparkAISummit
  • 5. What is Tangram? The scheduling platform for • reliably running various batch workloads • with efficient heterogenous resource management • at scale 5#UnifiedAnalytics #SparkAISummit
  • 6. Tangram Scheduling Targets • Single jobs: adhoc/periodic • Batch jobs: adhoc/periodic, malleable • Gang jobs: adhoc/periodic, rigid • Long-running jobs: steady and regular; e.g. online training 6#UnifiedAnalytics #SparkAISummit
  • 7. Why Tangram? • Various workload characteristics – ML – Apache Spark – Apache Giraph – Single jobs • Customized scheduling policies • Scalability – Fleet size: hundreds of thousands worker nodes – Job scheduling throughput: hundreds of millions jobs per day 7#UnifiedAnalytics #SparkAISummit
  • 8. Overview • What is Tangram? 8#UnifiedAnalytics #SparkAISummit Admin Job Manager DB ML Resource Manager Master Agent AgentAgent Single Job Gang Job ML Elastic Scheduler 1 2 3 4 5 6 SQL query Giraph Spark
  • 9. Client Library 9#UnifiedAnalytics #SparkAISummit • Job management • Request/Release resources • Resource grant • Preemption notification • Launch containers • Container status change event Tangram client Resource Manager Agent Application 1 2 3 4 5 6
  • 10. Agent • Report schedulable resources and runtime usage • Health check reports • Detect labels • Launch/Kill Containers • Container recovery • Resource isolation with cgroup v2 10#UnifiedAnalytics #SparkAISummit
  • 11. Failure Recovery • Agent failure – Scan the recovery directory and recover the running containers • RM failure – Both agent and client hold off communication to the RM until the new master shows up – Client sync session info to the new master to help it build the states – Agents add them to the new master 11#UnifiedAnalytics #SparkAISummit
  • 12. Scheduling Policies • Hierarchical queue structure • Jobs to be queued on leaves • Queue configs: – min/max resources – Policy: • FIFO • Dominant Resource Fairness (DRF) • User fairness • Global • … 12#UnifiedAnalytics #SparkAISummit / ads feed pipelines interactive Job DRF DRF DRF User FairnessFIFO 20%80% 50% 50% user1 user2 50% 50% FIFO FIFO Job Job Job
  • 13. Scheduling Policies • Jobs ordered by priority, submission time within queue • Gang job as first class in scheduling and resource allocation • Lookahead scheduling for better throughput and utilization • Job starvation prevention 13#UnifiedAnalytics #SparkAISummit Gang 200 Gang 20 Single Gang 4 Single
  • 14. Resource Allocation • Fine-grained resource specification: – {cpuMilliCores: 3000, memoryBytes: 200GB} • Constraints: – “dataCenter = dc1 & type in [1,2] & kernelVersion > 4.10” • Job Affinity: – inSameDatacenter 14#UnifiedAnalytics #SparkAISummit
  • 15. Resource Allocation 15#UnifiedAnalytics #SparkAISummit Prefetched Host Cache • Bypass the steps of host filtering and scoring • Speedup allocation process Host Filtering • Hard & Soft constraints • Resource constraint • Label constraint • Job affinity Host Scoring and Ordering • Packing efficiency • Host healthiness • Data locality Commit Allocation • Book keeping resources • Update cluster & queue parameters
  • 16. Constraint-based Scheduling • Machine type • Datacenter • Region • CPU architecture • Host prefix • … 16#UnifiedAnalytics #SparkAISummit Merged host pool - type 1 & 2 Job Job Job Host 1 Host 2 Host 3 Host 4 Host 5 Labeled with {”type”:”2”} Labeled with {”type”:”1”} Job Job Job constraint: type=2 Job constraint: type=1 Queue
  • 17. Preemption • Guarantee resource availability SLO within and across queues • Identify the starving jobs and overallocated jobs • Minimize preemption cost: two-phase protocol – Only candidates appearing in both phases will be preempted – Resource Manager notifies client with preemption intent s.t. necessary action can be taken, e.g. checkpointing 17#UnifiedAnalytics #SparkAISummit
  • 18. Cross Datacenter Scheduling • The growing demand of computation and storage for Hive tables spans across data centers • Stranded capacity with imbalanced load • Poor data locality and waste of network bandwidth • Slow reaction to recover from crisis and disaster 18#UnifiedAnalytics #SparkAISummit
  • 19. Cross Datacenter Scheduling • Dispatcher Proxy – Monitors resource consumption across data centers – Decides the Resource Manager for scheduling jobs – Provides location hints to the Resource Manager for enforcement • Planner – Decides where the data will be replaced based on utilization and available resources 19#UnifiedAnalytics #SparkAISummit Datacenter 1 Datacenter 2 Datacenter 3 Resource Manager 1 Resource Manager 2 Dispatcher Job
  • 20. Cross Datacenter Scheduling • Dispatcher Proxy – Monitors resource consumption across data centers – Decides the Resource Manager for scheduling jobs – Provides location hints to the Resource Manager for enforcement • Planner – Decides where the data will be replaced based on utilization and available resources 20#UnifiedAnalytics #SparkAISummit Datacenter 1 Datacenter 2 Datacenter 3 Resource Manager 1 Resource Manager 2 Dispatcher Job Job constraint: datacenter=1
  • 21. Cross Datacenter Scheduling • Dispatcher Proxy – Monitors resource consumption across data centers – Decides the Resource Manager for scheduling jobs – Provides location hints to the Resource Manager for enforcement • Planner – Decides where the data will be replaced based on utilization and available resources 21#UnifiedAnalytics #SparkAISummit Datacenter 1 Datacenter 2 Datacenter 3 Resource Manager 1 Resource Manager 2 Dispatcher Job Job constraint: datacenter=1 Table DataTable Data
  • 22. Future Work • Mix workloads managed by one resource manager • Run batch workloads with off-peak resources from online services • Automatic resource tuning for high utilization • We’re hiring! Contact: [email protected] 22#UnifiedAnalytics #SparkAISummit
  • 23. DON’T FORGET TO RATE AND REVIEW THE SESSIONS SEARCH SPARK + AI SUMMIT