GPU Support in Spark and GPU/
CPU Mixed Resource Scheduling
at Production Scale
Yonggang Hu, IBM, DE
Junfeng Liu, IBM, Architect
About us
•  Yonggang Hu
Distinguished Engineer, IBM
Chief Architect at Platform Computing, IBM.
Vice President and Application Architect at JPMorgan Chase
Working on distributed computing, grid, cloud and big data for the
past 20 years.
•  Junfeng Liu
IBM Platform Computing Architect, focusing on Big data
platform design and implementation. Successfully delivering
solutions to several key customers.
Agenda
•  GPU and Spark integration motivation
•  The challenges in production deployments
•  Solutions in IBM Conductor with Spark
•  Demo
Spark & GPU
Spark apps are
CPU intensive
Need to handle
more data and
bigger models
Machine Learning
Predicative analytics,
Logistic regression, ALS
Kmeans, etc.
Graph Analytics
Security, Fraud Detection
Social Network Analytics
GraphX
Video/Speech Analytics
Object Recognition
Dialog
Financial Risk Analytics
Market simulation
Credit risk. home-grown,
apps from Murex, Misys
Spark-enable existing GPU appsGPU-enable Spark apps
Various ways to enable Spark & GPU
•  Use GPUs for accelerating Spark Libraries and operations without
changing interfaces and underlying programming model.
•  Automatically generate CUDA code from the source Spark Java
code
•  Integrate Spark with GPU-enabled application & system (e.g., Spark
integrated with Caffe, TensorFlow and customer applications)
Production Challenges
•  However
–  Identification of GPU execution vs. CPU
execution in DAG
–  Data preparation for GPU execution
–  Low resource utilization for CPU or GPU or
both
•  Cannot assume all compute hosts are identical and
have GPU resource available
•  GPU is a lot more expensive !!!
–  Overload and contention when running mixed
GPU & CPU workload
–  Long tail & GPU & CPU tasks failover
–  Task ratio control on different resources
Stage	2	Stage	1	
reduceByKey	 collect	
Stage	GPU	
GPU Group CPU Group
A typical example –
Personalized Medicine – Adverse Drug Reaction Workload
-  30X faster at learning speed and 4.3 X speed up at end-2-end
-  Need to fully utilize both GPU and CPU resources to get economic benefits
Scheduling Granularity
•  Scheduling at application level
–  Mesos and Yarn tag the GPU machine with label
–  Schedule the application on GPU hosts based resource requirement of application
–  Corse grained scheduling leads to low utilization of CPU/GPU.
•  Scheduling at DAG level
–  Need fine grained sharing for GPU resources rather than reserving entire GPU
machines
–  Identify GPU operation
–  Optimize the DAG tree by decupling GPU operations from CPU operations and by
inserting new GPU stages
–  Reduce GPU wait time, enable sharing GPU among different jobs and therefore
improve the overall GPU utilization
GPU tasks recognition
•  GPU and CPU tasks mixed together
•  Separate the workload is necessary for scheduling
control
GPUFunction()	
Python-C Wrapper to
Invoke Native
Function	
Function
implemented by
CUDA/OpenCL	
GPU library	Python-C/C++ Wrapper
GPU tasks recognition
•  Mark the GPU workload by DAG
operation
–  Go through the DAG tree to identify
the stages with GPU requirement
–  Optimize the distribution by inserting
GPU stage
Policies
•  RM needs capability to identify the GPU hosts and
manage along with CPU resources
•  Prioritization policy - share GPU resource among
applications
•  Allocation policy – control GPU and CPU allocation
independently – multi-dimensional scheduling
•  Fine grained policy to schedule tasks according to GPU
optimized DAG plan
Adaptive Scheduling
•  CPU & GPU tasks are convertible in
many applications
•  Scheduling needs adaptive capability
–  If GPU is available, use a portion of GPU
–  Otherwise run rest of tasks on CPU
dstInBlocks.join(merged).mapV
alues {
….
if (useGPU) {
loadGPULib()
callGPU ()
}
else {
//CPU version
}
}
Adaptive Scheduling
CPU
CPU
CPU
GPU
GPU
GPU
Driver
Executors Executors
Executors
Tasks
--------------
Node 0 Node 1 Node n
Efficiency Considerations
•  Do we need to wait GPU resource if there is
CPU available?
•  Do we need rerun the CPU tasks on GPU if
tasks on CPU are long-tail?
•  Do we need to have failover cross resource
type?
Defer Scheduling
•  Traditional defer Scheduling
–  Wait for data locality
–  Cache, Host, Rack
•  Resource based defer scheduling
–  Necessary if the GPU can greatly speed up task execution
–  Wait time is acceptable
Future works
•  Global optimization
–  Consider the cost of additional shuffle stage
–  Consider data locality of CPU and GPU stage
–  Add time dimension to MDS
–  Optimize global DAG tree execution
–  Use historical data to optimize future execution, e.g,
future iteration
Fine grain, dynamic allocation of
resources maximizes efficiency of Spark
instances sharing a common resource
pool. Multi-tenant, multi-framework
support. Eliminates cluster sprawl.
2
Run Spark natively on a shared
infrastructure without the dependency of
Hadoop. Reduce application wait time,
improving time to results.
1
Building Spark Centric Shared Service with
IBM Conductor
End-to-End Enterprise Class Solution
Improve Time to Results
Proven architecture at extreme scale, with
enterprise class workload management,
multi-version support for Spark,
monitoring, reporting, and security
capabilities.
3
Reduce Administration Costs
Increase Resource Utilization
•  IBM STC Spark Distribution
•  IBM Platform Resource Orchestrator /
Session Scheduler, application service
manager.
•  IBM Spectrum Scale FPO
4
IBM Conductor with Spark
IBM Bluemix Spark Cloud Service in
production – thousands of users and
tenants.
Third party audited benchmark
indicated significant performance/
throughput/SLA advantages
https://stacresearch.com/news/2016/03/29/IBM160229
IBM Systems 19©	2016	IBM	Corpora*on	
IBM Conductor with Spark
Monitor and Reporting with Elastic (ELK)
!  Integrated Elastic Search, Logstash, Kibana for customizable monitoring
!  Built-in monitoring Metrics
! Cross Spark Instance Groups
! Cross Spark Applications within Spark Instance Group
! Within Spark Application
!  Built-in monitoring inside Zeppelin Notebook
Demo
THANK YOU.
Contact information or call to action goes here.
Acceleration Opportunities for GPUs & Spark
Analytics Model Computational Patterns suitable for GPU Acceleration
Regression Analysis Cholesky Factorization, Matrix Inversion, Transpose
Clustering Cost-based iterative convergence
Nearest-neighbor
Search
Distance calculations, Singular Value Decomposition, Hashing
Neural Networks Matrix Multiplications, Convolutions, FFTs, Pair-wise dot-products
Support Vector
Machines
Linear Solvers, Dot-product
Association Rule
Mining
Set Operations: Intersection, union
Recommender Systems Matrix Factorizations, Dot-product
Time-series Processing FFT, Distance and Smoothing functions
Text Analytics Matrix multiplication, factorization, Set operations, String computations,
Distance functions
Monte Carlo Methods Random number generators, Probability distribution generators
Mathematical
Programming
Linear solvers, Dynamic Programming
OLAP/BI Aggregation, Sorting, Hash-based grouping, User-defined functions
Graph Analytics Matrix multiplications, Path traversals

More Related Content

PDF
Spark on Mesos
PPTX
GPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production Scale
PDF
CaffeOnSpark: Deep Learning On Spark Cluster
PDF
GPU Computing With Apache Spark And Python
PDF
Deploying Accelerators At Datacenter Scale Using Spark
PDF
Leveraging GPU-Accelerated Analytics on top of Apache Spark with Todd Mostak
PDF
High Performance Python on Apache Spark
PDF
Spark Summit EU talk by Josef Habdank
Spark on Mesos
GPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production Scale
CaffeOnSpark: Deep Learning On Spark Cluster
GPU Computing With Apache Spark And Python
Deploying Accelerators At Datacenter Scale Using Spark
Leveraging GPU-Accelerated Analytics on top of Apache Spark with Todd Mostak
High Performance Python on Apache Spark
Spark Summit EU talk by Josef Habdank

What's hot (19)

PDF
Transparent GPU Exploitation on Apache Spark with Kazuaki Ishizaki and Madhus...
PDF
Low Latency Execution For Apache Spark
PDF
Enterprise Scale Topological Data Analysis Using Spark
PDF
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
PDF
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
PDF
Deep Learning with Apache Spark and GPUs with Pierce Spitler
PDF
Spark Summit 2016: Connecting Python to the Spark Ecosystem
PDF
Re-Architecting Spark For Performance Understandability
PDF
Managing Apache Spark Workload and Automatic Optimizing
PDF
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
PDF
Reactive Streams, Linking Reactive Application To Spark Streaming
PDF
Apache Spark on K8S Best Practice and Performance in the Cloud
PDF
CUDA performance study on Hadoop MapReduce Cluster
PDF
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
PDF
Deep Learning on Apache® Spark™ : Workflows and Best Practices
PDF
Distributed Deep Learning with Apache Spark and TensorFlow with Jim Dowling
PDF
Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it...
PDF
Spark Summit EU talk by Luca Canali
PDF
Handling Data Skew Adaptively In Spark Using Dynamic Repartitioning
Transparent GPU Exploitation on Apache Spark with Kazuaki Ishizaki and Madhus...
Low Latency Execution For Apache Spark
Enterprise Scale Topological Data Analysis Using Spark
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Deep Learning with Apache Spark and GPUs with Pierce Spitler
Spark Summit 2016: Connecting Python to the Spark Ecosystem
Re-Architecting Spark For Performance Understandability
Managing Apache Spark Workload and Automatic Optimizing
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Reactive Streams, Linking Reactive Application To Spark Streaming
Apache Spark on K8S Best Practice and Performance in the Cloud
CUDA performance study on Hadoop MapReduce Cluster
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Distributed Deep Learning with Apache Spark and TensorFlow with Jim Dowling
Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it...
Spark Summit EU talk by Luca Canali
Handling Data Skew Adaptively In Spark Using Dynamic Repartitioning
Ad

Viewers also liked (20)

PDF
Exploiting GPUs in Spark
PDF
Using GPUs to handle Big Data with Java by Adam Roberts.
PPTX
The Potential of GPU-driven High Performance Data Analytics in Spark
PPTX
TensorFrames: Google Tensorflow on Apache Spark
PPTX
Machine Learning Approach for Quality Assessment and Prediction in Large Soft...
PPTX
Vasiliy Litvinov - Python Profiling
PDF
What’s eating python performance
PPTX
Denis Nagorny - Pumping Python Performance
PDF
The High Performance Python Landscape by Ian Ozsvald
PPTX
Boost.Python: C++ and Python Integration
PDF
Spark + Scikit Learn- Performance Tuning
PDF
Python profiling
PDF
Accelerating Machine Learning Applications on Spark Using GPUs
 
PDF
PG-Strom - GPU Accelerated Asyncr
PDF
Computational Techniques for the Statistical Analysis of Big Data in R
PDF
GPUs in Big Data - StampedeCon 2014
PPT
SIGGRAPH 2012: GPU-Accelerated 2D and Web Rendering
PDF
GPU Ecosystem
PPT
GTC 2012: GPU-Accelerated Path Rendering
PDF
Deep learning on spark
Exploiting GPUs in Spark
Using GPUs to handle Big Data with Java by Adam Roberts.
The Potential of GPU-driven High Performance Data Analytics in Spark
TensorFrames: Google Tensorflow on Apache Spark
Machine Learning Approach for Quality Assessment and Prediction in Large Soft...
Vasiliy Litvinov - Python Profiling
What’s eating python performance
Denis Nagorny - Pumping Python Performance
The High Performance Python Landscape by Ian Ozsvald
Boost.Python: C++ and Python Integration
Spark + Scikit Learn- Performance Tuning
Python profiling
Accelerating Machine Learning Applications on Spark Using GPUs
 
PG-Strom - GPU Accelerated Asyncr
Computational Techniques for the Statistical Analysis of Big Data in R
GPUs in Big Data - StampedeCon 2014
SIGGRAPH 2012: GPU-Accelerated 2D and Web Rendering
GPU Ecosystem
GTC 2012: GPU-Accelerated Path Rendering
Deep learning on spark
Ad

Similar to GPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production Scale (20)

PDF
The Convergence of HPC and Deep Learning
PDF
Stage Level Scheduling Improving Big Data and AI Integration
PPT
Enabling a hardware accelerated deep learning data science experience for Apa...
PDF
Hybrid Map Task Scheduling for GPU-based Heterogeneous Clusters
PDF
Enabling a hardware accelerated deep learning data science experience for Apa...
PDF
GIST AI-X Computing Cluster
PDF
Mauricio breteernitiz hpc-exascale-iscte
PDF
Deep Learning on the SaturnV Cluster
PDF
Future of hpc
PPTX
CPU VS GPU Performance a: a comparative analysis
PDF
Balancing Power & Performance Webinar
PDF
Enabling Cognitive Workloads on the Cloud: GPUs with Mesos, Docker and Marath...
PDF
Aplicações Potenciais de Deep Learning à Indústria do Petróleo
PPTX
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
PDF
Volume 2-issue-6-2040-2045
PDF
Volume 2-issue-6-2040-2045
PDF
Deep Dive into GPU Support in Apache Spark 3.x
PPTX
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
PDF
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
PDF
Introduction to GPUs for Machine Learning
The Convergence of HPC and Deep Learning
Stage Level Scheduling Improving Big Data and AI Integration
Enabling a hardware accelerated deep learning data science experience for Apa...
Hybrid Map Task Scheduling for GPU-based Heterogeneous Clusters
Enabling a hardware accelerated deep learning data science experience for Apa...
GIST AI-X Computing Cluster
Mauricio breteernitiz hpc-exascale-iscte
Deep Learning on the SaturnV Cluster
Future of hpc
CPU VS GPU Performance a: a comparative analysis
Balancing Power & Performance Webinar
Enabling Cognitive Workloads on the Cloud: GPUs with Mesos, Docker and Marath...
Aplicações Potenciais de Deep Learning à Indústria do Petróleo
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
Volume 2-issue-6-2040-2045
Volume 2-issue-6-2040-2045
Deep Dive into GPU Support in Apache Spark 3.x
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Introduction to GPUs for Machine Learning

More from Spark Summit (20)

PDF
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
PDF
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
PDF
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
PDF
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
PDF
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
PDF
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
PDF
Apache Spark and Tensorflow as a Service with Jim Dowling
PDF
Apache Spark and Tensorflow as a Service with Jim Dowling
PDF
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
PDF
Next CERN Accelerator Logging Service with Jakub Wozniak
PDF
Powering a Startup with Apache Spark with Kevin Kim
PDF
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
PDF
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
PDF
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
PDF
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
PDF
Goal Based Data Production with Sim Simeonov
PDF
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
PDF
Getting Ready to Use Redis with Apache Spark with Dvir Volk
PDF
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
PDF
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Next CERN Accelerator Logging Service with Jakub Wozniak
Powering a Startup with Apache Spark with Kevin Kim
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Goal Based Data Production with Sim Simeonov
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...

Recently uploaded (20)

PPTX
Leprosy and NLEP programme community medicine
PPTX
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
PPTX
A Complete Guide to Streamlining Business Processes
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
PDF
Introduction to Data Science and Data Analysis
PPT
Predictive modeling basics in data cleaning process
DOCX
Factor Analysis Word Document Presentation
PPTX
Introduction to Inferential Statistics.pptx
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PPTX
DS-40-Pre-Engagement and Kickoff deck - v8.0.pptx
PDF
Transcultural that can help you someday.
PPT
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
PPTX
Lesson-01intheselfoflifeofthekennyrogersoftheunderstandoftheunderstanded
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PPT
Image processing and pattern recognition 2.ppt
PPTX
retention in jsjsksksksnbsndjddjdnFPD.pptx
PPTX
chrmotography.pptx food anaylysis techni
PPTX
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
PDF
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
Leprosy and NLEP programme community medicine
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
A Complete Guide to Streamlining Business Processes
STERILIZATION AND DISINFECTION-1.ppthhhbx
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
Introduction to Data Science and Data Analysis
Predictive modeling basics in data cleaning process
Factor Analysis Word Document Presentation
Introduction to Inferential Statistics.pptx
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
DS-40-Pre-Engagement and Kickoff deck - v8.0.pptx
Transcultural that can help you someday.
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
Lesson-01intheselfoflifeofthekennyrogersoftheunderstandoftheunderstanded
Optimise Shopper Experiences with a Strong Data Estate.pdf
Image processing and pattern recognition 2.ppt
retention in jsjsksksksnbsndjddjdnFPD.pptx
chrmotography.pptx food anaylysis techni
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...

GPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production Scale

  • 1. GPU Support in Spark and GPU/ CPU Mixed Resource Scheduling at Production Scale Yonggang Hu, IBM, DE Junfeng Liu, IBM, Architect
  • 2. About us •  Yonggang Hu Distinguished Engineer, IBM Chief Architect at Platform Computing, IBM. Vice President and Application Architect at JPMorgan Chase Working on distributed computing, grid, cloud and big data for the past 20 years. •  Junfeng Liu IBM Platform Computing Architect, focusing on Big data platform design and implementation. Successfully delivering solutions to several key customers.
  • 3. Agenda •  GPU and Spark integration motivation •  The challenges in production deployments •  Solutions in IBM Conductor with Spark •  Demo
  • 4. Spark & GPU Spark apps are CPU intensive Need to handle more data and bigger models Machine Learning Predicative analytics, Logistic regression, ALS Kmeans, etc. Graph Analytics Security, Fraud Detection Social Network Analytics GraphX Video/Speech Analytics Object Recognition Dialog Financial Risk Analytics Market simulation Credit risk. home-grown, apps from Murex, Misys Spark-enable existing GPU appsGPU-enable Spark apps
  • 5. Various ways to enable Spark & GPU •  Use GPUs for accelerating Spark Libraries and operations without changing interfaces and underlying programming model. •  Automatically generate CUDA code from the source Spark Java code •  Integrate Spark with GPU-enabled application & system (e.g., Spark integrated with Caffe, TensorFlow and customer applications)
  • 6. Production Challenges •  However –  Identification of GPU execution vs. CPU execution in DAG –  Data preparation for GPU execution –  Low resource utilization for CPU or GPU or both •  Cannot assume all compute hosts are identical and have GPU resource available •  GPU is a lot more expensive !!! –  Overload and contention when running mixed GPU & CPU workload –  Long tail & GPU & CPU tasks failover –  Task ratio control on different resources Stage 2 Stage 1 reduceByKey collect Stage GPU GPU Group CPU Group
  • 7. A typical example – Personalized Medicine – Adverse Drug Reaction Workload -  30X faster at learning speed and 4.3 X speed up at end-2-end -  Need to fully utilize both GPU and CPU resources to get economic benefits
  • 8. Scheduling Granularity •  Scheduling at application level –  Mesos and Yarn tag the GPU machine with label –  Schedule the application on GPU hosts based resource requirement of application –  Corse grained scheduling leads to low utilization of CPU/GPU. •  Scheduling at DAG level –  Need fine grained sharing for GPU resources rather than reserving entire GPU machines –  Identify GPU operation –  Optimize the DAG tree by decupling GPU operations from CPU operations and by inserting new GPU stages –  Reduce GPU wait time, enable sharing GPU among different jobs and therefore improve the overall GPU utilization
  • 9. GPU tasks recognition •  GPU and CPU tasks mixed together •  Separate the workload is necessary for scheduling control GPUFunction() Python-C Wrapper to Invoke Native Function Function implemented by CUDA/OpenCL GPU library Python-C/C++ Wrapper
  • 10. GPU tasks recognition •  Mark the GPU workload by DAG operation –  Go through the DAG tree to identify the stages with GPU requirement –  Optimize the distribution by inserting GPU stage
  • 11. Policies •  RM needs capability to identify the GPU hosts and manage along with CPU resources •  Prioritization policy - share GPU resource among applications •  Allocation policy – control GPU and CPU allocation independently – multi-dimensional scheduling •  Fine grained policy to schedule tasks according to GPU optimized DAG plan
  • 12. Adaptive Scheduling •  CPU & GPU tasks are convertible in many applications •  Scheduling needs adaptive capability –  If GPU is available, use a portion of GPU –  Otherwise run rest of tasks on CPU dstInBlocks.join(merged).mapV alues { …. if (useGPU) { loadGPULib() callGPU () } else { //CPU version } }
  • 14. Efficiency Considerations •  Do we need to wait GPU resource if there is CPU available? •  Do we need rerun the CPU tasks on GPU if tasks on CPU are long-tail? •  Do we need to have failover cross resource type?
  • 15. Defer Scheduling •  Traditional defer Scheduling –  Wait for data locality –  Cache, Host, Rack •  Resource based defer scheduling –  Necessary if the GPU can greatly speed up task execution –  Wait time is acceptable
  • 16. Future works •  Global optimization –  Consider the cost of additional shuffle stage –  Consider data locality of CPU and GPU stage –  Add time dimension to MDS –  Optimize global DAG tree execution –  Use historical data to optimize future execution, e.g, future iteration
  • 17. Fine grain, dynamic allocation of resources maximizes efficiency of Spark instances sharing a common resource pool. Multi-tenant, multi-framework support. Eliminates cluster sprawl. 2 Run Spark natively on a shared infrastructure without the dependency of Hadoop. Reduce application wait time, improving time to results. 1 Building Spark Centric Shared Service with IBM Conductor End-to-End Enterprise Class Solution Improve Time to Results Proven architecture at extreme scale, with enterprise class workload management, multi-version support for Spark, monitoring, reporting, and security capabilities. 3 Reduce Administration Costs Increase Resource Utilization •  IBM STC Spark Distribution •  IBM Platform Resource Orchestrator / Session Scheduler, application service manager. •  IBM Spectrum Scale FPO 4
  • 18. IBM Conductor with Spark IBM Bluemix Spark Cloud Service in production – thousands of users and tenants. Third party audited benchmark indicated significant performance/ throughput/SLA advantages https://stacresearch.com/news/2016/03/29/IBM160229
  • 19. IBM Systems 19© 2016 IBM Corpora*on IBM Conductor with Spark Monitor and Reporting with Elastic (ELK) !  Integrated Elastic Search, Logstash, Kibana for customizable monitoring !  Built-in monitoring Metrics ! Cross Spark Instance Groups ! Cross Spark Applications within Spark Instance Group ! Within Spark Application !  Built-in monitoring inside Zeppelin Notebook
  • 20. Demo
  • 21. THANK YOU. Contact information or call to action goes here.
  • 22. Acceleration Opportunities for GPUs & Spark Analytics Model Computational Patterns suitable for GPU Acceleration Regression Analysis Cholesky Factorization, Matrix Inversion, Transpose Clustering Cost-based iterative convergence Nearest-neighbor Search Distance calculations, Singular Value Decomposition, Hashing Neural Networks Matrix Multiplications, Convolutions, FFTs, Pair-wise dot-products Support Vector Machines Linear Solvers, Dot-product Association Rule Mining Set Operations: Intersection, union Recommender Systems Matrix Factorizations, Dot-product Time-series Processing FFT, Distance and Smoothing functions Text Analytics Matrix multiplication, factorization, Set operations, String computations, Distance functions Monte Carlo Methods Random number generators, Probability distribution generators Mathematical Programming Linear solvers, Dynamic Programming OLAP/BI Aggregation, Sorting, Hash-based grouping, User-defined functions Graph Analytics Matrix multiplications, Path traversals