SlideShare a Scribd company logo
Bringing HPC Algorithms to
Big Data Platforms
Nikolay Malitsky
Brookhaven National Laboratory
Outline
 Spark as an integrated platform for experimental facilities
 Ptychographic application
 Spark-MPI approach
 Summary
National Synchrotron Light Source II
 highly optimized 3rd generation synchrotron facility
 started operations in 2014 at Brookhaven National
Laboratory, New York State
 suite of six experimental programs:
John Hill et al. NSLS-II Strategic Plan, 2015
 Hard X-Ray Spectroscopy
 Imaging & Microscopy
 Structural Biology
 Soft X-Ray Scattering & Spectroscopy
 Complex Scattering
 Diffraction & In Situ Scattering
DOE Science Drivers
Many years ago …
Basic Energy
Sciences
Fusion Energy
Sciences
High Energy
Physics
Nuclear
Physics
Biological and Environmental
Research
NSLS-II is here
Data-Intensive Science
Now
Closing a gap between Big Data and HPC computing
Ecosystems*: Big Data HPC Computing
*Geoffrey Fox et al. HPC-ABDC High Performance Computing Enhanced Apache Big Data Stack, CCGrid, 2015
Leaders: Spark MPI
New Frontiers
Three directions
 Spark + MPI-oriented extension
 MPI + Spark-oriented extension
 New model
topic of this talk
Spark an integrated platform for experimental facilities
Beamline Applications
Data-Information-Knowledge Discovery Path
MPI-oriented extension
Receiver (s)
X-Ray
Detector
Composite
Connector (s)
Heterogeneous Data Layer
Resilient Distributed Dataset API
Big Volume + Variety
Big Velocity
• Ptychography
• Tomography
• GISAXS
• Deep Learning
• …
Ptychographic Application
Ptychography
Ptychography is one of the essential image reconstruction techniques used in light source facilities.
This method consists of measuring multiple diffraction patterns by scanning a finite illumination
(also called the probe) on an extended specimen (the object).
Stefano Marchesini et al. SHARP: a distributed, GPU-based ptychographic solver, J. Appl. Cryst., 49, 2016
SHARP-NSLS2 application
Probe
ObjectStefano Marchesini et al. SHARP GPU-based Framework
Center for Advanced Mathematics for Energy Research, LBNL
Xiaojing Huang, HXN Approach
NSLS-II, BNL
512 384x384
detector frames
Next: near-real-time ptychographic pipeline
Tomographic experiment based on 100 ptychographic projections
SHARP Multi-GPU engine based on
the MPI Allreduce communication model
10 K – 40 K
detector frames
ptychographic
projection
Relaxed Averaged Alternating
Reflections (RAAR) iteration loop
Spark-MPI Approach
Deep Learning Parallel Approaches*
Model Parallelism Data Parallelism
*Jeffrey Dean et al. Large Scale Distributed Deep Networks, NIPS, 2012
Stochastic Gradient Descent (SGD)
iteration loop
(Some) Spark-Based Distributed Deep Learning Models*
SparkNet. AMPLab, UC Berkeley
TensorSpark, Arimo
CaffeOnSpark, Yahoo
Inter-Worker Interface (C++):
 Ethernet/TCP
 InfiniBand/RDMA
 GPU or CPU
*Yu Cao and Zhe Dong. Which is Deeper – Comparison of Deep Learning Frameworks, Spark Summit, June 6-8, 2016
SHARP-SPARK Benchmark Application*
*Nikolay Malitsky, Bringing the HPC Reconstruction Algorithms to Big Data Platforms, NYSDS, August 14-17, 2016
SHARP Multi-GPU Framework
CAMERA, LBNL
CaffeOnSpark RDMA Inter-Worker Interface
Big ML Team, Yahoo
RDMA Address Exchange Server
Approach Time, s
MPI Allreduce based on
MVAPICH2
0.013
SHARP-SPARK based
on the CaffeOnSpark
library
0.016
Benchmark results on a cluster with 8 GPU
nodes equally divided among the MPI and
Spark applications
Sum of 2M arrays of floats across the Spark workers
SHARP Inter-Worker API
Message Passing Interface (MPI) Framework
 MPICH, 1992 - present: Argonne National Laboratory
 MVAPICH, 2001 - present: Ohio State University
 OpenMPI, 2003 – present: multiple members
Major open-source implementations:
MVAPICH2 architecture*:
*MVAPICH Team. MVAPICH2 2.2 User Guide, 2016
From SHARP-SPARK to the MPI Framework
SHARP-SPARK MPI Framework
Application
Programming
Interface
Communicator
interface
MPI-3 Standard: point-to-point, collective, etc.
Inter-Process
Initialization
Mechanism
RDMA address
exchange server
Process Manager Interface (PMI-1 and PMI-2) with
the support of several internal and external
process managers
Inter-Process
Communication
CaffeOnSpark
RDMA library
Abstract Device Interface (ADI-3) with multiple
communication adapters
Spark-MPI Conceptual Demo MPICH and MVAPICH
Common Process Managers
Driver
Worker
Worker
Worker
Worker
PMI
Server
pm
gforker
hydra
reshell
…
util
…
src
https://github.com/SciDriver/spark-mpi/tree/master/examples/spark/
Spark driver-worker
PMI server-worker
MPI inter-worker
Interfaces
PMI Server variables
MPI interface
Summary: Path towards the Spark-MPI Applications
 CaffeOnSpark: Spark + RDMA inter-worker interface + complex initialization
procedure based on the Spark RDD mechanism
 SHARP-SPARK: Spark + CaffeOnSpark inter-worker interface + RDMA address
exchange server
 Spark-MPI: Spark + MPI inter-worker interface + PMI Server
Kitware and BNL. An in situ, streaming, data- and compute-intensive platform for experimental
data. DOE ASCR SBIR Phase I grant. Feb 21, 2017
Acknowledgement
SHARP Team, CAMERA, LBNL: H. Krishnan, S. Marchesini, T. Perciano, J. Sethian, D. Shapiro
CaffeOnSpark Team, Yahoo: A. Feng, J. Shi, M. Jain
NSLS-II, BNL: M. Cowan, L. Flaks, A. Heroux, X. Huang, L. Li, R. Petkus, T. Smith
Computational Science Initiative, BNL: N. D’ Imperio, K. Kleese van Dam, R. D. Zhihua,
Funding: National Synchrotron Light Source II, a U.S. Department of Energy (DOE) Office of Science
User Facility operated for the DOE Office of Science by Brookhaven National Laboratory under
Contract No. DE-SC0012704
Information Technology Division, BNL: R. Perez
Scientific Computing, Kitware: A. Chaudhary, P. O’Leary
Thank You.
malitsky@bnl.gov

More Related Content

What's hot (20)

PDF
Scalable Scientific Computing with Dask
Uwe Korn
 
PDF
Building Deep Learning Powered Big Data: Spark Summit East talk by Jiao Wang ...
Spark Summit
 
PDF
Spark Summit EU talk by Elena Lazovik
Spark Summit
 
PDF
Spark Summit 2016: Connecting Python to the Spark Ecosystem
Daniel Rodriguez
 
PDF
Spark Summit EU talk by Yaroslav Nedashkovsky and Andy Starzhinsky
Spark Summit
 
PDF
Parallelizing Large Simulations with Apache SparkR with Daniel Jeavons and Wa...
Spark Summit
 
PDF
Project Hydrogen: Unifying State-of-the-Art AI and Big Data in Apache Spark w...
Databricks
 
PDF
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Spark Summit
 
PDF
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Databricks
 
PDF
Analytics at Scale with Apache Spark on AWS with Jonathan Fritz
Databricks
 
PDF
Using Pluggable Apache Spark SQL Filters to Help GridPocket Users Keep Up wit...
Spark Summit
 
PDF
Elastify Cloud-Native Spark Application with Persistent Memory
Databricks
 
PDF
Spark Summit EU talk by Christos Erotocritou
Spark Summit
 
PDF
Trends for Big Data and Apache Spark in 2017 by Matei Zaharia
Spark Summit
 
PDF
Spark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit
 
PDF
Re-Architecting Spark For Performance Understandability
Jen Aman
 
PDF
Building a Business Logic Translation Engine with Spark Streaming for Communi...
Spark Summit
 
PDF
How to Share State Across Multiple Apache Spark Jobs using Apache Ignite with...
Spark Summit
 
PDF
Scaling Data Analytics Workloads on Databricks
Databricks
 
PDF
Pedal to the Metal: Accelerating Spark with Silicon Innovation
Jen Aman
 
Scalable Scientific Computing with Dask
Uwe Korn
 
Building Deep Learning Powered Big Data: Spark Summit East talk by Jiao Wang ...
Spark Summit
 
Spark Summit EU talk by Elena Lazovik
Spark Summit
 
Spark Summit 2016: Connecting Python to the Spark Ecosystem
Daniel Rodriguez
 
Spark Summit EU talk by Yaroslav Nedashkovsky and Andy Starzhinsky
Spark Summit
 
Parallelizing Large Simulations with Apache SparkR with Daniel Jeavons and Wa...
Spark Summit
 
Project Hydrogen: Unifying State-of-the-Art AI and Big Data in Apache Spark w...
Databricks
 
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Spark Summit
 
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Databricks
 
Analytics at Scale with Apache Spark on AWS with Jonathan Fritz
Databricks
 
Using Pluggable Apache Spark SQL Filters to Help GridPocket Users Keep Up wit...
Spark Summit
 
Elastify Cloud-Native Spark Application with Persistent Memory
Databricks
 
Spark Summit EU talk by Christos Erotocritou
Spark Summit
 
Trends for Big Data and Apache Spark in 2017 by Matei Zaharia
Spark Summit
 
Spark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit
 
Re-Architecting Spark For Performance Understandability
Jen Aman
 
Building a Business Logic Translation Engine with Spark Streaming for Communi...
Spark Summit
 
How to Share State Across Multiple Apache Spark Jobs using Apache Ignite with...
Spark Summit
 
Scaling Data Analytics Workloads on Databricks
Databricks
 
Pedal to the Metal: Accelerating Spark with Silicon Innovation
Jen Aman
 

Viewers also liked (20)

PDF
Problem Solving Recipes Learned from Supporting Spark: Spark Summit East talk...
Spark Summit
 
PDF
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
Spark Summit
 
PDF
BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...
Spark Summit
 
PDF
Analysis Andromeda Galaxy Data Using Spark: Spark Summit East Talk by Jose Na...
Spark Summit
 
PDF
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
Spark Summit
 
PPTX
Tuning and Monitoring Deep Learning on Apache Spark
Databricks
 
PDF
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
Spark Summit
 
PDF
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...
Spark Summit
 
PDF
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
Spark Summit
 
PDF
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
Spark Summit
 
PPTX
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
Spark Summit
 
PDF
Building a Real-Time Fraud Prevention Engine Using Open Source (Big Data) Sof...
Spark Summit
 
PDF
Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...
Spark Summit
 
PDF
Spark + Flashblade: Spark Summit East talk by Brian Gold
Spark Summit
 
PDF
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...
Spark Summit
 
PDF
Opaque: A Data Analytics Platform with Strong Security: Spark Summit East tal...
Spark Summit
 
PDF
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Spark Summit
 
PDF
Introduction to GPUs in HPC
inside-BigData.com
 
PDF
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
Spark Summit
 
PDF
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spar...
Spark Summit
 
Problem Solving Recipes Learned from Supporting Spark: Spark Summit East talk...
Spark Summit
 
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
Spark Summit
 
BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...
Spark Summit
 
Analysis Andromeda Galaxy Data Using Spark: Spark Summit East Talk by Jose Na...
Spark Summit
 
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
Spark Summit
 
Tuning and Monitoring Deep Learning on Apache Spark
Databricks
 
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
Spark Summit
 
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...
Spark Summit
 
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
Spark Summit
 
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
Spark Summit
 
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
Spark Summit
 
Building a Real-Time Fraud Prevention Engine Using Open Source (Big Data) Sof...
Spark Summit
 
Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...
Spark Summit
 
Spark + Flashblade: Spark Summit East talk by Brian Gold
Spark Summit
 
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...
Spark Summit
 
Opaque: A Data Analytics Platform with Strong Security: Spark Summit East tal...
Spark Summit
 
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Spark Summit
 
Introduction to GPUs in HPC
inside-BigData.com
 
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
Spark Summit
 
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spar...
Spark Summit
 
Ad

Similar to Bringing HPC Algorithms to Big Data Platforms: Spark Summit East talk by Nikolay Malitsky (20)

PPTX
Designing High-Performance and Scalable Middleware for HPC, AI and Data Science
Object Automation
 
PDF
Spark-MPI: Approaching the Fifth Paradigm with Nikolay Malitsky
Databricks
 
PPTX
Designing High performance & Scalable Middleware for HPC
Object Automation
 
PPTX
What’s New in the Berkeley Data Analytics Stack
Turi, Inc.
 
PDF
Designing HPC & Deep Learning Middleware for Exascale Systems
inside-BigData.com
 
PPTX
Designing HPC, Deep Learning, and Cloud Middleware for Exascale Systems
inside-BigData.com
 
PPTX
Matching Data Intensive Applications and Hardware/Software Architectures
Geoffrey Fox
 
PPTX
Matching Data Intensive Applications and Hardware/Software Architectures
Geoffrey Fox
 
PDF
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
Jonathan Dursi
 
PPTX
How to Design Scalable HPC, Deep Learning, and Cloud Middleware for Exascale ...
inside-BigData.com
 
PPTX
Big Data HPC Convergence
Geoffrey Fox
 
PPTX
In Memory Analytics with Apache Spark
Venkata Naga Ravi
 
PPTX
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
Geoffrey Fox
 
PDF
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
BigDataEverywhere
 
PDF
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
Linaro
 
PDF
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
Linaro
 
PDF
Data Analytics and Machine Learning: From Node to Cluster on ARM64
Ganesh Raju
 
PDF
Dev Ops Training
Spark Summit
 
PDF
High-Performance and Scalable Designs of Programming Models for Exascale Systems
inside-BigData.com
 
PDF
A Scalable Implementation of Deep Learning on Spark (Alexander Ulanov)
Alexander Ulanov
 
Designing High-Performance and Scalable Middleware for HPC, AI and Data Science
Object Automation
 
Spark-MPI: Approaching the Fifth Paradigm with Nikolay Malitsky
Databricks
 
Designing High performance & Scalable Middleware for HPC
Object Automation
 
What’s New in the Berkeley Data Analytics Stack
Turi, Inc.
 
Designing HPC & Deep Learning Middleware for Exascale Systems
inside-BigData.com
 
Designing HPC, Deep Learning, and Cloud Middleware for Exascale Systems
inside-BigData.com
 
Matching Data Intensive Applications and Hardware/Software Architectures
Geoffrey Fox
 
Matching Data Intensive Applications and Hardware/Software Architectures
Geoffrey Fox
 
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
Jonathan Dursi
 
How to Design Scalable HPC, Deep Learning, and Cloud Middleware for Exascale ...
inside-BigData.com
 
Big Data HPC Convergence
Geoffrey Fox
 
In Memory Analytics with Apache Spark
Venkata Naga Ravi
 
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
Geoffrey Fox
 
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
BigDataEverywhere
 
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
Linaro
 
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
Linaro
 
Data Analytics and Machine Learning: From Node to Cluster on ARM64
Ganesh Raju
 
Dev Ops Training
Spark Summit
 
High-Performance and Scalable Designs of Programming Models for Exascale Systems
inside-BigData.com
 
A Scalable Implementation of Deep Learning on Spark (Alexander Ulanov)
Alexander Ulanov
 
Ad

More from Spark Summit (20)

PDF
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
Spark Summit
 
PDF
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Spark Summit
 
PDF
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
 
PDF
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
 
PDF
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
Spark Summit
 
PDF
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Spark Summit
 
PDF
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
PDF
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
PDF
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
 
PDF
Next CERN Accelerator Logging Service with Jakub Wozniak
Spark Summit
 
PDF
Powering a Startup with Apache Spark with Kevin Kim
Spark Summit
 
PDF
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
 
PDF
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Spark Summit
 
PDF
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spark Summit
 
PDF
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spark Summit
 
PDF
Goal Based Data Production with Sim Simeonov
Spark Summit
 
PDF
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Spark Summit
 
PDF
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Spark Summit
 
PDF
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
PDF
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
 
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
Spark Summit
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Spark Summit
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
Spark Summit
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Spark Summit
 
Powering a Startup with Apache Spark with Kevin Kim
Spark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Spark Summit
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spark Summit
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spark Summit
 
Goal Based Data Production with Sim Simeonov
Spark Summit
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Spark Summit
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Spark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
 

Recently uploaded (20)

PPTX
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
PDF
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
PPTX
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
PDF
Merits and Demerits of DBMS over File System & 3-Tier Architecture in DBMS
MD RIZWAN MOLLA
 
PDF
Data Chunking Strategies for RAG in 2025.pdf
Tamanna
 
PPT
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
PDF
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
PPTX
AI Presentation Tool Pitch Deck Presentation.pptx
ShyamPanthavoor1
 
PPTX
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
PPTX
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
PDF
Building Production-Ready AI Agents with LangGraph.pdf
Tamanna
 
PDF
Copia de Strategic Roadmap Infographics by Slidesgo.pptx (1).pdf
ssuserd4c6911
 
PDF
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
PDF
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
PDF
Choosing the Right Database for Indexing.pdf
Tamanna
 
PDF
apidays Helsinki & North 2025 - APIs in the healthcare sector: hospitals inte...
apidays
 
PDF
How to Connect Your On-Premises Site to AWS Using Site-to-Site VPN.pdf
Tamanna
 
PPTX
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
PDF
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
PDF
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
Merits and Demerits of DBMS over File System & 3-Tier Architecture in DBMS
MD RIZWAN MOLLA
 
Data Chunking Strategies for RAG in 2025.pdf
Tamanna
 
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
AI Presentation Tool Pitch Deck Presentation.pptx
ShyamPanthavoor1
 
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
Building Production-Ready AI Agents with LangGraph.pdf
Tamanna
 
Copia de Strategic Roadmap Infographics by Slidesgo.pptx (1).pdf
ssuserd4c6911
 
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
Choosing the Right Database for Indexing.pdf
Tamanna
 
apidays Helsinki & North 2025 - APIs in the healthcare sector: hospitals inte...
apidays
 
How to Connect Your On-Premises Site to AWS Using Site-to-Site VPN.pdf
Tamanna
 
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 

Bringing HPC Algorithms to Big Data Platforms: Spark Summit East talk by Nikolay Malitsky

  • 1. Bringing HPC Algorithms to Big Data Platforms Nikolay Malitsky Brookhaven National Laboratory
  • 2. Outline  Spark as an integrated platform for experimental facilities  Ptychographic application  Spark-MPI approach  Summary
  • 3. National Synchrotron Light Source II  highly optimized 3rd generation synchrotron facility  started operations in 2014 at Brookhaven National Laboratory, New York State  suite of six experimental programs: John Hill et al. NSLS-II Strategic Plan, 2015  Hard X-Ray Spectroscopy  Imaging & Microscopy  Structural Biology  Soft X-Ray Scattering & Spectroscopy  Complex Scattering  Diffraction & In Situ Scattering
  • 4. DOE Science Drivers Many years ago … Basic Energy Sciences Fusion Energy Sciences High Energy Physics Nuclear Physics Biological and Environmental Research NSLS-II is here Data-Intensive Science Now
  • 5. Closing a gap between Big Data and HPC computing Ecosystems*: Big Data HPC Computing *Geoffrey Fox et al. HPC-ABDC High Performance Computing Enhanced Apache Big Data Stack, CCGrid, 2015 Leaders: Spark MPI New Frontiers
  • 6. Three directions  Spark + MPI-oriented extension  MPI + Spark-oriented extension  New model topic of this talk
  • 7. Spark an integrated platform for experimental facilities Beamline Applications Data-Information-Knowledge Discovery Path MPI-oriented extension Receiver (s) X-Ray Detector Composite Connector (s) Heterogeneous Data Layer Resilient Distributed Dataset API Big Volume + Variety Big Velocity • Ptychography • Tomography • GISAXS • Deep Learning • …
  • 9. Ptychography Ptychography is one of the essential image reconstruction techniques used in light source facilities. This method consists of measuring multiple diffraction patterns by scanning a finite illumination (also called the probe) on an extended specimen (the object). Stefano Marchesini et al. SHARP: a distributed, GPU-based ptychographic solver, J. Appl. Cryst., 49, 2016
  • 10. SHARP-NSLS2 application Probe ObjectStefano Marchesini et al. SHARP GPU-based Framework Center for Advanced Mathematics for Energy Research, LBNL Xiaojing Huang, HXN Approach NSLS-II, BNL 512 384x384 detector frames
  • 11. Next: near-real-time ptychographic pipeline Tomographic experiment based on 100 ptychographic projections SHARP Multi-GPU engine based on the MPI Allreduce communication model 10 K – 40 K detector frames ptychographic projection Relaxed Averaged Alternating Reflections (RAAR) iteration loop
  • 13. Deep Learning Parallel Approaches* Model Parallelism Data Parallelism *Jeffrey Dean et al. Large Scale Distributed Deep Networks, NIPS, 2012 Stochastic Gradient Descent (SGD) iteration loop
  • 14. (Some) Spark-Based Distributed Deep Learning Models* SparkNet. AMPLab, UC Berkeley TensorSpark, Arimo CaffeOnSpark, Yahoo Inter-Worker Interface (C++):  Ethernet/TCP  InfiniBand/RDMA  GPU or CPU *Yu Cao and Zhe Dong. Which is Deeper – Comparison of Deep Learning Frameworks, Spark Summit, June 6-8, 2016
  • 15. SHARP-SPARK Benchmark Application* *Nikolay Malitsky, Bringing the HPC Reconstruction Algorithms to Big Data Platforms, NYSDS, August 14-17, 2016 SHARP Multi-GPU Framework CAMERA, LBNL CaffeOnSpark RDMA Inter-Worker Interface Big ML Team, Yahoo RDMA Address Exchange Server Approach Time, s MPI Allreduce based on MVAPICH2 0.013 SHARP-SPARK based on the CaffeOnSpark library 0.016 Benchmark results on a cluster with 8 GPU nodes equally divided among the MPI and Spark applications Sum of 2M arrays of floats across the Spark workers SHARP Inter-Worker API
  • 16. Message Passing Interface (MPI) Framework  MPICH, 1992 - present: Argonne National Laboratory  MVAPICH, 2001 - present: Ohio State University  OpenMPI, 2003 – present: multiple members Major open-source implementations: MVAPICH2 architecture*: *MVAPICH Team. MVAPICH2 2.2 User Guide, 2016
  • 17. From SHARP-SPARK to the MPI Framework SHARP-SPARK MPI Framework Application Programming Interface Communicator interface MPI-3 Standard: point-to-point, collective, etc. Inter-Process Initialization Mechanism RDMA address exchange server Process Manager Interface (PMI-1 and PMI-2) with the support of several internal and external process managers Inter-Process Communication CaffeOnSpark RDMA library Abstract Device Interface (ADI-3) with multiple communication adapters
  • 18. Spark-MPI Conceptual Demo MPICH and MVAPICH Common Process Managers Driver Worker Worker Worker Worker PMI Server pm gforker hydra reshell … util … src https://github.com/SciDriver/spark-mpi/tree/master/examples/spark/ Spark driver-worker PMI server-worker MPI inter-worker Interfaces PMI Server variables MPI interface
  • 19. Summary: Path towards the Spark-MPI Applications  CaffeOnSpark: Spark + RDMA inter-worker interface + complex initialization procedure based on the Spark RDD mechanism  SHARP-SPARK: Spark + CaffeOnSpark inter-worker interface + RDMA address exchange server  Spark-MPI: Spark + MPI inter-worker interface + PMI Server Kitware and BNL. An in situ, streaming, data- and compute-intensive platform for experimental data. DOE ASCR SBIR Phase I grant. Feb 21, 2017
  • 20. Acknowledgement SHARP Team, CAMERA, LBNL: H. Krishnan, S. Marchesini, T. Perciano, J. Sethian, D. Shapiro CaffeOnSpark Team, Yahoo: A. Feng, J. Shi, M. Jain NSLS-II, BNL: M. Cowan, L. Flaks, A. Heroux, X. Huang, L. Li, R. Petkus, T. Smith Computational Science Initiative, BNL: N. D’ Imperio, K. Kleese van Dam, R. D. Zhihua, Funding: National Synchrotron Light Source II, a U.S. Department of Energy (DOE) Office of Science User Facility operated for the DOE Office of Science by Brookhaven National Laboratory under Contract No. DE-SC0012704 Information Technology Division, BNL: R. Perez Scientific Computing, Kitware: A. Chaudhary, P. O’Leary