SlideShare a Scribd company logo
SparkNet: Training Deep
Networks in Spark
Authors: Philipp Moritz, Robert Nishihara, Ion Stoica, Michael I. Jordan
(EECS, University of California, Berkeley)
Paper Presentation By: Sneh Pahilwani (T-13)
(CISE, University of Florida, Gainesville)
Motivation
● Much research in making deep learning models more accurate.
● With deeper models, comes better performance responsibility.
● Existing frameworks cannot handle asynchronous and communication-intensive
workloads.
● Claim: SparkNet implements a scalable, distributed algorithm for training deep
networks that can be applied to existing batch-processing frameworks like
MapReduce and Spark, with minimal hardware requirements.
SparkNet Features
● Provides a convenient interface to be able to access Spark RDDs.
● Provides Scala interface to call Caffe.
● Has a lightweight multi-dimensional tensor library.
● SparkNet can scale well to the cluster size and can tolerate a high communication
delay.
● Easy to deploy and no parameter adjustment.
● Compatible with existing Caffe models.
Parameter Server Model
● One or more master nodes hold the latest model parameters in memory and serve
them to worker nodes upon request.
● The nodes then compute gradients with respect to these parameters on a
minibatch drawn from the local dataset copy.
● These gradients are sent back to the server, which updates the model parameters.
SparkNet
architecture
Implementations
● SparkNet is built on top of Apache Spark and Caffe deep learning library.
● Uses Java to access the Caffe data.
● Uses Scala to access the parameters of the Caffe.
● Uses ScalaBuff to make the Caffe network to maintain a dynamic structure at run
time.
● SparkNet can be compatible with some of the Caffe model definition files, and
supports the Caffe model parameters of the load.
Implementations
Map from layer names to
weights
Lightweight multi-dimensional
tensor library implementation.
Implementations
Network Architecture
Implementations
Returns a WeightCollection
type
Stochastic Gradient Descent Comparison
● Conventional:
○ A conventional approach to parallelize gradient computation requires a lot of
broadcasting and communication overhead between the workers and
parameter server after every Stochastic Gradient Descent(SGD) iteration.
● SparkNet:
○ In this model, a fixed parameter is set for every worker in the Spark cluster
(number of iterations or time limit), after which the params are sent to the
master and averaged.
SGD Parallelism
● Spark consists of a single master node and a number of worker nodes.
● The data is split among the Spark workers.
● In every iteration, the Spark master broadcasts the model parameters to each
worker.
● Each worker then runs SGD on the model with its subset of data for a fixed
number of iterations (say, 50), after which the resulting model parameters on
each worker are sent to the master and averaged to form the new model
parameters.
Naive Parallelization
Number of serial iterations
Practical Limitations of Naive Parallelization
● Naive parallelization would distribute minibatch ‘b’ over ‘K’ machines, computing
gradients separately and aggregating results on one node.
● Cost of computation on single node in single iteration: C(b/K) and satisfies
C(b/K) >= C(b)/K. Hence, total running time to achieve test accuracy ‘a’:
Na(b)C(b)/K (in theory)
● Limitation #1: For approximation, C(b)/K ~ C(b/K) to hold, K<<b , limits number
of machines in cluster for effective parallelization.
● Limitation #2: To overcome above limitation, minibatch size could be increased
but does not decrease Na(b) enough to justify the increment.
SparkNet ParallelizationNumber of iterations on every
machine
Number of rounds
SparkNet parallelization
● Proceeds in rounds, where for each round, each machine runs SGD for ‘T’
iterations with batch size ‘b’.
● Between rounds, params on workers are gathered on master, averaged
and broadcasted to workers.
● Basically, synchronization happens every ‘T’ iterations.
● Total number of iterations: TMa(b,k,T).
● Total time taken: (TC(b) + S)Ma(b,k,T).
Results
● A speedup matrix is taken into
consideration with ‘T’ and ‘K’ values.
● For each parameter pair, SparkNet is
run on modified AlexNet
implementation on a subset of
ImageNet(100 classes, 1000 data
points over 20000 parallel
implementations).
● Ratio calculated is TMa(b,k,T)/Na(b)
with S=0 relative to training on
single machine.
Serial SGD
Single Worker, no
speedup
Zero communication
overhead
Results
Speedup measured as a function of communication overhead ‘S’. (5-node cluster)
Non-Zero
communication
overhead
Training Benchmarks
● AlexNet on ImageNet training.
● T = 50.
● Single GPU nodes.
● Accuracy measured: 45%
● Time measured:
○ Caffe: 55.6 hours (baseline)
○ SparkNet 3: 22.9 hours (2.4)
○ SparkNet 5: 14.5 hours (3.8)
○ SparkNet 10: 12.8 hours (4.4)
Training Benchmarks
● GoogLeNet on ImageNet
training.
● T = 50.
● Multi-GPU nodes.
● Accuracy measured: 40%
● Speedup measured (compared to
Caffe 4-GPU):
○ SparkNet 3-node 4-GPU: 2.7
○ SparkNet 6-node 4-GPU: 3.2
● On top of Caffe 4-GPU speedup
of 3.5.
THANK YOU!

More Related Content

What's hot (20)

PPTX
Scalable Parallel Computing on Clouds
Thilina Gunarathne
 
PDF
Hadoop secondary sort and a custom comparator
Subhas Kumar Ghosh
 
PPTX
Design & implementation of machine learning algorithm in (2)
saurabh Kumar Chaudhary
 
PPTX
Hadoop deconstructing map reduce job step by step
Subhas Kumar Ghosh
 
PDF
Optimization of graph storage using GoFFish
Anushree Prasanna Kumar
 
PPT
Chap5 slides
BaliThorat1
 
PPT
Chap2 slides
BaliThorat1
 
PPTX
Designing a machine learning algorithm for Apache Spark
Marco Gaido
 
PDF
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
ijceronline
 
PPTX
NS3 Simulation Research Guidance
Network Simulation Tools
 
PDF
A Graph-Based Method For Cross-Entity Threat Detection
Jen Aman
 
PDF
Large Graph Processing
Zuhair khayyat
 
PDF
Deep Stream Dynamic Graph Analytics with Grapharis - Massimo Perini
Flink Forward
 
PPTX
MapReduce : Simplified Data Processing on Large Clusters
Abolfazl Asudeh
 
PPTX
A Study on Task Scheduling in Could Data Centers for Energy Efficacy
Ehsan Sharifi
 
PPTX
Parallel algorithms
Danish Javed
 
PPT
Chap6 slides
BaliThorat1
 
PPT
Load Balancing In Cloud Computing newppt
Utshab Saha
 
DOCX
DESIGN OF SIMULATION DIFFERENT 8-BIT MULTIPLIERS USING VERILOG CODE BY SAIKIR...
Saikiran perfect
 
PDF
Advanced Data Science with Apache Spark-(Reza Zadeh, Stanford)
Spark Summit
 
Scalable Parallel Computing on Clouds
Thilina Gunarathne
 
Hadoop secondary sort and a custom comparator
Subhas Kumar Ghosh
 
Design & implementation of machine learning algorithm in (2)
saurabh Kumar Chaudhary
 
Hadoop deconstructing map reduce job step by step
Subhas Kumar Ghosh
 
Optimization of graph storage using GoFFish
Anushree Prasanna Kumar
 
Chap5 slides
BaliThorat1
 
Chap2 slides
BaliThorat1
 
Designing a machine learning algorithm for Apache Spark
Marco Gaido
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
ijceronline
 
NS3 Simulation Research Guidance
Network Simulation Tools
 
A Graph-Based Method For Cross-Entity Threat Detection
Jen Aman
 
Large Graph Processing
Zuhair khayyat
 
Deep Stream Dynamic Graph Analytics with Grapharis - Massimo Perini
Flink Forward
 
MapReduce : Simplified Data Processing on Large Clusters
Abolfazl Asudeh
 
A Study on Task Scheduling in Could Data Centers for Energy Efficacy
Ehsan Sharifi
 
Parallel algorithms
Danish Javed
 
Chap6 slides
BaliThorat1
 
Load Balancing In Cloud Computing newppt
Utshab Saha
 
DESIGN OF SIMULATION DIFFERENT 8-BIT MULTIPLIERS USING VERILOG CODE BY SAIKIR...
Saikiran perfect
 
Advanced Data Science with Apache Spark-(Reza Zadeh, Stanford)
Spark Summit
 

Similar to SparkNet presentation (20)

PDF
Distributed Deep Learning on Spark
Mathieu Dumoulin
 
PDF
Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark
Spark Summit
 
PPTX
Deep Learning with Spark and GPUs
DataWorks Summit
 
PPTX
UNET: Massive Scale DNN on Spark
Zhan Zhang
 
PDF
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
Universitat Politècnica de Catalunya
 
PDF
Deep Learning with Apache Spark and GPUs with Pierce Spitler
Databricks
 
PDF
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Sitakanta Mishra
 
PPTX
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
Tyrone Systems
 
PDF
Alex Smola at AI Frontiers: Scalable Deep Learning Using MXNet
AI Frontiers
 
PDF
Austin,TX Meetup presentation tensorflow final oct 26 2017
Clarisse Hedglin
 
PPTX
Inferno Scalable Deep Learning on Spark
DataWorks Summit/Hadoop Summit
 
PPTX
Paddle_Spark_Summit
Min Hsieh (Kyle) Tsai
 
PPTX
Deep_Learning_Frameworks_CNTK_PyTorch
Subhashis Hazarika
 
PDF
Distributed deep learning optimizations
geetachauhan
 
PDF
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
MLconf
 
PDF
Spark and Deep Learning frameworks with distributed workloads
S N
 
PDF
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...
Universitat Politècnica de Catalunya
 
PPTX
improve deep learning training and inference performance
s.rohit
 
PDF
Atlanta Hadoop Users Meetup 09 21 2016
Chris Fregly
 
PDF
Tallinn Estonia Advanced Java Meetup Spark + TensorFlow = TensorFrames Oct 24...
Chris Fregly
 
Distributed Deep Learning on Spark
Mathieu Dumoulin
 
Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark
Spark Summit
 
Deep Learning with Spark and GPUs
DataWorks Summit
 
UNET: Massive Scale DNN on Spark
Zhan Zhang
 
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
Universitat Politècnica de Catalunya
 
Deep Learning with Apache Spark and GPUs with Pierce Spitler
Databricks
 
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Sitakanta Mishra
 
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
Tyrone Systems
 
Alex Smola at AI Frontiers: Scalable Deep Learning Using MXNet
AI Frontiers
 
Austin,TX Meetup presentation tensorflow final oct 26 2017
Clarisse Hedglin
 
Inferno Scalable Deep Learning on Spark
DataWorks Summit/Hadoop Summit
 
Paddle_Spark_Summit
Min Hsieh (Kyle) Tsai
 
Deep_Learning_Frameworks_CNTK_PyTorch
Subhashis Hazarika
 
Distributed deep learning optimizations
geetachauhan
 
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
MLconf
 
Spark and Deep Learning frameworks with distributed workloads
S N
 
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...
Universitat Politècnica de Catalunya
 
improve deep learning training and inference performance
s.rohit
 
Atlanta Hadoop Users Meetup 09 21 2016
Chris Fregly
 
Tallinn Estonia Advanced Java Meetup Spark + TensorFlow = TensorFrames Oct 24...
Chris Fregly
 
Ad

Recently uploaded (20)

PPT
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
Human-centred design in online workplace learning and relationship to engagem...
Tracy Tang
 
PPTX
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PDF
Smart Air Quality Monitoring with Serrax AQM190 LITE
SERRAX TECHNOLOGIES LLP
 
PDF
TrustArc Webinar - Data Privacy Trends 2025: Mid-Year Insights & Program Stra...
TrustArc
 
PDF
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
PDF
Complete Network Protection with Real-Time Security
L4RGINDIA
 
PDF
Why Orbit Edge Tech is a Top Next JS Development Company in 2025
mahendraalaska08
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
July Patch Tuesday
Ivanti
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
Human-centred design in online workplace learning and relationship to engagem...
Tracy Tang
 
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
Smart Air Quality Monitoring with Serrax AQM190 LITE
SERRAX TECHNOLOGIES LLP
 
TrustArc Webinar - Data Privacy Trends 2025: Mid-Year Insights & Program Stra...
TrustArc
 
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
Complete Network Protection with Real-Time Security
L4RGINDIA
 
Why Orbit Edge Tech is a Top Next JS Development Company in 2025
mahendraalaska08
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
July Patch Tuesday
Ivanti
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
Ad

SparkNet presentation

  • 1. SparkNet: Training Deep Networks in Spark Authors: Philipp Moritz, Robert Nishihara, Ion Stoica, Michael I. Jordan (EECS, University of California, Berkeley) Paper Presentation By: Sneh Pahilwani (T-13) (CISE, University of Florida, Gainesville)
  • 2. Motivation ● Much research in making deep learning models more accurate. ● With deeper models, comes better performance responsibility. ● Existing frameworks cannot handle asynchronous and communication-intensive workloads. ● Claim: SparkNet implements a scalable, distributed algorithm for training deep networks that can be applied to existing batch-processing frameworks like MapReduce and Spark, with minimal hardware requirements.
  • 3. SparkNet Features ● Provides a convenient interface to be able to access Spark RDDs. ● Provides Scala interface to call Caffe. ● Has a lightweight multi-dimensional tensor library. ● SparkNet can scale well to the cluster size and can tolerate a high communication delay. ● Easy to deploy and no parameter adjustment. ● Compatible with existing Caffe models.
  • 4. Parameter Server Model ● One or more master nodes hold the latest model parameters in memory and serve them to worker nodes upon request. ● The nodes then compute gradients with respect to these parameters on a minibatch drawn from the local dataset copy. ● These gradients are sent back to the server, which updates the model parameters. SparkNet architecture
  • 5. Implementations ● SparkNet is built on top of Apache Spark and Caffe deep learning library. ● Uses Java to access the Caffe data. ● Uses Scala to access the parameters of the Caffe. ● Uses ScalaBuff to make the Caffe network to maintain a dynamic structure at run time. ● SparkNet can be compatible with some of the Caffe model definition files, and supports the Caffe model parameters of the load.
  • 6. Implementations Map from layer names to weights Lightweight multi-dimensional tensor library implementation.
  • 9. Stochastic Gradient Descent Comparison ● Conventional: ○ A conventional approach to parallelize gradient computation requires a lot of broadcasting and communication overhead between the workers and parameter server after every Stochastic Gradient Descent(SGD) iteration. ● SparkNet: ○ In this model, a fixed parameter is set for every worker in the Spark cluster (number of iterations or time limit), after which the params are sent to the master and averaged.
  • 10. SGD Parallelism ● Spark consists of a single master node and a number of worker nodes. ● The data is split among the Spark workers. ● In every iteration, the Spark master broadcasts the model parameters to each worker. ● Each worker then runs SGD on the model with its subset of data for a fixed number of iterations (say, 50), after which the resulting model parameters on each worker are sent to the master and averaged to form the new model parameters.
  • 11. Naive Parallelization Number of serial iterations
  • 12. Practical Limitations of Naive Parallelization ● Naive parallelization would distribute minibatch ‘b’ over ‘K’ machines, computing gradients separately and aggregating results on one node. ● Cost of computation on single node in single iteration: C(b/K) and satisfies C(b/K) >= C(b)/K. Hence, total running time to achieve test accuracy ‘a’: Na(b)C(b)/K (in theory) ● Limitation #1: For approximation, C(b)/K ~ C(b/K) to hold, K<<b , limits number of machines in cluster for effective parallelization. ● Limitation #2: To overcome above limitation, minibatch size could be increased but does not decrease Na(b) enough to justify the increment.
  • 13. SparkNet ParallelizationNumber of iterations on every machine Number of rounds
  • 14. SparkNet parallelization ● Proceeds in rounds, where for each round, each machine runs SGD for ‘T’ iterations with batch size ‘b’. ● Between rounds, params on workers are gathered on master, averaged and broadcasted to workers. ● Basically, synchronization happens every ‘T’ iterations. ● Total number of iterations: TMa(b,k,T). ● Total time taken: (TC(b) + S)Ma(b,k,T).
  • 15. Results ● A speedup matrix is taken into consideration with ‘T’ and ‘K’ values. ● For each parameter pair, SparkNet is run on modified AlexNet implementation on a subset of ImageNet(100 classes, 1000 data points over 20000 parallel implementations). ● Ratio calculated is TMa(b,k,T)/Na(b) with S=0 relative to training on single machine. Serial SGD Single Worker, no speedup Zero communication overhead
  • 16. Results Speedup measured as a function of communication overhead ‘S’. (5-node cluster) Non-Zero communication overhead
  • 17. Training Benchmarks ● AlexNet on ImageNet training. ● T = 50. ● Single GPU nodes. ● Accuracy measured: 45% ● Time measured: ○ Caffe: 55.6 hours (baseline) ○ SparkNet 3: 22.9 hours (2.4) ○ SparkNet 5: 14.5 hours (3.8) ○ SparkNet 10: 12.8 hours (4.4)
  • 18. Training Benchmarks ● GoogLeNet on ImageNet training. ● T = 50. ● Multi-GPU nodes. ● Accuracy measured: 40% ● Speedup measured (compared to Caffe 4-GPU): ○ SparkNet 3-node 4-GPU: 2.7 ○ SparkNet 6-node 4-GPU: 3.2 ● On top of Caffe 4-GPU speedup of 3.5.