SlideShare a Scribd company logo
Innovation and
Reinvention Driving
Transformation
OCTOBER 9, 2018
2018 HPCC Systems® Community
Day
Robert K.L.
Kennedy
Parallel Distributed Deep Learning on HPCC
Systems
Background
• Expand HPCC Systems complex machine learning capabilities
• Current HPCC Systems ML libraries have common ML algorithms
• Lacks more advanced ML algorithms, such as Deep Learning
• Expand HPCC Systems libraries to include Deep Learning
Parallel Distributed Deep Learning on HPCC Systems 2
Presentation Outline
• Project Goals
• Problem Statement
• Brief Neural Network Background
• Introduction to Parallel Deep Learning Methods and Techniques
• Overview of the Technologies Used in this Implementation
• Details of the Implementation
• Implementation Validation and Statistical Analysis
• Future Work
Parallel Distributed Deep Learning on HPCC Systems 3
• Extend HPCC Systems Libraries to include Deep Learning
• Specifically Distributed Deep Learning
• Combine HPCC Systems and TensorFlow
• Widely used open source DL library
• HPCC Systems Provides:
• Cluster environment
• Distribution of data and code
• Communication between nodes
• TensorFlow Provides:
• Deep Learning training algorithms for localized execution
Project Goals
Parallel Distributed Deep Learning on HPCC Systems 4
• Deep Learning models are large and Complex
• DL needs large amounts of training data
• Training Process
• Time requirements increase with data size and model complexity
• Computation requirements increase as well
• Large multi node computers are needed to effectively train large, cutting edge
Deep Learning models
Problem Statement
Parallel Distributed Deep Learning on HPCC Systems 5
• Neural Network Visualization
• 2 hidden layers, fully connected, 3 class
classification output
• Forwardpropagation and Backpropagation
• Optimize Model with respect to Loss
Function
• Gradient Descent, SGD, Batch SGD, Mini-batch
SGD
Neural Network Background
Parallel Distributed Deep Learning on HPCC Systems 6
• Data Parallelism
• Model Parallelism
• Synchronous and
Asynchronous
• Parallel SGD
Parallel Deep Learning
Parallel Distributed Deep Learning on HPCC Systems 7
Data Parallelism Model Parallelism
• ECL/HPCC Systems Handles the ‘Data Parallelism’ part of the parallelization
• Pyembed handles the localized neural network training
• Using Python, TensorFlow, Keras, and other libraries
• The implementation is a synchronous data parallel stochastic gradient descent
• However, it is not limited to using SGD at a localized level
• The implementation is not limited to TensorFlow
• Using Keras, other Deep Learning ‘backends’ can be used with no change in code
Implementation – Overview
Parallel Distributed Deep Learning on HPCC Systems 8
• TensorFlow
• Google’s Popular Deep Learning Library
• Keras
• Deep Learning Library API – uses
TensorFlow or other ‘backend’
• Much less code to produce same model
TensorFlow | Keras
Parallel Distributed Deep Learning on HPCC Systems 9
• ECL Partitions Training Data into N partitions
• Where N is the number of slave nodes
• Pyembed – plugin that allows ECL to execute python code
• ECL Distributes the Pyembed code along with data to each node
• Passes into Pyembed the data, NN model, and meta-data as parameters
Implementation - HPCC and ECL
Parallel Distributed Deep Learning on HPCC Systems 10
• Receives parameters at time of execution, passed in from ECL
• Then converts to types usable by the python libraries
• Builds localized NN model from the inputs
• Recall this is iterative so the input model changes on each Epoch
• Trains the inputted model on its partition of data
• Returns the updated model weights once completed
• Does not return any training data
Implementation – Pyembed
Parallel Distributed Deep Learning on HPCC Systems 11
Code Example – Parallel SGD
Parallel Distributed Deep Learning on HPCC Systems 12
• Using a ‘big-data’ dataset, 3,692,556 records long
• Each record is 850 bytes long
• 80/20 Split for Training and Testing Datasets
• We use 10 (from the 80 split) data set sizes, each with different class imbalance
ratios
• 1:1, 1:5, 1:10, 1:25, 1:50, 1:100, 1:500, 1:1000, 1:1500, 1:2000
• Ranging from 2,240 records to 2,241,120 records
• 1.9 MB to 1.9 GB in size
• Each dataset is run on 5 different cluster sizes
• 1 node, 2 nodes, 4 nodes, 8 nodes, and 16 nodes
• Cluster is cloud based and each node has 1 CPU and 3.5 gigs of memory
Case Study – Training Time – Design
Parallel Distributed Deep Learning on HPCC Systems 13
• Note the Y scale of the Left graph is logarithmic
Case Study – Training Time – Results
Parallel Distributed Deep Learning on HPCC Systems 14
●●●● ●
●
●
●
●
●●●● ● ●
●
●
●
●●●● ● ●
●
●
●
●●●● ● ●
●
●
●
●●●● ● ● ●
●
●
0
2000
4000
6000
0 500 1000 1500
Training Dataset Size (thousands)
TrainingTime(seconds)
# of Nodes
●
●
●
●
●
1
2
4
8
16
Training Time Comparison
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ● ●
●
●
●
●
●
● ● ● ● ●
●
●
●
●
64
512
4096
4 32 256 2048
Training Dataset Size (thousands)
TrainingTime(seconds)
# of Nodes
●
●
●
●
●
1
2
4
8
16
Training Time Comparison
• Uses same experimental design as
previous case study
• Model performance is slightly improved
by number of nodes
• See: slope of the red line
• Other dataset sizes’ model
performance effects out of scope
• Due to the severe class imbalance
Case Study – Model Performance
Parallel Distributed Deep Learning on HPCC Systems 15
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
1 2 4 8 16
Nodes
Performance(AUC)
Data Size
●
●
●
●
●
●
●
●
●
●
1
5
10
25
50
100
500
1000
1500
2000
Model Performance vs. # Nodes
• Successful Implementation of a synchronous data parallel deep learning
algorithm
• Case Studies show the runtime is valid across a wide spectrum of clusters sizes
and dataset sizes
• Leveraged HPCC Systems and TensorFlow to bring Deep Learning to HPCC
Systems
• Started new open source HPCC Systems Library for distributed DL
• Accompanying Documentation, Test cases, and Performance tests
• Provided possible research avenues for future work
Conclusion
Parallel Distributed Deep Learning on HPCC Systems 16
• Improved Data Parallelism
• For HPCC Systems with multiple slave Thor nodes on a single logical computer
• Model Parallelism Implementation
• Hybrid Approach – Model and Data parallelism
• Asynchronous Parallelism
• This paradigm has additional challenges on a cluster system
Future Work
Parallel Distributed Deep Learning on HPCC Systems 17

More Related Content

What's hot (20)

PPTX
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
MLconf
 
PDF
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Spark Summit
 
PPT
Composing High-Performance Memory Allocators with Heap Layers
Emery Berger
 
PDF
[Paper Reading]Orca: A Modular Query Optimizer Architecture for Big Data
PingCAP
 
PDF
IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)
Spark Summit
 
PDF
Flink Forward SF 2017: Dean Wampler - Streaming Deep Learning Scenarios with...
Flink Forward
 
PDF
[Paper reading] Interleaving with Coroutines: A Practical Approach for Robust...
PingCAP
 
PDF
Best Practices for Hyperparameter Tuning with MLflow
Databricks
 
PDF
Generalized Linear Models in Spark MLlib and SparkR
Databricks
 
PPTX
Serving BERT Models in Production with TorchServe
Nidhin Pattaniyil
 
PDF
Spark Summit EU talk by Mikhail Semeniuk Hollin Wilkins
Spark Summit
 
PDF
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
GeeksLab Odessa
 
PDF
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Intel® Software
 
PDF
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splines
Intel® Software
 
PDF
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...
Universitat Politècnica de Catalunya
 
PPTX
Data science on big data. Pragmatic approach
Pavel Mezentsev
 
PPTX
Suneel Marthi - Deep Learning with Apache Flink and DL4J
Flink Forward
 
PDF
Spark Summit EU talk by Luca Canali
Spark Summit
 
PDF
Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...
Spark Summit
 
PDF
Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...
Intel® Software
 
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
MLconf
 
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Spark Summit
 
Composing High-Performance Memory Allocators with Heap Layers
Emery Berger
 
[Paper Reading]Orca: A Modular Query Optimizer Architecture for Big Data
PingCAP
 
IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)
Spark Summit
 
Flink Forward SF 2017: Dean Wampler - Streaming Deep Learning Scenarios with...
Flink Forward
 
[Paper reading] Interleaving with Coroutines: A Practical Approach for Robust...
PingCAP
 
Best Practices for Hyperparameter Tuning with MLflow
Databricks
 
Generalized Linear Models in Spark MLlib and SparkR
Databricks
 
Serving BERT Models in Production with TorchServe
Nidhin Pattaniyil
 
Spark Summit EU talk by Mikhail Semeniuk Hollin Wilkins
Spark Summit
 
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
GeeksLab Odessa
 
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Intel® Software
 
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splines
Intel® Software
 
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...
Universitat Politècnica de Catalunya
 
Data science on big data. Pragmatic approach
Pavel Mezentsev
 
Suneel Marthi - Deep Learning with Apache Flink and DL4J
Flink Forward
 
Spark Summit EU talk by Luca Canali
Spark Summit
 
Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...
Spark Summit
 
Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...
Intel® Software
 

Similar to Parallel Distributed Deep Learning on HPCC Systems (20)

PPTX
2018 03 25 system ml ai and openpower meetup
Ganesan Narayanasamy
 
PPTX
Expanding HPCC Systems Deep Neural Network Capabilities
HPCC Systems
 
PDF
Assisting User’s Transition to Titan’s Accelerated Architecture
inside-BigData.com
 
PDF
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Jakob Karalus
 
PDF
Democratizing machine learning on kubernetes
Docker, Inc.
 
PPT
Harnessing OpenCL in Modern Coprocessors
Unai Lopez-Novoa
 
PDF
OpenPOWER Acceleration of HPCC Systems
HPCC Systems
 
PDF
Distributed DNN training: Infrastructure, challenges, and lessons learned
Wee Hyong Tok
 
PDF
Deep Learning at Scale
Mateusz Dymczyk
 
PDF
Neural Networks from Scratch - TensorFlow 101
Gerold Bausch
 
PDF
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Databricks
 
PPTX
Combining Machine Learning frameworks with Apache Spark
DataWorks Summit/Hadoop Summit
 
PDF
Spark and Deep Learning frameworks with distributed workloads
S N
 
PDF
Latest (storage IO) patterns for cloud-native applications
OpenEBS
 
PPTX
Simulation of Heterogeneous Cloud Infrastructures
CloudLightning
 
PDF
Approximation techniques used for general purpose algorithms
Sabidur Rahman
 
PDF
Deep learning and Apache Spark
QuantUniversity
 
PPTX
Deep_Learning_Frameworks_CNTK_PyTorch
Subhashis Hazarika
 
PPTX
Advancements in HPCC Systems Machine Learning
HPCC Systems
 
PDF
Using Deep Learning Toolkits with Kubernetes clusters
Joy Qiao
 
2018 03 25 system ml ai and openpower meetup
Ganesan Narayanasamy
 
Expanding HPCC Systems Deep Neural Network Capabilities
HPCC Systems
 
Assisting User’s Transition to Titan’s Accelerated Architecture
inside-BigData.com
 
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Jakob Karalus
 
Democratizing machine learning on kubernetes
Docker, Inc.
 
Harnessing OpenCL in Modern Coprocessors
Unai Lopez-Novoa
 
OpenPOWER Acceleration of HPCC Systems
HPCC Systems
 
Distributed DNN training: Infrastructure, challenges, and lessons learned
Wee Hyong Tok
 
Deep Learning at Scale
Mateusz Dymczyk
 
Neural Networks from Scratch - TensorFlow 101
Gerold Bausch
 
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Databricks
 
Combining Machine Learning frameworks with Apache Spark
DataWorks Summit/Hadoop Summit
 
Spark and Deep Learning frameworks with distributed workloads
S N
 
Latest (storage IO) patterns for cloud-native applications
OpenEBS
 
Simulation of Heterogeneous Cloud Infrastructures
CloudLightning
 
Approximation techniques used for general purpose algorithms
Sabidur Rahman
 
Deep learning and Apache Spark
QuantUniversity
 
Deep_Learning_Frameworks_CNTK_PyTorch
Subhashis Hazarika
 
Advancements in HPCC Systems Machine Learning
HPCC Systems
 
Using Deep Learning Toolkits with Kubernetes clusters
Joy Qiao
 
Ad

More from HPCC Systems (20)

PPTX
Natural Language to SQL Query conversion using Machine Learning Techniques on...
HPCC Systems
 
PPTX
Towards Trustable AI for Complex Systems
HPCC Systems
 
PPTX
Welcome
HPCC Systems
 
PPTX
Closing / Adjourn
HPCC Systems
 
PPTX
Community Website: Virtual Ribbon Cutting
HPCC Systems
 
PPTX
Path to 8.0
HPCC Systems
 
PPTX
Release Cycle Changes
HPCC Systems
 
PPTX
Geohashing with Uber’s H3 Geospatial Index
HPCC Systems
 
PPTX
Docker Support
HPCC Systems
 
PPTX
Leveraging Intra-Node Parallelization in HPCC Systems
HPCC Systems
 
PPTX
DataPatterns - Profiling in ECL Watch
HPCC Systems
 
PPTX
Leveraging the Spark-HPCC Ecosystem
HPCC Systems
 
PPTX
Work Unit Analysis Tool
HPCC Systems
 
PPTX
Community Award Ceremony
HPCC Systems
 
PPTX
Dapper Tool - A Bundle to Make your ECL Neater
HPCC Systems
 
PPTX
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
HPCC Systems
 
PPTX
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
HPCC Systems
 
PPTX
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...
HPCC Systems
 
PPTX
Leveraging HPCC Systems as Part of an Information Security, Privacy, and Comp...
HPCC Systems
 
PPTX
Using the Open Source VS Code Editor with the HPCC Systems Platform
HPCC Systems
 
Natural Language to SQL Query conversion using Machine Learning Techniques on...
HPCC Systems
 
Towards Trustable AI for Complex Systems
HPCC Systems
 
Welcome
HPCC Systems
 
Closing / Adjourn
HPCC Systems
 
Community Website: Virtual Ribbon Cutting
HPCC Systems
 
Path to 8.0
HPCC Systems
 
Release Cycle Changes
HPCC Systems
 
Geohashing with Uber’s H3 Geospatial Index
HPCC Systems
 
Docker Support
HPCC Systems
 
Leveraging Intra-Node Parallelization in HPCC Systems
HPCC Systems
 
DataPatterns - Profiling in ECL Watch
HPCC Systems
 
Leveraging the Spark-HPCC Ecosystem
HPCC Systems
 
Work Unit Analysis Tool
HPCC Systems
 
Community Award Ceremony
HPCC Systems
 
Dapper Tool - A Bundle to Make your ECL Neater
HPCC Systems
 
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
HPCC Systems
 
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
HPCC Systems
 
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...
HPCC Systems
 
Leveraging HPCC Systems as Part of an Information Security, Privacy, and Comp...
HPCC Systems
 
Using the Open Source VS Code Editor with the HPCC Systems Platform
HPCC Systems
 
Ad

Recently uploaded (20)

PPTX
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
PPTX
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
PDF
Research Methodology Overview Introduction
ayeshagul29594
 
PPTX
04_Tamás Marton_Intuitech .pptx_AI_Barometer_2025
FinTech Belgium
 
PDF
Technical-Report-GPS_GIS_RS-for-MSF-finalv2.pdf
KPycho
 
PDF
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
PPTX
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
PDF
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
PDF
1750162332_Snapshot-of-Indias-oil-Gas-data-May-2025.pdf
sandeep718278
 
PPTX
BinarySearchTree in datastructures in detail
kichokuttu
 
PPTX
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
 
PPTX
Powerful Uses of Data Analytics You Should Know
subhashenia
 
PDF
A GraphRAG approach for Energy Efficiency Q&A
Marco Brambilla
 
PPTX
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
PPTX
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
PPTX
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
PDF
The Best NVIDIA GPUs for LLM Inference in 2025.pdf
Tamanna36
 
PPTX
thid ppt defines the ich guridlens and gives the information about the ICH gu...
shaistabegum14
 
PPTX
big data eco system fundamentals of data science
arivukarasi
 
PPTX
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
Research Methodology Overview Introduction
ayeshagul29594
 
04_Tamás Marton_Intuitech .pptx_AI_Barometer_2025
FinTech Belgium
 
Technical-Report-GPS_GIS_RS-for-MSF-finalv2.pdf
KPycho
 
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
1750162332_Snapshot-of-Indias-oil-Gas-data-May-2025.pdf
sandeep718278
 
BinarySearchTree in datastructures in detail
kichokuttu
 
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
 
Powerful Uses of Data Analytics You Should Know
subhashenia
 
A GraphRAG approach for Energy Efficiency Q&A
Marco Brambilla
 
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
The Best NVIDIA GPUs for LLM Inference in 2025.pdf
Tamanna36
 
thid ppt defines the ich guridlens and gives the information about the ICH gu...
shaistabegum14
 
big data eco system fundamentals of data science
arivukarasi
 
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 

Parallel Distributed Deep Learning on HPCC Systems

  • 1. Innovation and Reinvention Driving Transformation OCTOBER 9, 2018 2018 HPCC Systems® Community Day Robert K.L. Kennedy Parallel Distributed Deep Learning on HPCC Systems
  • 2. Background • Expand HPCC Systems complex machine learning capabilities • Current HPCC Systems ML libraries have common ML algorithms • Lacks more advanced ML algorithms, such as Deep Learning • Expand HPCC Systems libraries to include Deep Learning Parallel Distributed Deep Learning on HPCC Systems 2
  • 3. Presentation Outline • Project Goals • Problem Statement • Brief Neural Network Background • Introduction to Parallel Deep Learning Methods and Techniques • Overview of the Technologies Used in this Implementation • Details of the Implementation • Implementation Validation and Statistical Analysis • Future Work Parallel Distributed Deep Learning on HPCC Systems 3
  • 4. • Extend HPCC Systems Libraries to include Deep Learning • Specifically Distributed Deep Learning • Combine HPCC Systems and TensorFlow • Widely used open source DL library • HPCC Systems Provides: • Cluster environment • Distribution of data and code • Communication between nodes • TensorFlow Provides: • Deep Learning training algorithms for localized execution Project Goals Parallel Distributed Deep Learning on HPCC Systems 4
  • 5. • Deep Learning models are large and Complex • DL needs large amounts of training data • Training Process • Time requirements increase with data size and model complexity • Computation requirements increase as well • Large multi node computers are needed to effectively train large, cutting edge Deep Learning models Problem Statement Parallel Distributed Deep Learning on HPCC Systems 5
  • 6. • Neural Network Visualization • 2 hidden layers, fully connected, 3 class classification output • Forwardpropagation and Backpropagation • Optimize Model with respect to Loss Function • Gradient Descent, SGD, Batch SGD, Mini-batch SGD Neural Network Background Parallel Distributed Deep Learning on HPCC Systems 6
  • 7. • Data Parallelism • Model Parallelism • Synchronous and Asynchronous • Parallel SGD Parallel Deep Learning Parallel Distributed Deep Learning on HPCC Systems 7 Data Parallelism Model Parallelism
  • 8. • ECL/HPCC Systems Handles the ‘Data Parallelism’ part of the parallelization • Pyembed handles the localized neural network training • Using Python, TensorFlow, Keras, and other libraries • The implementation is a synchronous data parallel stochastic gradient descent • However, it is not limited to using SGD at a localized level • The implementation is not limited to TensorFlow • Using Keras, other Deep Learning ‘backends’ can be used with no change in code Implementation – Overview Parallel Distributed Deep Learning on HPCC Systems 8
  • 9. • TensorFlow • Google’s Popular Deep Learning Library • Keras • Deep Learning Library API – uses TensorFlow or other ‘backend’ • Much less code to produce same model TensorFlow | Keras Parallel Distributed Deep Learning on HPCC Systems 9
  • 10. • ECL Partitions Training Data into N partitions • Where N is the number of slave nodes • Pyembed – plugin that allows ECL to execute python code • ECL Distributes the Pyembed code along with data to each node • Passes into Pyembed the data, NN model, and meta-data as parameters Implementation - HPCC and ECL Parallel Distributed Deep Learning on HPCC Systems 10
  • 11. • Receives parameters at time of execution, passed in from ECL • Then converts to types usable by the python libraries • Builds localized NN model from the inputs • Recall this is iterative so the input model changes on each Epoch • Trains the inputted model on its partition of data • Returns the updated model weights once completed • Does not return any training data Implementation – Pyembed Parallel Distributed Deep Learning on HPCC Systems 11
  • 12. Code Example – Parallel SGD Parallel Distributed Deep Learning on HPCC Systems 12
  • 13. • Using a ‘big-data’ dataset, 3,692,556 records long • Each record is 850 bytes long • 80/20 Split for Training and Testing Datasets • We use 10 (from the 80 split) data set sizes, each with different class imbalance ratios • 1:1, 1:5, 1:10, 1:25, 1:50, 1:100, 1:500, 1:1000, 1:1500, 1:2000 • Ranging from 2,240 records to 2,241,120 records • 1.9 MB to 1.9 GB in size • Each dataset is run on 5 different cluster sizes • 1 node, 2 nodes, 4 nodes, 8 nodes, and 16 nodes • Cluster is cloud based and each node has 1 CPU and 3.5 gigs of memory Case Study – Training Time – Design Parallel Distributed Deep Learning on HPCC Systems 13
  • 14. • Note the Y scale of the Left graph is logarithmic Case Study – Training Time – Results Parallel Distributed Deep Learning on HPCC Systems 14 ●●●● ● ● ● ● ● ●●●● ● ● ● ● ● ●●●● ● ● ● ● ● ●●●● ● ● ● ● ● ●●●● ● ● ● ● ● 0 2000 4000 6000 0 500 1000 1500 Training Dataset Size (thousands) TrainingTime(seconds) # of Nodes ● ● ● ● ● 1 2 4 8 16 Training Time Comparison ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 64 512 4096 4 32 256 2048 Training Dataset Size (thousands) TrainingTime(seconds) # of Nodes ● ● ● ● ● 1 2 4 8 16 Training Time Comparison
  • 15. • Uses same experimental design as previous case study • Model performance is slightly improved by number of nodes • See: slope of the red line • Other dataset sizes’ model performance effects out of scope • Due to the severe class imbalance Case Study – Model Performance Parallel Distributed Deep Learning on HPCC Systems 15 ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● 1 2 4 8 16 Nodes Performance(AUC) Data Size ● ● ● ● ● ● ● ● ● ● 1 5 10 25 50 100 500 1000 1500 2000 Model Performance vs. # Nodes
  • 16. • Successful Implementation of a synchronous data parallel deep learning algorithm • Case Studies show the runtime is valid across a wide spectrum of clusters sizes and dataset sizes • Leveraged HPCC Systems and TensorFlow to bring Deep Learning to HPCC Systems • Started new open source HPCC Systems Library for distributed DL • Accompanying Documentation, Test cases, and Performance tests • Provided possible research avenues for future work Conclusion Parallel Distributed Deep Learning on HPCC Systems 16
  • 17. • Improved Data Parallelism • For HPCC Systems with multiple slave Thor nodes on a single logical computer • Model Parallelism Implementation • Hybrid Approach – Model and Data parallelism • Asynchronous Parallelism • This paradigm has additional challenges on a cluster system Future Work Parallel Distributed Deep Learning on HPCC Systems 17