SlideShare a Scribd company logo
Distributed Deep
Learning with Docker
at Salesforce
Jeff Hajewski
Software Engineer,
Salesforce.
github.com/j-haj
jeff-hajewski-3a1b5a29
Caveats
● These my own views and opinions, not those of Salesforce
● This is how one team at Salesforce deploys deep learning
models
● When I use the term Docker I am referring to the
technology, not the company
● Some of these designs are simplified
● What is deep learning and why is it difficult?
● Deep learning at Salesforce
● Challenges
○ Designing for team specialization
○ Interacting with the model server
○ Testing
● Key takeaways
About this talk
The core task of deep learning is function approximation.
Neural networks can approximate any function.
Neural networks are expensive to evaluate.
● Linear regression: ~1,000 parameters
● Deep neural network: 100M - 1B parameters (100,000 - 1M x linear reg.)
Deep Learning Review
How should we design
distributed systems
for deep learning?
high latency tasks
We use deep learning models to provide our customers
useful information about their sales process.
They send us this data as a firehose of streaming data.
The faster we get this data to our customers, the more
useful and actionable it is for their sales teams.
Deep Learning at Salesforce
There are three steps to this process
1. Preprocessing - cleaning and formatting the data
2. Inference - running the data through the model
3. Postprocessing - interpreting the output from the model
Deep Learning at Salesforce
Deep Learning at Salesforce
Discusses
cat
preprocess
[0.2, 0.71, 0.89, 0.6]
[0.85, 0.15]
inference
postprocess“Hello! My
cat is
friendly.”
Deep Learning at Salesforce
Discusses
cat
preprocess
[0.2, 0.71, 0.89, 0.6]
[0.85, 0.15]
inference
postprocess“Hello! My
cat is
friendly.”
Data Science Team
Deep Learning at Salesforce
Discusses
cat
preprocess
[0.2, 0.71, 0.89, 0.6]
[0.85, 0.15]
inference
postprocess“Hello! My
cat is
friendly.”
Data Science Team Systems Team
Challenge 1:
designing
for team
specialization
Requirements
1. The data science team shouldn’t need to know
about the system. They just want to define a
sequence of computation.
2. The systems engineers shouldn’t need to know
anything about the computation. They just want to
scale the system.
Designing for team specialization
Challenges
1. Some functions takes longer to execute than others
(e.g., model inference)
2. The order of execution is important
Designing for team specialization
Solution: map functions to containers
Designing for team specialization
postprocess(inference(preprocess(x)))
preprocess inference postprocess
What about throughput?
preprocess inference postprocess
0110010
1001111
0110010
1000011
It’s a cat!
1,000
QPS
500
QPS
300
QPS
1,000
QPS
Max
throughput
inference
inferenceinferencepreprocess
What about throughput?
preprocess inference postprocess
0110010
1001111
0110010
1000011
It’s a cat!
1,000
QPS
2x
500
QPS
4x
300
QPS
1,000
QPS
Max
throughput
Docker enables us to easily scale out each individual stage
inferenceinferencepreprocess
What about throughput?
preprocess inference postprocess
0110010
1001111
0110010
1000011
It’s a cat!
1,000
QPS
2x
500
QPS
2x
300
QPS
1,000
QPS
Max
throughput
Kafka gives stage-wise checkpointing
Challenge 2:
interacting
with the
model servers
Model servers provide a way to query the model,
typically via gRPC or HTTP.
What is the best way to deploy and interact with these
model servers?
Serving deep learning models
Challenge:
1. Model servers are designed as a standalone process.
2. How should we best utilize multiple GPUs?
3. What about networking?
Interacting with the model server
We want to keep deployment simple!
Solution: Deploy model server images as part of a “pod” or
“group” with a coordinator service
Interacting with the model server
JVM
Manager
Model Server Model Server Model Server...
Pod
1. Who owns the model server?
2. How should we handle model versions? Where are they
stored locally?
3. What are the addresses of the model servers?
This solves additional challenges
Data science team
Docker shared volume
http://localhost via Docker private networking
Challenge 3:
testing
Challenge: how should we test these systems?
1. Deep learning models are probabilistic
2. Interservice interactions can be quite complex
Testing
Solution: Docker Compose
● Makes it easy to swap out the model server with a mock
service
● Deploying the entire system locally is easy
● Integrates well with Maven and Gradle
Testing
We haven’t spent a lot of time
discussing the details of Docker
That is precisely the point!
● Docker allows us to simplify many aspects of our design.
● Docker stays out of the way.
● Docker provides a simple alternative to a much more
complex solution.
Docker simplifies our lives

More Related Content

What's hot (20)

PDF
Predicting Space Weather with Docker
Docker, Inc.
 
PDF
DCSF19 Container Security: Theory & Practice at Netflix
Docker, Inc.
 
PDF
Hands-on Helm
Docker, Inc.
 
PPTX
Using the SDACK Architecture on Security Event Inspection by Yu-Lun Chen and ...
Docker, Inc.
 
PPTX
Puppet plugin for vRealize Automation (vRA)
Puppet
 
PPTX
Simple tweaks to get the most out of your JVM
Jamie Coleman
 
PPTX
Webinar: Accelerate Your Inner Dev Loop for Kubernetes Services
Ambassador Labs
 
PDF
DCSF 19 How Entergy is Mitigating Legacy Windows Operating System Vulnerabili...
Docker, Inc.
 
PDF
DCSF 19 Microservices API: Routing Across Any Infrastructure
Docker, Inc.
 
PDF
DCSF 19 Developing Apps with Containers, Functions and Cloud Services
Docker, Inc.
 
PPTX
Serverless java
Vishwas N
 
PDF
DCSF 19 Modernizing Insurance with Docker Enterprise: The Physicians Mutual ...
Docker, Inc.
 
PDF
DCEU 18: From Monolith to Microservices
Docker, Inc.
 
PDF
Accessible hpc for everyone with docker and containers
Docker, Inc.
 
PDF
DCEU 18: 5 Patterns for Success in Application Transformation
Docker, Inc.
 
PDF
Networking in Docker EE 2.0 with Kubernetes and Swarm
Abhinandan P.b
 
PDF
Application Deployment and Management at Scale with 1&1 by Matt Baldwin
Docker, Inc.
 
PDF
JEEconf 2017
Ihor Kolodyuk
 
PPTX
Container on azure
Vishwas N
 
PDF
Immutable Awesomeness by John Willis and Josh Corman
Docker, Inc.
 
Predicting Space Weather with Docker
Docker, Inc.
 
DCSF19 Container Security: Theory & Practice at Netflix
Docker, Inc.
 
Hands-on Helm
Docker, Inc.
 
Using the SDACK Architecture on Security Event Inspection by Yu-Lun Chen and ...
Docker, Inc.
 
Puppet plugin for vRealize Automation (vRA)
Puppet
 
Simple tweaks to get the most out of your JVM
Jamie Coleman
 
Webinar: Accelerate Your Inner Dev Loop for Kubernetes Services
Ambassador Labs
 
DCSF 19 How Entergy is Mitigating Legacy Windows Operating System Vulnerabili...
Docker, Inc.
 
DCSF 19 Microservices API: Routing Across Any Infrastructure
Docker, Inc.
 
DCSF 19 Developing Apps with Containers, Functions and Cloud Services
Docker, Inc.
 
Serverless java
Vishwas N
 
DCSF 19 Modernizing Insurance with Docker Enterprise: The Physicians Mutual ...
Docker, Inc.
 
DCEU 18: From Monolith to Microservices
Docker, Inc.
 
Accessible hpc for everyone with docker and containers
Docker, Inc.
 
DCEU 18: 5 Patterns for Success in Application Transformation
Docker, Inc.
 
Networking in Docker EE 2.0 with Kubernetes and Swarm
Abhinandan P.b
 
Application Deployment and Management at Scale with 1&1 by Matt Baldwin
Docker, Inc.
 
JEEconf 2017
Ihor Kolodyuk
 
Container on azure
Vishwas N
 
Immutable Awesomeness by John Willis and Josh Corman
Docker, Inc.
 

Similar to Distributed Deep Learning with Docker at Salesforce (20)

PPTX
No BS Guide to Deep Learning in the Enterprise
Jesus Rodriguez
 
PDF
Why scala for data science
Guglielmo Iozzia
 
PPTX
Machine learning in the wild deployment
Birger Moell
 
PPTX
Canada DevOps Summit 2020 Presentation Nov_03_2020
Varun Manik
 
PDF
Data Con LA 2018 - Towards Data Science Engineering Principles by Joerg Schad
Data Con LA
 
PPTX
Notes on Deploying Machine-learning Models at Scale
Deep Kayal
 
PDF
Austin,TX Meetup presentation tensorflow final oct 26 2017
Clarisse Hedglin
 
PDF
Deploying deep learning models with Docker and Kubernetes
PetteriTeikariPhD
 
PDF
TensorFlow 16: Building a Data Science Platform
Seldon
 
PDF
End-to-End Platform Support for Distributed Deep Learning in Finance
Jim Dowling
 
PPTX
BigDL Deep Learning in Apache Spark - AWS re:invent 2017
Dave Nielsen
 
PDF
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
Provectus
 
PPTX
DevOps for Machine Learning overview en-us
eltonrodriguez11
 
PDF
Scalable and Distributed DNN Training on Modern HPC Systems
inside-BigData.com
 
PPTX
IBM Developer Model Asset eXchange
Nick Pentreath
 
PDF
Scaling Deep Learning Algorithms on Extreme Scale Architectures
inside-BigData.com
 
PDF
Deep learning with kafka
Nitin Kumar
 
PPTX
Kubeflow.pptx
dhaferbenali1
 
PDF
Data ops: Machine Learning in production
Stepan Pushkarev
 
PPTX
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
Tyrone Systems
 
No BS Guide to Deep Learning in the Enterprise
Jesus Rodriguez
 
Why scala for data science
Guglielmo Iozzia
 
Machine learning in the wild deployment
Birger Moell
 
Canada DevOps Summit 2020 Presentation Nov_03_2020
Varun Manik
 
Data Con LA 2018 - Towards Data Science Engineering Principles by Joerg Schad
Data Con LA
 
Notes on Deploying Machine-learning Models at Scale
Deep Kayal
 
Austin,TX Meetup presentation tensorflow final oct 26 2017
Clarisse Hedglin
 
Deploying deep learning models with Docker and Kubernetes
PetteriTeikariPhD
 
TensorFlow 16: Building a Data Science Platform
Seldon
 
End-to-End Platform Support for Distributed Deep Learning in Finance
Jim Dowling
 
BigDL Deep Learning in Apache Spark - AWS re:invent 2017
Dave Nielsen
 
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
Provectus
 
DevOps for Machine Learning overview en-us
eltonrodriguez11
 
Scalable and Distributed DNN Training on Modern HPC Systems
inside-BigData.com
 
IBM Developer Model Asset eXchange
Nick Pentreath
 
Scaling Deep Learning Algorithms on Extreme Scale Architectures
inside-BigData.com
 
Deep learning with kafka
Nitin Kumar
 
Kubeflow.pptx
dhaferbenali1
 
Data ops: Machine Learning in production
Stepan Pushkarev
 
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
Tyrone Systems
 
Ad

More from Docker, Inc. (20)

PDF
Containerize Your Game Server for the Best Multiplayer Experience
Docker, Inc.
 
PDF
How to Improve Your Image Builds Using Advance Docker Build
Docker, Inc.
 
PDF
Build & Deploy Multi-Container Applications to AWS
Docker, Inc.
 
PDF
Securing Your Containerized Applications with NGINX
Docker, Inc.
 
PDF
How To Build and Run Node Apps with Docker and Compose
Docker, Inc.
 
PDF
The First 10M Pulls: Building The Official Curl Image for Docker Hub
Docker, Inc.
 
PDF
COVID-19 in Italy: How Docker is Helping the Biggest Italian IT Company Conti...
Docker, Inc.
 
PDF
Become a Docker Power User With Microsoft Visual Studio Code
Docker, Inc.
 
PDF
How to Use Mirroring and Caching to Optimize your Container Registry
Docker, Inc.
 
PDF
Kubernetes at Datadog Scale
Docker, Inc.
 
PDF
Labels, Labels, Labels
Docker, Inc.
 
PDF
Using Docker Hub at Scale to Support Micro Focus' Delivery and Deployment Model
Docker, Inc.
 
PDF
Build & Deploy Multi-Container Applications to AWS
Docker, Inc.
 
PDF
From Fortran on the Desktop to Kubernetes in the Cloud: A Windows Migration S...
Docker, Inc.
 
PDF
Developing with Docker for the Arm Architecture
Docker, Inc.
 
PDF
Sharing is Caring: How to Begin Speaking at Conferences
Docker, Inc.
 
PDF
Virtual Meetup Docker + Arm: Building Multi-arch Apps with Buildx
Docker, Inc.
 
PDF
DCSF 19 eBPF Superpowers
Docker, Inc.
 
PDF
DCSF 19 Zero Trust Networks Come to Enterprise Kubernetes
Docker, Inc.
 
PDF
DCSF 19 Node.js Rocks in Docker for Dev and Ops
Docker, Inc.
 
Containerize Your Game Server for the Best Multiplayer Experience
Docker, Inc.
 
How to Improve Your Image Builds Using Advance Docker Build
Docker, Inc.
 
Build & Deploy Multi-Container Applications to AWS
Docker, Inc.
 
Securing Your Containerized Applications with NGINX
Docker, Inc.
 
How To Build and Run Node Apps with Docker and Compose
Docker, Inc.
 
The First 10M Pulls: Building The Official Curl Image for Docker Hub
Docker, Inc.
 
COVID-19 in Italy: How Docker is Helping the Biggest Italian IT Company Conti...
Docker, Inc.
 
Become a Docker Power User With Microsoft Visual Studio Code
Docker, Inc.
 
How to Use Mirroring and Caching to Optimize your Container Registry
Docker, Inc.
 
Kubernetes at Datadog Scale
Docker, Inc.
 
Labels, Labels, Labels
Docker, Inc.
 
Using Docker Hub at Scale to Support Micro Focus' Delivery and Deployment Model
Docker, Inc.
 
Build & Deploy Multi-Container Applications to AWS
Docker, Inc.
 
From Fortran on the Desktop to Kubernetes in the Cloud: A Windows Migration S...
Docker, Inc.
 
Developing with Docker for the Arm Architecture
Docker, Inc.
 
Sharing is Caring: How to Begin Speaking at Conferences
Docker, Inc.
 
Virtual Meetup Docker + Arm: Building Multi-arch Apps with Buildx
Docker, Inc.
 
DCSF 19 eBPF Superpowers
Docker, Inc.
 
DCSF 19 Zero Trust Networks Come to Enterprise Kubernetes
Docker, Inc.
 
DCSF 19 Node.js Rocks in Docker for Dev and Ops
Docker, Inc.
 
Ad

Recently uploaded (20)

PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PPTX
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PPTX
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 

Distributed Deep Learning with Docker at Salesforce

  • 1. Distributed Deep Learning with Docker at Salesforce
  • 3. Caveats ● These my own views and opinions, not those of Salesforce ● This is how one team at Salesforce deploys deep learning models ● When I use the term Docker I am referring to the technology, not the company ● Some of these designs are simplified
  • 4. ● What is deep learning and why is it difficult? ● Deep learning at Salesforce ● Challenges ○ Designing for team specialization ○ Interacting with the model server ○ Testing ● Key takeaways About this talk
  • 5. The core task of deep learning is function approximation. Neural networks can approximate any function. Neural networks are expensive to evaluate. ● Linear regression: ~1,000 parameters ● Deep neural network: 100M - 1B parameters (100,000 - 1M x linear reg.) Deep Learning Review
  • 6. How should we design distributed systems for deep learning? high latency tasks
  • 7. We use deep learning models to provide our customers useful information about their sales process. They send us this data as a firehose of streaming data. The faster we get this data to our customers, the more useful and actionable it is for their sales teams. Deep Learning at Salesforce
  • 8. There are three steps to this process 1. Preprocessing - cleaning and formatting the data 2. Inference - running the data through the model 3. Postprocessing - interpreting the output from the model Deep Learning at Salesforce
  • 9. Deep Learning at Salesforce Discusses cat preprocess [0.2, 0.71, 0.89, 0.6] [0.85, 0.15] inference postprocess“Hello! My cat is friendly.”
  • 10. Deep Learning at Salesforce Discusses cat preprocess [0.2, 0.71, 0.89, 0.6] [0.85, 0.15] inference postprocess“Hello! My cat is friendly.” Data Science Team
  • 11. Deep Learning at Salesforce Discusses cat preprocess [0.2, 0.71, 0.89, 0.6] [0.85, 0.15] inference postprocess“Hello! My cat is friendly.” Data Science Team Systems Team
  • 13. Requirements 1. The data science team shouldn’t need to know about the system. They just want to define a sequence of computation. 2. The systems engineers shouldn’t need to know anything about the computation. They just want to scale the system. Designing for team specialization
  • 14. Challenges 1. Some functions takes longer to execute than others (e.g., model inference) 2. The order of execution is important Designing for team specialization
  • 15. Solution: map functions to containers Designing for team specialization postprocess(inference(preprocess(x))) preprocess inference postprocess
  • 16. What about throughput? preprocess inference postprocess 0110010 1001111 0110010 1000011 It’s a cat! 1,000 QPS 500 QPS 300 QPS 1,000 QPS Max throughput
  • 17. inference inferenceinferencepreprocess What about throughput? preprocess inference postprocess 0110010 1001111 0110010 1000011 It’s a cat! 1,000 QPS 2x 500 QPS 4x 300 QPS 1,000 QPS Max throughput Docker enables us to easily scale out each individual stage
  • 18. inferenceinferencepreprocess What about throughput? preprocess inference postprocess 0110010 1001111 0110010 1000011 It’s a cat! 1,000 QPS 2x 500 QPS 2x 300 QPS 1,000 QPS Max throughput Kafka gives stage-wise checkpointing
  • 20. Model servers provide a way to query the model, typically via gRPC or HTTP. What is the best way to deploy and interact with these model servers? Serving deep learning models
  • 21. Challenge: 1. Model servers are designed as a standalone process. 2. How should we best utilize multiple GPUs? 3. What about networking? Interacting with the model server We want to keep deployment simple!
  • 22. Solution: Deploy model server images as part of a “pod” or “group” with a coordinator service Interacting with the model server JVM Manager Model Server Model Server Model Server... Pod
  • 23. 1. Who owns the model server? 2. How should we handle model versions? Where are they stored locally? 3. What are the addresses of the model servers? This solves additional challenges Data science team Docker shared volume http://localhost via Docker private networking
  • 25. Challenge: how should we test these systems? 1. Deep learning models are probabilistic 2. Interservice interactions can be quite complex Testing
  • 26. Solution: Docker Compose ● Makes it easy to swap out the model server with a mock service ● Deploying the entire system locally is easy ● Integrates well with Maven and Gradle Testing
  • 27. We haven’t spent a lot of time discussing the details of Docker That is precisely the point!
  • 28. ● Docker allows us to simplify many aspects of our design. ● Docker stays out of the way. ● Docker provides a simple alternative to a much more complex solution. Docker simplifies our lives