SlideShare a Scribd company logo
A Journey through the ML Model
Deployment to Production

STANKO KUVELJIC
A Journey through the
ML Model Deployment
to Production
A Journey through the
ML Model Deployment
to Production
HELL
Stanko Kuveljić
“Expectations were like fine pottery. The harder you held them, the more
likely they were to crack.”, The Way of the Kings, Brandon Sanderson
1. ML Deployment
2. Deploy on the edge
3. Data base deployment
4. REST API - Flask
5. REST API - Flask + uWSGI
6. TensorFlow Serving
7. Message Queues
8. Tidal Waves
9. Summary
MACHINE LEARNING
People with no idea about AI
saying it will take over the world
My network
ML
DEPLOYMENT
“I know the pieces fit 'cause I watched them tumble down”, Schism, Tool
Light
heaven
abyss
Shadow
OrderLife
DeathChaos
Reality
astral
AI system astral rift
& ML
VOID
source: https://imgur.com/t/elon/5aTN9vc
Reality
astral
AI system
ML
VOID
Feature
Engineering
Data
Verification
Data
Storage
Scheduler
Data
Collection
Automation
Monitoring Configuration
BIServing
Voodoo
Task
Managers
Some of Serving Approaches
Some of Serving Approaches
1. On the edge - (device deployment)
Some of Serving Approaches
1. On the edge - (device deployment)
2. Database batch inference
Some of Serving Approaches
1. On the edge - (device deployment)
2. Database batch inference
3. REST API
Some of Serving Approaches
1. On the edge - (device deployment)
2. Database batch inference
3. REST API
4. Streaming
ON THE EDGE
Mobile Device Deployment
Calls model on the device
Predictions: On the Fly
Latency: Low
Constraint: Model complexity
Mobile Device Deployment
Calls model on the device
Predictions: On the Fly
Latency: Low
Constraint: Model complexity
Animal: Cat
DATA BASE
BIG DATA
Batch Inference
App
Preproc Inference
Raw
Data
Scored
Data
Scheduler/Cli
Batch Inference
Predictions: On demand/scheduled
Latency: “less important”
Constraint: not real time
App
Preproc Inference
Raw
Data
Scored
Data
Scheduler/Cli
REST API
Flask
Client
Flask GPUML
Models
Other
Services DB
Flask
• Web framework - NOT A SERVER
• Easy for development
• Single request per time
• Can’t scale
• Not for production
Client
Flask GPUML
Models
Other
Services DB
uWSGI
M
W
W
W
• Web Server Gateway Interface
• Mostly used for Python application
uWSGI - Forking (copy on write)
Master
Worker
Loads
App Worker
Worker
Copy from
Parent
Copy from
Parent
Copy from
Parent
uWSGI - Forking (copy on write)
Master
Worker
Loads
App Worker
Worker
Copy from
Parent
Copy from
Parent
Copy from
Parent
uWSGI - Lazy Apps
Master
Worker
Doesn’t
Load App
Worker
Worker
Loads
App
Loads
App
Loads
App
uWSGI - Postfork fix
Master
Loads
App
Worker
Copy from
Parent
Worker
Copy from
Parent
Worker
Copy from
Parent
postfork()
postfork()
postfork()
uWSGI - Lazy and Postforked Summary
• TF requires postfork (or lazy apps)
• Each process makes copy of ML model
• Each process maintains own session
• High memory footprint
• GPU doesn’t always help
uWSGI - Lazy and Postforked Summary
• TF requires postfork (or lazy apps)
• Each process makes copy of ML model
• Each process maintains own session
• High memory footprint
• GPU doesn’t always help
TENSORFLOW
SERVING
App 1
App 2
App 3
GPU
TensorFlow
Serving
Models
App 1
App 2
App 3
GPU
TensorFlow
Serving
Models
• Separated from the APPS
• Multiple models and versions
• Manages session with GPU
• Allows batch processing
• Scalable
• REST/GRPC endpoints
• Versioning policy
Client
Flask + uWSGI
DB
Client
TF - Client
Other
Services
GPU
TensorFlow
Serving
Models
STREAMING
Message queues
Client app
Producer
Client app
Producer
Consumer Producer
Request
Queue Response
Queue
Client app
ConsumerML MODELS
&
INFERENCE
Lyrics: Ænima, Tool
LEARN TO SWIM
Lyrics: Ænima, Tool
'CAUSE I'M PRAYING FOR RAIN
AND I'M PRAYING FOR TIDAL WAVES
Lyrics: Ænima, Tool
I WANNA SEE IT COME DOWN
Lyrics: Ænima, Tool
BURN IT DOWN
Lyrics: Ænima, Tool
FLASH IT DOWN
Lyrics: Ænima, Tool
Lyrics: Ænima, Tool
WITH LOAD TESTS
Tested Models
MODEL PARAMETERS NOTES
Inception 4M InceptionV4
CNN (image) 0.5M 6 - CONV LAYERS
LSTM (text) 1M 2 x LSTM (128) - 256 unroll
Machine
OS Ubuntu16
RAM 32GB
GPU Nvidia1050Ti
CPU i7
CNN - Results
Workers GPU/CPU Throughtput (e/s) Load RPS
Response
TIme (s)
Breakpoint
RPS
1 (Flask) CPU 104 200 18.1 200
2 CPU 200 200 1.08 500
4 CPU 200 200 0.07 600
1 (Flask) GPU 650 750 11.1 750
2 GPU 750 750 1.0 900
4 GPU 750 750 0.04 1000
Inception - Results
Workers GPU/CPU Throughtput (e/s) Load RPS
Response
TIme (s)
Breakpoint
RPS
1 (Flask) CPU 4 10 32 10
2 CPU 5 10 18.3 10
4 CPU 3 10 26.5 10
1 (Flask) GPU 20 50 30 50
2 GPU 20 50 21 50
Inception - Results
Workers GPU/CPU Throughtput (e/s) Load RPS
Response
TIme (s)
Breakpoint
RPS
1 (Flask) CPU 4 10 32 10
2 CPU 5 10 18.3 10
4 CPU 3 10 26.5 10
1 (Flask) GPU 20 50 30 50
2 GPU 20 50 21 50
RNN - Results
Workers GPU/CPU Throughtput (e/s) Load RPS
Response
TIme (s)
Breakpoint
RPS
1 (Flask) CPU 35 100 24.6 100
2 CPU 50 100 19.7 100
4 CPU 60 100 18.0 100
1 (Flask) GPU 10 50 36.59 50
2 GPU 10 50 27.7 50
4 GPU 15 50 26.5 50
RNN - Results
Workers GPU/CPU Throughtput (e/s) Load RPS
Response
TIme (s)
Breakpoint
RPS
1 (Flask) CPU 35 100 24.6 100
2 CPU 50 100 19.7 100
4 CPU 60 100 18.0 100
1 (Flask) GPU 10 50 36.59 50
2 GPU 10 50 27.7 50
4 GPU 15 50 26.5 50
“The purpose of a storyteller is not to tell you how to think, but to give you
questions to think upon.”, The Way of the Kings, Brandon Sanderson
• Business?
• Does accuracy matter?
• Real time?
• Retraining?
• Data size?
• Algorithm to use?
• Demo vs Production?
THANK YOU
Journey through the ML model deployment to production @DSC5

More Related Content

What's hot (19)

PPTX
Mock cli with Python unittest
Song Jin
 
PDF
Test::Kantan - Perl and Testing
Tokuhiro Matsuno
 
PPTX
SaltConf 2015: Salt stack at web scale: Better, Stronger, Faster
Thomas Jackson
 
PDF
Front-End UnitTesting
Artyom Trityak
 
PPTX
One Container, Two Container, Three Containers, Four
Ashley Roach
 
PDF
Rails after 5 years
Rob Dawson
 
PPTX
Flask
Inker Kuo
 
PPTX
Server::Starter meets Java
Tokuhiro Matsuno
 
PDF
Why and how Pricing Assistant migrated from Celery to RQ - Paris.py #2
Sylvain Zimmer
 
PDF
PHP-VCR behat case study
Pascal Thormeier
 
PDF
SOAP calls in Clojure application
Prasanna Venkatesan
 
PPTX
CTU June 2011 - C# 5.0 - ASYNC & Await
Spiffy
 
PDF
Security Testing with OWASP ZAP in CI/CD - Simon Bennetts - Codemotion Amster...
Codemotion
 
ODP
JRuby - Everything in a single process
ocher
 
PDF
PHP-VCR Lightningtalk
Adrian Philipp
 
PDF
Sensu and Sensibility - Puppetconf 2014
Tomas Doran
 
PDF
JavaScript Language Update 2016 (LLoT)
Teppei Sato
 
PPTX
Async await
Jeff Hart
 
PPTX
Performance tests with Gatling
Andrzej Ludwikowski
 
Mock cli with Python unittest
Song Jin
 
Test::Kantan - Perl and Testing
Tokuhiro Matsuno
 
SaltConf 2015: Salt stack at web scale: Better, Stronger, Faster
Thomas Jackson
 
Front-End UnitTesting
Artyom Trityak
 
One Container, Two Container, Three Containers, Four
Ashley Roach
 
Rails after 5 years
Rob Dawson
 
Flask
Inker Kuo
 
Server::Starter meets Java
Tokuhiro Matsuno
 
Why and how Pricing Assistant migrated from Celery to RQ - Paris.py #2
Sylvain Zimmer
 
PHP-VCR behat case study
Pascal Thormeier
 
SOAP calls in Clojure application
Prasanna Venkatesan
 
CTU June 2011 - C# 5.0 - ASYNC & Await
Spiffy
 
Security Testing with OWASP ZAP in CI/CD - Simon Bennetts - Codemotion Amster...
Codemotion
 
JRuby - Everything in a single process
ocher
 
PHP-VCR Lightningtalk
Adrian Philipp
 
Sensu and Sensibility - Puppetconf 2014
Tomas Doran
 
JavaScript Language Update 2016 (LLoT)
Teppei Sato
 
Async await
Jeff Hart
 
Performance tests with Gatling
Andrzej Ludwikowski
 

Similar to Journey through the ML model deployment to production @DSC5 (20)

PDF
Journey through the ML model deployment to production by Stanko Kuveljic
SmartCat
 
PDF
Streaming Dataflow with Apache Flink
huguk
 
PDF
State of Akka 2017 - The best is yet to come
Konrad Malawski
 
PDF
Puppet Camp DC 2015: Distributed OpenSCAP Compliance Validation with MCollective
Puppet
 
PPTX
Being HAPI! Reverse Proxying on Purpose
Aman Kohli
 
PDF
Seattle StrongLoop Node.js Workshop
Jimmy Guerrero
 
PDF
Serverless in Production, an experience report (AWS UG South Wales)
Yan Cui
 
PDF
DSLing your System For Scalability Testing Using Gatling - Dublin Scala User ...
Aman Kohli
 
PDF
Serverless in production, an experience report (FullStack 2018)
Yan Cui
 
PDF
Splunk app for stream
csching
 
PDF
Original slides from Ryan Dahl's NodeJs intro talk
Aarti Parikh
 
PDF
Overcome your fear of implementing offline mode in your apps
Marin Todorov
 
PPTX
StrongLoop Overview
Shubhra Kar
 
PDF
Load testing with Blitz
Lindsay Holmwood
 
PDF
Crystal is a Rubyists friend (quick anecdote)
Forrest Chang
 
PPTX
Apache Flink(tm) - A Next-Generation Stream Processor
Aljoscha Krettek
 
PDF
Build an App with Blindfold - Britt Barak
DroidConTLV
 
PPTX
Apache Flink Training: System Overview
Flink Forward
 
PPT
Open Source XMPP for Cloud Services
mattjive
 
PPTX
Built in physical and logical replication in postgresql-Firat Gulec
FIRAT GULEC
 
Journey through the ML model deployment to production by Stanko Kuveljic
SmartCat
 
Streaming Dataflow with Apache Flink
huguk
 
State of Akka 2017 - The best is yet to come
Konrad Malawski
 
Puppet Camp DC 2015: Distributed OpenSCAP Compliance Validation with MCollective
Puppet
 
Being HAPI! Reverse Proxying on Purpose
Aman Kohli
 
Seattle StrongLoop Node.js Workshop
Jimmy Guerrero
 
Serverless in Production, an experience report (AWS UG South Wales)
Yan Cui
 
DSLing your System For Scalability Testing Using Gatling - Dublin Scala User ...
Aman Kohli
 
Serverless in production, an experience report (FullStack 2018)
Yan Cui
 
Splunk app for stream
csching
 
Original slides from Ryan Dahl's NodeJs intro talk
Aarti Parikh
 
Overcome your fear of implementing offline mode in your apps
Marin Todorov
 
StrongLoop Overview
Shubhra Kar
 
Load testing with Blitz
Lindsay Holmwood
 
Crystal is a Rubyists friend (quick anecdote)
Forrest Chang
 
Apache Flink(tm) - A Next-Generation Stream Processor
Aljoscha Krettek
 
Build an App with Blindfold - Britt Barak
DroidConTLV
 
Apache Flink Training: System Overview
Flink Forward
 
Open Source XMPP for Cloud Services
mattjive
 
Built in physical and logical replication in postgresql-Firat Gulec
FIRAT GULEC
 
Ad

More from SmartCat (6)

PPTX
Navigation in 3 d environment with reinforcement learning by Predrag Njegovan...
SmartCat
 
PPTX
Resilient event deduplication in Kafka by Vladimir Vajda
SmartCat
 
PPTX
Ai pitfalls through so you don't have to
SmartCat
 
PPTX
Embryo selection using AI
SmartCat
 
PPTX
HVAC optimisation using RL
SmartCat
 
PPTX
Elasticsearch - under the hood
SmartCat
 
Navigation in 3 d environment with reinforcement learning by Predrag Njegovan...
SmartCat
 
Resilient event deduplication in Kafka by Vladimir Vajda
SmartCat
 
Ai pitfalls through so you don't have to
SmartCat
 
Embryo selection using AI
SmartCat
 
HVAC optimisation using RL
SmartCat
 
Elasticsearch - under the hood
SmartCat
 
Ad

Recently uploaded (20)

PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
Learn Computer Forensics, Second Edition
AnuraShantha7
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
PPTX
Q2 Leading a Tableau User Group - Onboarding
lward7
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
Predicting the unpredictable: re-engineering recommendation algorithms for fr...
Speck&Tech
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
Learn Computer Forensics, Second Edition
AnuraShantha7
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
Q2 Leading a Tableau User Group - Onboarding
lward7
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
Predicting the unpredictable: re-engineering recommendation algorithms for fr...
Speck&Tech
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 

Journey through the ML model deployment to production @DSC5