SlideShare a Scribd company logo
HIGH PERFORMANCE DISTRIBUTED TENSORFLOW
IN PRODUCTION WITH GPUS AND KUBERNETES
HPC ADVISORY COUNCIL, FEB 2018
CHRIS FREGLY
FOUNDER @ PIPELINE.AI
HIGH PERFORMANCE DISTRIBUTED TENSORFLOW
IN PRODUCTION WITH GPUS AND KUBERNETES
HPC ADVISORY COUNCIL, FEB 2018
CHRIS FREGLY
FOUNDER @ PIPELINE.AI
KEY TAKE-AWAYS
Optimize Your Models After Training
Validate Models Online in Live Production (Safely!)
Evaluate Model Performance Offline *and* Online
Monitor and Tune Your Model Serving Runtime
INTRODUCTIONS: ME
§ Chris Fregly, Founder & Engineer @PipelineAI
§ Formerly Netflix, Databricks, IBM Spark Tech
§ Advanced Spark and TensorFlow Meetup
§ Please Join Our 60,000+ Global Members!!
Contact Me
chris@pipeline.ai
@cfregly
Global Locations
* San Francisco
* Chicago
* Austin
* Washington DC
* Dusseldorf
* London
INTRODUCTIONS: YOU
§ Data Scientist, Data Engineer, Data Analyst, Data Curious
§ Want to Deploy ML/AI Models Rapidly and Safely
§ Need to Trace or Explain Model Predictions
§ Have a Decent Grasp of Computer Science Fundamentals
PIPELINE.AI IS 100% OPEN SOURCE
§ https://github.com/PipelineAI/pipeline/
§ Please Star this GitHub Repo!
§ VC’s Value GitHub Stars @ $1,500 Each (?!)
GitHub Repo Geo Heat Map: http://jrvis.com/red-dwarf/
PIPELINE.AI OVERVIEW
500,000Docker Downloads
60,000 Registered Users
60,000 Meetup Members
30,000 LinkedIn Followers
2,400 GitHub Stars
15 Enterprise Beta Users
RECENT PIPELINE.AI NEWS
Sept 2017
Dec 2017
Jan 2018
PipelineAI Becomes Google ML/AI Expert
Register to Install PipelineAI
in Your Own Environment
(Starting March 2018)
http://pipeline.ai
Try GPU Community
Edition Today!
http://community.pipeline.ai
WHY HEAVY FOCUS ON MODEL SERVING?
Model Training
Batch & Boring
Offline in Research Lab
Pipeline Ends at Training
No Insight into Live Production
Small Number of Data Scientists
Optimizations Very Well-Known
Real-Time & Exciting!!
Online in Live Production
Pipeline Extends into Production
Continuous Insight into Live Production
Huuuuuuge Number of Application Users
Runtime Optimizations Not Yet Explored
<<<
Model Serving
100’s Training Jobs per Day 1,000,000’s Predictions per Sec
CLOUD-BASED MODEL SERVING OPTIONS
§ AWS SageMaker
§ Released Nov 2017 @ Re-invent
§ Custom Docker Images for Training/Serving (ie. PipelineAI Images)
§ Distributed TensorFlow Training through Estimator API
§ Traffic Splitting for A/B Model Testing
§ Google Cloud ML Engine
§ Mostly Command-Line Based
§ Driving TensorFlow Open Source API (ie. Estimator API)
§ Azure ML
PipelineAI Supports
Hybrid-Cloud
Deployments
BUILD MODEL WITH THE RUNTIME
§ Package Model + Runtime into 1 Docker Image
§ Emphasizes Immutable Deployment and Infrastructure
§ Same Image Across All Environments
§ No Library or Dependency Surprises from Laptop to Production
§ Allows Tuning Model + Runtime Together
pipeline predict-server-build --model-name=mnist 
--model-tag=A 
--model-type=tensorflow 
--model-runtime=tfserving 
--model-chip=gpu 
--model-path=./tensorflow/mnist/
Build Local
Model Server A
TUNE MODEL + RUNTIME TOGETHER
§ Model Training Optimizations
§ Model Hyper-Parameters (ie. Learning Rate)
§ Reduced Precision (ie. FP16 Half Precision)
§ Model Serving (Post-Train) Optimizations
§ Quantize Model Weights + Activations From 32-bit to 8-bit
§ Fuse Neural Network Layers Together
§ Model Runtime Optimizations
§ Runtime Config: Request Batch Size, etc
§ Different Runtime: TensorFlow Serving CPU/GPU, Nvidia TensorRT
SERVING (POST-TRAIN) OPTIMIZATIONS
§ Prepare Model for Serving
§ Simplify Network, Reduce Size
§ Reduce Precision -> Fast Math
§ Some Tools
§ Graph Transform Tool (GTT)
§ tfcompile
After Training
After
Optimizing!
pipeline optimize --optimization-list=[‘quantize_weights’,‘tfcompile’] 
--model-name=mnist 
--model-tag=A 
--model-path=./tensorflow/mnist/model 
--model-inputs=[‘x’] 
--model-outputs=[‘add’] 
--output-path=./tensorflow/mnist/optimized_model
Linear
Regression
Model Size: 70MB –> 70K (!)
NVIDIA TENSOR-RT RUNTIME
§ Post-Training Model Optimizations
§ Specific to Nvidia GPUs
§ GPU-Optimized Prediction Runtime
§ Alternative to TensorFlow Serving
§ PipelineAI Supports TensorRT!
TENSORFLOW LITE RUNTIME
§ Post-Training Model Optimizations
§ Currently Supports iOS and Android
§ On-Device Prediction Runtime
§ Low-Latency, Fast Startup
§ Selective Operator Loading
§ 70KB Min - 300KB Max Runtime Footprint
§ Supports Accelerators (GPU, TPU)
§ Falls Back to CPU without Accelerator
§ Java and C++ APIs
3 DIFFERENT RUNTIMES, SAME MODEL
pipeline predict-server-build --model-name=mnist 
--model-tag=C 
--model-type=tensorflow 
--model-runtime=tensorrt 
--model-chip=gpu 
--model-path=./tensorflow/mnist/
Build Local
Model Server C
pipeline predict-server-build --model-name=mnist 
--model-tag=A 
--model-type=tensorflow 
--model-runtime=tfserving 
--model-chip=cpu 
--model-path=./tensorflow/mnist/
Build Local
Model Server A
pipeline predict-server-build --model-name=mnist 
--model-tag=B 
--model-type=tensorflow 
--model-runtime=tfserving 
--model-chip=gpu 
--model-path=./tensorflow/mnist/
Build Local
Model Server B
Same Model,
Diff Runtime
RUN A LOADTEST LOCALLY!
§ Perform Mini-Load Test on Local Model Server
§ Immediate, Local Prediction Performance Metrics
§ Compare to Previous Model + Runtime Variations
§ Gain Intuition Before Push to Prod
pipeline predict-server-start --model-name=mnist 
--model-tag=A 
--memory-limit=2G
pipeline predict-http-test --model-endpoint-url=http://localhost:8080 
--test-request-path=test_request.json 
--test-request-concurrency=1000
Start Local
LoadTest
Start Local
Model Servers
PUSH IMAGE TO DOCKER REGISTRY
§ Supports All Public + Private Docker Registries
§ DockerHub, Artifactory, Quay, AWS, Google, …
§ Or Self-Hosted, Private Docker Registry
pipeline predict-server-push --model-name=mnist 
--model-tag=A 
--image-registry-url=<your-registry> 
--image-registry-repo=<your-repo>
Push Images to
Docker Registry
DEPLOY MODELS SAFELY TO PROD
§ Deploy from CLI or Jupyter Notebook
§ Tear-Down and Rollback Models Quickly
§ Shadow Canary: Deploy to 20% Live Traffic
§ Split Canary: Deploy to 97-2-1% Live Traffic
pipeline predict-kube-start --model-name=mnist 
--model-tag=BStart Cluster B
pipeline predict-kube-start --model-name=mnist 
--model-tag=CStart Cluster C
pipeline predict-kube-start --model-name=mnist 
--model-tag=AStart Cluster A
pipeline predict-kube-route --model-name=mnist 
--model-split-tag-and-weight-dict='{"A":97, "B":2, "C”:1}' 
--model-shadow-tag-list='[]'
Route Live Traffic
COMPARE MODELS OFFLINE & ONLINE
§ Offline, Batch Metrics
§ Validation + Training Accuracy
§ CPU + GPU Utilization
§ Online, Live Prediction Values
§ Compare Relative Precision
§ Newly-Seen, Streaming Data
§ Online, Real-Time Metrics
§ Response Time, Throughput
§ Cost ($) Per Prediction
ENSEMBLE PREDICTION AUDIT TRAIL
§ Necessary for Model Explain-ability
§ Fine-Grained Request Tracing
§ Used for Model Ensembles
REAL-TIME PREDICTION STREAMS
§ Visually Compare Real-time Predictions
Features and
Inputs
Predictions and
Confidences
Model B Model CModel A
PREDICTION PROFILING AND TUNING
§ Pinpoint Performance Bottlenecks
§ Fine-Grained Prediction Metrics
§ 3 Steps in Real-Time Prediction
1. transform_request()
2. predict()
3. transform_response()
SHIFT TRAFFIC TO MAX(REVENUE)
§ Shift Traffic to Winning Model with Multi-armed Bandits
LIVE, ADAPTIVE TRAFFIC ROUTING
§ A/B Tests
§ Inflexible and Boring
§ Multi-Armed Bandits
§ Adaptive and Exciting!
pipeline predict-kube-route --model-name=mnist 
--model-split-tag-and-weight-dict='{"A":1, "B":2, "C”:97}’ 
--model-shadow-tag-list='[]'
Route Traffic
Dynamically
SHIFT TRAFFIC TO MIN(CLOUD CO$T)
§ Based on Cost ($) Per Prediction
§ Cost Changes Throughout Day
§ Lose AWS Spot Instances
§ Google Cloud Becomes Cheaper
§ Shift Across Clouds & On-Prem
PSEUDO-CONTINUOUS TRAINING
§ Identify and Fix Borderline (Unconfident) Predictions
§ Fix Predictions Along Class Boundaries
§ Facilitate ”Human in the Loop”
§ Retrain with Newly-Labeled Data
§ Game-ify the Labeling Process
§ Path to Crowd-Sourced Labeling
CONTINUOUS MODEL TRAINING
§ The Holy Grail of Machine Learning!
§ PipelineAI Supports Continuous Model Training!
§ Kafka, Kinesis
§ Spark Streaming, Flink
§ Storm, Heron
THANK YOU!!
§ Please Star this GitHub Repo!
§ All slides, code, notebooks, and Docker images here:
https://github.com/PipelineAI/pipeline
Contact Me
chris@pipeline.ai
@cfregly

More Related Content

What's hot (16)

PDF
Building your own calendly using amazon app sync
Dhaval Nagar
 
PDF
Scaling Machine Learning To Billions Of Parameters
Jen Aman
 
PDF
アドテク×Scala @Dynalyst
Sangwon Han
 
PDF
Amazon EC2 Container Service Live Demo - Microservices Web Day
AWS Germany
 
PDF
Low Cost AWS Services For Application Development in the Cloud
Dhaval Nagar
 
PDF
Camel Desing Patterns Learned Through Blood, Sweat, and Tears
Bilgin Ibryam
 
PPTX
Running Vue Storefront in production (PWA Magento webshop)
Vendic Magento, PWA & Marketing
 
PDF
Page experience road - WordCamp Athens 2022
Fellyph Cintra
 
PDF
Serverless Architecture Patterns - Manoj Ganapathi
CodeOps Technologies LLP
 
PDF
Heat optimization
Rico Lin
 
PPTX
Serverless by examples and case studies
CodeOps Technologies LLP
 
ODP
Building Complex Data Workflows with Cascading on Hadoop
Gagan Agrawal
 
PDF
Digdag Updates 2020 July
You Yamagata
 
PPTX
Performance on a budget
Dimitry Ushakov
 
PDF
DW on AWS
Gaurav Agrawal
 
PDF
Gab2015 nicolas fonrose_costefficiencywithmicrosoftazure.pptx
Vincent Thavonekham-Pro
 
Building your own calendly using amazon app sync
Dhaval Nagar
 
Scaling Machine Learning To Billions Of Parameters
Jen Aman
 
アドテク×Scala @Dynalyst
Sangwon Han
 
Amazon EC2 Container Service Live Demo - Microservices Web Day
AWS Germany
 
Low Cost AWS Services For Application Development in the Cloud
Dhaval Nagar
 
Camel Desing Patterns Learned Through Blood, Sweat, and Tears
Bilgin Ibryam
 
Running Vue Storefront in production (PWA Magento webshop)
Vendic Magento, PWA & Marketing
 
Page experience road - WordCamp Athens 2022
Fellyph Cintra
 
Serverless Architecture Patterns - Manoj Ganapathi
CodeOps Technologies LLP
 
Heat optimization
Rico Lin
 
Serverless by examples and case studies
CodeOps Technologies LLP
 
Building Complex Data Workflows with Cascading on Hadoop
Gagan Agrawal
 
Digdag Updates 2020 July
You Yamagata
 
Performance on a budget
Dimitry Ushakov
 
DW on AWS
Gaurav Agrawal
 
Gab2015 nicolas fonrose_costefficiencywithmicrosoftazure.pptx
Vincent Thavonekham-Pro
 

Similar to High Performance Distributed TensorFlow with GPUs and Kubernetes (20)

PDF
PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...
Chris Fregly
 
PDF
PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...
Chris Fregly
 
PDF
Optimizing, Profiling, and Deploying High Performance Spark ML and TensorFlow AI
Data Con LA
 
PDF
PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...
Chris Fregly
 
PDF
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Chris Fregly
 
PDF
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
Chris Fregly
 
PDF
Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...
Chris Fregly
 
PDF
Building Google's ML Engine from Scratch on AWS with GPUs, Kubernetes, Istio,...
Chris Fregly
 
PDF
Building Google Cloud ML Engine From Scratch on AWS with PipelineAI - ODSC Lo...
Chris Fregly
 
PDF
Nvidia GPU Tech Conference - Optimizing, Profiling, and Deploying TensorFlow...
Chris Fregly
 
PDF
Dsdt meetup 2017 11-21
JDA Labs MTL
 
PDF
DSDT Meetup Nov 2017
DSDT_MTL
 
PPTX
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
SQUADEX
 
PDF
The Convergence of HPC and Deep Learning
inside-BigData.com
 
PDF
TensorFlow meetup: Keras - Pytorch - TensorFlow.js
Stijn Decubber
 
PDF
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
Databricks
 
PPTX
OS for AI: Elastic Microservices & the Next Gen of ML
Nordic APIs
 
PDF
Apache Samza 1.0 - What's New, What's Next
Prateek Maheshwari
 
PDF
Service Virtualization - Next Gen Testing Conference Singapore 2013
Min Fang
 
PDF
201908 Overview of Automated ML
Mark Tabladillo
 
PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...
Chris Fregly
 
PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...
Chris Fregly
 
Optimizing, Profiling, and Deploying High Performance Spark ML and TensorFlow AI
Data Con LA
 
PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...
Chris Fregly
 
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Chris Fregly
 
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
Chris Fregly
 
Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...
Chris Fregly
 
Building Google's ML Engine from Scratch on AWS with GPUs, Kubernetes, Istio,...
Chris Fregly
 
Building Google Cloud ML Engine From Scratch on AWS with PipelineAI - ODSC Lo...
Chris Fregly
 
Nvidia GPU Tech Conference - Optimizing, Profiling, and Deploying TensorFlow...
Chris Fregly
 
Dsdt meetup 2017 11-21
JDA Labs MTL
 
DSDT Meetup Nov 2017
DSDT_MTL
 
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
SQUADEX
 
The Convergence of HPC and Deep Learning
inside-BigData.com
 
TensorFlow meetup: Keras - Pytorch - TensorFlow.js
Stijn Decubber
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
Databricks
 
OS for AI: Elastic Microservices & the Next Gen of ML
Nordic APIs
 
Apache Samza 1.0 - What's New, What's Next
Prateek Maheshwari
 
Service Virtualization - Next Gen Testing Conference Singapore 2013
Min Fang
 
201908 Overview of Automated ML
Mark Tabladillo
 
Ad

More from inside-BigData.com (20)

PDF
Major Market Shifts in IT
inside-BigData.com
 
PDF
Preparing to program Aurora at Exascale - Early experiences and future direct...
inside-BigData.com
 
PPTX
Transforming Private 5G Networks
inside-BigData.com
 
PDF
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
inside-BigData.com
 
PDF
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
inside-BigData.com
 
PDF
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
inside-BigData.com
 
PDF
HPC Impact: EDA Telemetry Neural Networks
inside-BigData.com
 
PDF
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
inside-BigData.com
 
PDF
Machine Learning for Weather Forecasts
inside-BigData.com
 
PPTX
HPC AI Advisory Council Update
inside-BigData.com
 
PDF
Fugaku Supercomputer joins fight against COVID-19
inside-BigData.com
 
PDF
Energy Efficient Computing using Dynamic Tuning
inside-BigData.com
 
PDF
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
inside-BigData.com
 
PDF
State of ARM-based HPC
inside-BigData.com
 
PDF
Versal Premium ACAP for Network and Cloud Acceleration
inside-BigData.com
 
PDF
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
inside-BigData.com
 
PDF
Scaling TCO in a Post Moore's Era
inside-BigData.com
 
PDF
CUDA-Python and RAPIDS for blazing fast scientific computing
inside-BigData.com
 
PDF
Introducing HPC with a Raspberry Pi Cluster
inside-BigData.com
 
PDF
Overview of HPC Interconnects
inside-BigData.com
 
Major Market Shifts in IT
inside-BigData.com
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
inside-BigData.com
 
Transforming Private 5G Networks
inside-BigData.com
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
inside-BigData.com
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
inside-BigData.com
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
inside-BigData.com
 
HPC Impact: EDA Telemetry Neural Networks
inside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
inside-BigData.com
 
Machine Learning for Weather Forecasts
inside-BigData.com
 
HPC AI Advisory Council Update
inside-BigData.com
 
Fugaku Supercomputer joins fight against COVID-19
inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
inside-BigData.com
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
inside-BigData.com
 
State of ARM-based HPC
inside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
inside-BigData.com
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
inside-BigData.com
 
Scaling TCO in a Post Moore's Era
inside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
inside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
inside-BigData.com
 
Overview of HPC Interconnects
inside-BigData.com
 
Ad

Recently uploaded (20)

PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
PPTX
Digital Circuits, important subject in CS
contactparinay1
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
PPTX
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PDF
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
PDF
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
Digital Circuits, important subject in CS
contactparinay1
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 

High Performance Distributed TensorFlow with GPUs and Kubernetes

  • 1. HIGH PERFORMANCE DISTRIBUTED TENSORFLOW IN PRODUCTION WITH GPUS AND KUBERNETES HPC ADVISORY COUNCIL, FEB 2018 CHRIS FREGLY FOUNDER @ PIPELINE.AI
  • 2. HIGH PERFORMANCE DISTRIBUTED TENSORFLOW IN PRODUCTION WITH GPUS AND KUBERNETES HPC ADVISORY COUNCIL, FEB 2018 CHRIS FREGLY FOUNDER @ PIPELINE.AI
  • 3. KEY TAKE-AWAYS Optimize Your Models After Training Validate Models Online in Live Production (Safely!) Evaluate Model Performance Offline *and* Online Monitor and Tune Your Model Serving Runtime
  • 4. INTRODUCTIONS: ME § Chris Fregly, Founder & Engineer @PipelineAI § Formerly Netflix, Databricks, IBM Spark Tech § Advanced Spark and TensorFlow Meetup § Please Join Our 60,000+ Global Members!! Contact Me [email protected] @cfregly Global Locations * San Francisco * Chicago * Austin * Washington DC * Dusseldorf * London
  • 5. INTRODUCTIONS: YOU § Data Scientist, Data Engineer, Data Analyst, Data Curious § Want to Deploy ML/AI Models Rapidly and Safely § Need to Trace or Explain Model Predictions § Have a Decent Grasp of Computer Science Fundamentals
  • 6. PIPELINE.AI IS 100% OPEN SOURCE § https://github.com/PipelineAI/pipeline/ § Please Star this GitHub Repo! § VC’s Value GitHub Stars @ $1,500 Each (?!) GitHub Repo Geo Heat Map: http://jrvis.com/red-dwarf/
  • 7. PIPELINE.AI OVERVIEW 500,000Docker Downloads 60,000 Registered Users 60,000 Meetup Members 30,000 LinkedIn Followers 2,400 GitHub Stars 15 Enterprise Beta Users
  • 8. RECENT PIPELINE.AI NEWS Sept 2017 Dec 2017 Jan 2018 PipelineAI Becomes Google ML/AI Expert Register to Install PipelineAI in Your Own Environment (Starting March 2018) http://pipeline.ai Try GPU Community Edition Today! http://community.pipeline.ai
  • 9. WHY HEAVY FOCUS ON MODEL SERVING? Model Training Batch & Boring Offline in Research Lab Pipeline Ends at Training No Insight into Live Production Small Number of Data Scientists Optimizations Very Well-Known Real-Time & Exciting!! Online in Live Production Pipeline Extends into Production Continuous Insight into Live Production Huuuuuuge Number of Application Users Runtime Optimizations Not Yet Explored <<< Model Serving 100’s Training Jobs per Day 1,000,000’s Predictions per Sec
  • 10. CLOUD-BASED MODEL SERVING OPTIONS § AWS SageMaker § Released Nov 2017 @ Re-invent § Custom Docker Images for Training/Serving (ie. PipelineAI Images) § Distributed TensorFlow Training through Estimator API § Traffic Splitting for A/B Model Testing § Google Cloud ML Engine § Mostly Command-Line Based § Driving TensorFlow Open Source API (ie. Estimator API) § Azure ML PipelineAI Supports Hybrid-Cloud Deployments
  • 11. BUILD MODEL WITH THE RUNTIME § Package Model + Runtime into 1 Docker Image § Emphasizes Immutable Deployment and Infrastructure § Same Image Across All Environments § No Library or Dependency Surprises from Laptop to Production § Allows Tuning Model + Runtime Together pipeline predict-server-build --model-name=mnist --model-tag=A --model-type=tensorflow --model-runtime=tfserving --model-chip=gpu --model-path=./tensorflow/mnist/ Build Local Model Server A
  • 12. TUNE MODEL + RUNTIME TOGETHER § Model Training Optimizations § Model Hyper-Parameters (ie. Learning Rate) § Reduced Precision (ie. FP16 Half Precision) § Model Serving (Post-Train) Optimizations § Quantize Model Weights + Activations From 32-bit to 8-bit § Fuse Neural Network Layers Together § Model Runtime Optimizations § Runtime Config: Request Batch Size, etc § Different Runtime: TensorFlow Serving CPU/GPU, Nvidia TensorRT
  • 13. SERVING (POST-TRAIN) OPTIMIZATIONS § Prepare Model for Serving § Simplify Network, Reduce Size § Reduce Precision -> Fast Math § Some Tools § Graph Transform Tool (GTT) § tfcompile After Training After Optimizing! pipeline optimize --optimization-list=[‘quantize_weights’,‘tfcompile’] --model-name=mnist --model-tag=A --model-path=./tensorflow/mnist/model --model-inputs=[‘x’] --model-outputs=[‘add’] --output-path=./tensorflow/mnist/optimized_model Linear Regression Model Size: 70MB –> 70K (!)
  • 14. NVIDIA TENSOR-RT RUNTIME § Post-Training Model Optimizations § Specific to Nvidia GPUs § GPU-Optimized Prediction Runtime § Alternative to TensorFlow Serving § PipelineAI Supports TensorRT!
  • 15. TENSORFLOW LITE RUNTIME § Post-Training Model Optimizations § Currently Supports iOS and Android § On-Device Prediction Runtime § Low-Latency, Fast Startup § Selective Operator Loading § 70KB Min - 300KB Max Runtime Footprint § Supports Accelerators (GPU, TPU) § Falls Back to CPU without Accelerator § Java and C++ APIs
  • 16. 3 DIFFERENT RUNTIMES, SAME MODEL pipeline predict-server-build --model-name=mnist --model-tag=C --model-type=tensorflow --model-runtime=tensorrt --model-chip=gpu --model-path=./tensorflow/mnist/ Build Local Model Server C pipeline predict-server-build --model-name=mnist --model-tag=A --model-type=tensorflow --model-runtime=tfserving --model-chip=cpu --model-path=./tensorflow/mnist/ Build Local Model Server A pipeline predict-server-build --model-name=mnist --model-tag=B --model-type=tensorflow --model-runtime=tfserving --model-chip=gpu --model-path=./tensorflow/mnist/ Build Local Model Server B Same Model, Diff Runtime
  • 17. RUN A LOADTEST LOCALLY! § Perform Mini-Load Test on Local Model Server § Immediate, Local Prediction Performance Metrics § Compare to Previous Model + Runtime Variations § Gain Intuition Before Push to Prod pipeline predict-server-start --model-name=mnist --model-tag=A --memory-limit=2G pipeline predict-http-test --model-endpoint-url=http://localhost:8080 --test-request-path=test_request.json --test-request-concurrency=1000 Start Local LoadTest Start Local Model Servers
  • 18. PUSH IMAGE TO DOCKER REGISTRY § Supports All Public + Private Docker Registries § DockerHub, Artifactory, Quay, AWS, Google, … § Or Self-Hosted, Private Docker Registry pipeline predict-server-push --model-name=mnist --model-tag=A --image-registry-url=<your-registry> --image-registry-repo=<your-repo> Push Images to Docker Registry
  • 19. DEPLOY MODELS SAFELY TO PROD § Deploy from CLI or Jupyter Notebook § Tear-Down and Rollback Models Quickly § Shadow Canary: Deploy to 20% Live Traffic § Split Canary: Deploy to 97-2-1% Live Traffic pipeline predict-kube-start --model-name=mnist --model-tag=BStart Cluster B pipeline predict-kube-start --model-name=mnist --model-tag=CStart Cluster C pipeline predict-kube-start --model-name=mnist --model-tag=AStart Cluster A pipeline predict-kube-route --model-name=mnist --model-split-tag-and-weight-dict='{"A":97, "B":2, "C”:1}' --model-shadow-tag-list='[]' Route Live Traffic
  • 20. COMPARE MODELS OFFLINE & ONLINE § Offline, Batch Metrics § Validation + Training Accuracy § CPU + GPU Utilization § Online, Live Prediction Values § Compare Relative Precision § Newly-Seen, Streaming Data § Online, Real-Time Metrics § Response Time, Throughput § Cost ($) Per Prediction
  • 21. ENSEMBLE PREDICTION AUDIT TRAIL § Necessary for Model Explain-ability § Fine-Grained Request Tracing § Used for Model Ensembles
  • 22. REAL-TIME PREDICTION STREAMS § Visually Compare Real-time Predictions Features and Inputs Predictions and Confidences Model B Model CModel A
  • 23. PREDICTION PROFILING AND TUNING § Pinpoint Performance Bottlenecks § Fine-Grained Prediction Metrics § 3 Steps in Real-Time Prediction 1. transform_request() 2. predict() 3. transform_response()
  • 24. SHIFT TRAFFIC TO MAX(REVENUE) § Shift Traffic to Winning Model with Multi-armed Bandits
  • 25. LIVE, ADAPTIVE TRAFFIC ROUTING § A/B Tests § Inflexible and Boring § Multi-Armed Bandits § Adaptive and Exciting! pipeline predict-kube-route --model-name=mnist --model-split-tag-and-weight-dict='{"A":1, "B":2, "C”:97}’ --model-shadow-tag-list='[]' Route Traffic Dynamically
  • 26. SHIFT TRAFFIC TO MIN(CLOUD CO$T) § Based on Cost ($) Per Prediction § Cost Changes Throughout Day § Lose AWS Spot Instances § Google Cloud Becomes Cheaper § Shift Across Clouds & On-Prem
  • 27. PSEUDO-CONTINUOUS TRAINING § Identify and Fix Borderline (Unconfident) Predictions § Fix Predictions Along Class Boundaries § Facilitate ”Human in the Loop” § Retrain with Newly-Labeled Data § Game-ify the Labeling Process § Path to Crowd-Sourced Labeling
  • 28. CONTINUOUS MODEL TRAINING § The Holy Grail of Machine Learning! § PipelineAI Supports Continuous Model Training! § Kafka, Kinesis § Spark Streaming, Flink § Storm, Heron
  • 29. THANK YOU!! § Please Star this GitHub Repo! § All slides, code, notebooks, and Docker images here: https://github.com/PipelineAI/pipeline Contact Me [email protected] @cfregly