Using MLOps to Bring ML to Production/The Promise of MLOps

Foundation for ML
Your data +
Microsoft data
Breakthrough
advancements
Data Cloud Models
Power of Azure

SpeechVision Language
2016 2017 20182018
Microsoft ML breakthroughs

Microsoft 365
ML at Microsoft
| Research

ML at scale
Monthly active
Office 365 users
using AI
180
million
Questions Asked
of Cortana
18
Billion
Number of Signals
Analyzed to Block
Emerging Threats
DAILY
6.5
Trillion

Building
a model
Data ingestion Data analysis
Data
transformation
Data validation Data splitting
Trainer
Model
validation
Training
at scale
LoggingRoll-out Serving Monitoring

Ok, but, like, I’m
a data scientist. IDGAF
I don’t care
about all that.

Cowboys and Ranchers Can Be Friends!
SRE/ML EngineersData Scientist
• Quick iteration
• Frameworks they
understand
• Best of breed tools
• No management
headaches
• Unlimited scale
• Reuse of tooling and
platforms
• Corporate compliance
• Observability
• Uptime

Haven’t I Heard This Before?

GitOps
== VELOCITY and SECURITY

MLOps = ML + DEV + OPS
Experiment
Data Acquisition
Business Understanding
Initial Modeling
Develop
Modeling
Operate
Continuous Delivery
Data Feedback Loop
System + Model Monitoring
ML
+ Testing
Continuous Integration
Continuous Deployment

MLOps Benefits
• Code drives generation
and deployments
• Pipelines are
reproducible and
verifiable
• All artifacts can be
tagged and audited
• SWE best practices for
quality control
• Offline comparisons of
model quality
• Minimize bias and
enable explainability
• Controlled rollout
capabilities
• Live comparison of
predicted vs. expected
performance
• Results fed back to
watch for drift and
improve model
Automation /
Observability Validation
Reproducibility
/Auditability
== VELOCITY and SECURITY (For ML)

Internal MLOps Platforms
FBLearner FlowTensorFlow Extended
Uber’s Michelangelo
Microsoft Aether

But I Don’t Work at a
Big Company With
Thousands of
ML Engineers!

Build Your Own MLOps Platform
And many MANY more…
+ +

Cloud Provider
MLOps Platforms

Real World Multi-Cloud
CI/CD Pipeline
Process Train Stage Serve
Data
Distributed Cloud
SRE/ML Engineers
Data Scientist
ENV
#1
ENV
#2

Azure DevOps Pipelines
Cloud-hosted pipelines for Linux, Windows and macOS.
Any language, any platform, any cloud
Build, test, and deploy Node.js, Python,  Java, PHP,
Ruby, C/C++, .NET, Android, and iOS apps. Run in
parallel on Linux, macOS, and Windows. Deploy to
Azure, AWS, GCP or on-premises
Extensible
Explore and implement a wide range of community-
built build, test, and deployment tasks, along with
hundreds of extensions from Slack to SonarCloud.
Support for YAML, reporting and more
Containers and Kubernetes
Easily build and push images to container registries
like Docker Hub and Azure Container Registry.
Deploy containers to individual hosts or Kubernetes.

First Class Model Training Tasks
CI pipeline captures:
1. Create sandbox
2. Run unit tests and code quality checks
3. Attach to compute
4. Run training pipeline
5. Evaluate model
6. Register model

Automated Deployment
CD pipeline captures:
1. Package model into container
image
2. Validate and profile model
3. Deploy model to DevTest (ACI)
4. If all is well, proceed to rollout
to AKS
Everything is done via the CLI

Model Versioning & Storage
• which data,
• which experiment / previous model(s),
• where’s the code / notebook)
• Was it converted / quantized?
• Private / compliant data

Model Validation
• Data (changes to shape / profile)
• Model in isolation (offline A/B)
• Model + app (functional testing)
• Only deploy after initial validation passes
• Ramp up traffic to new model using A/B
experimentations
• Functional behavior
• Performance characteristics

Model Deployment
• Focus on ML, not DevOps
• Get telemetry for service health and model behavior
• code-generation
• API specifications / interfaces
• Cloud Services
• Mobile / Embedded Applications
• Edge Devices
• Quantize / optimize models for target platform
• Compliant + Safe

MLOps Gets You to Production
• End-to-end ownership by data science teams
using SWE best practices
• Continuously deliver of value to end users.
• Enables lineage, auditability and regulatory
compliance through consistency

What Does All This Stuff Solve For?
1. Does My Model Actually Work?
2. What Did My Customers See?
3. Is My Model Still Good?

Does My Model Actually Work?
Time to test out
my model…
Laptop The Cloud

Laptop The Cloud

Looks good to
me! To Production!
Laptop The Cloud

Laptop The Cloud
Wait, what?
Oh… oh no…

Laptop The Cloud
WOAH there.

Laptop The Cloud
WOAH there.
Source Control

What is
happening…
Source Control
Laptop The Cloud

A Small Example of Issues You Can Have…
• Inappropriate HW/SW stack
• Mismatched driver versions
• Crash looping deployment
• Data/model versioning [Nick Walsh]
• Non-standard images/OS version
• Pre-processing code doesn’t match
production pre-processing
• Production data doesn’t match
training/test data
• Output of the model doesn’t match
application expectations
• Hand-coded heuristics better than model
[Adam Laiacano]
• Model freshness (train on out-of-date
data/input shape changed)
• Test/production statistics/population
shape skew
• Overfitting on training/test data
• Bias introduction (or not tested)
• Over/under HW provisioning
• Latency issues
• Permissions/certs
• Failure to obey health checks
• Killed production model before roll out
of new/in wrong order
• Thundering herd for new model
• Logging to the wrong location
• Storage for model not allocated
properly/accessible by deployment
tooling
• Route to artifacts not available for
download
• API signature changes not
propagated/expected
• Cross-data center latency
• Expected benefit doesn’t materialize (e.g.
multiple components in the app change
simultaneously)
• Get wrong/no traffic because A/B config
didn’t roll out
• Get too much traffic too soon (expected
to canary/exponential roll out)
• Lack of visibility into real-time model
behavior (detecting data drift, live data
distribution vs train data, etc) [Nick
Walsh]
• Outliers not predicted [MikeBSilverman]
• Change was a good change, but didn’t
communicate with the rest of the team
(so you must roll back)
• No dates! (date to measure
impact/improvement against a pre-
agreed measure; date scheduled to
assess data changes) [Mary Branscombe]
• No CI/CD; manual changes untracked
[Jon Peck]
• LACK OF DOCUMENTATION!! (the
problem, the testing, the solution, lots
more) [Terry Christiani]
• Successful model causes pain elsewhere
in the organization (e.g. detecting faults
previously missed) [Mark Round]
Or It Just Doesn’t Work!
At All!

Laptop The Cloud
Source Control
Automated
Validation &
Profiling
Package
For Rollout
Explain Model
& Look for
Bias
Clean/
Minimize
Code
Sane
Deployment
Nice. Nice.
ü

But I Can Do All
These Manually…

MLOps is a Platform and a Philosophy
Even if:
o Every data scientist trained...
o And you had all the tools necessary...
o And they all worked together...
o And your SREs understood ML modeling...
o And and and and ...
You’d still need a permenant, repeatble
record of what you did

What Did My Customers See?
SRE/ML Engineers
The Cloud
Front End
Model Server
Customer
I’d Like a loan,
please.
Source Control

SRE/ML Engineers
The Cloud
Front End
Model Server
Customer
No.
Source Control

SRE/ML Engineers
The Cloud
Front End
Model Server
Customer
Ok, but why?
Source Control

Source Control
SRE/ML Engineers
The Cloud
Front End
Model Server
Customer
Uh oh.
Lawyer
Lawyer
Lawyer
Lawyer
Lawyer
Lawyer
Lawyer
Lawyer
Lawyer
Lawyer Lawyer
Lawyer
Lawyer
Lawyer
Lawyer
Lawyer
LawyerLawyer

It’s Not Just About Explainability!
• Yes, models are complicated
• But, that’s not enough:
o What data did you train on?
o How did you transform/exclude outliers?
o What are the data statistics?
o Did anything change between code and production?
o What model did you actually serve (to this person)?
• MLOps can help!

SRE/ML Engineers
The Cloud
Front End
Model Server
Customer
Source Control
Automated
Validation &
Profiling
Package
For Rollout
Explain Model
& Look for
Bias
Clean/
Minimize
Code
Sane
Deployment

32c04681d7573
SRE/ML Engineers
The Cloud
Front End
Model Server
Customer
Automated
Validation &
Profiling
Package
For Rollout
Explain Model
& Look for
Bias
Clean/
Minimize
Code
Sane
Deployment
Source Control
Immutable
Metadata Store
b151f8e65b32a c7f4e7607b4b7 0ef1d58921d89 e2e1e994c4251 786c8e57a6d51 9ce88802f0759
9ce88802f0759

SRE/ML Engineers
The Cloud
Front End
Model Server
Customer
Automated
Validation &
Profiling
Package
For Rollout
Explain Model
& Look for
Bias
Clean/
Minimize
Code
Sane
Deployment
Source Control
Immutable
Metadata Store
32c04681d7573
Why didn’t I get
a loan?
9ce88802f0759

SRE/ML Engineers
The Cloud
Front End
Model Server
Customer
Automated
Validation &
Profiling
Package
For Rollout
Explain Model
& Look for
Bias
Clean/
Minimize
Code
Sane
Deployment
Source Control
Immutable
Metadata Store
32c04681d7573
32c04681d7573
9ce88802f0759

Is My Model Still Good?
SRE/ML Engineers
The Cloud
There is a
blue or
orange
DUCK inside
this barn.
What color
is the duck?

Let’s Use Machine
Learning!!

SRE/ML Engineers
The Cloud
Front End
Model Server
f7c5f9fe7b762
It’s a
duck!
BLUE
There is a
blue or
orange
DUCK inside
this barn.
What color
is the duck?

SRE/ML Engineers
The Cloud
Front End
Model Server
f7c5f9fe7b762
It’s a
duck!
BLUE
5 Blue Ducks
995 Yellow Ducks
Accuracy = 99%
False Positive = 1%
???????????????????

𝑷 𝑨| 𝑩 =
𝑷 𝑩| 𝑨 ⋅ 𝑷 𝑨
𝑷 𝑩
Bayes’ Theorem

Accuracy depends on
the population
distribution!

SRE/ML Engineers
The Cloud
Front End
Model Server
f7c5f9fe7b762
It’s a
duck!
BLUE
995 Yellow Ducks
5 Blue Ducks
WRONG 2/3rd of the Time!
Accuracy = 99%
False Positive = 1%
???????????????????

SRE/ML Engineers
The Cloud
Front End
Model Server
f7c5f9fe7b762
It’s a
duck!
BLUE
995 Yellow Ducks
5 Blue Ducks
Model Server
d4093cc84b267

SRE/ML Engineers
The Cloud
Front End
Model Server
995 Yellow Ducks
5 Blue Ducks
d4093cc84b267

SRE/ML Engineers
The Cloud
Front End
Model Server 500 Yellow Ducks
500 Blue Ducks
d4093cc84b267

• Models != Code – they can go stale... QUICKLY.
• IMPORTANT:
o Watch your model & data for drift from training
o Regularly (if not continuously) retrain, even before
performance begins to fail
o Multiple versions rollbacks are not uncommon!
• Without an e2e MLOps pipeline, many of the
above are O(really really hard)!

MLOps Gives* You…
• Software best practices for building machine
learning solutions
• Repeatable workflow for training a model and
rolling it out to production
• An immutable record of what’s actually running
• Lineage of model creation including data sources
• Acceleration from code to customer benefits
* Requires some human and software work

What’s Next for MLOps
• Simplify monitoring and retraining
• Extend MLOps for data incl prep and profiling
• Enterprise features
o Test cases
o Auditing
o Security
o Resource management (bin packing / resource optimization)
o Network isolation
• Metadata and API standards
Or, better yet, you tell us!

It’s a whole new world
• Data science will touch
EVERY industry.
• We can’t ask people to
become a PhD in statistics
though.
• How do WE help everyone
take advantage of this
transformation?

me: David Aronchick (david.aronchick@microsoft.com)
twitter: @aronchick
github:
• https://github.com/aronchick/kubeflow-and-mlops
• https://aka.ms/mlops
THANK YOU!

Using MLOps to Bring ML to Production/The Promise of MLOps

More Related Content

What's hot (20)

Similar to Using MLOps to Bring ML to Production/The Promise of MLOps (20)

More from Weaveworks (20)

Recently uploaded (20)

Using MLOps to Bring ML to Production/The Promise of MLOps