SlideShare a Scribd company logo
2
Most read
4
Most read
8
Most read
Intro to ML-Ops
- Presented by Avinash Patil,
DevOps and Budding ML-Ops
“ Machine Learning means Building a model from example inputs to make
data-driven predictions vs. following strictly static program instructions. ”
Machine Learning Workflow
An orchestrated and repeatable pattern which systematically transforms and
processes information to create prediction solutions.
1
Asking
the
right
question
?
3
Selecting
the
Algorithm
4
Training
the
m
odel
2
Preparing
Data
5
Testing
the
m
odel
What is ML-Ops
★ MLOps is about building a scalable team ML Researcher,
Data Engineer , Product Managers, DevOps.
★ Extension of DevOps to ML as first class citizen.
★ Infrastructure and tooling to Productionize ML
Software Engineering
Developer OperationsMachine Learning
ML-Ops
Continuous Delivery for Machine Learning (CD4ML) :
a software engineering approach in which a cross-functional team produces machine learning
applications based on code, data, and models in small and safe increments that can be reproduced and
reliably released at any time, in short adaptation cycles
Challenges in Typical Organization
Common functional silos in large organizations can create barriers, stifling the ability to automate the end-to-end process of
deploying ML applications to production
I. Organizational Challenges : Different teams, Handover is like throw over the wall
II. Technical Challenges: How to make the process reproducible and auditable. Because these teams use different
tools and follow different workflows, it becomes hard to automate it end-to-end.
Technical Components of CD4ML
1. Discoverable and Accessible Data : Data Pipeline, Collect and make data available as “Data Lake”
2. Reproducible Model Training : ML Pipeline : Split data into Training and Validation Set.
3. Model Serving: Embedded model / Model published as Service / Model Published as Data
4. Testing and Quality in Machine Learning : Validating Data Schemas ,Component Integration, Model Quality, Model
Bias and Fairness
5. Experiments Tracking: Version control the data and git versioning of data science experiments
6. Model Deployment: Train the model to make significant decisions
7. Continuous Delivery Orchestration: Provision and execute ML Pipeline, releases and automate governance
stages
8. Model Monitoring and Observability: Integrate tools for log aggregation, metrics and ML models behavioral data.
Discover and Accessible Data:
★ Gather data from your core transactional systems
★ Also bring in data sources from outside your organization
★ Organize data volumes as Data Lake or Collection of Real-time data streams
★ Data Pipeline : Transform , Cleanup and De-normalize multiple files
★ Use Amazon S3 / Google Cloud Storage
★ Version Control the derived/transformed data as an artifact.
Reproducible Model Training
★ Process that takes data and code as input, and produces a trained ML model
as the output. This process usually involves data cleaning and pre-processing,
feature engineering, model and algorithm selection, model optimization and
evaluation.
Model Serving
★ Embedded Model: When Model artifact is packaged together with consuming application. E.g.
Serialize object file {Pickle in Python}, MLeap as common to Tensorflow, Sci-kit learn Models
★ Models Deployed as Separate Service: Model is decoupled and wrapped in service and can be used
by consuming applications and also easy to upgrade the release versions, as it is distinct service, it
may introduce some latency. E.g. Wrap your model for deployment into their MLaaS such AWS
Sagemaker
★ Model Published as Data: Model is also treated and published independently, but the consuming
application will ingest it as data at runtime. We have seen this used in streaming/real-time scenarios
where the application can subscribe to events that are published whenever a new model version is
released, and ingest them into memory while continuing to predict using the previous version.
E.g. Apache Spark Model Serving through REST API
Testing and Quality in ML
★ Validating Data
★ Validating Component Integration
★ Validating Model Quality
★ Validating Model Fairness and Bias
Experiment Tracking
★ As ML model is research centric, Data Scientists conducts new experiments
to analyse data
★ Track experiments to version control philosophy
★ Integrate branches of experiments with Training Model
★ DVC and MLFlow Tracking can be used
Model Deployment
★ Multiple Models : Publishing APIs for different models for predicting
consumer applications
★ Shadow Models: Replace a version in Production with current one as Shadow
Model
★ Competing Models: Complex and managing multiple versions of models in
production like A/B test and routing choices based to make statistically
significant decisions
★ Online Learning Model: Model to make online, real-time decisions and
continuously improve performance with the sequential arrival of data
Continuous Delivery Orchestration
★ Model automated and manual ML governance stages into our deployment pipeline, to help detect
model bias, fairness, or to introduce explainability for humans to decide if the model should further
progress towards production or not.
★ Machine Learning Pipeline: to perform model training and evaluation within the GoCD agent, as well
as executing the basic threshold test to decide if the model can be promoted or not. If the model is
good, we perform a dvc push command to publish it as an artifact.
★ Application Deployment Pipeline: to build and test the application code, to fetch the promoted model
from the upstream pipeline using dvc pull, to package a new combined artifact that contains the
model and the application as a Docker image, and to deploy them to a Kubernetes production
cluster.
Model Monitoring and Observability
★ Model inputs: what data is being fed to the models, giving visibility into any training-serving skew.
Model outputs: what predictions and recommendations are the models making from these inputs, to
understand how the model is performing with real data.
★ Model interpretability outputs: metrics such as model coefficients, ELI5, or LIME outputs that allow
further investigation to understand how the models are making predictions to identify potential
overfit or bias that was not found during training.
★ Model outputs and decisions: what predictions our models are making given the production input
data, and also which decisions are being made with those predictions. Sometimes the application
might choose to ignore the model and make a decision based on predefined rules (or to avoid future
bias).
★ User action and rewards: based on further user action, we can capture reward metrics to
understand if the model is having the desired effect. For example, if we display product
recommendations, we can track when the user decides to purchase the recommended product as a
reward.
★ Model fairness: analysing input data and output predictions against known features that could bias,
such as race, gender, age, income groups, etc.
End to End CD4ML Process
Practical Example:
References :
➢ https://mlflow.org
➢ https://martinfowler.com/articles/cd4ml.html
➢ https://github.com/ThoughtWorksInc/cd4ml-workshop
➢ https://www.slideshare.net/ThoughtWorks/continuous-delivery-for-machine-l
earning-198815316
➢ https://dvc.org/
➢ https://mleap-docs.combust.ml/getting-started/

More Related Content

What's hot (20)

PPTX
From Data Science to MLOps
Carl W. Handlin
 
PDF
Ml ops past_present_future
Nisha Talagala
 
PPTX
MLOps - The Assembly Line of ML
Jordan Birdsell
 
PDF
MLOps Bridging the gap between Data Scientists and Ops.
Knoldus Inc.
 
PPTX
MLOps with Azure DevOps
Marco Parenzan
 
PDF
Databricks Overview for MLOps
Databricks
 
PPTX
MLOps and Data Quality: Deploying Reliable ML Models in Production
Provectus
 
PDF
“Houston, we have a model...” Introduction to MLOps
Rui Quintino
 
PDF
Introduction to MLflow
Databricks
 
PDF
Seamless MLOps with Seldon and MLflow
Databricks
 
PDF
Apply MLOps at Scale by H&M
Databricks
 
PDF
MLOps by Sasha Rosenbaum
Sasha Rosenbaum
 
PDF
Managing the Complete Machine Learning Lifecycle with MLflow
Databricks
 
PDF
Drifting Away: Testing ML Models in Production
Databricks
 
PDF
Managing the Machine Learning Lifecycle with MLOps
Fatih Baltacı
 
PDF
ML-Ops how to bring your data science to production
Herman Wu
 
PDF
mlflow: Accelerating the End-to-End ML lifecycle
Databricks
 
PDF
Apply MLOps at Scale
Databricks
 
PDF
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Databricks
 
PDF
Ml ops on AWS
PhilipBasford
 
From Data Science to MLOps
Carl W. Handlin
 
Ml ops past_present_future
Nisha Talagala
 
MLOps - The Assembly Line of ML
Jordan Birdsell
 
MLOps Bridging the gap between Data Scientists and Ops.
Knoldus Inc.
 
MLOps with Azure DevOps
Marco Parenzan
 
Databricks Overview for MLOps
Databricks
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
Provectus
 
“Houston, we have a model...” Introduction to MLOps
Rui Quintino
 
Introduction to MLflow
Databricks
 
Seamless MLOps with Seldon and MLflow
Databricks
 
Apply MLOps at Scale by H&M
Databricks
 
MLOps by Sasha Rosenbaum
Sasha Rosenbaum
 
Managing the Complete Machine Learning Lifecycle with MLflow
Databricks
 
Drifting Away: Testing ML Models in Production
Databricks
 
Managing the Machine Learning Lifecycle with MLOps
Fatih Baltacı
 
ML-Ops how to bring your data science to production
Herman Wu
 
mlflow: Accelerating the End-to-End ML lifecycle
Databricks
 
Apply MLOps at Scale
Databricks
 
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Databricks
 
Ml ops on AWS
PhilipBasford
 

Similar to Ml ops intro session (20)

PPTX
DevOps for Machine Learning overview en-us
eltonrodriguez11
 
PPTX
MOPs & ML Pipelines on GCP - Session 6, RGDC
gdgsurrey
 
PDF
Building successful and secure products with AI and ML
Simon Lia-Jonassen
 
PDF
Continuous delivery for machine learning
Rajesh Muppalla
 
PDF
C2_W1---.pdf
Humayun Kabir
 
PDF
Overcome the Hurdles of Machine Learning Model Deployment_ A Comprehensive Gu...
remissionpasurai
 
PDF
Productionalizing Models through CI/CD Design with MLflow
Databricks
 
PPTX
CNCF-Istanbul-MLOps for Devops Engineers.pptx
cansukavili1
 
PDF
"Managing the Complete Machine Learning Lifecycle with MLflow"
Databricks
 
PPTX
From Notebook to Production: What Most ML Tutorials Don’t Teach
vivekbharti311
 
PPTX
Why is dev ops for machine learning so different - dataxdays
Ryan Dawson
 
PDF
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Anyscale
 
PPTX
Improving How We Deliver Machine Learning Models (XCONF 2019)
David Tan
 
PPTX
ML Ops.pptx
Adam Doyle
 
PDF
Productionising Machine Learning Models
Tash Bickley
 
PPTX
Nasscom ml ops webinar
Sameer Mahajan
 
PPTX
MLOps.pptx
sundharakumarkb1
 
PPTX
Why do the majority of Data Science projects never make it to production?
Itai Yaffe
 
PPTX
230208 MLOps Getting from Good to Great.pptx
Arthur240715
 
PDF
MLFlow: Platform for Complete Machine Learning Lifecycle
Databricks
 
DevOps for Machine Learning overview en-us
eltonrodriguez11
 
MOPs & ML Pipelines on GCP - Session 6, RGDC
gdgsurrey
 
Building successful and secure products with AI and ML
Simon Lia-Jonassen
 
Continuous delivery for machine learning
Rajesh Muppalla
 
C2_W1---.pdf
Humayun Kabir
 
Overcome the Hurdles of Machine Learning Model Deployment_ A Comprehensive Gu...
remissionpasurai
 
Productionalizing Models through CI/CD Design with MLflow
Databricks
 
CNCF-Istanbul-MLOps for Devops Engineers.pptx
cansukavili1
 
"Managing the Complete Machine Learning Lifecycle with MLflow"
Databricks
 
From Notebook to Production: What Most ML Tutorials Don’t Teach
vivekbharti311
 
Why is dev ops for machine learning so different - dataxdays
Ryan Dawson
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Anyscale
 
Improving How We Deliver Machine Learning Models (XCONF 2019)
David Tan
 
ML Ops.pptx
Adam Doyle
 
Productionising Machine Learning Models
Tash Bickley
 
Nasscom ml ops webinar
Sameer Mahajan
 
MLOps.pptx
sundharakumarkb1
 
Why do the majority of Data Science projects never make it to production?
Itai Yaffe
 
230208 MLOps Getting from Good to Great.pptx
Arthur240715
 
MLFlow: Platform for Complete Machine Learning Lifecycle
Databricks
 
Ad

Recently uploaded (20)

PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
PDF
July Patch Tuesday
Ivanti
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
Biography of Daniel Podor.pdf
Daniel Podor
 
PDF
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
July Patch Tuesday
Ivanti
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Biography of Daniel Podor.pdf
Daniel Podor
 
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
Ad

Ml ops intro session

  • 1. Intro to ML-Ops - Presented by Avinash Patil, DevOps and Budding ML-Ops
  • 2. “ Machine Learning means Building a model from example inputs to make data-driven predictions vs. following strictly static program instructions. ”
  • 3. Machine Learning Workflow An orchestrated and repeatable pattern which systematically transforms and processes information to create prediction solutions. 1 Asking the right question ? 3 Selecting the Algorithm 4 Training the m odel 2 Preparing Data 5 Testing the m odel
  • 4. What is ML-Ops ★ MLOps is about building a scalable team ML Researcher, Data Engineer , Product Managers, DevOps. ★ Extension of DevOps to ML as first class citizen. ★ Infrastructure and tooling to Productionize ML Software Engineering Developer OperationsMachine Learning ML-Ops
  • 5. Continuous Delivery for Machine Learning (CD4ML) : a software engineering approach in which a cross-functional team produces machine learning applications based on code, data, and models in small and safe increments that can be reproduced and reliably released at any time, in short adaptation cycles
  • 6. Challenges in Typical Organization Common functional silos in large organizations can create barriers, stifling the ability to automate the end-to-end process of deploying ML applications to production I. Organizational Challenges : Different teams, Handover is like throw over the wall II. Technical Challenges: How to make the process reproducible and auditable. Because these teams use different tools and follow different workflows, it becomes hard to automate it end-to-end.
  • 7. Technical Components of CD4ML 1. Discoverable and Accessible Data : Data Pipeline, Collect and make data available as “Data Lake” 2. Reproducible Model Training : ML Pipeline : Split data into Training and Validation Set. 3. Model Serving: Embedded model / Model published as Service / Model Published as Data 4. Testing and Quality in Machine Learning : Validating Data Schemas ,Component Integration, Model Quality, Model Bias and Fairness 5. Experiments Tracking: Version control the data and git versioning of data science experiments 6. Model Deployment: Train the model to make significant decisions 7. Continuous Delivery Orchestration: Provision and execute ML Pipeline, releases and automate governance stages 8. Model Monitoring and Observability: Integrate tools for log aggregation, metrics and ML models behavioral data.
  • 8. Discover and Accessible Data: ★ Gather data from your core transactional systems ★ Also bring in data sources from outside your organization ★ Organize data volumes as Data Lake or Collection of Real-time data streams ★ Data Pipeline : Transform , Cleanup and De-normalize multiple files ★ Use Amazon S3 / Google Cloud Storage ★ Version Control the derived/transformed data as an artifact.
  • 9. Reproducible Model Training ★ Process that takes data and code as input, and produces a trained ML model as the output. This process usually involves data cleaning and pre-processing, feature engineering, model and algorithm selection, model optimization and evaluation.
  • 10. Model Serving ★ Embedded Model: When Model artifact is packaged together with consuming application. E.g. Serialize object file {Pickle in Python}, MLeap as common to Tensorflow, Sci-kit learn Models ★ Models Deployed as Separate Service: Model is decoupled and wrapped in service and can be used by consuming applications and also easy to upgrade the release versions, as it is distinct service, it may introduce some latency. E.g. Wrap your model for deployment into their MLaaS such AWS Sagemaker ★ Model Published as Data: Model is also treated and published independently, but the consuming application will ingest it as data at runtime. We have seen this used in streaming/real-time scenarios where the application can subscribe to events that are published whenever a new model version is released, and ingest them into memory while continuing to predict using the previous version. E.g. Apache Spark Model Serving through REST API
  • 11. Testing and Quality in ML ★ Validating Data ★ Validating Component Integration ★ Validating Model Quality ★ Validating Model Fairness and Bias
  • 12. Experiment Tracking ★ As ML model is research centric, Data Scientists conducts new experiments to analyse data ★ Track experiments to version control philosophy ★ Integrate branches of experiments with Training Model ★ DVC and MLFlow Tracking can be used
  • 13. Model Deployment ★ Multiple Models : Publishing APIs for different models for predicting consumer applications ★ Shadow Models: Replace a version in Production with current one as Shadow Model ★ Competing Models: Complex and managing multiple versions of models in production like A/B test and routing choices based to make statistically significant decisions ★ Online Learning Model: Model to make online, real-time decisions and continuously improve performance with the sequential arrival of data
  • 14. Continuous Delivery Orchestration ★ Model automated and manual ML governance stages into our deployment pipeline, to help detect model bias, fairness, or to introduce explainability for humans to decide if the model should further progress towards production or not. ★ Machine Learning Pipeline: to perform model training and evaluation within the GoCD agent, as well as executing the basic threshold test to decide if the model can be promoted or not. If the model is good, we perform a dvc push command to publish it as an artifact. ★ Application Deployment Pipeline: to build and test the application code, to fetch the promoted model from the upstream pipeline using dvc pull, to package a new combined artifact that contains the model and the application as a Docker image, and to deploy them to a Kubernetes production cluster.
  • 15. Model Monitoring and Observability ★ Model inputs: what data is being fed to the models, giving visibility into any training-serving skew. Model outputs: what predictions and recommendations are the models making from these inputs, to understand how the model is performing with real data. ★ Model interpretability outputs: metrics such as model coefficients, ELI5, or LIME outputs that allow further investigation to understand how the models are making predictions to identify potential overfit or bias that was not found during training. ★ Model outputs and decisions: what predictions our models are making given the production input data, and also which decisions are being made with those predictions. Sometimes the application might choose to ignore the model and make a decision based on predefined rules (or to avoid future bias). ★ User action and rewards: based on further user action, we can capture reward metrics to understand if the model is having the desired effect. For example, if we display product recommendations, we can track when the user decides to purchase the recommended product as a reward. ★ Model fairness: analysing input data and output predictions against known features that could bias, such as race, gender, age, income groups, etc.
  • 16. End to End CD4ML Process
  • 18. References : ➢ https://mlflow.org ➢ https://martinfowler.com/articles/cd4ml.html ➢ https://github.com/ThoughtWorksInc/cd4ml-workshop ➢ https://www.slideshare.net/ThoughtWorks/continuous-delivery-for-machine-l earning-198815316 ➢ https://dvc.org/ ➢ https://mleap-docs.combust.ml/getting-started/