SlideShare a Scribd company logo
MLlib with MLFlow
Michelle Hoogenhout
July 17th 2021
What I’ll cover
Use MLlib with Mlflow end-to-end to:
● Prepare data in pyspark for use with MLLib
● Train and evaluate several classifier models
● Log model performance with MLFlow Tracking
What you’ll need
● Pyspark / Docker
● MLflow
https://github.com/michellehoog/mllib-example
Why pyspark?
Enables scalable analysis (without having to know Scala!)
Allows distributed processing
Creates ML pipelines
Interacts with Pandas
What algorithms are available on
pyspark MLLib?
● Variety of classification and regression models, incl.
○ Linear & Logistic Regression
○ Tree-based models
○ Multilayer Perceptron
○ Naive Bayes
● Clustering
● Collaborative filtering
● Frequent pattern mining
Spark workflow
● DataFrame: Spark ML uses DataFrame from Spark SQL as an ML dataset, which
can hold a variety of data types. E.g., a DataFrame could have different columns
storing text, feature vectors, true labels, and predictions.
● Transformer: A Transformer is an algorithm which can transform one DataFrame
into another DataFrame. E.g., an ML model is a Transformer which transforms
DataFrame with features into a DataFrame with predictions.
● Estimator: An Estimator is an algorithm which can be **fit** on a DataFrame to
produce a Transformer. E.g., a learning algorithm is an Estimator which trains on a
DataFrame and produces a model.
● Pipeline: A Pipeline chains multiple Transformers and Estimators together to
specify an ML workflow.
● Parameter:: All Transformers and Estimators now share a common API for
specifying parameters.
Things to note
Data format
● Dense format
● Numeric and zero-indexed (non-negative for Naive
Bayes)
● Named ‘label’ and ‘features’
Pipelines
MLFlow
pip install mlflow[extras]
https://www.mlflow.org/docs/latest/tutorials-and-examples/
tutorial.html
Open source tracking and deployment of ML models
Not specific to Spark / MLLib
MLFlow can log:
● Git commit hash
● Start & end time
● Source
● Parameters
● Metrics
● Artifacts (output)
MLlib with MLFlow.pdf
MLflow tracking overview
Step 1. Create experiment
Step 2. Add runs to your code
Step 3. View logs
MLflow tracking overview
All MLflow runs are logged to the active experiment, which can be set using any of the
following ways:
● Use the mlflow.set_experiment() command.
● Use the experiment_id parameter in the mlflow.start_run() command.
● Set one of the MLflow environment variables MLFLOW_EXPERIMENT_NAME or
MLFLOW_EXPERIMENT_ID.
If no active experiment is set, runs are logged to the notebook experiment.
Viewing the Tracking MLflow UI
The tracking API writes data to local ./mlruns directory.
To view:
Run MLflow instance with mlflow ui
MLflow’s Tracking UI: http://localhost:5000/#/

More Related Content

PDF
MLflow with Databricks
Liangjun Jiang
 
PDF
Mlflow with databricks
Liangjun Jiang
 
PDF
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
Databricks
 
PDF
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
Databricks
 
PDF
Machine learning pipeline with spark ml
datamantra
 
PDF
Introduction to MLflow
Databricks
 
PDF
Introduction to Spark ML Pipelines Workshop
Holden Karau
 
PPTX
MLflow_MLOps_Databricks_Architecture.pptx
amesar0
 
MLflow with Databricks
Liangjun Jiang
 
Mlflow with databricks
Liangjun Jiang
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
Databricks
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
Databricks
 
Machine learning pipeline with spark ml
datamantra
 
Introduction to MLflow
Databricks
 
Introduction to Spark ML Pipelines Workshop
Holden Karau
 
MLflow_MLOps_Databricks_Architecture.pptx
amesar0
 

Similar to MLlib with MLFlow.pdf (20)

PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
PDF
Advanced MLflow: Multi-Step Workflows, Hyperparameter Tuning and Integrating ...
Databricks
 
PDF
"Managing the Complete Machine Learning Lifecycle with MLflow"
Databricks
 
PPTX
DAIS Europe Nov. 2020 presentation on MLflow Model Serving
amesar0
 
PDF
Using Spark Mllib Models in a Production Training and Serving Platform: Exper...
Databricks
 
PDF
Applied Machine learning for business analytics
meghu123
 
PPTX
Training And Serving ML Model Using Kubeflow by Jayesh Sharma
CodeOps Technologies LLP
 
PDF
GraphQL Bangkok meetup 5.0
Tobias Meixner
 
PDF
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
Fei Chen
 
PDF
MLFlow 1.0 Meetup
Databricks
 
PDF
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...
Anant Corporation
 
PPTX
databricks ml flow demonstration using automatic features engineering
Mohamed MEJDOUBI
 
PPTX
Machine Learning Orchestration with Airflow
Anant Corporation
 
PDF
Productionalizing Spark ML
datamantra
 
PPTX
ML Ops Tools ML flow and Hugging Face(2).pptx
MohamedHomoda3
 
PPTX
Databricks MLflow Object Relationships
amesar0
 
PDF
Porting R Models into Scala Spark
carl_pulley
 
PPTX
MLflow Model Serving - DAIS 2021
amesar0
 
PDF
Managing the Complete Machine Learning Lifecycle with MLflow
Databricks
 
PPTX
Scale machine learning deployment
Gang Tao
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Advanced MLflow: Multi-Step Workflows, Hyperparameter Tuning and Integrating ...
Databricks
 
"Managing the Complete Machine Learning Lifecycle with MLflow"
Databricks
 
DAIS Europe Nov. 2020 presentation on MLflow Model Serving
amesar0
 
Using Spark Mllib Models in a Production Training and Serving Platform: Exper...
Databricks
 
Applied Machine learning for business analytics
meghu123
 
Training And Serving ML Model Using Kubeflow by Jayesh Sharma
CodeOps Technologies LLP
 
GraphQL Bangkok meetup 5.0
Tobias Meixner
 
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
Fei Chen
 
MLFlow 1.0 Meetup
Databricks
 
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...
Anant Corporation
 
databricks ml flow demonstration using automatic features engineering
Mohamed MEJDOUBI
 
Machine Learning Orchestration with Airflow
Anant Corporation
 
Productionalizing Spark ML
datamantra
 
ML Ops Tools ML flow and Hugging Face(2).pptx
MohamedHomoda3
 
Databricks MLflow Object Relationships
amesar0
 
Porting R Models into Scala Spark
carl_pulley
 
MLflow Model Serving - DAIS 2021
amesar0
 
Managing the Complete Machine Learning Lifecycle with MLflow
Databricks
 
Scale machine learning deployment
Gang Tao
 
Ad

Recently uploaded (20)

PPTX
Introduction-to-Python-Programming-Language (1).pptx
dhyeysapariya
 
PDF
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
PDF
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
PPTX
Data-Users-in-Database-Management-Systems (1).pptx
dharmik832021
 
PPTX
Presentation on animal welfare a good topic
kidscream385
 
PPTX
Pipeline Automatic Leak Detection for Water Distribution Systems
Sione Palu
 
PPTX
White Blue Simple Modern Enhancing Sales Strategy Presentation_20250724_21093...
RamNeymarjr
 
PDF
Mastering Financial Analysis Materials.pdf
SalamiAbdullahi
 
PPTX
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
PPTX
M1-T1.pptxM1-T1.pptxM1-T1.pptxM1-T1.pptx
teodoroferiarevanojr
 
PPTX
The whitetiger novel review for collegeassignment.pptx
DhruvPatel754154
 
PPTX
Introduction to Biostatistics Presentation.pptx
AtemJoshua
 
PPTX
MR and reffffffvvvvvvvfversal_083605.pptx
manjeshjain
 
PPTX
Databricks-DE-Associate Certification Questions-june-2024.pptx
pedelli41
 
PDF
TIC ACTIVIDAD 1geeeeeeeeeeeeeeeeeeeeeeeeeeeeeer3.pdf
Thais Ruiz
 
PPTX
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
PDF
Technical Writing Module-I Complete Notes.pdf
VedprakashArya13
 
PDF
717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...
pedelli41
 
PDF
The_Future_of_Data_Analytics_by_CA_Suvidha_Chaplot_UPDATED.pdf
CA Suvidha Chaplot
 
PDF
Blitz Campinas - Dia 24 de maio - Piettro.pdf
fabigreek
 
Introduction-to-Python-Programming-Language (1).pptx
dhyeysapariya
 
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
Data-Users-in-Database-Management-Systems (1).pptx
dharmik832021
 
Presentation on animal welfare a good topic
kidscream385
 
Pipeline Automatic Leak Detection for Water Distribution Systems
Sione Palu
 
White Blue Simple Modern Enhancing Sales Strategy Presentation_20250724_21093...
RamNeymarjr
 
Mastering Financial Analysis Materials.pdf
SalamiAbdullahi
 
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
M1-T1.pptxM1-T1.pptxM1-T1.pptxM1-T1.pptx
teodoroferiarevanojr
 
The whitetiger novel review for collegeassignment.pptx
DhruvPatel754154
 
Introduction to Biostatistics Presentation.pptx
AtemJoshua
 
MR and reffffffvvvvvvvfversal_083605.pptx
manjeshjain
 
Databricks-DE-Associate Certification Questions-june-2024.pptx
pedelli41
 
TIC ACTIVIDAD 1geeeeeeeeeeeeeeeeeeeeeeeeeeeeeer3.pdf
Thais Ruiz
 
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
Technical Writing Module-I Complete Notes.pdf
VedprakashArya13
 
717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...
pedelli41
 
The_Future_of_Data_Analytics_by_CA_Suvidha_Chaplot_UPDATED.pdf
CA Suvidha Chaplot
 
Blitz Campinas - Dia 24 de maio - Piettro.pdf
fabigreek
 
Ad

MLlib with MLFlow.pdf

  • 1. MLlib with MLFlow Michelle Hoogenhout July 17th 2021
  • 2. What I’ll cover Use MLlib with Mlflow end-to-end to: ● Prepare data in pyspark for use with MLLib ● Train and evaluate several classifier models ● Log model performance with MLFlow Tracking
  • 3. What you’ll need ● Pyspark / Docker ● MLflow https://github.com/michellehoog/mllib-example
  • 4. Why pyspark? Enables scalable analysis (without having to know Scala!) Allows distributed processing Creates ML pipelines Interacts with Pandas
  • 5. What algorithms are available on pyspark MLLib? ● Variety of classification and regression models, incl. ○ Linear & Logistic Regression ○ Tree-based models ○ Multilayer Perceptron ○ Naive Bayes ● Clustering ● Collaborative filtering ● Frequent pattern mining
  • 6. Spark workflow ● DataFrame: Spark ML uses DataFrame from Spark SQL as an ML dataset, which can hold a variety of data types. E.g., a DataFrame could have different columns storing text, feature vectors, true labels, and predictions. ● Transformer: A Transformer is an algorithm which can transform one DataFrame into another DataFrame. E.g., an ML model is a Transformer which transforms DataFrame with features into a DataFrame with predictions. ● Estimator: An Estimator is an algorithm which can be **fit** on a DataFrame to produce a Transformer. E.g., a learning algorithm is an Estimator which trains on a DataFrame and produces a model. ● Pipeline: A Pipeline chains multiple Transformers and Estimators together to specify an ML workflow. ● Parameter:: All Transformers and Estimators now share a common API for specifying parameters.
  • 7. Things to note Data format ● Dense format ● Numeric and zero-indexed (non-negative for Naive Bayes) ● Named ‘label’ and ‘features’ Pipelines
  • 9. MLFlow can log: ● Git commit hash ● Start & end time ● Source ● Parameters ● Metrics ● Artifacts (output)
  • 11. MLflow tracking overview Step 1. Create experiment Step 2. Add runs to your code Step 3. View logs
  • 12. MLflow tracking overview All MLflow runs are logged to the active experiment, which can be set using any of the following ways: ● Use the mlflow.set_experiment() command. ● Use the experiment_id parameter in the mlflow.start_run() command. ● Set one of the MLflow environment variables MLFLOW_EXPERIMENT_NAME or MLFLOW_EXPERIMENT_ID. If no active experiment is set, runs are logged to the notebook experiment.
  • 13. Viewing the Tracking MLflow UI The tracking API writes data to local ./mlruns directory. To view: Run MLflow instance with mlflow ui MLflow’s Tracking UI: http://localhost:5000/#/