SlideShare a Scribd company logo
Unifying Twitter Around a Single ML Platform
Yi Zhuang (@yz), Nicholas Leonard (@strife076)
April 17, 2019
Overview
• ML Use Cases at Twitter
• ML Platform Requirements & Challenges
• Unifying Twitter Around a Single ML Platform
• Technology Migrations
• Health ML Use Case
• Summary of Lessons Learned
• Future of Our ML Platform
Overview
• ML Use Cases at Twitter
• ML Platform Requirements & Challenges
• Unifying Twitter Around a Single ML Platform
• Technology Migrations
• Health ML Use Case
• Lessons Learned
• Future of Our ML Platform
ML Use Cases: Tweet Ranking
ML Use Cases at Twitter: Ads
pCTR =
p ( “click” | if we show
this Candidate Ad to
this User
in this Context)
User
Candidate Ad
Context
“Click”
ML Use Cases at Twitter
• Other use cases
• Recommending Tweets, Users, Hashtags, News, etc.
• Detecting Abusive Tweets and Spam
• Detecting NSFW Images and Videos
• And so on …
ML Use Cases at Twitter
ML is
Everywhere
Overview
• ML Use Cases at Twitter
• ML Platform Requirements & Challenges
• Unifying Twitter Around a Single ML Platform
• Technology Migrations
• Health ML Use Case
• Summary of Lessons Learned
• Future of Our ML Platform
Requirements of ML Platform
Data Scale
PBs of data per day
Some models train on Tens of TBs of data per day
Requirements of ML Platform
Prediction Throughput
Tens of millions of predictions per second
Requirements of ML Platform
Prediction Latency Budget
tens of milliseconds
Example Use Case
Ads Prediction
Training examples
everyday
1+B
1+M Features
40ms Serving latency
Predictions every
second10+M
Overview
• ML Use Cases at Twitter
• ML Platform Requirements & Challenges
• Unifying Twitter Around a Single ML Platform
• Technology Migrations
• Health ML Use Case
• Summary of Lessons Learned
• Future of Our ML Platform
Challenges of Old ML Platform
Fragmentation
of ML Practice
PyTorch
Scikit
Learn
In-house
Frameworks
VW
Lua Torch
TensorFlow
Challenges of Old ML Platform
Difficulty Sharing
Models
Tooling &
Resources
Knowledge
Challenges of Old ML Platform
Inefficiencies
Work Duplication
Example Duplicate Work
Various Ways to do
Model Training & Serving
Model Refreshes
Data Cleaning and Preprocessing
Experiment Tracking
Etc.
Overview
• ML Use Cases at Twitter
• ML Platform Requirements & Challenges
• Unifying Twitter Around a Single ML Platform
• Technology Migrations
• Health ML Use Case
• Lessons Learned
• Future of Our ML Platform
New Unified ML Platform Overview
A Single Consistent ML Platform Across Twitter
5
Producion
M
odelServing
Lorem
ipsum
dolorsitam
et,consectetur
adipiscing
elit,sed
do
eiusm
od
tem
por.
Donec
facilisis
lacus
egetm
auris.
4
Experim
entation
Tracking
3
M
odelTraining
and
Evaluation
Preprocessing
and
Featurization
21
Pipeline
Orchestration
Overview
• ML Use Cases at Twitter
• ML Platform Requirements & Challenges
• Unifying Twitter Around a Single ML Platform
• Technology migrations
• Health ML Use Case
• Summary of Lessons Learned
• Future of Our ML Platform
Technology Migrations
● Data Analysis: Scalding + PySpark/Notebooks
● Featurization: Feature Store
● ML Frameworks: Java ML -> Lua Torch -> TensorFlow
● Training and deployment cycles: Apache Airflow
Data Analysis: Scalding
● Scala
● Abstraction over hadoop
● Distributed data processing
● Great for large scale data
● Slow-iteration
Data analysis: Notebook + Spark
● iPython Notebook + PySpark
● Easier for Python engineers
● Data visualization
● Faster iteration
Lessons learned
ML Practitioner Diversity
Production ML Engineers
Deep Learning Researcher
Data Scientists
Featurization: Ad Hoc
● Teams use common data sources
○ E.g. user data, tweet data, engagement data
● Every team does their own featurization
○ Duplication of effort
● Difficult to validate features at serving time
○ Inconsistent featurization schemes for training vs serving
Featurization: Feature Store
● Teams can share, discover and access features
● Consistent training-time vs serving-time featurization
Lessons learned
Consistency
Consistency across teams => sharing & efficiency
Important: feature consistency between training and serving
ML Frameworks: Java ML
● Logistic regression
○ Relies on feature discretization
● Typically used in an online learning environment:
○ Model learns new data as it becomes available (~15 min delay)
ML Frameworks: Lua Torch
● Deep learning
● Feature discretization parity
● ML Engineers didn’t want to learn Lua:
○ Lua hidden via YAML
○ Hard to debug and unit test
● Complex production setup
○ JVM -> JNI -> Lua VMs -> C/C++
ML Frameworks: TensorFlow
● Google support
● Production ready
○ Export graphs as protobuf
○ Serve graphs from Java/Scala:
■ JVM -> TensorFlow
● TensorBoard
● Large ecosystem (E.g. TFX)
Lessons learned
Reproducibility is hard
... across different ML framework: small differences, large impacts
Online experiments take time
Need simple setup, fast iterations
Train and Deploy Cycles
Different approaches to productionizing training algorithms:
● Manually re-train and re-deploy the model periodically
○ Retraining frequency varies
● Automate training and deployment cycles:
○ Cron, Aurora, Airflow Jobs
○ Helps reduce model staleness
Train and Deploy Cycle
Apache Airflow: DAGs
Hyperparameter Tuning
Lessons learned
Automation is crucial
ML models become stale over time
ML Hyperparameter tunings are often tedious
Overview
• ML Use Cases at Twitter
• ML Platform Requirements & Challenges at Twitter
• Unifying Twitter Around a Single ML Platform
• Technology Migrations
• Health ML Use Case
• Summary of Lessons Learned
• Future of Our ML Platform
Health ML Case Study
● Situation:
○ Models still running using Lua Torch
○ Retrained manually every ~6 months.
● Mission:
○ Migrate Health ML models to new ML Platform
○ Reach metric parity with existing models (minimum)
ML Pipeline Overview
Training
Data
Preprocessing
Feature
Store
Data
Exploration Training Offline
Evaluation
Model
Tuning
Experiment
Loop
Online
A/B
Testing
Production
Experiment
Prediction Servers
Lessons Learned
Teamwork: Platform, Modeling, Product
Integration of All Components
Overview
• ML Use Cases at Twitter
• ML Platform Requirements & Challenges at Twitter
• Unifying Twitter Around a Single ML Platform
• Technology Migrations
• Summary of Lessons Learned
• Future of Our ML Platform
Summary of Lessons Learned
● Consistency brings efficiency
● DL Reproducibility is hard
● Automation is crucial
● ML practitioner Diversity
○ ML engineers vs DL researchers
○ Production vs exploration
● Collaboration of platform, modeling, product teams
Overview
• ML Use Cases at Twitter
• ML Platform Requirements & Challenges at Twitter
• Unifying Twitter Around a Single ML Platform
• Technology Migrations
• Summary of Lessons Learned
• Future of Our ML Platform
Future
2018 Strategy: Consistency & Adoption
2019 Strategy: Ease of Use & Velocity
10x, 50x training speed
Auto model evaluation & validation
Auto model deploy & auto scaling
Auto hyperparameter tuning & architecture search
Continuous Deep Learning Model Training
and so on ...
Thank You
If you are interested in learning more about Twitter Cortex, please contact: @yz @strife076

More Related Content

What's hot (20)

PDF
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
Fei Chen
 
PDF
Apache Liminal (Incubating)—Orchestrate the Machine Learning Pipeline
Databricks
 
PDF
“Houston, we have a model...” Introduction to MLOps
Rui Quintino
 
PDF
The Lyft data platform: Now and in the future
markgrover
 
PDF
What's Next for MLflow in 2019
Anyscale
 
PDF
Using Machine Learning & Artificial Intelligence to Create Impactful Customer...
Costanoa Ventures
 
PDF
Open source ml systems that need to be built
Nikhil Garg
 
PDF
Near real-time anomaly detection at Lyft
markgrover
 
PPTX
Pythonsevilla2019 - Introduction to MLFlow
Fernando Ortega Gallego
 
PDF
Accelerating Production Machine Learning with MLflow with Matei Zaharia
Databricks
 
PPTX
Feature store: Solving anti-patterns in ML-systems
Andrzej Michałowski
 
PDF
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Ed Fernandez
 
PPTX
Nasscom ml ops webinar
Sameer Mahajan
 
PDF
SAIS2018 - Fact Store At Netflix Scale
Nitin S
 
PDF
The A-Z of Data: Introduction to MLOps
DataPhoenix
 
PDF
Applied machine learning at facebook a datacenter infrastructure perspective...
Shunya Ueta
 
PDF
Machine Learning Platform Life-Cycle Management
Bill Liu
 
PDF
Rakuten - Recommendation Platform
Karthik Murugesan
 
PDF
Hamburg Data Science Meetup - MLOps with a Feature Store
Moritz Meister
 
PDF
Seamless MLOps with Seldon and MLflow
Databricks
 
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
Fei Chen
 
Apache Liminal (Incubating)—Orchestrate the Machine Learning Pipeline
Databricks
 
“Houston, we have a model...” Introduction to MLOps
Rui Quintino
 
The Lyft data platform: Now and in the future
markgrover
 
What's Next for MLflow in 2019
Anyscale
 
Using Machine Learning & Artificial Intelligence to Create Impactful Customer...
Costanoa Ventures
 
Open source ml systems that need to be built
Nikhil Garg
 
Near real-time anomaly detection at Lyft
markgrover
 
Pythonsevilla2019 - Introduction to MLFlow
Fernando Ortega Gallego
 
Accelerating Production Machine Learning with MLflow with Matei Zaharia
Databricks
 
Feature store: Solving anti-patterns in ML-systems
Andrzej Michałowski
 
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Ed Fernandez
 
Nasscom ml ops webinar
Sameer Mahajan
 
SAIS2018 - Fact Store At Netflix Scale
Nitin S
 
The A-Z of Data: Introduction to MLOps
DataPhoenix
 
Applied machine learning at facebook a datacenter infrastructure perspective...
Shunya Ueta
 
Machine Learning Platform Life-Cycle Management
Bill Liu
 
Rakuten - Recommendation Platform
Karthik Murugesan
 
Hamburg Data Science Meetup - MLOps with a Feature Store
Moritz Meister
 
Seamless MLOps with Seldon and MLflow
Databricks
 

Similar to Unifying Twitter around a single ML platform - Twitter AI Platform 2019 (20)

PPTX
Python for Machine Learning_ A Comprehensive Overview.pptx
KuldeepSinghBrar3
 
PDF
MLOps Virtual Event: Automating ML at Scale
Databricks
 
PDF
Tuning ML Models: Scaling, Workflows, and Architecture
Databricks
 
PDF
ML platforms & auto ml - UEM annotated (2) - #digitalbusinessweek
Ed Fernandez
 
PPTX
[DSC Europe 23] Petar Zecevic - ML in Production on Databricks
DataScienceConferenc1
 
PDF
PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...
Gabriel Moreira
 
PDF
PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...
Gabriel Moreira
 
PPTX
From Notebook to Production: What Most ML Tutorials Don’t Teach
vivekbharti311
 
PDF
Building a Scalable and reliable open source ML Platform with MLFlow
GoDataDriven
 
PDF
Strata parallel m-ml-ops_sept_2017
Nisha Talagala
 
PPTX
No BS Guide to Deep Learning in the Enterprise
Jesus Rodriguez
 
PDF
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
MLconf
 
PDF
Scaling Recommendations at Quora (RecSys talk 9/16/2016)
Nikhil Dandekar
 
PDF
How Lucene Powers the LinkedIn Segmentation and Targeting Platform
Hien Luu
 
PDF
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
DataScienceConferenc1
 
PDF
BigMLSchool: ML Platforms and AutoML in the Enterprise
BigML, Inc
 
PDF
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Databricks
 
PDF
MLOps for production-level machine learning
cnvrg.io AI OS - Hands-on ML Workshops
 
PDF
Making Data Science Scalable - 5 Lessons Learned
Laurenz Wuttke
 
PDF
Advanced MLflow: Multi-Step Workflows, Hyperparameter Tuning and Integrating ...
Databricks
 
Python for Machine Learning_ A Comprehensive Overview.pptx
KuldeepSinghBrar3
 
MLOps Virtual Event: Automating ML at Scale
Databricks
 
Tuning ML Models: Scaling, Workflows, and Architecture
Databricks
 
ML platforms & auto ml - UEM annotated (2) - #digitalbusinessweek
Ed Fernandez
 
[DSC Europe 23] Petar Zecevic - ML in Production on Databricks
DataScienceConferenc1
 
PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...
Gabriel Moreira
 
PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...
Gabriel Moreira
 
From Notebook to Production: What Most ML Tutorials Don’t Teach
vivekbharti311
 
Building a Scalable and reliable open source ML Platform with MLFlow
GoDataDriven
 
Strata parallel m-ml-ops_sept_2017
Nisha Talagala
 
No BS Guide to Deep Learning in the Enterprise
Jesus Rodriguez
 
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
MLconf
 
Scaling Recommendations at Quora (RecSys talk 9/16/2016)
Nikhil Dandekar
 
How Lucene Powers the LinkedIn Segmentation and Targeting Platform
Hien Luu
 
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
DataScienceConferenc1
 
BigMLSchool: ML Platforms and AutoML in the Enterprise
BigML, Inc
 
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Databricks
 
MLOps for production-level machine learning
cnvrg.io AI OS - Hands-on ML Workshops
 
Making Data Science Scalable - 5 Lessons Learned
Laurenz Wuttke
 
Advanced MLflow: Multi-Step Workflows, Hyperparameter Tuning and Integrating ...
Databricks
 
Ad

More from Karthik Murugesan (20)

PDF
Yahoo's Knowledge Graph - 2014 slides
Karthik Murugesan
 
PDF
Free servers to build Big Data Systems on: Bing's Approach
Karthik Murugesan
 
PDF
Microsoft cosmos
Karthik Murugesan
 
PPTX
Microsoft AI Platform - AETHER Introduction
Karthik Murugesan
 
PDF
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
Karthik Murugesan
 
PDF
Lyft data Platform - 2019 slides
Karthik Murugesan
 
PDF
The Evolution of Spotify Home Architecture - Qcon 2019
Karthik Murugesan
 
PDF
The magic behind your Lyft ride prices: A case study on machine learning and ...
Karthik Murugesan
 
PDF
The journey toward a self-service data platform at Netflix - sf 2019
Karthik Murugesan
 
PDF
Developing a ML model using TF Estimator
Karthik Murugesan
 
PDF
Production Model Deployment - StitchFix - 2018
Karthik Murugesan
 
PDF
Netflix factstore for recommendations - 2018
Karthik Murugesan
 
PDF
Trends in Music Recommendations 2018
Karthik Murugesan
 
PDF
Netflix Ads Personalization Solution - 2017
Karthik Murugesan
 
PDF
State Of AI 2018
Karthik Murugesan
 
PDF
Spotify Machine Learning Solution for Music Discovery
Karthik Murugesan
 
PDF
AirBNB - Zipline: Airbnb’s Machine Learning Data Management Platform
Karthik Murugesan
 
PDF
Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...
Karthik Murugesan
 
PDF
Apache Spark-Based Stratification Library for Machine Learning Use Cases at N...
Karthik Murugesan
 
PDF
Fact Store at Scale for Netflix Recommendations
Karthik Murugesan
 
Yahoo's Knowledge Graph - 2014 slides
Karthik Murugesan
 
Free servers to build Big Data Systems on: Bing's Approach
Karthik Murugesan
 
Microsoft cosmos
Karthik Murugesan
 
Microsoft AI Platform - AETHER Introduction
Karthik Murugesan
 
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
Karthik Murugesan
 
Lyft data Platform - 2019 slides
Karthik Murugesan
 
The Evolution of Spotify Home Architecture - Qcon 2019
Karthik Murugesan
 
The magic behind your Lyft ride prices: A case study on machine learning and ...
Karthik Murugesan
 
The journey toward a self-service data platform at Netflix - sf 2019
Karthik Murugesan
 
Developing a ML model using TF Estimator
Karthik Murugesan
 
Production Model Deployment - StitchFix - 2018
Karthik Murugesan
 
Netflix factstore for recommendations - 2018
Karthik Murugesan
 
Trends in Music Recommendations 2018
Karthik Murugesan
 
Netflix Ads Personalization Solution - 2017
Karthik Murugesan
 
State Of AI 2018
Karthik Murugesan
 
Spotify Machine Learning Solution for Music Discovery
Karthik Murugesan
 
AirBNB - Zipline: Airbnb’s Machine Learning Data Management Platform
Karthik Murugesan
 
Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...
Karthik Murugesan
 
Apache Spark-Based Stratification Library for Machine Learning Use Cases at N...
Karthik Murugesan
 
Fact Store at Scale for Netflix Recommendations
Karthik Murugesan
 
Ad

Recently uploaded (20)

PPT
introduction to networking with basics coverage
RamananMuthukrishnan
 
PPTX
Optimization_Techniques_ML_Presentation.pptx
farispalayi
 
PDF
The-Hidden-Dangers-of-Skipping-Penetration-Testing.pdf.pdf
naksh4thra
 
PPTX
L1A Season 1 Guide made by A hegy Eng Grammar fixed
toszolder91
 
PPTX
法国巴黎第二大学本科毕业证{Paris 2学费发票Paris 2成绩单}办理方法
Taqyea
 
PPTX
一比一原版(SUNY-Albany毕业证)纽约州立大学奥尔巴尼分校毕业证如何办理
Taqyea
 
PPTX
西班牙武康大学毕业证书{UCAMOfferUCAM成绩单水印}原版制作
Taqyea
 
PPTX
internet básico presentacion es una red global
70965857
 
PPT
Agilent Optoelectronic Solutions for Mobile Application
andreashenniger2
 
PPTX
Orchestrating things in Angular application
Peter Abraham
 
PPTX
Presentation3gsgsgsgsdfgadgsfgfgsfgagsfgsfgzfdgsdgs.pptx
SUB03
 
PPT
Computer Securityyyyyyyy - Chapter 2.ppt
SolomonSB
 
PPTX
PE introd.pptxfrgfgfdgfdgfgrtretrt44t444
nepmithibai2024
 
PDF
AI_MOD_1.pdf artificial intelligence notes
shreyarrce
 
PPTX
原版西班牙莱昂大学毕业证(León毕业证书)如何办理
Taqyea
 
PDF
DevOps Design for different deployment options
henrymails
 
PDF
𝐁𝐔𝐊𝐓𝐈 𝐊𝐄𝐌𝐄𝐍𝐀𝐍𝐆𝐀𝐍 𝐊𝐈𝐏𝐄𝐑𝟒𝐃 𝐇𝐀𝐑𝐈 𝐈𝐍𝐈 𝟐𝟎𝟐𝟓
hokimamad0
 
PDF
Web Hosting for Shopify WooCommerce etc.
Harry_Phoneix Harry_Phoneix
 
PPTX
INTEGRATION OF ICT IN LEARNING AND INCORPORATIING TECHNOLOGY
kvshardwork1235
 
PDF
Apple_Environmental_Progress_Report_2025.pdf
yiukwong
 
introduction to networking with basics coverage
RamananMuthukrishnan
 
Optimization_Techniques_ML_Presentation.pptx
farispalayi
 
The-Hidden-Dangers-of-Skipping-Penetration-Testing.pdf.pdf
naksh4thra
 
L1A Season 1 Guide made by A hegy Eng Grammar fixed
toszolder91
 
法国巴黎第二大学本科毕业证{Paris 2学费发票Paris 2成绩单}办理方法
Taqyea
 
一比一原版(SUNY-Albany毕业证)纽约州立大学奥尔巴尼分校毕业证如何办理
Taqyea
 
西班牙武康大学毕业证书{UCAMOfferUCAM成绩单水印}原版制作
Taqyea
 
internet básico presentacion es una red global
70965857
 
Agilent Optoelectronic Solutions for Mobile Application
andreashenniger2
 
Orchestrating things in Angular application
Peter Abraham
 
Presentation3gsgsgsgsdfgadgsfgfgsfgagsfgsfgzfdgsdgs.pptx
SUB03
 
Computer Securityyyyyyyy - Chapter 2.ppt
SolomonSB
 
PE introd.pptxfrgfgfdgfdgfgrtretrt44t444
nepmithibai2024
 
AI_MOD_1.pdf artificial intelligence notes
shreyarrce
 
原版西班牙莱昂大学毕业证(León毕业证书)如何办理
Taqyea
 
DevOps Design for different deployment options
henrymails
 
𝐁𝐔𝐊𝐓𝐈 𝐊𝐄𝐌𝐄𝐍𝐀𝐍𝐆𝐀𝐍 𝐊𝐈𝐏𝐄𝐑𝟒𝐃 𝐇𝐀𝐑𝐈 𝐈𝐍𝐈 𝟐𝟎𝟐𝟓
hokimamad0
 
Web Hosting for Shopify WooCommerce etc.
Harry_Phoneix Harry_Phoneix
 
INTEGRATION OF ICT IN LEARNING AND INCORPORATIING TECHNOLOGY
kvshardwork1235
 
Apple_Environmental_Progress_Report_2025.pdf
yiukwong
 

Unifying Twitter around a single ML platform - Twitter AI Platform 2019

  • 1. Unifying Twitter Around a Single ML Platform Yi Zhuang (@yz), Nicholas Leonard (@strife076) April 17, 2019
  • 2. Overview • ML Use Cases at Twitter • ML Platform Requirements & Challenges • Unifying Twitter Around a Single ML Platform • Technology Migrations • Health ML Use Case • Summary of Lessons Learned • Future of Our ML Platform
  • 3. Overview • ML Use Cases at Twitter • ML Platform Requirements & Challenges • Unifying Twitter Around a Single ML Platform • Technology Migrations • Health ML Use Case • Lessons Learned • Future of Our ML Platform
  • 4. ML Use Cases: Tweet Ranking
  • 5. ML Use Cases at Twitter: Ads pCTR = p ( “click” | if we show this Candidate Ad to this User in this Context) User Candidate Ad Context “Click”
  • 6. ML Use Cases at Twitter • Other use cases • Recommending Tweets, Users, Hashtags, News, etc. • Detecting Abusive Tweets and Spam • Detecting NSFW Images and Videos • And so on …
  • 7. ML Use Cases at Twitter ML is Everywhere
  • 8. Overview • ML Use Cases at Twitter • ML Platform Requirements & Challenges • Unifying Twitter Around a Single ML Platform • Technology Migrations • Health ML Use Case • Summary of Lessons Learned • Future of Our ML Platform
  • 9. Requirements of ML Platform Data Scale PBs of data per day Some models train on Tens of TBs of data per day
  • 10. Requirements of ML Platform Prediction Throughput Tens of millions of predictions per second
  • 11. Requirements of ML Platform Prediction Latency Budget tens of milliseconds
  • 12. Example Use Case Ads Prediction Training examples everyday 1+B 1+M Features 40ms Serving latency Predictions every second10+M
  • 13. Overview • ML Use Cases at Twitter • ML Platform Requirements & Challenges • Unifying Twitter Around a Single ML Platform • Technology Migrations • Health ML Use Case • Summary of Lessons Learned • Future of Our ML Platform
  • 14. Challenges of Old ML Platform Fragmentation of ML Practice PyTorch Scikit Learn In-house Frameworks VW Lua Torch TensorFlow
  • 15. Challenges of Old ML Platform Difficulty Sharing Models Tooling & Resources Knowledge
  • 16. Challenges of Old ML Platform Inefficiencies Work Duplication
  • 17. Example Duplicate Work Various Ways to do Model Training & Serving Model Refreshes Data Cleaning and Preprocessing Experiment Tracking Etc.
  • 18. Overview • ML Use Cases at Twitter • ML Platform Requirements & Challenges • Unifying Twitter Around a Single ML Platform • Technology Migrations • Health ML Use Case • Lessons Learned • Future of Our ML Platform
  • 19. New Unified ML Platform Overview A Single Consistent ML Platform Across Twitter 5 Producion M odelServing Lorem ipsum dolorsitam et,consectetur adipiscing elit,sed do eiusm od tem por. Donec facilisis lacus egetm auris. 4 Experim entation Tracking 3 M odelTraining and Evaluation Preprocessing and Featurization 21 Pipeline Orchestration
  • 20. Overview • ML Use Cases at Twitter • ML Platform Requirements & Challenges • Unifying Twitter Around a Single ML Platform • Technology migrations • Health ML Use Case • Summary of Lessons Learned • Future of Our ML Platform
  • 21. Technology Migrations ● Data Analysis: Scalding + PySpark/Notebooks ● Featurization: Feature Store ● ML Frameworks: Java ML -> Lua Torch -> TensorFlow ● Training and deployment cycles: Apache Airflow
  • 22. Data Analysis: Scalding ● Scala ● Abstraction over hadoop ● Distributed data processing ● Great for large scale data ● Slow-iteration
  • 23. Data analysis: Notebook + Spark ● iPython Notebook + PySpark ● Easier for Python engineers ● Data visualization ● Faster iteration
  • 24. Lessons learned ML Practitioner Diversity Production ML Engineers Deep Learning Researcher Data Scientists
  • 25. Featurization: Ad Hoc ● Teams use common data sources ○ E.g. user data, tweet data, engagement data ● Every team does their own featurization ○ Duplication of effort ● Difficult to validate features at serving time ○ Inconsistent featurization schemes for training vs serving
  • 26. Featurization: Feature Store ● Teams can share, discover and access features ● Consistent training-time vs serving-time featurization
  • 27. Lessons learned Consistency Consistency across teams => sharing & efficiency Important: feature consistency between training and serving
  • 28. ML Frameworks: Java ML ● Logistic regression ○ Relies on feature discretization ● Typically used in an online learning environment: ○ Model learns new data as it becomes available (~15 min delay)
  • 29. ML Frameworks: Lua Torch ● Deep learning ● Feature discretization parity ● ML Engineers didn’t want to learn Lua: ○ Lua hidden via YAML ○ Hard to debug and unit test ● Complex production setup ○ JVM -> JNI -> Lua VMs -> C/C++
  • 30. ML Frameworks: TensorFlow ● Google support ● Production ready ○ Export graphs as protobuf ○ Serve graphs from Java/Scala: ■ JVM -> TensorFlow ● TensorBoard ● Large ecosystem (E.g. TFX)
  • 31. Lessons learned Reproducibility is hard ... across different ML framework: small differences, large impacts Online experiments take time Need simple setup, fast iterations
  • 32. Train and Deploy Cycles Different approaches to productionizing training algorithms: ● Manually re-train and re-deploy the model periodically ○ Retraining frequency varies ● Automate training and deployment cycles: ○ Cron, Aurora, Airflow Jobs ○ Helps reduce model staleness
  • 33. Train and Deploy Cycle Apache Airflow: DAGs
  • 35. Lessons learned Automation is crucial ML models become stale over time ML Hyperparameter tunings are often tedious
  • 36. Overview • ML Use Cases at Twitter • ML Platform Requirements & Challenges at Twitter • Unifying Twitter Around a Single ML Platform • Technology Migrations • Health ML Use Case • Summary of Lessons Learned • Future of Our ML Platform
  • 37. Health ML Case Study ● Situation: ○ Models still running using Lua Torch ○ Retrained manually every ~6 months. ● Mission: ○ Migrate Health ML models to new ML Platform ○ Reach metric parity with existing models (minimum)
  • 38. ML Pipeline Overview Training Data Preprocessing Feature Store Data Exploration Training Offline Evaluation Model Tuning Experiment Loop Online A/B Testing Production Experiment Prediction Servers
  • 39. Lessons Learned Teamwork: Platform, Modeling, Product Integration of All Components
  • 40. Overview • ML Use Cases at Twitter • ML Platform Requirements & Challenges at Twitter • Unifying Twitter Around a Single ML Platform • Technology Migrations • Summary of Lessons Learned • Future of Our ML Platform
  • 41. Summary of Lessons Learned ● Consistency brings efficiency ● DL Reproducibility is hard ● Automation is crucial ● ML practitioner Diversity ○ ML engineers vs DL researchers ○ Production vs exploration ● Collaboration of platform, modeling, product teams
  • 42. Overview • ML Use Cases at Twitter • ML Platform Requirements & Challenges at Twitter • Unifying Twitter Around a Single ML Platform • Technology Migrations • Summary of Lessons Learned • Future of Our ML Platform
  • 43. Future 2018 Strategy: Consistency & Adoption 2019 Strategy: Ease of Use & Velocity 10x, 50x training speed Auto model evaluation & validation Auto model deploy & auto scaling Auto hyperparameter tuning & architecture search Continuous Deep Learning Model Training and so on ...
  • 44. Thank You If you are interested in learning more about Twitter Cortex, please contact: @yz @strife076