Distributing Large-scale ML
Algorithms: from GPUs to the
Cloud
MMDS 2014
June, 2014
Xavier Amatriain
Director - Algorithms Engineering @xamat
Outline
■ Introduction
■ Emmy-winning Algorithms
■ Distributing ML Algorithms in Practice
■ An example: ANN over GPUs & AWS Cloud
What we were interested in:
■ High quality recommendations
Proxy question:
■ Accuracy in predicted rating
■ Improve by 10% = $1million!
Data size:
■ 100M ratings (back then “almost massive”)
2006 2014
Netflix Scale
▪ > 44M members
▪ > 40 countries
▪ > 1000 device types
▪ > 5B hours in Q3 2013
▪ Plays: > 50M/day
▪ Searches: > 3M/day
▪ Ratings: > 5M/day
▪ Log 100B events/day
▪ 31.62% of peak US downstream
traffic
Smart Models ■ Regression models (Logistic,
Linear, Elastic nets)
■ GBDT/RF
■ SVD & other MF models
■ Factorization Machines
■ Restricted Boltzmann Machines
■ Markov Chains & other graphical
models
■ Clustering (from k-means to
modern non-parametric models)
■ Deep ANN
■ LDA
■ Association Rules
■ …
Netflix Algorithms
“Emmy Winning”
MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud
Rating Prediction
2007 Progress Prize
▪ Top 2 algorithms
▪ MF/SVD - Prize RMSE: 0.8914
▪ RBM - Prize RMSE: 0.8990
▪ Linear blend Prize RMSE: 0.88
▪ Currently in use as part of Netflix’ rating prediction component
▪ Limitations
▪ Designed for 100M ratings, we have 5B ratings
▪ Not adaptable as users add ratings
▪ Performance issues
Ranking
Ranking
Page composition
Similarity
Search Recommendations
Postplay
Gamification
Distributing ML algorithms in practice
1. Do I need all that data?
2. At what level should I distribute/parallelize?
3. What latency can I afford?
Do I need all that data?
Really?
Anand Rajaraman: Former Stanford Prof. &
Senior VP at Walmart
Sometimes, it’s not
about more data
[Banko and Brill, 2001]
Norvig: “Google does not
have better Algorithms,
only more Data”
Many features/
low-bias models
Sometimes, it’s not
about more data
At what level should I parallelize?
The three levels of Distribution/Parallelization
1. For each subset of the population (e.g.
region)
2. For each combination of the
hyperparameters
3. For each subset of the training data
Each level has different requirements
Level 1 Distribution
■ We may have subsets of the
population for which we need to
train an independently optimized
model.
■ Training can be fully distributed
requiring no coordination or data
communication
Level 2 Distribution
■ For a given subset of the population we
need to find the “optimal” model
■ Train several models with different
hyperparameter values
■ Worst-case: grid search
■ Can do much better than this (E.g. Bayesian
Optimization with Gaussian Process Priors)
■ This process *does* require coordination
■ Need to decide on next “step”
■ Need to gather final optimal result
■ Requires data distribution, not sharing
Level 3 Distribution
■ For each combination of
hyperparameters, model training may
still be expensive
■ Process requires coordination and
data sharing/communication
■ Can distribute computation over
machines splitting examples or
parameters (e.g. ADMM)
■ Or parallelize on a single multicore
machine (e.g. Hogwild)
■ Or… use GPUs
ANN Training over GPUS and AWS
ANN Training over GPUS and AWS
■ Level 1 distribution: machines over different AWS
regions
■ Level 2 distribution: machines in AWS and same AWS
region
■ Use coordination tools
■ Spearmint or similar for parameter optimization
■ Condor, StarCluster, Mesos… for distributed cluster coordination
■ Level 3 parallelization: highly optimized parallel CUDA
code on GPUs
What latency can I afford?
3 shades of latency
▪ Blueprint for multiple
algorithm services
▪ Ranking
▪ Row selection
▪ Ratings
▪ Search
▪ …
▪ Multi-layered Machine
Learning
Matrix Factorization Example
Xavier Amatriain (@xamat)
xavier@netflix.com
Thanks!
(and yes, we are hiring)

More Related Content

PDF
Cikm 2013 - Beyond Data From User Information to Business Value
PDF
MLConf - Emmys, Oscars & Machine Learning Algorithms at Netflix
PDF
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
PDF
BIG2016- Lessons Learned from building real-life user-focused Big Data systems
PDF
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
PDF
Barcelona ML Meetup - Lessons Learned
PDF
Deep Learning For Practitioners, lecture 2: Selecting the right applications...
PDF
Robust and declarative machine learning pipelines for predictive buying at Ba...
Cikm 2013 - Beyond Data From User Information to Business Value
MLConf - Emmys, Oscars & Machine Learning Algorithms at Netflix
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
BIG2016- Lessons Learned from building real-life user-focused Big Data systems
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Barcelona ML Meetup - Lessons Learned
Deep Learning For Practitioners, lecture 2: Selecting the right applications...
Robust and declarative machine learning pipelines for predictive buying at Ba...

What's hot (20)

PDF
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
PDF
Machine Learning to Grow the World's Knowledge
PDF
Recommending the world's knowledge
PDF
Introduction To Applied Machine Learning
PDF
A Multi-Armed Bandit Framework For Recommendations at Netflix
PDF
Explainable AI - making ML and DL models more interpretable
PDF
Replicable Evaluation of Recommender Systems
PDF
Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial
PDF
Artificial Intelligence Course: Linear models
PPTX
Understanding Basics of Machine Learning
PDF
Artwork Personalization at Netflix
PDF
Recommending for the World
PPTX
Machine learning basics
PDF
MaxEnt (Loglinear) Models - Overview
PPTX
Factorization Machines with libFM
PPTX
Introduction to-machine-learning
PPTX
Introduction to Machine Learning
PPTX
Machine Learning Basics
PDF
Time, Context and Causality in Recommender Systems
PDF
Lecture 2 Basic Concepts in Machine Learning for Language Technology
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
Machine Learning to Grow the World's Knowledge
Recommending the world's knowledge
Introduction To Applied Machine Learning
A Multi-Armed Bandit Framework For Recommendations at Netflix
Explainable AI - making ML and DL models more interpretable
Replicable Evaluation of Recommender Systems
Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial
Artificial Intelligence Course: Linear models
Understanding Basics of Machine Learning
Artwork Personalization at Netflix
Recommending for the World
Machine learning basics
MaxEnt (Loglinear) Models - Overview
Factorization Machines with libFM
Introduction to-machine-learning
Introduction to Machine Learning
Machine Learning Basics
Time, Context and Causality in Recommender Systems
Lecture 2 Basic Concepts in Machine Learning for Language Technology
Ad

Viewers also liked (16)

PDF
MLConf Seattle 2015 - ML@Quora
PDF
May: If I Were 22
PDF
Chela stress test
PPTX
Adding Identity Management and Access Control to your Application
PDF
Cwin16 - Paris - ux design
PDF
Ia32 Modo Protegido
PDF
Past present and future of Recommender Systems: an Industry Perspective
PDF
Marc Stickdorn & Jakob Schneider – Mobile ethnography and ExperienceFellow, a...
PDF
Netflix Nebula - Gradle Summit 2014
PDF
Cwin16 tls-partner-hpe-digital economy & Hybrid IT
PDF
Disciplined agile business analysis
PDF
How Comcast uses Data Science to Improve the Customer Experience
PDF
How to Start a Startup at NYU
PDF
Introduction to the Innovation Corps (NSF I-Corps)
PPTX
[PREMONEY 2014] Mayfield Fund >> Tim Chang, "Mobile Is The Future Of YOU: Why...
PDF
Cwin16 - lyon - faurecia customer cockpit
MLConf Seattle 2015 - ML@Quora
May: If I Were 22
Chela stress test
Adding Identity Management and Access Control to your Application
Cwin16 - Paris - ux design
Ia32 Modo Protegido
Past present and future of Recommender Systems: an Industry Perspective
Marc Stickdorn & Jakob Schneider – Mobile ethnography and ExperienceFellow, a...
Netflix Nebula - Gradle Summit 2014
Cwin16 tls-partner-hpe-digital economy & Hybrid IT
Disciplined agile business analysis
How Comcast uses Data Science to Improve the Customer Experience
How to Start a Startup at NYU
Introduction to the Innovation Corps (NSF I-Corps)
[PREMONEY 2014] Mayfield Fund >> Tim Chang, "Mobile Is The Future Of YOU: Why...
Cwin16 - lyon - faurecia customer cockpit
Ad

Similar to MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud (20)

PDF
The Power of Auto ML and How Does it Work
PDF
Machine Learning for Fraud Detection
PPTX
Computer Vision for Beginners
PDF
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
PDF
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
PDF
Spark Summit EU talk by Josef Habdank
PDF
Big data 2.0, deep learning and financial Usecases
PDF
10 Lessons Learned from Building Machine Learning Systems
PPTX
Prediction as a service with ensemble model in SparkML and Python ScikitLearn
PDF
Ed Snelson. Counterfactual Analysis
PDF
Understanding Parallelization of Machine Learning Algorithms in Apache Spark™
PPTX
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
PDF
Cooperative Machine Learning Network with Ahmed Masud of saf.ai
PDF
PDF
Distributed Machine Learning and the Parameter Server
PPTX
Machine Learning Techniques - Linear Model.pptx
PDF
SparkML: Easy ML Productization for Real-Time Bidding
PPT
Making sense of the Graph Revolution
PDF
Large scale Click-streaming and tranaction log mining
PDF
IEEE.BigData.Tutorial.2.slides
The Power of Auto ML and How Does it Work
Machine Learning for Fraud Detection
Computer Vision for Beginners
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Spark Summit EU talk by Josef Habdank
Big data 2.0, deep learning and financial Usecases
10 Lessons Learned from Building Machine Learning Systems
Prediction as a service with ensemble model in SparkML and Python ScikitLearn
Ed Snelson. Counterfactual Analysis
Understanding Parallelization of Machine Learning Algorithms in Apache Spark™
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
Cooperative Machine Learning Network with Ahmed Masud of saf.ai
Distributed Machine Learning and the Parameter Server
Machine Learning Techniques - Linear Model.pptx
SparkML: Easy ML Productization for Real-Time Bidding
Making sense of the Graph Revolution
Large scale Click-streaming and tranaction log mining
IEEE.BigData.Tutorial.2.slides

More from Xavier Amatriain (20)

PDF
Data/AI driven product development: from video streaming to telehealth
PDF
AI-driven product innovation: from Recommender Systems to COVID-19
PDF
AI for COVID-19 - Q42020 update
PDF
AI for COVID-19: An online virtual care approach
PDF
Lessons learned from building practical deep learning systems
PDF
AI for healthcare: Scaling Access and Quality of Care for Everyone
PDF
Towards online universal quality healthcare through AI
PDF
From one to zero: Going smaller as a growth strategy
PDF
Learning to speak medicine
PDF
ML to cure the world
PDF
Recommender Systems In Industry
PDF
Medical advice as a Recommender System
PDF
Staying Shallow & Lean in a Deep Learning World
PDF
Machine Learning for Q&A Sites: The Quora Example
PDF
Past, present, and future of Recommender Systems: an industry perspective
PDF
10 more lessons learned from building Machine Learning systems - MLConf
PDF
10 more lessons learned from building Machine Learning systems
PDF
Lean DevOps - Lessons Learned from Innovation-driven Companies
PDF
Recsys 2014 Tutorial - The Recommender Problem Revisited
PDF
Kdd 2014 Tutorial - the recommender problem revisited
Data/AI driven product development: from video streaming to telehealth
AI-driven product innovation: from Recommender Systems to COVID-19
AI for COVID-19 - Q42020 update
AI for COVID-19: An online virtual care approach
Lessons learned from building practical deep learning systems
AI for healthcare: Scaling Access and Quality of Care for Everyone
Towards online universal quality healthcare through AI
From one to zero: Going smaller as a growth strategy
Learning to speak medicine
ML to cure the world
Recommender Systems In Industry
Medical advice as a Recommender System
Staying Shallow & Lean in a Deep Learning World
Machine Learning for Q&A Sites: The Quora Example
Past, present, and future of Recommender Systems: an industry perspective
10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems
Lean DevOps - Lessons Learned from Innovation-driven Companies
Recsys 2014 Tutorial - The Recommender Problem Revisited
Kdd 2014 Tutorial - the recommender problem revisited

Recently uploaded (20)

PPTX
IOP Unit 1.pptx for btech 1st year students
PDF
Software defined netwoks is useful to learn NFV and virtual Lans
PDF
ST MNCWANGO P2 WIL (MEPR302) FINAL REPORT.pdf
PDF
ASPEN PLUS USER GUIDE - PROCESS SIMULATIONS
PDF
MACCAFERRY GUIA GAVIONES TERRAPLENES EN ESPAÑOL
PPTX
MODULE 3 SUSTAINABLE DEVELOPMENT GOALSPPT.pptx
PDF
ITEC 1010 - Networks and Cloud Computing
PPTX
Software-Development-Life-Cycle-SDLC.pptx
PDF
IAE-V2500 Engine for Airbus Family 319/320
PDF
BTCVPE506F_Module 1 History & Theories of Town Planning.pdf
PPTX
quantum theory on the next future in.pptx
PDF
CBCN cam bien cong nghiep bach khoa da năng
PDF
Introduction to Machine Learning -Basic concepts,Models and Description
PDF
Engineering Solutions for Ethical Dilemmas in Healthcare (www.kiu.ac.ug)
PDF
IAE-V2500 Engine Airbus Family A319/320
PPTX
Research Writing, Mechanical Engineering
PPTX
1. Effective HSEW Induction Training - EMCO 2024, O&M.pptx
PDF
LS-6-Digital-Literacy (1) K12 CURRICULUM .pdf
PDF
Module 1 part 1.pdf engineering notes s7
PPTX
sub station Simple Design of Substation PPT.pptx
IOP Unit 1.pptx for btech 1st year students
Software defined netwoks is useful to learn NFV and virtual Lans
ST MNCWANGO P2 WIL (MEPR302) FINAL REPORT.pdf
ASPEN PLUS USER GUIDE - PROCESS SIMULATIONS
MACCAFERRY GUIA GAVIONES TERRAPLENES EN ESPAÑOL
MODULE 3 SUSTAINABLE DEVELOPMENT GOALSPPT.pptx
ITEC 1010 - Networks and Cloud Computing
Software-Development-Life-Cycle-SDLC.pptx
IAE-V2500 Engine for Airbus Family 319/320
BTCVPE506F_Module 1 History & Theories of Town Planning.pdf
quantum theory on the next future in.pptx
CBCN cam bien cong nghiep bach khoa da năng
Introduction to Machine Learning -Basic concepts,Models and Description
Engineering Solutions for Ethical Dilemmas in Healthcare (www.kiu.ac.ug)
IAE-V2500 Engine Airbus Family A319/320
Research Writing, Mechanical Engineering
1. Effective HSEW Induction Training - EMCO 2024, O&M.pptx
LS-6-Digital-Literacy (1) K12 CURRICULUM .pdf
Module 1 part 1.pdf engineering notes s7
sub station Simple Design of Substation PPT.pptx

MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud

  • 1. Distributing Large-scale ML Algorithms: from GPUs to the Cloud MMDS 2014 June, 2014 Xavier Amatriain Director - Algorithms Engineering @xamat
  • 2. Outline ■ Introduction ■ Emmy-winning Algorithms ■ Distributing ML Algorithms in Practice ■ An example: ANN over GPUs & AWS Cloud
  • 3. What we were interested in: ■ High quality recommendations Proxy question: ■ Accuracy in predicted rating ■ Improve by 10% = $1million! Data size: ■ 100M ratings (back then “almost massive”)
  • 5. Netflix Scale ▪ > 44M members ▪ > 40 countries ▪ > 1000 device types ▪ > 5B hours in Q3 2013 ▪ Plays: > 50M/day ▪ Searches: > 3M/day ▪ Ratings: > 5M/day ▪ Log 100B events/day ▪ 31.62% of peak US downstream traffic
  • 6. Smart Models ■ Regression models (Logistic, Linear, Elastic nets) ■ GBDT/RF ■ SVD & other MF models ■ Factorization Machines ■ Restricted Boltzmann Machines ■ Markov Chains & other graphical models ■ Clustering (from k-means to modern non-parametric models) ■ Deep ANN ■ LDA ■ Association Rules ■ …
  • 10. 2007 Progress Prize ▪ Top 2 algorithms ▪ MF/SVD - Prize RMSE: 0.8914 ▪ RBM - Prize RMSE: 0.8990 ▪ Linear blend Prize RMSE: 0.88 ▪ Currently in use as part of Netflix’ rating prediction component ▪ Limitations ▪ Designed for 100M ratings, we have 5B ratings ▪ Not adaptable as users add ratings ▪ Performance issues
  • 18. 1. Do I need all that data? 2. At what level should I distribute/parallelize? 3. What latency can I afford?
  • 19. Do I need all that data?
  • 20. Really? Anand Rajaraman: Former Stanford Prof. & Senior VP at Walmart
  • 22. [Banko and Brill, 2001] Norvig: “Google does not have better Algorithms, only more Data” Many features/ low-bias models
  • 24. At what level should I parallelize?
  • 25. The three levels of Distribution/Parallelization 1. For each subset of the population (e.g. region) 2. For each combination of the hyperparameters 3. For each subset of the training data Each level has different requirements
  • 26. Level 1 Distribution ■ We may have subsets of the population for which we need to train an independently optimized model. ■ Training can be fully distributed requiring no coordination or data communication
  • 27. Level 2 Distribution ■ For a given subset of the population we need to find the “optimal” model ■ Train several models with different hyperparameter values ■ Worst-case: grid search ■ Can do much better than this (E.g. Bayesian Optimization with Gaussian Process Priors) ■ This process *does* require coordination ■ Need to decide on next “step” ■ Need to gather final optimal result ■ Requires data distribution, not sharing
  • 28. Level 3 Distribution ■ For each combination of hyperparameters, model training may still be expensive ■ Process requires coordination and data sharing/communication ■ Can distribute computation over machines splitting examples or parameters (e.g. ADMM) ■ Or parallelize on a single multicore machine (e.g. Hogwild) ■ Or… use GPUs
  • 29. ANN Training over GPUS and AWS
  • 30. ANN Training over GPUS and AWS ■ Level 1 distribution: machines over different AWS regions ■ Level 2 distribution: machines in AWS and same AWS region ■ Use coordination tools ■ Spearmint or similar for parameter optimization ■ Condor, StarCluster, Mesos… for distributed cluster coordination ■ Level 3 parallelization: highly optimized parallel CUDA code on GPUs
  • 31. What latency can I afford?
  • 32. 3 shades of latency ▪ Blueprint for multiple algorithm services ▪ Ranking ▪ Row selection ▪ Ratings ▪ Search ▪ … ▪ Multi-layered Machine Learning