11
Recommendations for
Building Machine Learning Software
Justin Basilico
Page Algorithms Engineering May 19, 2016
@JustinBasilico
22
Introduction
3
Change of focus
2006 2016
4
Netflix Scale
 > 81M members
 > 190 countries
 > 1000 device types
 > 3B hours/month
 > 36% of peak US
downstream traffic
5
Goal
Help members find content to watch and enjoy
to maximize member satisfaction and retention
6
Everything is a Recommendation
Rows
Ranking
Over 80% of what
people watch
comes from our
recommendations
Recommendations
are driven by
Machine Learning
7
Machine Learning Approach
Problem
Data
AlgorithmModel
Metrics
8
Models & Algorithms
 Regression (linear, logistic, elastic net)
 SVD and other Matrix Factorizations
 Factorization Machines
 Restricted Boltzmann Machines
 Deep Neural Networks
 Markov Models and Graph Algorithms
 Clustering
 Latent Dirichlet Allocation
 Gradient Boosted Decision
Trees/Random Forests
 Gaussian Processes
 …
9
Design Considerations
Recommendations
• Personal
• Accurate
• Diverse
• Novel
• Fresh
Software
• Scalable
• Responsive
• Resilient
• Efficient
• Flexible
10
Software Stack
http://techblog.netflix.com
1111
Recommendations
12
Be flexible about where and when
computation happens
Recommendation 1
13
System Architecture
 Offline: Process data
 Batch learning
 Nearline: Process events
 Model evaluation
 Online learning
 Asynchronous
 Online: Process requests
 Real-time
Netflix.Hermes
Netflix.Manhattan
Nearline
Computation
Models
Online
Data Service
Offline Data
Model
training
Online
Computation
Event Distribution
User Event
Queue
Algorithm
Service
UI Client
Member
Query results
Recommendations
NEARLINE
Machine
Learning
Algorithm
Machine
Learning
Algorithm
Offline
Computation Machine
Learning
Algorithm
Play, Rate,
Browse...
OFFLINE
ONLINE
More details on Netflix Techblog
14
Where to place components?
 Example: Matrix Factorization
 Offline:
 Collect sample of play data
 Run batch learning algorithm like
SGD to produce factorization
 Publish video factors
 Nearline:
 Solve user factors
 Compute user-video dot products
 Store scores in cache
 Online:
 Presentation-context filtering
 Serve recommendations
Netflix.Hermes
Netflix.Manhattan
Nearline
Computation
Models
Online
Data Service
Offline Data
Model
training
Online
Computation
Event Distribution
User Event
Queue
Algorithm
Service
UI Client
Member
Query results
Recommendations
NEARLINE
Machine
Learning
Algorithm
Machine
Learning
Algorithm
Offline
Computation Machine
Learning
Algorithm
Play, Rate,
Browse...
OFFLINE
ONLINE
V
sij=uivj Aui=b
sij
X≈UVt
X
sij>t
15
Design application software for
experimentation
Recommendation 2
16
Example development process
Idea Data
Offline
Modeling
(R, Python,
MATLAB, …)
Iterate
Implement in
production
system (Java,
C++, …)
Data
discrepancies
Missing post-
processing
logic
Performance
issues
Actual
output
Experimentation environment
Production environment
(A/B test) Code
discrepancies
Final
model
17
Solution: Share and lean towards production
 Developing machine learning is iterative
 Need a short pipeline to rapidly try ideas
 Want to see output of complete system
 So make the application easy to experiment with
 Share components between online, nearline, and offline
 Use the real code whenever possible
 Have well-defined interfaces and formats to allow you to go
off-the-beaten-path
18
Shared Engine
Avoid dual implementations
Experiment
code
Production
code
ProductionExperiment • Models
• Features
• Algorithms
• …
19
Make algorithms extensible and modular
Recommendation 3
20
Make algorithms and models extensible and modular
 Algorithms often need to be tailored for a
specific application
 Treating an algorithm as a black box is
limiting
 Better to make algorithms extensible and
modular to allow for customization
 Separate models and algorithms
 Many algorithms can learn the same model
(i.e. linear binary classifier)
 Many algorithms can be trained on the same
types of data
 Support composing algorithms
Data
Parameters
Data
Model
Parameters
Model
Algorithm
Vs.
21
Provide building blocks
 Don’t start from scratch
 Linear algebra: Vectors, Matrices, …
 Statistics: Distributions, tests, …
 Models, features, metrics, ensembles, …
 Loss, distance, kernel, … functions
 Optimization, inference, …
 Layers, activation functions, …
 Initializers, stopping criteria, …
 …
 Domain-specific components
Build abstractions on
familiar concepts
Make the software put
them together
22
Example: Tailoring Random Forests
Using Cognitive Foundry: http://github.com/algorithmfoundry/Foundry
Use a custom
tree split
Customize to
run it for an
hour
Report a
custom metric
each iteration
Inspect the
ensemble
23
Describe your input and output
transformations with your model
Recommendation 4
24
Application
Putting learning in an application
Feature
Encoding
Output
Decoding
?
Machine
Learned Model
Rd ⟶ Rk
Application or model code?
25
Example: Simple ranking system
 High-level API: List<Video> rank(User u, List<Video> videos)
 Example model description file:
{
“type”: “ScoringRanker”,
“scorer”: {
“type”: “FeatureScorer”,
“features”: [
{“type”: “Popularity”, “days”: 10},
{“type”: “PredictedRating”}
],
“function”: {
“type”: “Linear”,
“bias”: -0.5,
“weights”: {
“popularity”: 0.2,
“predictedRating”: 1.2,
“predictedRating*popularity”:
3.5
}
}
}
Ranker
Scorer
Features
Linear function
Feature transformations
26
Maximize out a single machine before
distributing your algorithms
Recommendation 5
27
Problem: Your great new algorithm doesn’t scale
 Want to run your algorithm on larger data
 Temptation to go distributed
 Spark/Hadoop/etc seem to make it easy
 But building distributed versions of non-trivial ML algorithms is hard
 Often means changing the algorithm or making lots of approximations
 So try to squeeze as much out of a single machine first
 Have a lot more communication bandwidth via memory than network
 You will be surprised how far one machine can go
 Example: Amazon announced today an X1 instance type with 2TB
memory and 128 virtual CPUs
28
How?
 Profile your code and think about memory
cache layout
 Small changes can have a big impact
 Example: Transposing a matrix can drop
computation from 100ms to 3ms
 Go multicore
 Algorithms like HogWild for SGD-type optimization
can make this very easy
 Use specialized resources like GPU (or TPU?)
 Only go distributed once you’ve optimized on
these dimensions (often you won’t need to)
29
Example: Training Neural Networks
 Level 1: Machines in different
AWS regions
 Level 2: Machines in same AWS
region
 Simple: Grid search
 Better: Bayesian optimization using
Gaussian Processes
 Mesos, Spark, etc. for coordination
 Level 3: Highly optimized, parallel
CUDA code on GPUs
30
Don’t just rely on metrics for testing
Recommendation 6
31
Machine Learning and Testing
 Temptation: Use validation metrics to test software
 When things work and metrics go up this seems great
 When metrics don’t improve was it the
 code
 data
 metric
 idea
 …?
32
Reality of Testing
 Machine learning code involves intricate math and logic
 Rounding issues, corner cases, …
 Is that a + or -? (The math or paper could be wrong.)
 Solution: Unit test
 Testing of metric code is especially important
 Test the whole system: Just unit testing is not enough
 At a minimum, compare output for unexpected changes across
versions
3333
Conclusions
34
Two ways to solve computational problems
Know
solution
Write code
Compile
code
Test code Deploy code
Know
relevant
data
Develop
algorithmic
approach
Train model
on data using
algorithm
Validate
model with
metrics
Deploy
model
Software Development
Machine Learning
(steps may involve Software Development)
35
Take-aways for building machine learning software
 Building machine learning is an iterative process
 Make experimentation easy
 Take a holistic view of application where you are placing
learning
 Design your algorithms to be modular
 Optimize how your code runs on a single machine before
going distributed
 Testing can be hard but is worthwhile
36
Thank You Justin Basilico
jbasilico@netflix.com
@JustinBasilico
We’re hiring

More Related Content

PPTX
Learning to Personalize
PPTX
Personalized Page Generation for Browsing Recommendations
PPTX
Learning a Personalized Homepage
PDF
Recent Trends in Personalization: A Netflix Perspective
PDF
Calibrated Recommendations
PDF
Déjà Vu: The Importance of Time and Causality in Recommender Systems
PPTX
Lessons Learned from Building Machine Learning Software at Netflix
PPTX
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
Learning to Personalize
Personalized Page Generation for Browsing Recommendations
Learning a Personalized Homepage
Recent Trends in Personalization: A Netflix Perspective
Calibrated Recommendations
Déjà Vu: The Importance of Time and Causality in Recommender Systems
Lessons Learned from Building Machine Learning Software at Netflix
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...

What's hot (20)

PDF
Context Aware Recommendations at Netflix
PDF
Artwork Personalization at Netflix Fernando Amat RecSys2018
PDF
Interactive Recommender Systems with Netflix and Spotify
PDF
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
PDF
Contextualization at Netflix
PDF
Past, Present & Future of Recommender Systems: An Industry Perspective
PDF
Deep Learning for Recommender Systems
PDF
Recommending for the World
PDF
RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020
PPTX
Recommendation at Netflix Scale
PDF
User behavior analytics
PPTX
Netflix talk at ML Platform meetup Sep 2019
PDF
Recent Trends in Personalization at Netflix
PDF
Tableau Conference 2018: Binging on Data - Enabling Analytics at Netflix
PDF
Kdd 2014 Tutorial - the recommender problem revisited
PPTX
Recommendation Modeling with Impression Data at Netflix
PDF
Artwork Personalization at Netflix
PDF
Netflix Recommendations - Beyond the 5 Stars
PDF
Recent Trends in Personalization at Netflix
PDF
Recommender Systems In Industry
Context Aware Recommendations at Netflix
Artwork Personalization at Netflix Fernando Amat RecSys2018
Interactive Recommender Systems with Netflix and Spotify
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Contextualization at Netflix
Past, Present & Future of Recommender Systems: An Industry Perspective
Deep Learning for Recommender Systems
Recommending for the World
RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020
Recommendation at Netflix Scale
User behavior analytics
Netflix talk at ML Platform meetup Sep 2019
Recent Trends in Personalization at Netflix
Tableau Conference 2018: Binging on Data - Enabling Analytics at Netflix
Kdd 2014 Tutorial - the recommender problem revisited
Recommendation Modeling with Impression Data at Netflix
Artwork Personalization at Netflix
Netflix Recommendations - Beyond the 5 Stars
Recent Trends in Personalization at Netflix
Recommender Systems In Industry
Ad

Similar to Recommendations for Building Machine Learning Software (20)

PPTX
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
PPTX
Recommendations for Building Machine Learning Software
PDF
Presentation Verification & Validation
PDF
What are the Unique Challenges and Opportunities in Systems for ML?
PDF
Mathematical Modeling using MATLAB, by U.M. Sundar Senior Application Enginee...
PDF
Production model lifecycle management 2016 09
PDF
Automotive engineering design - Model Based Design
PDF
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...
PDF
Ds for finance day 4
PPTX
Model driven engineering for big data management systems
PPTX
Serverless machine learning architectures at Helixa
PDF
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
PPT
The Magic Of Application Lifecycle Management In Vs Public
PPTX
AutoML for user segmentation: how to match millions of users with hundreds of...
PDF
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
PDF
201906 04 Overview of Automated ML June 2019
PPTX
Deploying Data Science Engines to Production
PPTX
Danny Bickson - Python based predictive analytics with GraphLab Create
PDF
AI for Software Engineering
PDF
Continuous delivery for machine learning
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Recommendations for Building Machine Learning Software
Presentation Verification & Validation
What are the Unique Challenges and Opportunities in Systems for ML?
Mathematical Modeling using MATLAB, by U.M. Sundar Senior Application Enginee...
Production model lifecycle management 2016 09
Automotive engineering design - Model Based Design
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...
Ds for finance day 4
Model driven engineering for big data management systems
Serverless machine learning architectures at Helixa
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
The Magic Of Application Lifecycle Management In Vs Public
AutoML for user segmentation: how to match millions of users with hundreds of...
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
201906 04 Overview of Automated ML June 2019
Deploying Data Science Engines to Production
Danny Bickson - Python based predictive analytics with GraphLab Create
AI for Software Engineering
Continuous delivery for machine learning
Ad

Recently uploaded (20)

PDF
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
PDF
Statistics on Ai - sourced from AIPRM.pdf
PDF
The influence of sentiment analysis in enhancing early warning system model f...
PDF
The-Future-of-Automotive-Quality-is-Here-AI-Driven-Engineering.pdf
PDF
Dell Pro Micro: Speed customer interactions, patient processing, and learning...
PPT
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
PDF
Enhancing plagiarism detection using data pre-processing and machine learning...
PPTX
MuleSoft-Compete-Deck for midddleware integrations
PDF
Lung cancer patients survival prediction using outlier detection and optimize...
PDF
giants, standing on the shoulders of - by Daniel Stenberg
PDF
Early detection and classification of bone marrow changes in lumbar vertebrae...
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
PDF
Data Virtualization in Action: Scaling APIs and Apps with FME
PDF
Advancing precision in air quality forecasting through machine learning integ...
PDF
Comparative analysis of machine learning models for fake news detection in so...
PDF
Rapid Prototyping: A lecture on prototyping techniques for interface design
PDF
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
PDF
INTERSPEECH 2025 「Recent Advances and Future Directions in Voice Conversion」
PPTX
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
PPTX
Internet of Everything -Basic concepts details
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
Statistics on Ai - sourced from AIPRM.pdf
The influence of sentiment analysis in enhancing early warning system model f...
The-Future-of-Automotive-Quality-is-Here-AI-Driven-Engineering.pdf
Dell Pro Micro: Speed customer interactions, patient processing, and learning...
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
Enhancing plagiarism detection using data pre-processing and machine learning...
MuleSoft-Compete-Deck for midddleware integrations
Lung cancer patients survival prediction using outlier detection and optimize...
giants, standing on the shoulders of - by Daniel Stenberg
Early detection and classification of bone marrow changes in lumbar vertebrae...
Taming the Chaos: How to Turn Unstructured Data into Decisions
Data Virtualization in Action: Scaling APIs and Apps with FME
Advancing precision in air quality forecasting through machine learning integ...
Comparative analysis of machine learning models for fake news detection in so...
Rapid Prototyping: A lecture on prototyping techniques for interface design
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
INTERSPEECH 2025 「Recent Advances and Future Directions in Voice Conversion」
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
Internet of Everything -Basic concepts details

Recommendations for Building Machine Learning Software

  • 1. 11 Recommendations for Building Machine Learning Software Justin Basilico Page Algorithms Engineering May 19, 2016 @JustinBasilico
  • 4. 4 Netflix Scale  > 81M members  > 190 countries  > 1000 device types  > 3B hours/month  > 36% of peak US downstream traffic
  • 5. 5 Goal Help members find content to watch and enjoy to maximize member satisfaction and retention
  • 6. 6 Everything is a Recommendation Rows Ranking Over 80% of what people watch comes from our recommendations Recommendations are driven by Machine Learning
  • 8. 8 Models & Algorithms  Regression (linear, logistic, elastic net)  SVD and other Matrix Factorizations  Factorization Machines  Restricted Boltzmann Machines  Deep Neural Networks  Markov Models and Graph Algorithms  Clustering  Latent Dirichlet Allocation  Gradient Boosted Decision Trees/Random Forests  Gaussian Processes  …
  • 9. 9 Design Considerations Recommendations • Personal • Accurate • Diverse • Novel • Fresh Software • Scalable • Responsive • Resilient • Efficient • Flexible
  • 12. 12 Be flexible about where and when computation happens Recommendation 1
  • 13. 13 System Architecture  Offline: Process data  Batch learning  Nearline: Process events  Model evaluation  Online learning  Asynchronous  Online: Process requests  Real-time Netflix.Hermes Netflix.Manhattan Nearline Computation Models Online Data Service Offline Data Model training Online Computation Event Distribution User Event Queue Algorithm Service UI Client Member Query results Recommendations NEARLINE Machine Learning Algorithm Machine Learning Algorithm Offline Computation Machine Learning Algorithm Play, Rate, Browse... OFFLINE ONLINE More details on Netflix Techblog
  • 14. 14 Where to place components?  Example: Matrix Factorization  Offline:  Collect sample of play data  Run batch learning algorithm like SGD to produce factorization  Publish video factors  Nearline:  Solve user factors  Compute user-video dot products  Store scores in cache  Online:  Presentation-context filtering  Serve recommendations Netflix.Hermes Netflix.Manhattan Nearline Computation Models Online Data Service Offline Data Model training Online Computation Event Distribution User Event Queue Algorithm Service UI Client Member Query results Recommendations NEARLINE Machine Learning Algorithm Machine Learning Algorithm Offline Computation Machine Learning Algorithm Play, Rate, Browse... OFFLINE ONLINE V sij=uivj Aui=b sij X≈UVt X sij>t
  • 15. 15 Design application software for experimentation Recommendation 2
  • 16. 16 Example development process Idea Data Offline Modeling (R, Python, MATLAB, …) Iterate Implement in production system (Java, C++, …) Data discrepancies Missing post- processing logic Performance issues Actual output Experimentation environment Production environment (A/B test) Code discrepancies Final model
  • 17. 17 Solution: Share and lean towards production  Developing machine learning is iterative  Need a short pipeline to rapidly try ideas  Want to see output of complete system  So make the application easy to experiment with  Share components between online, nearline, and offline  Use the real code whenever possible  Have well-defined interfaces and formats to allow you to go off-the-beaten-path
  • 18. 18 Shared Engine Avoid dual implementations Experiment code Production code ProductionExperiment • Models • Features • Algorithms • …
  • 19. 19 Make algorithms extensible and modular Recommendation 3
  • 20. 20 Make algorithms and models extensible and modular  Algorithms often need to be tailored for a specific application  Treating an algorithm as a black box is limiting  Better to make algorithms extensible and modular to allow for customization  Separate models and algorithms  Many algorithms can learn the same model (i.e. linear binary classifier)  Many algorithms can be trained on the same types of data  Support composing algorithms Data Parameters Data Model Parameters Model Algorithm Vs.
  • 21. 21 Provide building blocks  Don’t start from scratch  Linear algebra: Vectors, Matrices, …  Statistics: Distributions, tests, …  Models, features, metrics, ensembles, …  Loss, distance, kernel, … functions  Optimization, inference, …  Layers, activation functions, …  Initializers, stopping criteria, …  …  Domain-specific components Build abstractions on familiar concepts Make the software put them together
  • 22. 22 Example: Tailoring Random Forests Using Cognitive Foundry: http://github.com/algorithmfoundry/Foundry Use a custom tree split Customize to run it for an hour Report a custom metric each iteration Inspect the ensemble
  • 23. 23 Describe your input and output transformations with your model Recommendation 4
  • 24. 24 Application Putting learning in an application Feature Encoding Output Decoding ? Machine Learned Model Rd ⟶ Rk Application or model code?
  • 25. 25 Example: Simple ranking system  High-level API: List<Video> rank(User u, List<Video> videos)  Example model description file: { “type”: “ScoringRanker”, “scorer”: { “type”: “FeatureScorer”, “features”: [ {“type”: “Popularity”, “days”: 10}, {“type”: “PredictedRating”} ], “function”: { “type”: “Linear”, “bias”: -0.5, “weights”: { “popularity”: 0.2, “predictedRating”: 1.2, “predictedRating*popularity”: 3.5 } } } Ranker Scorer Features Linear function Feature transformations
  • 26. 26 Maximize out a single machine before distributing your algorithms Recommendation 5
  • 27. 27 Problem: Your great new algorithm doesn’t scale  Want to run your algorithm on larger data  Temptation to go distributed  Spark/Hadoop/etc seem to make it easy  But building distributed versions of non-trivial ML algorithms is hard  Often means changing the algorithm or making lots of approximations  So try to squeeze as much out of a single machine first  Have a lot more communication bandwidth via memory than network  You will be surprised how far one machine can go  Example: Amazon announced today an X1 instance type with 2TB memory and 128 virtual CPUs
  • 28. 28 How?  Profile your code and think about memory cache layout  Small changes can have a big impact  Example: Transposing a matrix can drop computation from 100ms to 3ms  Go multicore  Algorithms like HogWild for SGD-type optimization can make this very easy  Use specialized resources like GPU (or TPU?)  Only go distributed once you’ve optimized on these dimensions (often you won’t need to)
  • 29. 29 Example: Training Neural Networks  Level 1: Machines in different AWS regions  Level 2: Machines in same AWS region  Simple: Grid search  Better: Bayesian optimization using Gaussian Processes  Mesos, Spark, etc. for coordination  Level 3: Highly optimized, parallel CUDA code on GPUs
  • 30. 30 Don’t just rely on metrics for testing Recommendation 6
  • 31. 31 Machine Learning and Testing  Temptation: Use validation metrics to test software  When things work and metrics go up this seems great  When metrics don’t improve was it the  code  data  metric  idea  …?
  • 32. 32 Reality of Testing  Machine learning code involves intricate math and logic  Rounding issues, corner cases, …  Is that a + or -? (The math or paper could be wrong.)  Solution: Unit test  Testing of metric code is especially important  Test the whole system: Just unit testing is not enough  At a minimum, compare output for unexpected changes across versions
  • 34. 34 Two ways to solve computational problems Know solution Write code Compile code Test code Deploy code Know relevant data Develop algorithmic approach Train model on data using algorithm Validate model with metrics Deploy model Software Development Machine Learning (steps may involve Software Development)
  • 35. 35 Take-aways for building machine learning software  Building machine learning is an iterative process  Make experimentation easy  Take a holistic view of application where you are placing learning  Design your algorithms to be modular  Optimize how your code runs on a single machine before going distributed  Testing can be hard but is worthwhile
  • 36. 36 Thank You Justin Basilico [email protected] @JustinBasilico We’re hiring

Editor's Notes

  • #14: http://techblog.netflix.com/2013/03/system-architectures-for.html
  • #30: http://techblog.netflix.com/2014/02/distributed-neural-networks-with-gpus.html