SlideShare a Scribd company logo
10MoreLessons
Learned from building real-life Machine Learning Systems
Xavier Amatriain (@xamat) 10/13/2015
Machine Learning
@Quora
Our Mission
“To share and grow the world’s knowledge”
● Millions of questions & answers
● Millions of users
● Thousands of topics
● ...
Demand
What we care about
Quality
Relevance
Lots of data relations
ML Applications @ Quora
● Answer ranking
● Feed ranking
● Topic recommendations
● User recommendations
● Email digest
● Ask2Answer
● Duplicate Questions
● Related Questions
● Spam/moderation
● Trending now
● ...
Models
● Logistic Regression
● Elastic Nets
● Gradient Boosted Decision
Trees
● Random Forests
● (Deep) Neural Networks
● LambdaMART
● Matrix Factorization
● LDA
● ...
10MoreLessons
Learned from implementing real-life ML systems
1.Implicitsignalsbeat
explicitones
(almostalways)
Implicit vs. Explicit
● Many have acknowledged
that implicit feedback is more useful
● Is implicit feedback really always
more useful?
● If so, why?
● Implicit data is (usually):
○ More dense, and available for all users
○ Better representative of user behavior vs.
user reflection
○ More related to final objective function
○ Better correlated with AB test results
● E.g. Rating vs watching
Implicit vs. Explicit
● However
○ It is not always the case that
direct implicit feedback correlates
well with long-term retention
○ E.g. clickbait
● Solution:
○ Combine different forms of
implicit + explicit to better represent
long-term goal
Implicit vs. Explicit
2.YourModelwilllearn
whatyouteachittolearn
Training a model
● Model will learn according to:
○ Training data (e.g. implicit and explicit)
○ Target function (e.g. probability of user reading an answer)
○ Metric (e.g. precision vs. recall)
● Example 1 (made up):
○ Optimize probability of a user going to the cinema to
watch a movie and rate it “highly” by using purchase history
and previous ratings. Use NDCG of the ranking as final
metric using only movies rated 4 or higher as positives.
Example 2 - Quora’s feed
● Training data = implicit + explicit
● Target function: Value of showing a story to a
user ~ weighted sum of actions: v = ∑a
va
1{ya
= 1}
○ predict probabilities for each action, then compute expected
value: v_pred = E[ V | x ] = ∑a
va
p(a | x)
● Metric: any ranking metric
3.Supervisedvs.plus
UnsupervisedLearning
Supervised/Unsupervised Learning
● Unsupervised learning as dimensionality reduction
● Unsupervised learning as feature engineering
● The “magic” behind combining
unsupervised/supervised learning
○ E.g.1 clustering + knn
○ E.g.2 Matrix Factorization
■ MF can be interpreted as
● Unsupervised:
○ Dimensionality Reduction a la PCA
○ Clustering (e.g. NMF)
● Supervised
○ Labeled targets ~ regression
Supervised/Unsupervised Learning
● One of the “tricks” in Deep Learning is how it
combines unsupervised/supervised learning
○ E.g. Stacked Autoencoders
○ E.g. training of convolutional nets
4.Everythingisanensemble
Ensembles
● Netflix Prize was won by an ensemble
○ Initially Bellkor was using GDBTs
○ BigChaos introduced ANN-based ensemble
● Most practical applications of ML run an ensemble
○ Why wouldn’t you?
○ At least as good as the best of your methods
○ Can add completely different approaches (e.
g. CF and content-based)
○ You can use many different models at the
ensemble layer: LR, GDBTs, RFs, ANNs...
Ensembles & Feature Engineering
● Ensembles are the way to turn any model into a feature!
● E.g. Don’t know if the way to go is to use Factorization
Machines, Tensor Factorization, or RNNs?
○ Treat each model as a “feature”
○ Feed them into an ensemble
The Master Algorithm?
It definitely is an ensemble!
5.Theoutputofyourmodel
willbetheinputofanotherone
(andotherdesignproblems)
Outputs will be inputs
● Ensembles turn any model into a feature
○ That’s great!
○ That can be a mess!
● Make sure the output of your model is ready to
accept data dependencies
○ E.g. can you easily change the distribution of the
value without affecting all other models
depending on it?
● Avoid feedback loops
● Can you treat your ML infrastructure as you would
your software one?
ML vs Software
● Can you treat your ML infrastructure as you would
your software one?
○ Yes and No
● You should apply best Software Engineering
practices (e.g. encapsulation, abstraction, cohesion,
low coupling…)
● However, Design Patterns for Machine Learning
software are not well known/documented
6.Thepains&gains
ofFeatureEngineering
Feature Engineering
● Main properties of a well-behaved ML feature
○ Reusable
○ Transformable
○ Interpretable
○ Reliable
● Reusability: You should be able to reuse features in different
models, applications, and teams
● Transformability: Besides directly reusing a feature, it
should be easy to use a transformation of it (e.g. log(f), max(f),
∑ft
over a time window…)
Feature Engineering
● Main properties of a well-behaved ML feature
○ Reusable
○ Transformable
○ Interpretable
○ Reliable
● Interpretability: In order to do any of the previous, you
need to be able to understand the meaning of features and
interpret their values.
● Reliability: It should be easy to monitor and detect bugs/issues
in features
Feature Engineering Example - Quora Answer Ranking
What is a good Quora answer?
• truthful
• reusable
• provides explanation
• well formatted
• ...
Feature Engineering Example - Quora Answer Ranking
How are those dimensions translated
into features?
• Features that relate to the answer
quality itself
• Interaction features
(upvotes/downvotes, clicks,
comments…)
• User features (e.g. expertise in topic)
7.Thetwofacesofyour
MLinfrastructure
Machine Learning Infrastructure
● Whenever you develop any ML infrastructure, you need to
target two different modes:
○ Mode 1: ML experimentation
■ Flexibility
■ Easy-to-use
■ Reusability
○ Mode 2: ML production
■ All of the above + performance & scalability
● Ideally you want the two modes to be as similar as possible
● How to combine them?
Machine Learning Infrastructure: Experimentation & Production
● Option 1:
○ Favor experimentation and only invest in productionizing
once something shows results
○ E.g. Have ML researchers use R and then ask Engineers
to implement things in production when they work
● Option 2:
○ Favor production and have “researchers” struggle to figure
out how to run experiments
○ E.g. Implement highly optimized C++ code and have ML
researchers experiment only through data available in logs/DB
Machine Learning Infrastructure: Experimentation & Production
● Option 1:
○ Favor experimentation and only invest in productionazing once
something shows results
○ E.g. Have ML researchers use R and then ask Engineers to
implement things in production when they work
● Option 2:
○ Favor production and have “researchers” struggle to figure out
how to run experiments
○ E.g. Implement highly optimized C++ code and have ML
researchers experiment only through data available in logs/DB
● Good intermediate options:
○ Have ML “researchers” experiment on iPython Notebooks using
Python tools (scikit-learn, Theano…). Use same tools in
production whenever possible, implement optimized versions
only when needed.
○ Implement abstraction layers on top of optimized
implementations so they can be accessed from regular/friendly
experimentation tools
Machine Learning Infrastructure: Experimentation & Production
8.Whyyoushouldcareabout
answeringquestions(aboutyourmodel)
Model debuggability
● Value of a model = value it brings to the product
● Product owners/stakeholders have expectations on
the product
● It is important to answer questions to why did
something fail
● Bridge gap between product design and ML algos
● Model debuggability is so important it can
determine:
○ Particular model to use
○ Features to rely on
○ Implementation of tools
Model debuggability
● E.g. Why am I seeing or not seeing
this on my homepage feed?
9.Youdon’tneedtodistribute
yourMLalgorithm
Distributing ML
● Most of what people do in practice can fit into a multi-
core machine
○ Smart data sampling
○ Offline schemes
○ Efficient parallel code
● Dangers of “easy” distributed approaches such
as Hadoop/Spark
● Do you care about costs? How about latencies?
Distributing ML
● Example of optimizing computations to fit them into
one machine
○ Spark implementation: 6 hours, 15 machines
○ Developer time: 4 days
○ C++ implementation: 10 minutes, 1 machine
● Most practical applications of Big Data can fit into
a (multicore) implementation
10.Theuntoldstoryof
DataScienceandvs.MLengineering
Data Scientists and ML Engineers
● We all know the definition of a Data Scientist
● Where do Data Scientists fit in an organization?
○ Many companies struggling with this
● Valuable to have strong DS who can bring value
from the data
● Strong DS with solid engineering skills are
unicorns and finding them is not scalable
○ DS need engineers to bring things to production
○ Engineers have enough on their plate to be willing to
“productionize” cool DS projects
The data-driven ML innovation funnel
Data Research
ML Exploration -
Product Design
AB Testing
Data Scientists and ML Engineers
● Solution:
○ (1) Define different parts of the innovation funnel
■ Part 1. Data research & hypothesis
building -> Data Science
■ Part 2. ML solution building &
implementation -> ML Engineering
■ Part 3. Online experimentation, AB
Testing analysis-> Data Science
○ (2) Broaden the definition of ML Engineers
to include from coding experts with high-level
ML knowledge to ML experts with good
software skills
Data Research
ML Solution
AB Testing
Data
Science
Data
Science
ML
Engineering
Conclusions
● Make sure you teach your model what you
want it to learn
● Ensembles and the combination of
supervised/unsupervised techniques are key
in many ML applications
● Important to focus on feature engineering
● Be thoughtful about
○ your ML infrastructure/tools
○ about organizing your teams
10 more lessons learned from building Machine Learning systems

More Related Content

What's hot (20)

PPTX
Knowledge Graph Embedding
ssuserf2f0fe
 
PDF
Introduction to the Semantic Web
Marin Dimitrov
 
PPTX
Spam Detection Using Natural Language processing
युनीक तुषार गुप्ता
 
PDF
Sentiment analysis - Our approach and use cases
Karol Chlasta
 
PPTX
DATA MINING TOOL- ORANGE
Neeraj Goswami
 
PPTX
Web mining
Tanjarul Islam Mishu
 
PDF
The Art of Social Media Analysis with Twitter & Python
Krishna Sankar
 
PPTX
Classification and Regression
Megha Sharma
 
PPTX
Dynamic Itemset Counting
Tarat Diloksawatdikul
 
PDF
Privacy preserving machine learning
Michał Kuźba
 
PDF
Context-aware Recommendation: A Quick View
YONG ZHENG
 
PDF
Knowledge graphs, meet Deep Learning
Connected Data World
 
PDF
AI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete Deck
SlideTeam
 
PDF
Building Generative AI-infused apps: what's possible and how to start
Maxim Salnikov
 
PPTX
Introduction to Graph Databases
Max De Marzi
 
PDF
Practical sentiment analysis
Diana Maynard
 
PDF
Link Analysis
Yusuke Yamamoto
 
PDF
Meta-Prod2Vec: Simple Product Embeddings with Side-Information
recsysfr
 
PPTX
Natural Language Processing
VeenaSKumar2
 
PDF
Graphs for Data Science and Machine Learning
Neo4j
 
Knowledge Graph Embedding
ssuserf2f0fe
 
Introduction to the Semantic Web
Marin Dimitrov
 
Spam Detection Using Natural Language processing
युनीक तुषार गुप्ता
 
Sentiment analysis - Our approach and use cases
Karol Chlasta
 
DATA MINING TOOL- ORANGE
Neeraj Goswami
 
The Art of Social Media Analysis with Twitter & Python
Krishna Sankar
 
Classification and Regression
Megha Sharma
 
Dynamic Itemset Counting
Tarat Diloksawatdikul
 
Privacy preserving machine learning
Michał Kuźba
 
Context-aware Recommendation: A Quick View
YONG ZHENG
 
Knowledge graphs, meet Deep Learning
Connected Data World
 
AI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete Deck
SlideTeam
 
Building Generative AI-infused apps: what's possible and how to start
Maxim Salnikov
 
Introduction to Graph Databases
Max De Marzi
 
Practical sentiment analysis
Diana Maynard
 
Link Analysis
Yusuke Yamamoto
 
Meta-Prod2Vec: Simple Product Embeddings with Side-Information
recsysfr
 
Natural Language Processing
VeenaSKumar2
 
Graphs for Data Science and Machine Learning
Neo4j
 

Similar to 10 more lessons learned from building Machine Learning systems (20)

PDF
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Xavier Amatriain
 
PDF
Practical machine learning
Faizan Javed
 
PDF
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Xavier Amatriain
 
PDF
Lessons learned from building practical deep learning systems
Xavier Amatriain
 
PPTX
Lessons Learned from Building Machine Learning Software at Netflix
Justin Basilico
 
PDF
BIG2016- Lessons Learned from building real-life user-focused Big Data systems
Xavier Amatriain
 
PPTX
Implementing Machine Learning in the Real World
Jesus Rodriguez
 
PDF
VSSML17 Review. Summary Day 2 Sessions
BigML, Inc
 
PPTX
Recommendations for Building Machine Learning Software
Justin Basilico
 
PPTX
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
MLconf
 
PDF
Making Netflix Machine Learning Algorithms Reliable
Justin Basilico
 
PPTX
Integrating Machine Learning Capabilities into your team
Cameron Vetter
 
PDF
Productionising Machine Learning Models
Tash Bickley
 
PDF
BSSML16 L10. Summary Day 2 Sessions
BigML, Inc
 
PPTX
Recommendations for Building Machine Learning Software
Justin Basilico
 
PDF
Pitfalls of machine learning in production
Antoine Sauray
 
PDF
Getting started with Machine Learning
Gaurav Bhalotia
 
PDF
Choosing a Machine Learning technique to solve your need
GibDevs
 
PDF
Overview of machine learning
SolivarLabs
 
PDF
Main principles of Data Science and Machine Learning
Nikolay Karelin
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Xavier Amatriain
 
Practical machine learning
Faizan Javed
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Xavier Amatriain
 
Lessons learned from building practical deep learning systems
Xavier Amatriain
 
Lessons Learned from Building Machine Learning Software at Netflix
Justin Basilico
 
BIG2016- Lessons Learned from building real-life user-focused Big Data systems
Xavier Amatriain
 
Implementing Machine Learning in the Real World
Jesus Rodriguez
 
VSSML17 Review. Summary Day 2 Sessions
BigML, Inc
 
Recommendations for Building Machine Learning Software
Justin Basilico
 
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
MLconf
 
Making Netflix Machine Learning Algorithms Reliable
Justin Basilico
 
Integrating Machine Learning Capabilities into your team
Cameron Vetter
 
Productionising Machine Learning Models
Tash Bickley
 
BSSML16 L10. Summary Day 2 Sessions
BigML, Inc
 
Recommendations for Building Machine Learning Software
Justin Basilico
 
Pitfalls of machine learning in production
Antoine Sauray
 
Getting started with Machine Learning
Gaurav Bhalotia
 
Choosing a Machine Learning technique to solve your need
GibDevs
 
Overview of machine learning
SolivarLabs
 
Main principles of Data Science and Machine Learning
Nikolay Karelin
 
Ad

More from Xavier Amatriain (20)

PDF
Data/AI driven product development: from video streaming to telehealth
Xavier Amatriain
 
PDF
AI-driven product innovation: from Recommender Systems to COVID-19
Xavier Amatriain
 
PDF
AI for COVID-19 - Q42020 update
Xavier Amatriain
 
PDF
AI for COVID-19: An online virtual care approach
Xavier Amatriain
 
PDF
AI for healthcare: Scaling Access and Quality of Care for Everyone
Xavier Amatriain
 
PDF
Towards online universal quality healthcare through AI
Xavier Amatriain
 
PDF
From one to zero: Going smaller as a growth strategy
Xavier Amatriain
 
PDF
Learning to speak medicine
Xavier Amatriain
 
PDF
ML to cure the world
Xavier Amatriain
 
PDF
Recommender Systems In Industry
Xavier Amatriain
 
PDF
Medical advice as a Recommender System
Xavier Amatriain
 
PDF
Past present and future of Recommender Systems: an Industry Perspective
Xavier Amatriain
 
PDF
Staying Shallow & Lean in a Deep Learning World
Xavier Amatriain
 
PDF
Machine Learning for Q&A Sites: The Quora Example
Xavier Amatriain
 
PDF
Past, present, and future of Recommender Systems: an industry perspective
Xavier Amatriain
 
PDF
Barcelona ML Meetup - Lessons Learned
Xavier Amatriain
 
PDF
10 more lessons learned from building Machine Learning systems - MLConf
Xavier Amatriain
 
PDF
Machine Learning to Grow the World's Knowledge
Xavier Amatriain
 
PDF
MLConf Seattle 2015 - ML@Quora
Xavier Amatriain
 
PDF
Lean DevOps - Lessons Learned from Innovation-driven Companies
Xavier Amatriain
 
Data/AI driven product development: from video streaming to telehealth
Xavier Amatriain
 
AI-driven product innovation: from Recommender Systems to COVID-19
Xavier Amatriain
 
AI for COVID-19 - Q42020 update
Xavier Amatriain
 
AI for COVID-19: An online virtual care approach
Xavier Amatriain
 
AI for healthcare: Scaling Access and Quality of Care for Everyone
Xavier Amatriain
 
Towards online universal quality healthcare through AI
Xavier Amatriain
 
From one to zero: Going smaller as a growth strategy
Xavier Amatriain
 
Learning to speak medicine
Xavier Amatriain
 
ML to cure the world
Xavier Amatriain
 
Recommender Systems In Industry
Xavier Amatriain
 
Medical advice as a Recommender System
Xavier Amatriain
 
Past present and future of Recommender Systems: an Industry Perspective
Xavier Amatriain
 
Staying Shallow & Lean in a Deep Learning World
Xavier Amatriain
 
Machine Learning for Q&A Sites: The Quora Example
Xavier Amatriain
 
Past, present, and future of Recommender Systems: an industry perspective
Xavier Amatriain
 
Barcelona ML Meetup - Lessons Learned
Xavier Amatriain
 
10 more lessons learned from building Machine Learning systems - MLConf
Xavier Amatriain
 
Machine Learning to Grow the World's Knowledge
Xavier Amatriain
 
MLConf Seattle 2015 - ML@Quora
Xavier Amatriain
 
Lean DevOps - Lessons Learned from Innovation-driven Companies
Xavier Amatriain
 
Ad

Recently uploaded (20)

PPTX
美国电子版毕业证南卡罗莱纳大学上州分校水印成绩单USC学费发票定做学位证书编号怎么查
Taqyea
 
PDF
WD2(I)-RFQ-GW-1415_ Shifting and Filling of Sand in the Pond at the WD5 Area_...
ShahadathHossain23
 
PPTX
MODULE 03 - CLOUD COMPUTING AND SECURITY.pptx
Alvas Institute of Engineering and technology, Moodabidri
 
PPTX
Introduction to Internal Combustion Engines - Types, Working and Camparison.pptx
UtkarshPatil98
 
PPTX
Knowledge Representation : Semantic Networks
Amity University, Patna
 
PPTX
Water Resources Engineering (CVE 728)--Slide 4.pptx
mohammedado3
 
PPTX
DATA BASE MANAGEMENT AND RELATIONAL DATA
gomathisankariv2
 
PPTX
How Industrial Project Management Differs From Construction.pptx
jamespit799
 
PPTX
Final Major project a b c d e f g h i j k l m
bharathpsnab
 
PDF
AI TECHNIQUES FOR IDENTIFYING ALTERATIONS IN THE HUMAN GUT MICROBIOME IN MULT...
vidyalalltv1
 
PDF
Pressure Measurement training for engineers and Technicians
AIESOLUTIONS
 
PPTX
Lecture 1 Shell and Tube Heat exchanger-1.pptx
mailforillegalwork
 
PPTX
What is Shot Peening | Shot Peening is a Surface Treatment Process
Vibra Finish
 
PPTX
Water Resources Engineering (CVE 728)--Slide 3.pptx
mohammedado3
 
PDF
aAn_Introduction_to_Arcadia_20150115.pdf
henriqueltorres1
 
PPTX
Biosensors, BioDevices, Biomediccal.pptx
AsimovRiyaz
 
PDF
Viol_Alessandro_Presentazione_prelaurea.pdf
dsecqyvhbowrzxshhf
 
PPTX
Worm gear strength and wear calculation as per standard VB Bhandari Databook.
shahveer210504
 
PPT
Footbinding.pptmnmkjkjkknmnnjkkkkkkkkkkkkkk
mamadoundiaye42742
 
PDF
SERVERLESS PERSONAL TO-DO LIST APPLICATION
anushaashraf20
 
美国电子版毕业证南卡罗莱纳大学上州分校水印成绩单USC学费发票定做学位证书编号怎么查
Taqyea
 
WD2(I)-RFQ-GW-1415_ Shifting and Filling of Sand in the Pond at the WD5 Area_...
ShahadathHossain23
 
MODULE 03 - CLOUD COMPUTING AND SECURITY.pptx
Alvas Institute of Engineering and technology, Moodabidri
 
Introduction to Internal Combustion Engines - Types, Working and Camparison.pptx
UtkarshPatil98
 
Knowledge Representation : Semantic Networks
Amity University, Patna
 
Water Resources Engineering (CVE 728)--Slide 4.pptx
mohammedado3
 
DATA BASE MANAGEMENT AND RELATIONAL DATA
gomathisankariv2
 
How Industrial Project Management Differs From Construction.pptx
jamespit799
 
Final Major project a b c d e f g h i j k l m
bharathpsnab
 
AI TECHNIQUES FOR IDENTIFYING ALTERATIONS IN THE HUMAN GUT MICROBIOME IN MULT...
vidyalalltv1
 
Pressure Measurement training for engineers and Technicians
AIESOLUTIONS
 
Lecture 1 Shell and Tube Heat exchanger-1.pptx
mailforillegalwork
 
What is Shot Peening | Shot Peening is a Surface Treatment Process
Vibra Finish
 
Water Resources Engineering (CVE 728)--Slide 3.pptx
mohammedado3
 
aAn_Introduction_to_Arcadia_20150115.pdf
henriqueltorres1
 
Biosensors, BioDevices, Biomediccal.pptx
AsimovRiyaz
 
Viol_Alessandro_Presentazione_prelaurea.pdf
dsecqyvhbowrzxshhf
 
Worm gear strength and wear calculation as per standard VB Bhandari Databook.
shahveer210504
 
Footbinding.pptmnmkjkjkknmnnjkkkkkkkkkkkkkk
mamadoundiaye42742
 
SERVERLESS PERSONAL TO-DO LIST APPLICATION
anushaashraf20
 

10 more lessons learned from building Machine Learning systems

  • 1. 10MoreLessons Learned from building real-life Machine Learning Systems Xavier Amatriain (@xamat) 10/13/2015
  • 3. Our Mission “To share and grow the world’s knowledge” ● Millions of questions & answers ● Millions of users ● Thousands of topics ● ...
  • 4. Demand What we care about Quality Relevance
  • 5. Lots of data relations
  • 6. ML Applications @ Quora ● Answer ranking ● Feed ranking ● Topic recommendations ● User recommendations ● Email digest ● Ask2Answer ● Duplicate Questions ● Related Questions ● Spam/moderation ● Trending now ● ...
  • 7. Models ● Logistic Regression ● Elastic Nets ● Gradient Boosted Decision Trees ● Random Forests ● (Deep) Neural Networks ● LambdaMART ● Matrix Factorization ● LDA ● ...
  • 10. Implicit vs. Explicit ● Many have acknowledged that implicit feedback is more useful ● Is implicit feedback really always more useful? ● If so, why?
  • 11. ● Implicit data is (usually): ○ More dense, and available for all users ○ Better representative of user behavior vs. user reflection ○ More related to final objective function ○ Better correlated with AB test results ● E.g. Rating vs watching Implicit vs. Explicit
  • 12. ● However ○ It is not always the case that direct implicit feedback correlates well with long-term retention ○ E.g. clickbait ● Solution: ○ Combine different forms of implicit + explicit to better represent long-term goal Implicit vs. Explicit
  • 14. Training a model ● Model will learn according to: ○ Training data (e.g. implicit and explicit) ○ Target function (e.g. probability of user reading an answer) ○ Metric (e.g. precision vs. recall) ● Example 1 (made up): ○ Optimize probability of a user going to the cinema to watch a movie and rate it “highly” by using purchase history and previous ratings. Use NDCG of the ranking as final metric using only movies rated 4 or higher as positives.
  • 15. Example 2 - Quora’s feed ● Training data = implicit + explicit ● Target function: Value of showing a story to a user ~ weighted sum of actions: v = ∑a va 1{ya = 1} ○ predict probabilities for each action, then compute expected value: v_pred = E[ V | x ] = ∑a va p(a | x) ● Metric: any ranking metric
  • 17. Supervised/Unsupervised Learning ● Unsupervised learning as dimensionality reduction ● Unsupervised learning as feature engineering ● The “magic” behind combining unsupervised/supervised learning ○ E.g.1 clustering + knn ○ E.g.2 Matrix Factorization ■ MF can be interpreted as ● Unsupervised: ○ Dimensionality Reduction a la PCA ○ Clustering (e.g. NMF) ● Supervised ○ Labeled targets ~ regression
  • 18. Supervised/Unsupervised Learning ● One of the “tricks” in Deep Learning is how it combines unsupervised/supervised learning ○ E.g. Stacked Autoencoders ○ E.g. training of convolutional nets
  • 20. Ensembles ● Netflix Prize was won by an ensemble ○ Initially Bellkor was using GDBTs ○ BigChaos introduced ANN-based ensemble ● Most practical applications of ML run an ensemble ○ Why wouldn’t you? ○ At least as good as the best of your methods ○ Can add completely different approaches (e. g. CF and content-based) ○ You can use many different models at the ensemble layer: LR, GDBTs, RFs, ANNs...
  • 21. Ensembles & Feature Engineering ● Ensembles are the way to turn any model into a feature! ● E.g. Don’t know if the way to go is to use Factorization Machines, Tensor Factorization, or RNNs? ○ Treat each model as a “feature” ○ Feed them into an ensemble
  • 22. The Master Algorithm? It definitely is an ensemble!
  • 24. Outputs will be inputs ● Ensembles turn any model into a feature ○ That’s great! ○ That can be a mess! ● Make sure the output of your model is ready to accept data dependencies ○ E.g. can you easily change the distribution of the value without affecting all other models depending on it? ● Avoid feedback loops ● Can you treat your ML infrastructure as you would your software one?
  • 25. ML vs Software ● Can you treat your ML infrastructure as you would your software one? ○ Yes and No ● You should apply best Software Engineering practices (e.g. encapsulation, abstraction, cohesion, low coupling…) ● However, Design Patterns for Machine Learning software are not well known/documented
  • 27. Feature Engineering ● Main properties of a well-behaved ML feature ○ Reusable ○ Transformable ○ Interpretable ○ Reliable ● Reusability: You should be able to reuse features in different models, applications, and teams ● Transformability: Besides directly reusing a feature, it should be easy to use a transformation of it (e.g. log(f), max(f), ∑ft over a time window…)
  • 28. Feature Engineering ● Main properties of a well-behaved ML feature ○ Reusable ○ Transformable ○ Interpretable ○ Reliable ● Interpretability: In order to do any of the previous, you need to be able to understand the meaning of features and interpret their values. ● Reliability: It should be easy to monitor and detect bugs/issues in features
  • 29. Feature Engineering Example - Quora Answer Ranking What is a good Quora answer? • truthful • reusable • provides explanation • well formatted • ...
  • 30. Feature Engineering Example - Quora Answer Ranking How are those dimensions translated into features? • Features that relate to the answer quality itself • Interaction features (upvotes/downvotes, clicks, comments…) • User features (e.g. expertise in topic)
  • 32. Machine Learning Infrastructure ● Whenever you develop any ML infrastructure, you need to target two different modes: ○ Mode 1: ML experimentation ■ Flexibility ■ Easy-to-use ■ Reusability ○ Mode 2: ML production ■ All of the above + performance & scalability ● Ideally you want the two modes to be as similar as possible ● How to combine them?
  • 33. Machine Learning Infrastructure: Experimentation & Production ● Option 1: ○ Favor experimentation and only invest in productionizing once something shows results ○ E.g. Have ML researchers use R and then ask Engineers to implement things in production when they work ● Option 2: ○ Favor production and have “researchers” struggle to figure out how to run experiments ○ E.g. Implement highly optimized C++ code and have ML researchers experiment only through data available in logs/DB
  • 34. Machine Learning Infrastructure: Experimentation & Production ● Option 1: ○ Favor experimentation and only invest in productionazing once something shows results ○ E.g. Have ML researchers use R and then ask Engineers to implement things in production when they work ● Option 2: ○ Favor production and have “researchers” struggle to figure out how to run experiments ○ E.g. Implement highly optimized C++ code and have ML researchers experiment only through data available in logs/DB
  • 35. ● Good intermediate options: ○ Have ML “researchers” experiment on iPython Notebooks using Python tools (scikit-learn, Theano…). Use same tools in production whenever possible, implement optimized versions only when needed. ○ Implement abstraction layers on top of optimized implementations so they can be accessed from regular/friendly experimentation tools Machine Learning Infrastructure: Experimentation & Production
  • 37. Model debuggability ● Value of a model = value it brings to the product ● Product owners/stakeholders have expectations on the product ● It is important to answer questions to why did something fail ● Bridge gap between product design and ML algos ● Model debuggability is so important it can determine: ○ Particular model to use ○ Features to rely on ○ Implementation of tools
  • 38. Model debuggability ● E.g. Why am I seeing or not seeing this on my homepage feed?
  • 40. Distributing ML ● Most of what people do in practice can fit into a multi- core machine ○ Smart data sampling ○ Offline schemes ○ Efficient parallel code ● Dangers of “easy” distributed approaches such as Hadoop/Spark ● Do you care about costs? How about latencies?
  • 41. Distributing ML ● Example of optimizing computations to fit them into one machine ○ Spark implementation: 6 hours, 15 machines ○ Developer time: 4 days ○ C++ implementation: 10 minutes, 1 machine ● Most practical applications of Big Data can fit into a (multicore) implementation
  • 43. Data Scientists and ML Engineers ● We all know the definition of a Data Scientist ● Where do Data Scientists fit in an organization? ○ Many companies struggling with this ● Valuable to have strong DS who can bring value from the data ● Strong DS with solid engineering skills are unicorns and finding them is not scalable ○ DS need engineers to bring things to production ○ Engineers have enough on their plate to be willing to “productionize” cool DS projects
  • 44. The data-driven ML innovation funnel Data Research ML Exploration - Product Design AB Testing
  • 45. Data Scientists and ML Engineers ● Solution: ○ (1) Define different parts of the innovation funnel ■ Part 1. Data research & hypothesis building -> Data Science ■ Part 2. ML solution building & implementation -> ML Engineering ■ Part 3. Online experimentation, AB Testing analysis-> Data Science ○ (2) Broaden the definition of ML Engineers to include from coding experts with high-level ML knowledge to ML experts with good software skills Data Research ML Solution AB Testing Data Science Data Science ML Engineering
  • 47. ● Make sure you teach your model what you want it to learn ● Ensembles and the combination of supervised/unsupervised techniques are key in many ML applications ● Important to focus on feature engineering ● Be thoughtful about ○ your ML infrastructure/tools ○ about organizing your teams