Olivier Koch, Criteo
RecSys London Meetup - Nov 8th, 2018
Large-scale
recommendation
for new users
2 •
Joint work with Ivan Lobov, Mohamed Amine
Benhalloum, Dmitry Parfenchik, Alexandre Gillotte, Alois
Bissuel, Vincent Grosbois, Sergei Lebedev, Flavian Vasile
3 •
1. Context
2. Large-scale matrix factorization with randomized SVD
3. Offline evaluation methods
4. What's next?
Outline
4 •
Buy ad space on publishers’ websites.
Build banners showing products that users will like / want to buy.
Get paid if users click / buy the product.
What / Who is Criteo again?
5 •
What / Who is Criteo again?
3 billion ads/day
5 billion products
100 ms
6 •
Retargeting
~ a few hours
7 •
Acquisition
?
~ a few days/weeks
8 •
2B users
20K partners
~1M products/partner
Hundreds of possible campaigns per user
In 50 ms!
At scale
9 •
The Acquisition pipeline
Campaign selection
Product selection
(Recommendation)
Bidding
10 •
The Acquisition pipeline
Campaign selection
Product selection
(Recommendation)
Bidding
11 •
The Acquisition pipeline
Campaign selection
Product selection
(Recommendation)
Bidding
The Recommendation problem
12 •
Instead of letting a different model do the
bidding/campaign selection, how about we do
recommendation for all user - partner pairs?
200B recommendations anyone?
Large-scale MF
with R-SVD
14 •
Singular value decomposition
A U S VT
m x n m x m m x n n x n
=
15 •
The catch
m = n = hundred of million items
16 •
Randomized SVD
Trick: Approximate A with a tall-and-tiny matrix Q
17 •
Randomized SVD
18 •
Randomized SVD
How do we find Q?
19 •
Randomized SVD
20 •
Randomized SVD
21 •
Randomized SVD
0
20
40
60
80
100
120
1
5
9
13
17
21
25
29
33
37
41
45
49
53
57
61
65
69
73
77
81
85
89
93
97
101
105
109
113
117
121
125
129
133
137
141
145
149
153
157
161
165
169
173
177
singular values
22 •
Finding structure with randomness: Probabilistic algorithms for constructing
approximate matrix decompositions, Nathan Halko, Per-Gunnar Martinsson, Joel A.
Tropp, Journal SIAM, May 2011
Randomized SVD
23 •
spark-rsvd
https://github.com/criteo/Spark-RSVD
24 •
spark-rsvd (blog post)
https://medium.com/@alois.bissuel/6695b649f519
25 •
Point-wise mutual information
26 •
Approximate nearest neighbors with Annoy
https://erikbern.com/2015/10/01/nearest-neighbors-and-vector-models-part-2-how-to-search-in-high-dimensional-spaces.html
Credits: Erik Bernhardsson
27 •
Putting it all together
User timelines
CoEvent
matrix
PMI
matrix
R-SVD
KNN
Indexing
KNN Indices
training
inference
User
embedding
Product
vectors
KNN SearchUser timelines Recommend
ations
28 •
Putting it all together
memcacheRecommen-
dations
HDFS
All users x partners
RecoService
Campaign
selection
users x ~50 partners
29 •
Putting it all together
memcacheRecommendati
ons
HDFS
All users x partners
RecoService
Campaign
selection
users x ~50 partners
Simpler
(« no model »)
Evolutive
(reco-based)
30 •
Offline pipeline runs at scale in 5-10 hours with 100 Spark
executors on ~300M timelines
Spark, scala, python
Scheduled every day
The best is the enemy of the good (good enough for an AB test)
Putting it all together
31 •
Good vs Best trade-off
Not scalable
Not prod-grade
A few weeks
Scalable
Prod-grade
Many months
Scalable
Not-quite-prod-grade
Several months
Offline
evaluation
33 •
• Global best-of (per partner)
• Mixture of « sources » (best-of-by-X) merged into a pClick
model
Baselines
34 •
Precision @ k over pairs of partners
Offline metrics
train validation
35 •
Qualitative evaluation
36 •
Qualitative evaluation
37 •
Qualitative evaluation
38 •
Qualitative evaluation
What’s next?
40 •
Fusing CF and metadata (content2vec)
Deeper representations of users and products (graph
convolutions, recurrent neural nets)
Train at scale with TF
41 •
tf-yarn: train TensorFlow models on YARN in just a few lines of code!
https://github.com/criteo/tf-yarn
42 •
Acquisition provides new challenges for Recommendation algorithms
MF (via R-SVD) is an attractive approach to try
We built a pipeline leveraging R-SVD and KNN at scale (~300M users, hundreds of
partners) with promising offline results
Qualitative evaluation matters (on top of the quantitative one)
There are many things coming up next!
Summary
43 •
Thank you!
o.koch@criteo.com
ailab.criteo.com

More Related Content

PDF
2017 09-20-criteo-recsys-london
PPTX
What can asset managers learn from Netflix?
PDF
Product Lines and Ecosystems: from customization to configuration
PPTX
20121206i jac so n project board presentation
PDF
Machine Learning Applications in Credit Risk
PPTX
CI and CD with Visual Studio Team Services and Azure
PPTX
OmniSuggest: A Ubiquitous Cloud-Based Context-Aware Recommendation System for...
PPTX
NLM Update by Dianne Babski, 18th June 2019
2017 09-20-criteo-recsys-london
What can asset managers learn from Netflix?
Product Lines and Ecosystems: from customization to configuration
20121206i jac so n project board presentation
Machine Learning Applications in Credit Risk
CI and CD with Visual Studio Team Services and Azure
OmniSuggest: A Ubiquitous Cloud-Based Context-Aware Recommendation System for...
NLM Update by Dianne Babski, 18th June 2019

Similar to Recommendation for new users at Criteo (20)

PPTX
UKSG webinar: Authentication technology update: RA21 and OpenAthens with Josh...
PPTX
Shopify - CNCF March 2025 Meetup - Presentation - 26-03-25.pptx
PDF
ECIR Recommendation Challenges
PDF
Reco4J @ Munich Meetup (April 18th)
PPTX
Guerilla Human Computer Interaction and Customer Based Design
PDF
Cognistreamer's use case
PDF
Reco4J @ London Meetup (June 26th)
PDF
Open Chemistry: Input Preparation, Data Visualization & Analysis
PDF
Microservices.pdf
PPTX
BS 8878: Systematic Approaches to Documenting Web Accessibility Policies and ...
PDF
Hazen, Morse, and Varnum "Fall 2022 ODI Conformance Statement Workshop for Li...
PDF
Ddz project new-approach-091124
PPTX
Practical Steps to Address Piracy
PDF
Multi-Agency Multi-Media Interoperable Communication, Enabled By Redis: Paul ...
PPTX
CFPB Design Manual & Capital Framework at OSCON
PDF
A flexible recommenndation system for Cable TV
PDF
A Flexible Recommendation System for Cable TV
PPTX
Agile development and operation of complex systems in multitechnology and mul...
PDF
Developing recommendation systems to support open source software developers ...
PDF
tip oopt pse-summit2017
UKSG webinar: Authentication technology update: RA21 and OpenAthens with Josh...
Shopify - CNCF March 2025 Meetup - Presentation - 26-03-25.pptx
ECIR Recommendation Challenges
Reco4J @ Munich Meetup (April 18th)
Guerilla Human Computer Interaction and Customer Based Design
Cognistreamer's use case
Reco4J @ London Meetup (June 26th)
Open Chemistry: Input Preparation, Data Visualization & Analysis
Microservices.pdf
BS 8878: Systematic Approaches to Documenting Web Accessibility Policies and ...
Hazen, Morse, and Varnum "Fall 2022 ODI Conformance Statement Workshop for Li...
Ddz project new-approach-091124
Practical Steps to Address Piracy
Multi-Agency Multi-Media Interoperable Communication, Enabled By Redis: Paul ...
CFPB Design Manual & Capital Framework at OSCON
A flexible recommenndation system for Cable TV
A Flexible Recommendation System for Cable TV
Agile development and operation of complex systems in multitechnology and mul...
Developing recommendation systems to support open source software developers ...
tip oopt pse-summit2017
Ad

Recently uploaded (20)

PDF
VSL-Strand-Post-tensioning-Systems-Technical-Catalogue_2019-01.pdf
PDF
20250617 - IR - Global Guide for HR - 51 pages.pdf
PDF
AIGA 012_04 Cleaning of equipment for oxygen service_reformat Jan 12.pdf
PDF
UEFA_Carbon_Footprint_Calculator_Methology_2.0.pdf
PDF
MLpara ingenieira CIVIL, meca Y AMBIENTAL
PDF
First part_B-Image Processing - 1 of 2).pdf
PDF
August 2025 - Top 10 Read Articles in Network Security & Its Applications
PDF
Computer organization and architecuture Digital Notes....pdf
PPTX
Micro1New.ppt.pptx the mai themes of micfrobiology
PPTX
Environmental studies, Moudle 3-Environmental Pollution.pptx
PPTX
"Array and Linked List in Data Structures with Types, Operations, Implementat...
PDF
Unit I -OPERATING SYSTEMS_SRM_KATTANKULATHUR.pptx.pdf
PDF
LOW POWER CLASS AB SI POWER AMPLIFIER FOR WIRELESS MEDICAL SENSOR NETWORK
PPTX
Sorting and Hashing in Data Structures with Algorithms, Techniques, Implement...
PDF
August -2025_Top10 Read_Articles_ijait.pdf
PDF
UEFA_Embodied_Carbon_Emissions_Football_Infrastructure.pdf
PPTX
ASME PCC-02 TRAINING -DESKTOP-NLE5HNP.pptx
PDF
distributed database system" (DDBS) is often used to refer to both the distri...
PDF
Beginners-Guide-to-Artificial-Intelligence.pdf
PPTX
Amdahl’s law is explained in the above power point presentations
VSL-Strand-Post-tensioning-Systems-Technical-Catalogue_2019-01.pdf
20250617 - IR - Global Guide for HR - 51 pages.pdf
AIGA 012_04 Cleaning of equipment for oxygen service_reformat Jan 12.pdf
UEFA_Carbon_Footprint_Calculator_Methology_2.0.pdf
MLpara ingenieira CIVIL, meca Y AMBIENTAL
First part_B-Image Processing - 1 of 2).pdf
August 2025 - Top 10 Read Articles in Network Security & Its Applications
Computer organization and architecuture Digital Notes....pdf
Micro1New.ppt.pptx the mai themes of micfrobiology
Environmental studies, Moudle 3-Environmental Pollution.pptx
"Array and Linked List in Data Structures with Types, Operations, Implementat...
Unit I -OPERATING SYSTEMS_SRM_KATTANKULATHUR.pptx.pdf
LOW POWER CLASS AB SI POWER AMPLIFIER FOR WIRELESS MEDICAL SENSOR NETWORK
Sorting and Hashing in Data Structures with Algorithms, Techniques, Implement...
August -2025_Top10 Read_Articles_ijait.pdf
UEFA_Embodied_Carbon_Emissions_Football_Infrastructure.pdf
ASME PCC-02 TRAINING -DESKTOP-NLE5HNP.pptx
distributed database system" (DDBS) is often used to refer to both the distri...
Beginners-Guide-to-Artificial-Intelligence.pdf
Amdahl’s law is explained in the above power point presentations
Ad

Recommendation for new users at Criteo