Introduction to recommender systems

An introduction to recommendation algorithms
Collaborative ﬁltering: how does it work?
Arnaud de Myttenaere

About me
Data Scientist, PhD
Founder of Uchidata
Consultant at Octo Technology, Sydney
Several projects on recommendation
algorithms (Viadeo social network,
e-commerce, news website, . . . )

How do recommendation algorithms work?

Context
Available data
Personal information Historical behavior
Objective
→ If a young man is a fan of Daft Punk, what’s the best artist we
could recommend to him?

The diﬀerent approaches
Model Based
Memory Based
Collaborative ﬁltering using graph libraries
Context
A simple example
Cosine similarity
R code
More formally
Notations
Similarity function
Cosine similarity
Conclusion

Model Based approach
1. Build a dataset which summarizes the data
UserId Like Gender Age Artist Style Country
1 1 M 25 Daft Punk Electro France
1 0 M 25 Lady Gaga Pop USA
2 1 F 20 The Beatles Rock UK
target user information item information
2. Learn a model to predict the target variable using your favorite
algorithm: Linear Regression, Random Forest, XGBoost, . . .
3. For each new customer, apply the model on a set of artists
and recommend the ones with the highest scores.

Memory Based approach
How to recommend items to a particular customer or user?
For each new customer:
Search for similar customers in
historical data
Recommend popular items among
similar customers
Example: Collaborative Filtering.

Summary
Model based
Memory based

Collaborative ﬁltering: why?
Collaborative Filtering algorithms1 ..
are intuitive
are simple to implement
scales relatively well
captures many implicit signals
are hard to beat
1
Criteo Product Recommendation Beyond Collaborative Filtering - welcome
to the Twilight Zone!, Olivier Koch RecSys Meetup London - Sept 20, 2017

But Collaborative Filtering algorithms also have some limitations2 :
does not scales that well in practice
does not capture temporal signals
does not solve cold start
does not address exploration in the long tail
2
Criteo Product Recommendation Beyond Collaborative Filtering - welcome
to the Twilight Zone!, Olivier Koch RecSys Meetup London - Sept 20, 2017

Context
Let us consider the following example:
John likes Rock
Mike likes Pop and Electro
Dan likes Pop, R&B and Rock
Lea likes Pop
This information can be loaded in the folowing dataset:
Customer Item
John Rock
Mike Pop
Mike Electro
Dan Pop
Dan R&B
Dan Rock
Lea Pop
Objective: ﬁnd the best recommendation for Lea.

Graph visualization
The data can be visualized as a (bipartite) graph.
Code : R
Library : igraph
JohnMike Dan Lea
Rock PopElectro R&B
1 l i b r a r y ( i g r a p h )
d = read . csv ( ” data . csv ” ) # Load Data
3 g = graph . data . frame ( d ) # Load Data i n t o graph
5 V( g ) $ type <− V( g ) $name %i n% d$ Item # Set graph as b i p a r t i t e
p l o t (g , l a y o u t=l a y o u t . b i p a r t i t e , v e r t e x . c o l o r=c ( ” green ” , ” cyan ” ) [V( g ) $ type +1])

Incidence matrix
This graph can be represented by a matrix...






Rock Pop Electro R&B
John : 1 0 0 0
Mike : 0 1 1 0
Dan : 1 1 0 1
Lea : 0 1 0 0






A = get . i n c i d e n c e (g , s p a r s e=TRUE)

Incidence matrices
... or by two matrices:
Atrain =




John : 1 0 0 0
Mike : 0 1 1 0
Dan : 1 1 0 1




Atest =
Lea : 0 1 0 0
1 A t r a i n = A[ which ( rownames (A) != ”Lea” ) , ]
A t e s t = A[ which ( rownames (A) == ”Lea” ) , ]

If similarity is the number of items in common (1/2)
The similarity vector is given by:
SimMatrix = Atrain ⊗ t(Atest)
i.e.
SimMatrix =


1 0 0 0
0 1 1 0
1 1 0 1

 ⊗




0
1
0
0



 =


0
1
1


John
Mike
Dan
Indeed Lea does not have any item in common with John, but has
1 item in common with Mike and Dan (Pop).
sim matrix = A t r a i n % ∗ % A t e s t

If similarity is the number of items in common (2/2)
Then the recommendation scores are given by
scoreMatrix = t(SimMatrix) ⊗ Atrain
i.e.
scoreMatrix = 0 1 1 ⊗


1 0 0 0
0 1 1 0
1 1 0 1


So
scoreMatrix =
(1 2 1 1)
1 s c o r e M a t r i x = t ( as . matrix ( sim matrix ) ) % ∗ % A t r a i n

Comments
If similarity is the number of items in common...
→ not optimal since users with a lot of items will be very similar
to (almost) every user.
→ hard to use because leads to a lot of items with the same
recommendation score.
Better similarity metric: cosine similarity
→ Idea: normalize the similarity using the number of items
associated to each users.

Cosine similarity (1/3)
Using the same data:
Atrain =




John : 1 0 0 0
Mike : 0 1 1 0
Dan : 1 1 0 1




The normalization vector is:
Ntrain =


1
1/
√
2
1/
√
3


John is associated to 1 item (Rock)
Mike is associated to 2 items (Pop, Electro)
Dan is associated to 3 items

Then the normalized similarity matrix is given by:
SimMatrixnorm =



1 . .
. 1√
2
.
. . 1√
3


⊗


0
1
1

 =


0
1/
√
2
1/
√
3




0
0.71
0.58


John
Mike
Dan
→Lea is more similar to Mike than Dan, because she has an item
in common with both but Mike is associated to less items than
Dan, so the link with Mike is stronger.
1 N t r a i n = 1/ s q r t (A t r a i n % ∗ % rep (1 , ncol (A) ) )
M norm = diag ( as . v e c t o r (N t r a i n ) )

The matrix of scores is given by:
scoreMatrix = t(SimMatrixnorm) ⊗ Atrain
i.e.
scoreMatrix = 0 0.71 0.58 ⊗


1 0 0 0
0 1 1 0
1 1 0 1


So
scoreMatrix =
(0.58 1.29 0.71 0.58)
s c o r e M a t r i x = t ( as . matrix ( sim matrix norm ) ) % ∗ % A t r a i n

Comments
For
Atest =
Lea : 0 1 0 0
Recommendation scores are
scoreMatrix =
(0.58 1.29 0.71 0.58)
1. Pop is the best recommendation for Lea, but she is already
associated to Pop.
2. If the objective is to recommend new items, Electro is the
best recommendation for Lea.
3. Rock and R&B have the same score and can be ordered by
frequencies or randomly.

R code
Collaborative ﬁltering in 10 lines of code.
1 l i b r a r y ( i g r a p h ) # Load graph l i b r a r y
2 g = graph . data . frame ( read . csv ( ” data . csv ” ) ) # Read and con vert data i n t o graph
V( g ) $ type <− V( g ) $name %i n% d$ Item # Set graph as b i p a r t i t e
4 A = get . i n c i d e n c e (g , s p a r s e=TRUE) # Compute I n c i d e n c e Matrix
5 A t r a i n = A[ which ( rownames (A) != ”Lea” ) , ]
A t e s t = A[ which ( rownames (A) == ”Lea” ) , ]
N t r a i n = 1/ s q r t (A t r a i n % ∗ % rep (1 , ncol (A) ) )
8 M norm = diag ( as . v e c t o r (N t r a i n ) )
sim matrix norm = M norm % ∗ % (A t r a i n % ∗ % A t e s t )
10 s c o r e M a t r i x = t ( as . matrix ( sim matrix norm ) ) % ∗ % A t r a i n

In practice
Can be precomputed and do not need to be updated in real time :
Atrain and Mnorm
Must be computed in real time :
Atest and scoreMatrix (matrix calculation)
Optimal number of users :
too small → bad performance,
too big → too slow (unless computation is parallelized).

Collaborative Filtering
Notations
Let Iu(t) be the vector of items associated to a user u at time t:
Iu(t) = (0, 0, . . . , 1, . . . , 0)
where the kth coeﬃcient is equal to 1 if item k is associated to
user u at time t, and 0 else.
Example: in music recommendation an ”item” could be an artist (or a song), and
coeﬃcient k is equal to 1 if the user u likes the artist (or song) k.

Collaborative Filtering
Then, for t > t, the collaborative ﬁltering algorithm estimates
Iu(t ) (the future vector of items associated to the user u) by:
Iu(t ) =
v=u
sim(v, u) · Iv (t)
where sim(v, u) represents the similarity between users u and v.
→ The most relevant items for the user u are the ones with the
highest score.

Similarity function
The similarity between two users can be deﬁned as the number of
items in common. Then
sim(u, v) = Iu|Iv
where ·|· is the classical scalar product.
→ not optimal since users with a lot of items will be very similar to
every user.

Cosine similarity
Cosine similarity One can normalize the similarity by the number of
items associated to users u and v.
sim(u, v) =
Iu|Iv
Iu · Iv
However, as
Iu(t ) =
v=u
Iu|Iv
Iu · Iv
· Iv =
1
Iu
·
v=u
Iu|Iv
Iv
· Iv
The order of recommendations for the user u is the same than the
ones got with sim(u, v) = Iu|Iv
√
Iv
.
→ In practice we can use sim(u, v) = Iu|Iv
√
Iv
.

Conclusion
Two different approaches :
Model Based
Memory Based
→ Choose the number of users in Atrain to fit your practical
constraints,
→ The definition of similarity between users can be modified to
consider users and context.

Conclusion
However
This algorithm is based on past behavior, so it never suggests
new content.
→ It is necessary to often refresh the training set.

Introduction to recommender systems

More Related Content

What's hot (19)

Similar to Introduction to recommender systems (20)

Recently uploaded (20)

Introduction to recommender systems