SlideShare a Scribd company logo
January 8, 2020 Page 1/27
Personalized Top-N Sequential Recommendation
via Convolutional Sequence Embedding (WSDM’18)
Jihoo Kim
datartist@hanyang.ac.kr
Dept. of Computer and Software, Hanyang University
Jiaxi Tang, Ke Wang
Simon Fraser University
January 8, 2020 Page 2/27
Jiaxi Tang
PhD Student
School of Computing Science
Simon Fraser University
Intern at Google AI
Research & Machine Intelligence Team
Ke Wang
Professor
School of Computing Science
Simon Fraser University
PhD, Georgia Institute of Technology
MS, Georgia Institute of Technology
Recent papers
Towards Neural Mixture Recommender for Long Range Dependent User Sequences (WWW’19)
Jiaxi Tang*, Francois Belletti*, Sagar Jain, Minmin Chen, Alex Beutel, Can Xu and Ed H. Chi
Ranking Distillation: Learning Compact Ranking Models With High Performance for Recommender System (KDD’18)
Jiaxi Tang, Ke Wang
Author
January 8, 2020 Page 3/27
Minimum qualifications:
• Currently enrolled in a Master’s or PhD degree in Computer Science or a related technical field.
• Experience (classroom/work) in Natural Language Understanding, Neural Networks, Computer Vision, Machine
Learning, Deep Learning, Algorithmic Foundations of Optimization, Data Science, Data Mining and/or Machine
Intelligence/Artificial Intelligence.
• Experience with one or more general purpose programming languages: Java, C++ or Python.
• Experience with research communities and/or efforts, including having published papers (being listed as author)
at conferences (e.g. NIPS, ICML, ACL, CVPR, etc).
About the job
Research and Machine Intelligence is a high impact team that’s building the next generation of intelligence and
language understanding for all Google products. To achieve this, we’re working on projects that utilize the latest
techniques in Artificial Intelligence, Machine Learning (including Deep Learning approaches like Google AI) and
Natural Language Understanding. We impact products across Google including Search, Maps and Google Now.
https://careers.google.com/jobs/results/136271419680924358-research-intern-2020/
Google AI Research Intern
January 8, 2020 Page 4/27
Contents
1. Introduction
1.1 Top-N Sequential Recommendation
1.2 Limitations of Previous Work
1.3 Contributions
2. Related Work
3. Proposed Methodology
3.1 Embedding Look-up
3.2 Convolutional Layers
3.3 Fully-connected Layers
3.4 Network Training
3.5 Recommendation
4. Experiments
4.1 Experimental Setup
4.2 Performance Comparison
4.3 Network Visualization
January 8, 2020 Page 5/27
User’s long term
and static behaviors
User’s short term
and dynamic behaviors
General
preferences
Sequential
patterns
<
always
After buying an iPhone, buy phone accessories
“I love Apple’s products”
vs
recent next
<Motivation>
1. Introduction
January 8, 2020 Page 6/27
1.1 Top-N Sequential Recommendation
Users
Items
Sequence
order
General preferences
Sequential patterns
Input Output
A list of items
for user u
<Top-N Sequential Recommendation>
<Notations>
1. Introduction
January 8, 2020 Page 7/27
1.2 Limitations of Previous Work
<Markov chain based model>
1) FPMC (Factorized Personalized Markov Chains) WWW’10
2) Fossil (Factorized Sequential Prediction with Item Similarity Model) ICDM’16
<Two major limitations>
1) Fail to model union-level* sequential patterns.
2) Fail to allow skip behaviors**.
milk flour
*Union-Level?
butter… …
**Skip behaviors?
… airport hotel
rest-
aurant
bar
attr-
action
not necessary
…
Figure 1
1. Introduction
January 8, 2020 Page 8/27
1.2 Limitations of Previous Work
To provide evidences of union-level influences and skip behaviors
minimum support count = 5
minimum confidence = 50%
X Y
sequence
Figure 2
Sequential Association
Rules
→
1. Introduction
January 8, 2020 Page 9/27
1.3 Contributions
Caser (ConvolutionAl Sequence Embedding Recommendation Model)
• Caser uses horizontal and vertical convolutional filters to capture sequential patterns
at point-level, union-level, and of skip behaviors.
• Caser models both users’ general preferences and sequential patterns, and
generalizes several existing state-of-the-art methods in a single unified framework.
• Caser outperforms state-of-the-art methods for top-N sequential recommendation on
real life data sets.
1. Introduction
January 8, 2020 Page 10/27
• Sequential pattern mining depends on the explicit representation of patterns, thus, could
miss patterns in unobserved states. (= could miss implicit patterns)
• CNN has been used to extract users’ preferences from their reviews. None of these works
is for sequential recommendation.
• RNN was used for session-based recommendation. It may not work well in sequential
recommendation, because not all adjacent actions have dependency relationships.
• Temporal recommendation is related but different problem. (Session-based is also different)
(ex. Recommend coffee in the morning, instead of evening.)
2. Related Work
January 8, 2020 Page 11/27
Figure 3
<Network Architecture of Caser>
3. Proposed Methodology
January 8, 2020 Page 12/27
The user 𝒖’s sequence
every 𝑳 successive
items
as input
their next 𝑻 items
as the targets
window of
size 𝑳 + 𝑻
The embedding for item 𝒊
d is the number of latent dimensions
𝑺 𝟏
𝒖
𝑺 𝟐
𝒖
𝑺 𝟑
𝒖
𝑺 𝟒
𝒖
𝑺 𝟓
𝒖
𝑬(𝒖,𝟑)
=
𝑸 𝑺 𝟏
𝒖
𝑸 𝑺 𝟐
𝒖
𝑬(𝒖,𝟒) =
𝑸 𝑺 𝟐
𝒖
𝑸 𝑺 𝟑
𝒖
𝑬(𝒖,𝟓) =
𝑸 𝑺 𝟑
𝒖
𝑸 𝑺 𝟒
𝒖
3.1 Embedding Look-up
3. Proposed Methodology
January 8, 2020 Page 13/27
image
local features
= 𝑳 × 𝒅 matrix 𝑬
= sequential pattern
Figure 4
Unlike image recognition,
“image” 𝑬 is not given…
and must be learnt
3.2 Convolutional Layers
3. Proposed Methodology
January 8, 2020 Page 14/27
𝑳 = 𝟒
𝒉 = 𝟐
𝒅 = 𝟑
𝑭 𝒌
∈ ℝ 𝟐×𝟑
𝒊 = 𝟏
𝒊 = 𝑳 − 𝒉 + 𝟏
= 𝟒 − 𝟐 + 𝟏
= 𝟑
𝑬 𝟏:𝟐
𝑬 𝟐:𝟑
𝑬 𝟑:𝟒
inner
product
activation
function
𝑖-th convolution value
<Max Pooling><Horizontal Filter>
𝑳 = 𝟒
𝒅 = 𝟑
෩𝑭 𝒌 ∈ ℝ 𝟒×𝟏
<Vertical Filter>
→ weighted sum
→ no max pooling
3. Proposed Methodology
𝑘-th filter
# of filter
height of filter
Convolution value (by 𝑭 𝒌
)
January 8, 2020 Page 15/27
activation function
convolutional
sequence embedding
3.3 Fully-connected Layers
the probability of
how likely user 𝒖 will interact
with item 𝒊
at time step 𝒕
3. Proposed Methodology
January 8, 2020 Page 16/27
union-level
sequential patterns
point-level
sequential patterns
short-term
sequential patterns
long-term
general preferences
3. Proposed Methodology
January 8, 2020 Page 17/27
3.4 Network Training
To train the network, we transform the values of the output layers to probabilities
sigmoid function
the collection of the time steps
for which we would like to make
predictions for user 𝒖
the likelihood of all sequences in the dataset
3. Proposed Methodology
January 8, 2020 Page 18/27
3.4 Network Training
To further capture skip behaviors, we could consider the next 𝑻 target items
Taking the negative logarithm of likelihood, we get the objective function “binary cross-entropy loss”
model parameters
hyper-parameters
are learned by minimizing the loss function (13)
are tuned on the validation set via grid search
3. Proposed Methodology
January 8, 2020 Page 19/27
3.5 Recommendation
After obtaining the trained neural network, to make recommendations for a user 𝒖 at time step 𝒕
We recommend 𝑵 items
that have the highest values
in the output layer 𝒖
𝒖’s last 𝑳 items’
embedding 𝑬(𝒖,𝒕)
𝒖’s latent
embedding 𝑷 𝒖
Input Output
3. Proposed Methodology
January 8, 2020 Page 20/27
4.1 Experimental Setup
<Datasets>
Amazon data was not used, due to its SI
0.0026 for ‘Office Products’
0.0019 for ‘Clothing’ / ‘Shoes’ / ‘Jewelry’ / ‘Video Games’
70% 10% 20%
validation testtraining
sequence
4. Experiments
January 8, 2020 Page 21/27
<Evaluation Metrics>
4.1 Experimental Setup
MAP(Mean Average Precision): the average of AP for all users
Precision, Recall
top 𝑵 predicted items
for a user
the last 20% of actions
in user’s sequence (= test set)
4. Experiments
January 8, 2020 Page 22/27
4.2 Performance Comparison
4. Experiments
January 8, 2020 Page 23/27
4.2 Performance Comparison
<Influence of hyper-parameter 𝒅, 𝑳, 𝑻,>
4. Experiments
January 8, 2020 Page 24/27
4.2 Performance Comparison
<Analysis of Caser Components>
𝒉 denotes horizontal convolutional layer
𝒗 denotes vertical convolutional layer
𝒑 denotes personalization
Any missing component is represented
by setting its corresponding 𝒐, ෥𝒐, 𝑷 𝒖 to zero.
4. Experiments
January 8, 2020 Page 25/27
4.3 Network Visualization
Caser puts more emphasis on recent actions,
demonstrating a major difference from the conventional top-N recommendation.
<Vertical convolutional filters>
4. Experiments
January 8, 2020 Page 26/27
4.3 Network Visualization
<Horizontal convolutional filters>
<Previous Sequence>
𝑺 𝟏 (13th Warrior) History
𝑺 𝟐 (American Beauty), Romance
𝑺 𝟑 (Star Trek), Action & SF
𝑺 𝟒 (Star Trek III)
𝑺 𝟓 (Star Trek IV)
<Predictions>
𝑹 𝟏 (Mad Max)
𝑹 𝟐 (Star War)
𝑹 𝟑 (Star Trek) >> Ground Truth
4. Experiments
January 8, 2020 Page 27/27
Thank you!
Q & A

More Related Content

What's hot (20)

PPTX
K MEANS CLUSTERING
singh7599
 
PPTX
Density based clustering
YaswanthHariKumarVud
 
PDF
KNN Algorithm Using R | Edureka
Edureka!
 
PPTX
Fault tolerance in distributed systems
sumitjain2013
 
PDF
Sentiment Analysis
Data Science Society
 
PDF
Density Based Clustering
SSA KPI
 
PDF
Consensus in distributed computing
Ruben Tan
 
PDF
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
Ono Shigeru
 
PPTX
K Nearest Neighbor Algorithm
Tharuka Vishwajith Sarathchandra
 
PPTX
Recommender Systems
Girish Khanzode
 
PPTX
Anomaly detection using deep one class classifier
홍배 김
 
PDF
Rnn and lstm
Shreshth Saxena
 
PPTX
Network flows
Luckshay Batra
 
PDF
Introduction to IPython & Jupyter Notebooks
Eueung Mulyana
 
PPTX
Recurrent Neural Network
Mohammad Sabouri
 
PDF
Matrix Factorization
Yusuke Yamamoto
 
PPTX
Recommender systems: Content-based and collaborative filtering
Viet-Trung TRAN
 
PPTX
Deep learning
Ratnakar Pandey
 
PDF
Recommender Systems
Francesco Casalegno
 
PPTX
Data mining Measuring similarity and desimilarity
Rushali Deshmukh
 
K MEANS CLUSTERING
singh7599
 
Density based clustering
YaswanthHariKumarVud
 
KNN Algorithm Using R | Edureka
Edureka!
 
Fault tolerance in distributed systems
sumitjain2013
 
Sentiment Analysis
Data Science Society
 
Density Based Clustering
SSA KPI
 
Consensus in distributed computing
Ruben Tan
 
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
Ono Shigeru
 
K Nearest Neighbor Algorithm
Tharuka Vishwajith Sarathchandra
 
Recommender Systems
Girish Khanzode
 
Anomaly detection using deep one class classifier
홍배 김
 
Rnn and lstm
Shreshth Saxena
 
Network flows
Luckshay Batra
 
Introduction to IPython & Jupyter Notebooks
Eueung Mulyana
 
Recurrent Neural Network
Mohammad Sabouri
 
Matrix Factorization
Yusuke Yamamoto
 
Recommender systems: Content-based and collaborative filtering
Viet-Trung TRAN
 
Deep learning
Ratnakar Pandey
 
Recommender Systems
Francesco Casalegno
 
Data mining Measuring similarity and desimilarity
Rushali Deshmukh
 

Similar to [Paper Review] Personalized Top-N Sequential Recommendation via Convolutional Sequence Embedding (WSDM’18) (20)

PPTX
250310_JH_labseminar[CASER : Personalized Top-N Sequential Recommendation via...
thanhdowork
 
PPTX
230727_HB_JointJournalClub.pptx
Network Science Lab, The Catholic University of Korea
 
PDF
Introduction to Recommender Systems
Turi, Inc.
 
PDF
Frequently Bought Together Recommendations Based on Embeddings
Databricks
 
PDF
Handwritten Text Recognition Using Machine Learning
IRJET Journal
 
PDF
Recent Trends in Personalization: A Netflix Perspective
Justin Basilico
 
PDF
Crafting Recommenders: the Shallow and the Deep of it!
Sudeep Das, Ph.D.
 
PPT
Watch-It-Next: A Contextual TV Recommendation System
Raz Nissim
 
PDF
Andrew Clegg, Data Scientician & Machine Learning Engine-Driver: "Deep produc...
Dataconomy Media
 
PDF
Introduction to Recommender System
WQ Fan
 
PDF
Deep Learning for Recommender Systems
Yves Raimond
 
PDF
Further enhancements of recommender systems using deep learning
Institute of Contemporary Sciences
 
PDF
Machine learning advanced applications
Kaniska Mandal
 
PDF
LatentCross.pdf
NilanjanSarkar25
 
PPTX
Reinforcement Learning in Recommender Systems.pptx
AhmadLukyRamdani
 
PDF
Recent Trends in Personalization at Netflix
Förderverein Technische Fakultät
 
PPTX
Rokach-GomaxSlides (1).pptx
Jadna Almeida
 
PPTX
Rokach-GomaxSlides.pptx
Jadna Almeida
 
PPTX
Teacher training material
Vikram Parmar
 
PDF
Temporal Learning and Sequence Modeling for a Job Recommender System
Anoop Kumar
 
250310_JH_labseminar[CASER : Personalized Top-N Sequential Recommendation via...
thanhdowork
 
Introduction to Recommender Systems
Turi, Inc.
 
Frequently Bought Together Recommendations Based on Embeddings
Databricks
 
Handwritten Text Recognition Using Machine Learning
IRJET Journal
 
Recent Trends in Personalization: A Netflix Perspective
Justin Basilico
 
Crafting Recommenders: the Shallow and the Deep of it!
Sudeep Das, Ph.D.
 
Watch-It-Next: A Contextual TV Recommendation System
Raz Nissim
 
Andrew Clegg, Data Scientician & Machine Learning Engine-Driver: "Deep produc...
Dataconomy Media
 
Introduction to Recommender System
WQ Fan
 
Deep Learning for Recommender Systems
Yves Raimond
 
Further enhancements of recommender systems using deep learning
Institute of Contemporary Sciences
 
Machine learning advanced applications
Kaniska Mandal
 
LatentCross.pdf
NilanjanSarkar25
 
Reinforcement Learning in Recommender Systems.pptx
AhmadLukyRamdani
 
Recent Trends in Personalization at Netflix
Förderverein Technische Fakultät
 
Rokach-GomaxSlides (1).pptx
Jadna Almeida
 
Rokach-GomaxSlides.pptx
Jadna Almeida
 
Teacher training material
Vikram Parmar
 
Temporal Learning and Sequence Modeling for a Job Recommender System
Anoop Kumar
 
Ad

Recently uploaded (20)

PDF
Data Chunking Strategies for RAG in 2025.pdf
Tamanna
 
PPTX
GenAI-Introduction-to-Copilot-for-Bing-March-2025-FOR-HUB.pptx
cleydsonborges1
 
PDF
apidays Helsinki & North 2025 - APIs in the healthcare sector: hospitals inte...
apidays
 
PDF
R Cookbook - Processing and Manipulating Geological spatial data with R.pdf
OtnielSimopiaref2
 
PDF
WEF_Future_of_Global_Fintech_Second_Edition_2025.pdf
AproximacionAlFuturo
 
PPTX
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
PDF
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
PDF
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
PPT
Lecture 2-1.ppt at a higher learning institution such as the university of Za...
rachealhantukumane52
 
PPTX
加拿大尼亚加拉学院毕业证书{Niagara在读证明信Niagara成绩单修改}复刻
Taqyea
 
PDF
List of all the AI prompt cheat codes.pdf
Avijit Kumar Roy
 
PDF
Web Scraping with Google Gemini 2.0 .pdf
Tamanna
 
PDF
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
PPTX
Resmed Rady Landis May 4th - analytics.pptx
Adrian Limanto
 
PDF
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
PPTX
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
DOCX
AI/ML Applications in Financial domain projects
Rituparna De
 
PPTX
AI Presentation Tool Pitch Deck Presentation.pptx
ShyamPanthavoor1
 
PDF
How to Avoid 7 Costly Mainframe Migration Mistakes
JP Infra Pvt Ltd
 
PPT
Data base management system Transactions.ppt
gandhamcharan2006
 
Data Chunking Strategies for RAG in 2025.pdf
Tamanna
 
GenAI-Introduction-to-Copilot-for-Bing-March-2025-FOR-HUB.pptx
cleydsonborges1
 
apidays Helsinki & North 2025 - APIs in the healthcare sector: hospitals inte...
apidays
 
R Cookbook - Processing and Manipulating Geological spatial data with R.pdf
OtnielSimopiaref2
 
WEF_Future_of_Global_Fintech_Second_Edition_2025.pdf
AproximacionAlFuturo
 
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
Lecture 2-1.ppt at a higher learning institution such as the university of Za...
rachealhantukumane52
 
加拿大尼亚加拉学院毕业证书{Niagara在读证明信Niagara成绩单修改}复刻
Taqyea
 
List of all the AI prompt cheat codes.pdf
Avijit Kumar Roy
 
Web Scraping with Google Gemini 2.0 .pdf
Tamanna
 
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
Resmed Rady Landis May 4th - analytics.pptx
Adrian Limanto
 
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
AI/ML Applications in Financial domain projects
Rituparna De
 
AI Presentation Tool Pitch Deck Presentation.pptx
ShyamPanthavoor1
 
How to Avoid 7 Costly Mainframe Migration Mistakes
JP Infra Pvt Ltd
 
Data base management system Transactions.ppt
gandhamcharan2006
 
Ad

[Paper Review] Personalized Top-N Sequential Recommendation via Convolutional Sequence Embedding (WSDM’18)

  • 1. January 8, 2020 Page 1/27 Personalized Top-N Sequential Recommendation via Convolutional Sequence Embedding (WSDM’18) Jihoo Kim [email protected] Dept. of Computer and Software, Hanyang University Jiaxi Tang, Ke Wang Simon Fraser University
  • 2. January 8, 2020 Page 2/27 Jiaxi Tang PhD Student School of Computing Science Simon Fraser University Intern at Google AI Research & Machine Intelligence Team Ke Wang Professor School of Computing Science Simon Fraser University PhD, Georgia Institute of Technology MS, Georgia Institute of Technology Recent papers Towards Neural Mixture Recommender for Long Range Dependent User Sequences (WWW’19) Jiaxi Tang*, Francois Belletti*, Sagar Jain, Minmin Chen, Alex Beutel, Can Xu and Ed H. Chi Ranking Distillation: Learning Compact Ranking Models With High Performance for Recommender System (KDD’18) Jiaxi Tang, Ke Wang Author
  • 3. January 8, 2020 Page 3/27 Minimum qualifications: • Currently enrolled in a Master’s or PhD degree in Computer Science or a related technical field. • Experience (classroom/work) in Natural Language Understanding, Neural Networks, Computer Vision, Machine Learning, Deep Learning, Algorithmic Foundations of Optimization, Data Science, Data Mining and/or Machine Intelligence/Artificial Intelligence. • Experience with one or more general purpose programming languages: Java, C++ or Python. • Experience with research communities and/or efforts, including having published papers (being listed as author) at conferences (e.g. NIPS, ICML, ACL, CVPR, etc). About the job Research and Machine Intelligence is a high impact team that’s building the next generation of intelligence and language understanding for all Google products. To achieve this, we’re working on projects that utilize the latest techniques in Artificial Intelligence, Machine Learning (including Deep Learning approaches like Google AI) and Natural Language Understanding. We impact products across Google including Search, Maps and Google Now. https://careers.google.com/jobs/results/136271419680924358-research-intern-2020/ Google AI Research Intern
  • 4. January 8, 2020 Page 4/27 Contents 1. Introduction 1.1 Top-N Sequential Recommendation 1.2 Limitations of Previous Work 1.3 Contributions 2. Related Work 3. Proposed Methodology 3.1 Embedding Look-up 3.2 Convolutional Layers 3.3 Fully-connected Layers 3.4 Network Training 3.5 Recommendation 4. Experiments 4.1 Experimental Setup 4.2 Performance Comparison 4.3 Network Visualization
  • 5. January 8, 2020 Page 5/27 User’s long term and static behaviors User’s short term and dynamic behaviors General preferences Sequential patterns < always After buying an iPhone, buy phone accessories “I love Apple’s products” vs recent next <Motivation> 1. Introduction
  • 6. January 8, 2020 Page 6/27 1.1 Top-N Sequential Recommendation Users Items Sequence order General preferences Sequential patterns Input Output A list of items for user u <Top-N Sequential Recommendation> <Notations> 1. Introduction
  • 7. January 8, 2020 Page 7/27 1.2 Limitations of Previous Work <Markov chain based model> 1) FPMC (Factorized Personalized Markov Chains) WWW’10 2) Fossil (Factorized Sequential Prediction with Item Similarity Model) ICDM’16 <Two major limitations> 1) Fail to model union-level* sequential patterns. 2) Fail to allow skip behaviors**. milk flour *Union-Level? butter… … **Skip behaviors? … airport hotel rest- aurant bar attr- action not necessary … Figure 1 1. Introduction
  • 8. January 8, 2020 Page 8/27 1.2 Limitations of Previous Work To provide evidences of union-level influences and skip behaviors minimum support count = 5 minimum confidence = 50% X Y sequence Figure 2 Sequential Association Rules → 1. Introduction
  • 9. January 8, 2020 Page 9/27 1.3 Contributions Caser (ConvolutionAl Sequence Embedding Recommendation Model) • Caser uses horizontal and vertical convolutional filters to capture sequential patterns at point-level, union-level, and of skip behaviors. • Caser models both users’ general preferences and sequential patterns, and generalizes several existing state-of-the-art methods in a single unified framework. • Caser outperforms state-of-the-art methods for top-N sequential recommendation on real life data sets. 1. Introduction
  • 10. January 8, 2020 Page 10/27 • Sequential pattern mining depends on the explicit representation of patterns, thus, could miss patterns in unobserved states. (= could miss implicit patterns) • CNN has been used to extract users’ preferences from their reviews. None of these works is for sequential recommendation. • RNN was used for session-based recommendation. It may not work well in sequential recommendation, because not all adjacent actions have dependency relationships. • Temporal recommendation is related but different problem. (Session-based is also different) (ex. Recommend coffee in the morning, instead of evening.) 2. Related Work
  • 11. January 8, 2020 Page 11/27 Figure 3 <Network Architecture of Caser> 3. Proposed Methodology
  • 12. January 8, 2020 Page 12/27 The user 𝒖’s sequence every 𝑳 successive items as input their next 𝑻 items as the targets window of size 𝑳 + 𝑻 The embedding for item 𝒊 d is the number of latent dimensions 𝑺 𝟏 𝒖 𝑺 𝟐 𝒖 𝑺 𝟑 𝒖 𝑺 𝟒 𝒖 𝑺 𝟓 𝒖 𝑬(𝒖,𝟑) = 𝑸 𝑺 𝟏 𝒖 𝑸 𝑺 𝟐 𝒖 𝑬(𝒖,𝟒) = 𝑸 𝑺 𝟐 𝒖 𝑸 𝑺 𝟑 𝒖 𝑬(𝒖,𝟓) = 𝑸 𝑺 𝟑 𝒖 𝑸 𝑺 𝟒 𝒖 3.1 Embedding Look-up 3. Proposed Methodology
  • 13. January 8, 2020 Page 13/27 image local features = 𝑳 × 𝒅 matrix 𝑬 = sequential pattern Figure 4 Unlike image recognition, “image” 𝑬 is not given… and must be learnt 3.2 Convolutional Layers 3. Proposed Methodology
  • 14. January 8, 2020 Page 14/27 𝑳 = 𝟒 𝒉 = 𝟐 𝒅 = 𝟑 𝑭 𝒌 ∈ ℝ 𝟐×𝟑 𝒊 = 𝟏 𝒊 = 𝑳 − 𝒉 + 𝟏 = 𝟒 − 𝟐 + 𝟏 = 𝟑 𝑬 𝟏:𝟐 𝑬 𝟐:𝟑 𝑬 𝟑:𝟒 inner product activation function 𝑖-th convolution value <Max Pooling><Horizontal Filter> 𝑳 = 𝟒 𝒅 = 𝟑 ෩𝑭 𝒌 ∈ ℝ 𝟒×𝟏 <Vertical Filter> → weighted sum → no max pooling 3. Proposed Methodology 𝑘-th filter # of filter height of filter Convolution value (by 𝑭 𝒌 )
  • 15. January 8, 2020 Page 15/27 activation function convolutional sequence embedding 3.3 Fully-connected Layers the probability of how likely user 𝒖 will interact with item 𝒊 at time step 𝒕 3. Proposed Methodology
  • 16. January 8, 2020 Page 16/27 union-level sequential patterns point-level sequential patterns short-term sequential patterns long-term general preferences 3. Proposed Methodology
  • 17. January 8, 2020 Page 17/27 3.4 Network Training To train the network, we transform the values of the output layers to probabilities sigmoid function the collection of the time steps for which we would like to make predictions for user 𝒖 the likelihood of all sequences in the dataset 3. Proposed Methodology
  • 18. January 8, 2020 Page 18/27 3.4 Network Training To further capture skip behaviors, we could consider the next 𝑻 target items Taking the negative logarithm of likelihood, we get the objective function “binary cross-entropy loss” model parameters hyper-parameters are learned by minimizing the loss function (13) are tuned on the validation set via grid search 3. Proposed Methodology
  • 19. January 8, 2020 Page 19/27 3.5 Recommendation After obtaining the trained neural network, to make recommendations for a user 𝒖 at time step 𝒕 We recommend 𝑵 items that have the highest values in the output layer 𝒖 𝒖’s last 𝑳 items’ embedding 𝑬(𝒖,𝒕) 𝒖’s latent embedding 𝑷 𝒖 Input Output 3. Proposed Methodology
  • 20. January 8, 2020 Page 20/27 4.1 Experimental Setup <Datasets> Amazon data was not used, due to its SI 0.0026 for ‘Office Products’ 0.0019 for ‘Clothing’ / ‘Shoes’ / ‘Jewelry’ / ‘Video Games’ 70% 10% 20% validation testtraining sequence 4. Experiments
  • 21. January 8, 2020 Page 21/27 <Evaluation Metrics> 4.1 Experimental Setup MAP(Mean Average Precision): the average of AP for all users Precision, Recall top 𝑵 predicted items for a user the last 20% of actions in user’s sequence (= test set) 4. Experiments
  • 22. January 8, 2020 Page 22/27 4.2 Performance Comparison 4. Experiments
  • 23. January 8, 2020 Page 23/27 4.2 Performance Comparison <Influence of hyper-parameter 𝒅, 𝑳, 𝑻,> 4. Experiments
  • 24. January 8, 2020 Page 24/27 4.2 Performance Comparison <Analysis of Caser Components> 𝒉 denotes horizontal convolutional layer 𝒗 denotes vertical convolutional layer 𝒑 denotes personalization Any missing component is represented by setting its corresponding 𝒐, ෥𝒐, 𝑷 𝒖 to zero. 4. Experiments
  • 25. January 8, 2020 Page 25/27 4.3 Network Visualization Caser puts more emphasis on recent actions, demonstrating a major difference from the conventional top-N recommendation. <Vertical convolutional filters> 4. Experiments
  • 26. January 8, 2020 Page 26/27 4.3 Network Visualization <Horizontal convolutional filters> <Previous Sequence> 𝑺 𝟏 (13th Warrior) History 𝑺 𝟐 (American Beauty), Romance 𝑺 𝟑 (Star Trek), Action & SF 𝑺 𝟒 (Star Trek III) 𝑺 𝟓 (Star Trek IV) <Predictions> 𝑹 𝟏 (Mad Max) 𝑹 𝟐 (Star War) 𝑹 𝟑 (Star Trek) >> Ground Truth 4. Experiments
  • 27. January 8, 2020 Page 27/27 Thank you! Q & A