250310_JH_labseminar[CASER : Personalized Top-N Sequential Recommendation via Convolutional Sequence Embedding].pptx

CASER : Personalized Top-N
Sequential Recommendation via
Convolutional Sequence Embedding
Ju-Hee SHIM
Network Science Lab
Dept. of AI
The Catholic University of Korea
E-mail: shim020616@catholic.ac.kr
Jiaxi Tang, Ke Wang
WSDM 2018

2
 INTRODUCTION
• Motivation
 Architecture
• CASER
• Method
 Evaluation
• Datasets
• State-of-the-art methods
• Experimental setup
• Results
 CONCLUSION
Q/A

3
INTRODUCTION
Motivation
 Problems of Existing top-N Recommendation Models
• User’s general preferences are used as the basis for recommendations
• User’s general preferences : Reflect only static behavioral information of the user (ex. A person who likes
Samsung products is only recommended Samsung products, and a person who likes Apple products is
only recommended Apple products)
• However, these unidirectional models have the following limitations:
 Simply recommend items related IPHONE
: losing the opportunity to recommend phone accessories

4
INTRODUCTION
Motivation
 Limitations of Traditional Markov Chain-Based Models
a) Point-Level : The probability of purchasing a specific item often increases when multiple past items are
combined -> fail to capture this effect (ex. A user who buys milk and butter it likely to purchase flour, but this
is not reflected)
b) Skip Behaviors : Unable to account for skipped behaviors -> Traditional models assume continuous influence,
but in reall-world data, “skip” frequently occur

5
Architecture
 Transforming User Sequences into a Matrix “IMAGE” :
• Applying CNN :
• Convert the traditional 1D item sequence into an L x d matrix.
• L : the most recent L items
• d : Embedding dimension
• Horizontal Filters : Learning Union-Level Sequential Patterns
• Capturing patterns where multiple item combinations influence behavior
• Vertical Filters : Learning Point-Level Sequential Patterns
• Similar to traditional Markov Chain approaches
• Adding User embedding :
• Incorporate User Embedding to model Long-term user preferences effectively
CASER

6
Architecture
 Transformer Layer:
• Consists of L bidirectional Transformer layers.
• Each layer refines the user behavior sequence
received from the previous layer to enhance
representation power.
• In each layer, all item representations influence and
update each other.
• Unlike RNN-based models, which pass information
only from past to future, Self-Attention enables global
interaction across all items in the sequence.
Method

7
Architecture
Method
 Embedding Look-up:
• Retrieving Past Item Embeddings :
• Locate L past item embeddings of user U in the latent space
• Stack these embedding to construct the final Embedding
matrix (E) for training
• Create an embedding table using d-dimensional latent factors
• Q(item), P(User)

8
Architecture
Method
 Convolutional Layers :
• Treat the embedding matrix (E) as an "image" and apply
convolutional layers to capture sequential patterns in
user behavior
• Consider sequential patterns as local features within the
image
• Utilize two types of convolutional filters:
• 1) Vertical Convolutional Layer :
• Captures point-level sequential patterns
• Computes a weighted sum over the latent
representations of the past L items

9
Architecture
Method
 Convolutional Layers :
• 2) Horizontal Convolutional Layer :
• Captures union-level patterns.
• Varies the filter height (h) to extract diverse
sequential features
• To extracted most Significant feature, using
max-pooling

10
Architecture
Method
 Fully-connected Layers :
• Concatenate the outputs from the horizontal and
vertical filters
• Feed the concatenated features into a fully-connected
layer to extract high-level abstract features
• Concatenate user embedding with the extracted
features to capture general user preferences -> Pass the
final representation to the output layer for prediction

11
Architecture
Method
 Network Training & Recommendation:
• Apply the sigmoid activation function to the output layer to transform the output value y into a
probability.
• Compute the likelihood across all sequences in the dataset for training
• Use the user’s last L item embeddings to compute y-values for all items
• Select the top-N items with the highest y-values for recommendatio

12
Datasets
Evaluation
 MovieLens
 Gowalla
 Foursquare
 Tmall

13
Evaluation
State-of-the-art methods
 Compared Methods
• POP
• BPR
• FMC
• FPMC
• Fossil
• GRU4Rec

14
Evaluation
Experimental setup
•Evaluation Metrics
• Precision@N
• Recall@N
• MAP
•Optimizer: Adam
•Learning Rate: {1,10^-1,…,10^-4} grid search
•Batch Size : 100
•L2 Regularization
•Dropout : 50%
•Latent Dimensions d : {5,10,20,30,50,100}
•Markov Order L : {1,2,3,…,9}
•Target Number T : {1,2,3}
•Activation Functions : {identity, sigmoid, tanh, ReLU}
•Number of Horizontal Filters : {4,8,16,32,64}
•Number of Vertical Filters : {1,2,4,8,16}
•Loss : BCE loss
•Negative Sampling : random item 3

16
Evaluation
Results
 Ablation Study Results
• Caser model outperforms Fossil, GRU4Rec in terms of MAP, with the best performance observed at T =2,3.
• As the Markov Order L increases, performance improves and then plateaus; in sparse datasets,
excessively large L can lead to performance degradation.
• Markov Targets T contributes to performance improvement = Predicting multiple future items
simultaneously is more effective than predicting just one

17
Evaluation
Results
 Ablation Study Results
• Performance results based on the usage of each compontent
• P : personalization(user embedding), h: horizontal convolutional layer, v : vertical convolutional
layer
• The best performance is achieved when all three components are used together

18
Conclusion
Conclusion
 The author of this paper was proposing CASER, a novel approach to top-N sequential recommendation.
CASER captures information from point-level and union-level sequential patterns, skip behaviors, and long-
term user preferences.
 A unique aspect of CASER is it’s attempt to interpret a user’s 1D sequence as a 2D image representation.
This approach could be particularly meaningful in industries where the sequential dependency of user
behavior is weak.

250310_JH_labseminar[CASER : Personalized Top-N Sequential Recommendation via Convolutional Sequence Embedding].pptx

More Related Content

Similar to 250310_JH_labseminar[CASER : Personalized Top-N Sequential Recommendation via Convolutional Sequence Embedding].pptx (20)

More from thanhdowork (20)

Recently uploaded (20)

250310_JH_labseminar[CASER : Personalized Top-N Sequential Recommendation via Convolutional Sequence Embedding].pptx

Editor's Notes