SlideShare a Scribd company logo
Maja Kabiljo & Aleksandar Ilic, Facebook
Large scale Collaborative Filtering
using Apache Giraph
Conclusion
04
01
02
05
03
What is Apache Giraph?
Collaborative Filtering problem
Neighborhood-based models
Matrix factorization
What is
Apache Giraph?
What is Apache Giraph?
Iterative and graph processing on massive datasets
Billion vertices, trillion edges
Data mapped to a graph
•Vertex ids and values
•Edges and edge values
“Think like a vertex”
10
5
1
3
What is Apache Giraph?
Runs on top of Hadoop
Map only jobs
Keeps data in memory
Mappers communicate through network
Giraph workflow
Worker 1
Worker 2
Worker 3
Collaborative Filtering
Problem
Collaborative Filtering
Predict user’s interests based on many other users
Disney Roller coasters Disneyland Six Flags
Collaborative Filtering
Main challenge: Facebook data
•Billion users, 100 billion ratings
•Skewed item degrees
•No explicit ratings
Common approaches:
•Neighborhood based models
•Matrix factorization
Neighborhood
Based Models
Neighborhood based CF
Start from user item ratings
Calculate item similarities
For each item pair:
•Users who rated first item
•Users who rated second item
•Users who rated both items
?
u
u
u
u
u
u
I1 I2
Neighborhood based CF
Calculate user recommendations
For every user:
•Items rated by user
•Most similar items to these items
?
?
?
?
I4
I5
I6
I7
I1
I2
I3
u
Configurable formulas
Accommodating different use cases
Each calculation step is configurable
•User’s contribution to item similarities
•Item similarities based on all user’s contributions
•User to item recommendation score
Passing a piece of Java code through configuration
intersection / Math.sqrt(degree1 * degree2)
Users to items edges
Preprocessing:
•Filter out low degree ones
•Calculate global item stats
Users send item lists to items
•Items need other items’ global stats
to calculate similarities
Worker 1
Worker 2
Worker 3
Our solution
i
u
u
u
u
i
i
iu
Optimizations
Make item info globally available
•Using reduce/broadcast api
Striping technique
•Split computation across multiple supersteps
•In each stripe process one subset of items
Applications
Direct user recommendations
Context aware recommendations
User explore
Item similarities implemented using Hive join
•Remapping all items to 1..N first
Comparison with Hive
150M users
15M items
4B ratings
1.3B users
35M items
15B ratings
2.4B users
8M items
220B ratings
Hive CPU hours
(after int
remapping)
10 227 963
Giraph CPU hours 3 16 87
Ease of use
ratings = i2iRatings(table = ‘user_item_ratings')
similarities = i2iSimilarities(table = 'item_similarities')
recommendations = i2iRecommendations(table = 'user_recommendations')
i2iCalculateSimilarities(ratings,
similarities,
similarity_formula = '...',
num_workers = 10)
i2iCalculateRecommendations(ratings,
similarities,
recommendations,
scoring_formula = '...',
num_workers = 50)
Matrix
Factorization
?
? ?
? ?
?
? ? ?
Matrix factorization CF
4 4 1 3
5 3 1
1 2 4
5 3 4 5
2 3
...
. . .
...
U1
U2
U3
U4
users
...
U5
. . .I1 I2 I3 I4
items
I5
?
Basic form
Objective function
Two iterative approaches:
•Stochastic Gradient Descent
•Alternating Least Squares
regularization
Standard approach
A bipartite graph:
•Users and items are vertices
•Known ratings are edges
•Feature vectors sent through edges
Problems:
•Data sent per iteration: #knownRatings * #features
•Memory
•Large degree items
•SGD modifications are different than in the sequential solution
Worker 1
Worker 2
Worker 3
I2
I1
I3
I4
Our solution
Extending Giraph
•Worker data
•Worker to worker messages
Users are vertices, items are worker data
Our solution - rotational approach
Worker 1
Worker 2
Worker 3
item
set 3
item
set 1
item
set 2
•Network traffic?
•Memory?
•Skewed item degrees?
•SGD calculation?
Recommendations
Finding top inner products
Each (user, item) pair is unfeasible
Creating Ball Tree from item vectors
•Greedy tree traversal
•Pruning subtrees
•100-1000x faster
Additional features
Tracking rmse, average rank and precision/recall
Combining SGD & ALS
Using other objective functions
•CF for implicit feedback
•Biases
•Degree based regularization
•Optimizing ranks
Applications
Add user and item feature vectors in ranking
Get user to item score in realtime
Direct user recommendations
Training / testing metrics exampleRMSE
0
0.2
0.4
0.6
0.8
Iterations
0 4 8 12 16 20 24 28 32 36 40 44
Train f=8
Test f=8
Train f=128
Test f=128
Comparison with Spark MLlib
Performance of Spark MLlib ALS CF published in July 2014
On scaled copies of Amazon reviews datasetCpuminutes
0
150
300
450
600
Millions examples
0 300 600 900 1200
Standard (in Spark)
Rotational (in Giraph)
Ease of use
ratings = CFRatings(table = 'cf_ratings')
feature_vectors = CFFeatureVectors(table = 'cf_feature_vectors')
CFTrain(ratings,
feature_vectors,
CFSettings(features_size = 10, iterations = 20),
num_workers = 5)
CFRecommend(ratings,
feature_vectors,
CFRecommendations(top_items_table = 'cf_top_items'),
num_workers = 50)
Conclusion
Conclusion
Scalable implementation of Collaborative Filtering
On top of Apache Giraph
Highly performant (>100 billion ratings)
Neighborhood-based models
Matrix factorization
Group and Page recommendations at Facebook
Thank you!
tinyurl.com/fb-mf-cf
Questions?

More Related Content

Viewers also liked (20)

PDF
How to use Parquet as a Sasis for ETL and Analytics
DataWorks Summit
 
PDF
Apache Lens: Unified OLAP on Realtime and Historic Data
DataWorks Summit
 
PPTX
June 10 145pm hortonworks_tan & welch_v2
DataWorks Summit
 
PPTX
Applied Deep Learning with Spark and Deeplearning4j
DataWorks Summit
 
PPTX
Spark crash course workshop at Hadoop Summit
DataWorks Summit
 
PPTX
Internet of things Crash Course Workshop
DataWorks Summit
 
PDF
Apache Kylin - Balance Between Space and Time
DataWorks Summit
 
PDF
a Secure Public Cache for YARN Application Resources
DataWorks Summit
 
PDF
From Beginners to Experts, Data Wrangling for All
DataWorks Summit
 
PDF
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
DataWorks Summit
 
PPTX
Internet of Things Crash Course Workshop at Hadoop Summit
DataWorks Summit
 
PPTX
Hadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
DataWorks Summit
 
PDF
Scaling HDFS to Manage Billions of Files with Key-Value Stores
DataWorks Summit
 
PDF
Sqoop on Spark for Data Ingestion
DataWorks Summit
 
PPTX
Low Level CPU Performance Profiling Examples
Tanel Poder
 
PDF
Complex Analytics using Open Source Technologies
DataWorks Summit
 
PPTX
Harnessing Hadoop Distuption: A Telco Case Study
DataWorks Summit
 
PPTX
Have your Cake and Eat it Too - Architecture for Batch and Real-time processing
DataWorks Summit
 
PPTX
Functional Programming and Big Data
DataWorks Summit
 
PDF
Computation of spatial data on Hadoop Cluster
Abhishek Sagar
 
How to use Parquet as a Sasis for ETL and Analytics
DataWorks Summit
 
Apache Lens: Unified OLAP on Realtime and Historic Data
DataWorks Summit
 
June 10 145pm hortonworks_tan & welch_v2
DataWorks Summit
 
Applied Deep Learning with Spark and Deeplearning4j
DataWorks Summit
 
Spark crash course workshop at Hadoop Summit
DataWorks Summit
 
Internet of things Crash Course Workshop
DataWorks Summit
 
Apache Kylin - Balance Between Space and Time
DataWorks Summit
 
a Secure Public Cache for YARN Application Resources
DataWorks Summit
 
From Beginners to Experts, Data Wrangling for All
DataWorks Summit
 
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
DataWorks Summit
 
Internet of Things Crash Course Workshop at Hadoop Summit
DataWorks Summit
 
Hadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
DataWorks Summit
 
Scaling HDFS to Manage Billions of Files with Key-Value Stores
DataWorks Summit
 
Sqoop on Spark for Data Ingestion
DataWorks Summit
 
Low Level CPU Performance Profiling Examples
Tanel Poder
 
Complex Analytics using Open Source Technologies
DataWorks Summit
 
Harnessing Hadoop Distuption: A Telco Case Study
DataWorks Summit
 
Have your Cake and Eat it Too - Architecture for Batch and Real-time processing
DataWorks Summit
 
Functional Programming and Big Data
DataWorks Summit
 
Computation of spatial data on Hadoop Cluster
Abhishek Sagar
 

Similar to large scale collaborative filtering using Apache Giraph (20)

PDF
Recommendation System Explained
Crossing Minds
 
PDF
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
IRJET Journal
 
PPTX
Recommendation system
Ding Li
 
PDF
IMPROVING COLLABORATIVE RECOMMENDATION VIA USER-ITEM SUBGROUPS
Nexgen Technology
 
PDF
Recommender Systems
Carlos Castillo (ChaTo)
 
PPT
Chapter 02 collaborative recommendation
Aravindharamanan S
 
PPT
Chapter 02 collaborative recommendation
Aravindharamanan S
 
PDF
Recommender Systems Content and Collaborative Filtering
rosni
 
PDF
Analysing the performance of Recommendation System using different similarity...
IRJET Journal
 
PDF
A survey of memory based methods for collaborative filtering based techniques
IAEME Publication
 
PDF
At4102337341
IJERA Editor
 
PDF
Collaborative Filtering and Recommender Systems By Navisro Analytics
Navisro Analytics
 
PDF
Tutorial: Context In Recommender Systems
YONG ZHENG
 
PDF
Time-Ordered Collaborative Filtering for News Recommendation
IRJET Journal
 
PDF
Introduction to recommender systems
Arnaud de Myttenaere
 
PDF
Recommendation engines
Georgian Micsa
 
PDF
Aaa ped-19-Recommender Systems: Neighborhood-based Filtering
AminaRepo
 
PPTX
Mining massive datasets using recommender system
rosni
 
PDF
Overview of recommender system
Stanley Wang
 
PPTX
Lecture Notes on Recommender System Introduction
PerumalPitchandi
 
Recommendation System Explained
Crossing Minds
 
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
IRJET Journal
 
Recommendation system
Ding Li
 
IMPROVING COLLABORATIVE RECOMMENDATION VIA USER-ITEM SUBGROUPS
Nexgen Technology
 
Recommender Systems
Carlos Castillo (ChaTo)
 
Chapter 02 collaborative recommendation
Aravindharamanan S
 
Chapter 02 collaborative recommendation
Aravindharamanan S
 
Recommender Systems Content and Collaborative Filtering
rosni
 
Analysing the performance of Recommendation System using different similarity...
IRJET Journal
 
A survey of memory based methods for collaborative filtering based techniques
IAEME Publication
 
At4102337341
IJERA Editor
 
Collaborative Filtering and Recommender Systems By Navisro Analytics
Navisro Analytics
 
Tutorial: Context In Recommender Systems
YONG ZHENG
 
Time-Ordered Collaborative Filtering for News Recommendation
IRJET Journal
 
Introduction to recommender systems
Arnaud de Myttenaere
 
Recommendation engines
Georgian Micsa
 
Aaa ped-19-Recommender Systems: Neighborhood-based Filtering
AminaRepo
 
Mining massive datasets using recommender system
rosni
 
Overview of recommender system
Stanley Wang
 
Lecture Notes on Recommender System Introduction
PerumalPitchandi
 
Ad

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
DataWorks Summit
 
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
PPTX
Managing the Dewey Decimal System
DataWorks Summit
 
PPTX
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
PPTX
Security Framework for Multitenant Architecture
DataWorks Summit
 
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
PPTX
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
PDF
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
Ad

Recently uploaded (20)

PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PPT
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PPTX
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 

large scale collaborative filtering using Apache Giraph

  • 1. Maja Kabiljo & Aleksandar Ilic, Facebook Large scale Collaborative Filtering using Apache Giraph
  • 2. Conclusion 04 01 02 05 03 What is Apache Giraph? Collaborative Filtering problem Neighborhood-based models Matrix factorization
  • 4. What is Apache Giraph? Iterative and graph processing on massive datasets Billion vertices, trillion edges Data mapped to a graph •Vertex ids and values •Edges and edge values “Think like a vertex” 10 5 1 3
  • 5. What is Apache Giraph? Runs on top of Hadoop Map only jobs Keeps data in memory Mappers communicate through network
  • 8. Collaborative Filtering Predict user’s interests based on many other users Disney Roller coasters Disneyland Six Flags
  • 9. Collaborative Filtering Main challenge: Facebook data •Billion users, 100 billion ratings •Skewed item degrees •No explicit ratings Common approaches: •Neighborhood based models •Matrix factorization
  • 11. Neighborhood based CF Start from user item ratings Calculate item similarities For each item pair: •Users who rated first item •Users who rated second item •Users who rated both items ? u u u u u u I1 I2
  • 12. Neighborhood based CF Calculate user recommendations For every user: •Items rated by user •Most similar items to these items ? ? ? ? I4 I5 I6 I7 I1 I2 I3 u
  • 13. Configurable formulas Accommodating different use cases Each calculation step is configurable •User’s contribution to item similarities •Item similarities based on all user’s contributions •User to item recommendation score Passing a piece of Java code through configuration intersection / Math.sqrt(degree1 * degree2)
  • 14. Users to items edges Preprocessing: •Filter out low degree ones •Calculate global item stats Users send item lists to items •Items need other items’ global stats to calculate similarities Worker 1 Worker 2 Worker 3 Our solution i u u u u i i iu
  • 15. Optimizations Make item info globally available •Using reduce/broadcast api Striping technique •Split computation across multiple supersteps •In each stripe process one subset of items
  • 16. Applications Direct user recommendations Context aware recommendations User explore
  • 17. Item similarities implemented using Hive join •Remapping all items to 1..N first Comparison with Hive 150M users 15M items 4B ratings 1.3B users 35M items 15B ratings 2.4B users 8M items 220B ratings Hive CPU hours (after int remapping) 10 227 963 Giraph CPU hours 3 16 87
  • 18. Ease of use ratings = i2iRatings(table = ‘user_item_ratings') similarities = i2iSimilarities(table = 'item_similarities') recommendations = i2iRecommendations(table = 'user_recommendations') i2iCalculateSimilarities(ratings, similarities, similarity_formula = '...', num_workers = 10) i2iCalculateRecommendations(ratings, similarities, recommendations, scoring_formula = '...', num_workers = 50)
  • 20. ? ? ? ? ? ? ? ? ? Matrix factorization CF 4 4 1 3 5 3 1 1 2 4 5 3 4 5 2 3 ... . . . ... U1 U2 U3 U4 users ... U5 . . .I1 I2 I3 I4 items I5 ?
  • 21. Basic form Objective function Two iterative approaches: •Stochastic Gradient Descent •Alternating Least Squares regularization
  • 22. Standard approach A bipartite graph: •Users and items are vertices •Known ratings are edges •Feature vectors sent through edges Problems: •Data sent per iteration: #knownRatings * #features •Memory •Large degree items •SGD modifications are different than in the sequential solution Worker 1 Worker 2 Worker 3 I2 I1 I3 I4
  • 23. Our solution Extending Giraph •Worker data •Worker to worker messages Users are vertices, items are worker data
  • 24. Our solution - rotational approach Worker 1 Worker 2 Worker 3 item set 3 item set 1 item set 2 •Network traffic? •Memory? •Skewed item degrees? •SGD calculation?
  • 25. Recommendations Finding top inner products Each (user, item) pair is unfeasible Creating Ball Tree from item vectors •Greedy tree traversal •Pruning subtrees •100-1000x faster
  • 26. Additional features Tracking rmse, average rank and precision/recall Combining SGD & ALS Using other objective functions •CF for implicit feedback •Biases •Degree based regularization •Optimizing ranks
  • 27. Applications Add user and item feature vectors in ranking Get user to item score in realtime Direct user recommendations
  • 28. Training / testing metrics exampleRMSE 0 0.2 0.4 0.6 0.8 Iterations 0 4 8 12 16 20 24 28 32 36 40 44 Train f=8 Test f=8 Train f=128 Test f=128
  • 29. Comparison with Spark MLlib Performance of Spark MLlib ALS CF published in July 2014 On scaled copies of Amazon reviews datasetCpuminutes 0 150 300 450 600 Millions examples 0 300 600 900 1200 Standard (in Spark) Rotational (in Giraph)
  • 30. Ease of use ratings = CFRatings(table = 'cf_ratings') feature_vectors = CFFeatureVectors(table = 'cf_feature_vectors') CFTrain(ratings, feature_vectors, CFSettings(features_size = 10, iterations = 20), num_workers = 5) CFRecommend(ratings, feature_vectors, CFRecommendations(top_items_table = 'cf_top_items'), num_workers = 50)
  • 32. Conclusion Scalable implementation of Collaborative Filtering On top of Apache Giraph Highly performant (>100 billion ratings) Neighborhood-based models Matrix factorization Group and Page recommendations at Facebook