Kedar Sadekar, Netflix
Nitin Sharma, Netflix
Fact Store - Netflix
Recommendations
#DevSAIS11
Agenda
●
●
●
●
●
●
#DevSAIS11
#DevSAIS11
Recommendations at Netflix
● Personalized Homepage for each member
○ Goal: Quickly help members find content they’d like to
watch
○ Risk: Member may lose interest and abandon the service
○ Challenge: Recommendations at Scale
#DevSAIS11
Scale @ Netflix
●
●
●
●
#DevSAIS11
Experimentation Cycle @ Netflix
Design a New Experiment to Test Out Different Ideas
Offline
Experiment
Online
System
Design Experiment Model Testing
Collect Label Data
Offline Feature
Generation
Model Training Model Validation Metrics
Online A/B Testing
#DevSAIS11
ML Feature Engineering - Architectural View
Online Feature
Generation
Microservices
Online Scoring
Offline Feature
Generation
Shared Feature
Encoders
Model
Training
Deploy Models
Online SystemOffline Experiment
Features
Facts
#DevSAIS11
What is a Fact?
● Fact
○ Input data for feature encoders. Used to construct a feature
○ Example: Viewing history of member, my list of a member
● Historical Version of a fact
○ Rewindable - State of the world at that time
● Temporal
■ Facts are temporal i.e. they change with time
■ Each online scoring service uses the latest value of a fact
#DevSAIS11
Online
Scoring
Predictor
Fact Microservices
Features
Facts
Log these
Online
Scoring
Predictor
Fact Microservices
Features
Facts
Log these
Recommendations Recommendations
Feature Logging Fact Logging
#DevSAIS11
Fact Logging - Pull Architecture
Pull
● Daily snapshots of key facts
● Storage
○ S3 & Parquet
● Api to access the data
○ RDD & DataFrames
● Cons
○ Lacks temporal accuracy
○ Load on Microservices
○ Missing Experiment specific facts
Capture
Snapshots
Fact Microservices
Stratified
Member sets
Snapshots
#DevSAIS11
Fact Logging - Push Architecture
● Compute engines
themselves control
what to log
● Stratification
● Temporal accuracy
Compute Services
Fact Store
Fact Transformer
Fact Fetcher
Fact Logger
ML Workflows
Feature
Generation
Model
Training
#DevSAIS11
Fact Logger
Precompute Live Compute
Fact
Logger
● Library
● Facts
○ User Related
○ Video Related
○ Computation Specific
● Serialization
● Stratification Service
● Fact Stream
● Storage
Base Fact Tables
Stratification
#DevSAIS11
Fact Logging - Scalability
Precompute Live Compute
Fact Logger
● 5-10x increase in data through
Kafka
● SLA Impact; Cost Increase
● Compression - 70% decrease
Storage & Access
Fact Store
Fact Transformer
Deduplication
Precompute Live Compute
Fact Logger
● Pipeline load
○ Repeated facts
● Aggressive or not
○ Loss threshold
Conditional push
● Spark Job
○ Fact pointers
○ SLA
#DevSAIS11
#DevSAIS11
API Lookback
Member ID My List View History Thumbs
122312 My List Value View History Pointer Thumbs Value
254637 My List Pointer View History Pointer Thumbs Pointer
Member n My List Pointer View History Value Thumbs Pointer
My List
Partition 1
Values
Partition m
Values
View
History
Partition 1
Values
Partition m
Values
Thumbs
Partition 1
Values
Partition m
Values
log_time - x
log_time - y
log_time - z
Storage & Access
Fact Store
Fact Transformer
Read API
Precompute Live Compute
Fact Logger
● Query performance
○ Slow moving facts
● Point query
○ Connector
● Query time reduction
○ Hours to minutes
Deduplication
Conditional push
Write
Read
#DevSAIS11
Performance: Storage
• Partitioning scheme
– Noisy neighbor
• Storage format
– Exploratory vs production
• Fast & Slow lane
– Lookback limit
#DevSAIS11
Performance: Spark reads
• Bloom Filters
– Reduce scan
• Cache Access
– EVCache, Spectator
• MapPartitions vs UDF
– Eager vs Lazy
– SPARK-11438, SPARK-11469,
SPARK-20586
Application
ML Library
Read API
#DevSAIS11
Future Work
• Structured with schema evolution
– Best of both (POJO & Spark SQL), Iceberg
• Streaming vs Batch
– Multiple lanes, accountability, independent scale
• Duplication
– Storage vs Runtime cost
#DevSAIS11
#DevSAIS11
Questions?

More Related Content

PDF
The magic behind your Lyft ride prices: A case study on machine learning and ...
PDF
SAIS2018 - Fact Store At Netflix Scale
PDF
RealTime Recommendations @Netflix - Spark
PDF
Streaming sql and druid
PDF
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
PDF
[WSO2Con USA 2018] Deploying Applications in K8S and Docker
PDF
How to build an event driven architecture with kafka and kafka connect
PPTX
Ledingkart Meetup #4: Data pipeline @ lk
The magic behind your Lyft ride prices: A case study on machine learning and ...
SAIS2018 - Fact Store At Netflix Scale
RealTime Recommendations @Netflix - Spark
Streaming sql and druid
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
[WSO2Con USA 2018] Deploying Applications in K8S and Docker
How to build an event driven architecture with kafka and kafka connect
Ledingkart Meetup #4: Data pipeline @ lk

What's hot (18)

PDF
Scaling event aggregation at twitter
PPTX
RedisConf17- How Redis Saved Us a Boatload of Money and Boosted Efficiency
PDF
Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...
PDF
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
PDF
Kafka for Real-Time Event Processing in Serverless Environments
PDF
Kafka Summit NYC 2017 - The Real-time Event Driven Bank: A Kafka Story
PDF
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
PDF
It's Time To Stop Using Lambda Architecture
PDF
OpenStack MagnetoDB. Atlanta Summit 2014
PDF
[WSO2Con USA 2018] Microservices, Containers, and Beyond
PDF
[WSO2Con USA 2018] Up-leveling Brownfield Integration
PDF
2019 Slides - Michelangelo Palette: A Feature Engineering Platform at Uber
PDF
Anatomy of Data Frame API : A deep dive into Spark Data Frame API
PPTX
Kafka Practices @ Uber - Seattle Apache Kafka meetup
PDF
Scylla Summit 2022: Scalable and Sustainable Supply Chains with DLT and ScyllaDB
PDF
The future of serverless is STATE!
PPTX
Live Coding a KSQL Application
PDF
BDX 2016- Monal daxini @ Netflix
Scaling event aggregation at twitter
RedisConf17- How Redis Saved Us a Boatload of Money and Boosted Efficiency
Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
Kafka for Real-Time Event Processing in Serverless Environments
Kafka Summit NYC 2017 - The Real-time Event Driven Bank: A Kafka Story
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
It's Time To Stop Using Lambda Architecture
OpenStack MagnetoDB. Atlanta Summit 2014
[WSO2Con USA 2018] Microservices, Containers, and Beyond
[WSO2Con USA 2018] Up-leveling Brownfield Integration
2019 Slides - Michelangelo Palette: A Feature Engineering Platform at Uber
Anatomy of Data Frame API : A deep dive into Spark Data Frame API
Kafka Practices @ Uber - Seattle Apache Kafka meetup
Scylla Summit 2022: Scalable and Sustainable Supply Chains with DLT and ScyllaDB
The future of serverless is STATE!
Live Coding a KSQL Application
BDX 2016- Monal daxini @ Netflix
Ad

Similar to Fact Store at Scale for Netflix Recommendations (20)

PDF
Near Real-Time Netflix Recommendations Using Apache Spark Streaming with Nit...
PDF
Near Real-Time Netflix Recommendations Using Apache Spark Streaming
PDF
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
PDF
Structured Streaming in Spark
PDF
Netflix - Realtime Impression Store
PPTX
Revealing ALLSTOCKER
PDF
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
PDF
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
PPTX
Growing into a proactive Data Platform
PDF
Sql Server Machine Learning Services - Sql Saturday Prague 2018 #SqlSatPrague
PDF
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
PDF
Extracting Insights from Data at Twitter
PDF
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
PDF
Netflix Recommendations Feature Engineering with Time Travel
PDF
AWS Lambda and Serverless framework: lessons learned while building a serverl...
PDF
KFServing and Feast
PPTX
Netflix Big Data Paris 2017
PDF
Data Science in the Cloud @StitchFix
PPTX
Apache Pinot Meetup Sept02, 2020
PDF
Sprint 45 review
Near Real-Time Netflix Recommendations Using Apache Spark Streaming with Nit...
Near Real-Time Netflix Recommendations Using Apache Spark Streaming
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
Structured Streaming in Spark
Netflix - Realtime Impression Store
Revealing ALLSTOCKER
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Growing into a proactive Data Platform
Sql Server Machine Learning Services - Sql Saturday Prague 2018 #SqlSatPrague
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Extracting Insights from Data at Twitter
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Netflix Recommendations Feature Engineering with Time Travel
AWS Lambda and Serverless framework: lessons learned while building a serverl...
KFServing and Feast
Netflix Big Data Paris 2017
Data Science in the Cloud @StitchFix
Apache Pinot Meetup Sept02, 2020
Sprint 45 review
Ad

More from Karthik Murugesan (20)

PDF
Rakuten - Recommendation Platform
PDF
Yahoo's Knowledge Graph - 2014 slides
PDF
Free servers to build Big Data Systems on: Bing's Approach
PDF
Microsoft cosmos
PPTX
Microsoft AI Platform - AETHER Introduction
PDF
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
PDF
Lyft data Platform - 2019 slides
PDF
The Evolution of Spotify Home Architecture - Qcon 2019
PDF
Unifying Twitter around a single ML platform - Twitter AI Platform 2019
PDF
The journey toward a self-service data platform at Netflix - sf 2019
PDF
Developing a ML model using TF Estimator
PDF
Production Model Deployment - StitchFix - 2018
PDF
Netflix factstore for recommendations - 2018
PDF
Trends in Music Recommendations 2018
PDF
Netflix Ads Personalization Solution - 2017
PDF
State Of AI 2018
PDF
Spotify Machine Learning Solution for Music Discovery
PDF
AirBNB - Zipline: Airbnb’s Machine Learning Data Management Platform
PDF
Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...
PDF
Apache Spark-Based Stratification Library for Machine Learning Use Cases at N...
Rakuten - Recommendation Platform
Yahoo's Knowledge Graph - 2014 slides
Free servers to build Big Data Systems on: Bing's Approach
Microsoft cosmos
Microsoft AI Platform - AETHER Introduction
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
Lyft data Platform - 2019 slides
The Evolution of Spotify Home Architecture - Qcon 2019
Unifying Twitter around a single ML platform - Twitter AI Platform 2019
The journey toward a self-service data platform at Netflix - sf 2019
Developing a ML model using TF Estimator
Production Model Deployment - StitchFix - 2018
Netflix factstore for recommendations - 2018
Trends in Music Recommendations 2018
Netflix Ads Personalization Solution - 2017
State Of AI 2018
Spotify Machine Learning Solution for Music Discovery
AirBNB - Zipline: Airbnb’s Machine Learning Data Management Platform
Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...
Apache Spark-Based Stratification Library for Machine Learning Use Cases at N...

Recently uploaded (20)

PDF
Data Virtualization in Action: Scaling APIs and Apps with FME
DOCX
Basics of Cloud Computing - Cloud Ecosystem
PPTX
MuleSoft-Compete-Deck for midddleware integrations
PDF
Improvisation in detection of pomegranate leaf disease using transfer learni...
PDF
MENA-ECEONOMIC-CONTEXT-VC MENA-ECEONOMIC
PDF
Introduction to MCP and A2A Protocols: Enabling Agent Communication
PDF
Comparative analysis of machine learning models for fake news detection in so...
PDF
The-2025-Engineering-Revolution-AI-Quality-and-DevOps-Convergence.pdf
PPTX
agenticai-neweraofintelligence-250529192801-1b5e6870.pptx
PDF
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
PDF
Transform-Quality-Engineering-with-AI-A-60-Day-Blueprint-for-Digital-Success.pdf
PDF
Aug23rd - Mulesoft Community Workshop - Hyd, India.pdf
PPTX
AI-driven Assurance Across Your End-to-end Network With ThousandEyes
PDF
Lung cancer patients survival prediction using outlier detection and optimize...
PDF
Transform-Your-Streaming-Platform-with-AI-Driven-Quality-Engineering.pdf
PPTX
Microsoft User Copilot Training Slide Deck
PDF
Transform-Your-Supply-Chain-with-AI-Driven-Quality-Engineering.pdf
PDF
Early detection and classification of bone marrow changes in lumbar vertebrae...
PDF
Electrocardiogram sequences data analytics and classification using unsupervi...
PPTX
Training Program for knowledge in solar cell and solar industry
Data Virtualization in Action: Scaling APIs and Apps with FME
Basics of Cloud Computing - Cloud Ecosystem
MuleSoft-Compete-Deck for midddleware integrations
Improvisation in detection of pomegranate leaf disease using transfer learni...
MENA-ECEONOMIC-CONTEXT-VC MENA-ECEONOMIC
Introduction to MCP and A2A Protocols: Enabling Agent Communication
Comparative analysis of machine learning models for fake news detection in so...
The-2025-Engineering-Revolution-AI-Quality-and-DevOps-Convergence.pdf
agenticai-neweraofintelligence-250529192801-1b5e6870.pptx
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
Transform-Quality-Engineering-with-AI-A-60-Day-Blueprint-for-Digital-Success.pdf
Aug23rd - Mulesoft Community Workshop - Hyd, India.pdf
AI-driven Assurance Across Your End-to-end Network With ThousandEyes
Lung cancer patients survival prediction using outlier detection and optimize...
Transform-Your-Streaming-Platform-with-AI-Driven-Quality-Engineering.pdf
Microsoft User Copilot Training Slide Deck
Transform-Your-Supply-Chain-with-AI-Driven-Quality-Engineering.pdf
Early detection and classification of bone marrow changes in lumbar vertebrae...
Electrocardiogram sequences data analytics and classification using unsupervi...
Training Program for knowledge in solar cell and solar industry

Fact Store at Scale for Netflix Recommendations

  • 1. Kedar Sadekar, Netflix Nitin Sharma, Netflix Fact Store - Netflix Recommendations #DevSAIS11
  • 3. #DevSAIS11 Recommendations at Netflix ● Personalized Homepage for each member ○ Goal: Quickly help members find content they’d like to watch ○ Risk: Member may lose interest and abandon the service ○ Challenge: Recommendations at Scale
  • 5. #DevSAIS11 Experimentation Cycle @ Netflix Design a New Experiment to Test Out Different Ideas Offline Experiment Online System Design Experiment Model Testing Collect Label Data Offline Feature Generation Model Training Model Validation Metrics Online A/B Testing
  • 6. #DevSAIS11 ML Feature Engineering - Architectural View Online Feature Generation Microservices Online Scoring Offline Feature Generation Shared Feature Encoders Model Training Deploy Models Online SystemOffline Experiment Features Facts
  • 7. #DevSAIS11 What is a Fact? ● Fact ○ Input data for feature encoders. Used to construct a feature ○ Example: Viewing history of member, my list of a member ● Historical Version of a fact ○ Rewindable - State of the world at that time ● Temporal ■ Facts are temporal i.e. they change with time ■ Each online scoring service uses the latest value of a fact
  • 8. #DevSAIS11 Online Scoring Predictor Fact Microservices Features Facts Log these Online Scoring Predictor Fact Microservices Features Facts Log these Recommendations Recommendations Feature Logging Fact Logging
  • 9. #DevSAIS11 Fact Logging - Pull Architecture Pull ● Daily snapshots of key facts ● Storage ○ S3 & Parquet ● Api to access the data ○ RDD & DataFrames ● Cons ○ Lacks temporal accuracy ○ Load on Microservices ○ Missing Experiment specific facts Capture Snapshots Fact Microservices Stratified Member sets Snapshots
  • 10. #DevSAIS11 Fact Logging - Push Architecture ● Compute engines themselves control what to log ● Stratification ● Temporal accuracy Compute Services Fact Store Fact Transformer Fact Fetcher Fact Logger ML Workflows Feature Generation Model Training
  • 11. #DevSAIS11 Fact Logger Precompute Live Compute Fact Logger ● Library ● Facts ○ User Related ○ Video Related ○ Computation Specific ● Serialization ● Stratification Service ● Fact Stream ● Storage Base Fact Tables Stratification
  • 12. #DevSAIS11 Fact Logging - Scalability Precompute Live Compute Fact Logger ● 5-10x increase in data through Kafka ● SLA Impact; Cost Increase ● Compression - 70% decrease
  • 13. Storage & Access Fact Store Fact Transformer Deduplication Precompute Live Compute Fact Logger ● Pipeline load ○ Repeated facts ● Aggressive or not ○ Loss threshold Conditional push ● Spark Job ○ Fact pointers ○ SLA #DevSAIS11
  • 14. #DevSAIS11 API Lookback Member ID My List View History Thumbs 122312 My List Value View History Pointer Thumbs Value 254637 My List Pointer View History Pointer Thumbs Pointer Member n My List Pointer View History Value Thumbs Pointer My List Partition 1 Values Partition m Values View History Partition 1 Values Partition m Values Thumbs Partition 1 Values Partition m Values log_time - x log_time - y log_time - z
  • 15. Storage & Access Fact Store Fact Transformer Read API Precompute Live Compute Fact Logger ● Query performance ○ Slow moving facts ● Point query ○ Connector ● Query time reduction ○ Hours to minutes Deduplication Conditional push Write Read #DevSAIS11
  • 16. Performance: Storage • Partitioning scheme – Noisy neighbor • Storage format – Exploratory vs production • Fast & Slow lane – Lookback limit #DevSAIS11
  • 17. Performance: Spark reads • Bloom Filters – Reduce scan • Cache Access – EVCache, Spectator • MapPartitions vs UDF – Eager vs Lazy – SPARK-11438, SPARK-11469, SPARK-20586 Application ML Library Read API #DevSAIS11
  • 18. Future Work • Structured with schema evolution – Best of both (POJO & Spark SQL), Iceberg • Streaming vs Batch – Multiple lanes, accountability, independent scale • Duplication – Storage vs Runtime cost #DevSAIS11