SlideShare a Scribd company logo
Storage Infrastructure @ Linkedln
Deepshikha
SRE
Agenda
Why we need derived data stores
Nearline derived data
How we used Kafka to build a derived data
store
Derived Data Store
● What is derived data?
● Why do we need it?
● Why do we need specific stores for
derived data?
Legacy derived data store
Voldemort
Voldemort Architecture
Voldemort
Colo1
Batch processing
Hadoop
Voldemort
Colo 2
Challenges with Voldemort
Inefficient batch
processing model
No support for
incremental
changes
Not fault tolerant
Near real-time streaming data
● Real Time insights drive engagement
and fuel a number of applications
Why incremental pushes?
● To serve near real time computation
● Doing a full cycle offline push to update partial records is
expensive wrt time & resource
● Atomic swap might result in not serving changed data
What do we need?
● Scalable, cost-effective replication of
batch processed data
● Pluggable support for stream processing
● Incremental pushes
Guess what? Kafka has it all!
Scalable, cost-effective replication of
batch processed data
Pluggable support for stream processing
Incremental pushes
It also has
Useful metrics
Cross-colo replication abstracted away
with Kafka Mirror Maker
Reduced overhead of setting up from
scratch!
Brand new derived data store!
Venice
Batch
processing
Venice architecture
Venice
Hadoop
Kafka
Venice Kafka
Kafka
Batch
processing
Venice architecture
Venice
Hadoop
Kafka
Venice Kafka
Kafka
Stream
processing
Samza
Replication via KMM
Venice corp
controller
corp kafka broker
hadoop
Replication via KMM
Venice corp
controller
corp kafka broker
hadoop
Colo 1
Kafka Mirror
Maker
Colo 2
Kafka Mirror
Maker
Colo 3
Kafka Mirror
Maker
Colo 4
Kafka Mirror
Maker
Replication via KMM
Corp Controller
Corp Kafka
Broker
Hadoop
Colo 1
Kafka Mirror
Maker
Colo 2
Kafka Mirror
Maker
Colo 3
Kafka Mirror
Maker
Colo 4
Kafka Mirror
Maker
Colo 1
Kafka Broker
Colo 3
Kafka Broker
Colo 2
Kafka Broker
Colo 4
Kafka Broker
Replication via KMM
Corp Controller
Corp Kafka
Broker
Hadoop
Colo 1
Kafka Mirror
Maker
Colo 2
Kafka Mirror
Maker
Colo 3
Kafka Mirror
Maker
Colo 4
Kafka Mirror
Maker
Colo 1
Kafka Broker
Colo 3
Kafka Broker
Colo 2
Kafka Broker
Colo 4
Kafka Broker
Venice
subsystem
Venice
subsystem
Venice
subsystem
Venice
subsystem
Hybrid Mode
Store current batch’d
copy
Kafka Topics
Hadoop job
Venice
Hybrid Mode
Store current batch’d
copy
Kafka Topics
Hadoop job
Venice
Store streamed
copy
Samza job
Hybrid Mode
Store current batch’d
copy
Kafka Topics
Hadoop job
Venice
Store streamed
copy
Samza job
Store new batch’d copy
Hybrid Mode
Store current batch’d
copy
Kafka Topics
Hadoop job
Venice
Store streamed
copy
Samza job
Store new batch’d copy
Evolution
Batch pushes
Hadoop Dependency
Limited Scalability
Tricky Expansion
Not fault tolerant
1.
2.
3.
4.
5.
Voldemort Venice
Batch & Incremental Pushes
Data derived from Kafka
No scaling issues
Easy to expand
Fault tolerant
Questions?

More Related Content

What's hot (20)

PDF
Using Kafka as a Database For Real-Time Transaction Processing | Chad Preisle...
HostedbyConfluent
 
PDF
Putting Kafka Together with the Best of Google Cloud Platform
confluent
 
PDF
Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...
HostedbyConfluent
 
PDF
How Kafka and MemSQL Became the Dynamic Duo (Sarung Tripathi, MemSQL) Kafka S...
HostedbyConfluent
 
PPTX
Change Data Capture using Kafka
Akash Vacher
 
PDF
Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...
Nitin Kumar
 
PDF
Securing the Message Bus with Kafka Streams | Paul Otto and Ryan Salcido, Raf...
HostedbyConfluent
 
PDF
Tales from the four-comma club: Managing Kafka as a service at Salesforce | L...
HostedbyConfluent
 
PPTX
Introduction to Kafka
Akash Vacher
 
PPTX
How to Lock Down Apache Kafka and Keep Your Streams Safe
confluent
 
PDF
Deploying Kafka at Dropbox, Mark Smith, Sean Fellows
confluent
 
PDF
Capacity Planning Your Kafka Cluster | Jason Bell, Digitalis
HostedbyConfluent
 
PDF
Cross the streams thanks to Kafka and Flink (Christophe Philemotte, Digazu) K...
confluent
 
PDF
StreamSQL Feature Store (Apache Pulsar Summit)
Simba Khadder
 
PDF
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
HBaseCon
 
PPTX
Kafka Connect - debezium
Kasun Don
 
PDF
Kafka on Kubernetes: Keeping It Simple (Nikki Thean, Etsy) Kafka Summit SF 2019
confluent
 
PDF
Performance Tuning RocksDB for Kafka Streams’ State Stores
confluent
 
PDF
Confluent Operations Training for Apache Kafka
confluent
 
PDF
Shattering The Monolith(s) (Martin Kess, Namely) Kafka Summit SF 2019
confluent
 
Using Kafka as a Database For Real-Time Transaction Processing | Chad Preisle...
HostedbyConfluent
 
Putting Kafka Together with the Best of Google Cloud Platform
confluent
 
Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...
HostedbyConfluent
 
How Kafka and MemSQL Became the Dynamic Duo (Sarung Tripathi, MemSQL) Kafka S...
HostedbyConfluent
 
Change Data Capture using Kafka
Akash Vacher
 
Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...
Nitin Kumar
 
Securing the Message Bus with Kafka Streams | Paul Otto and Ryan Salcido, Raf...
HostedbyConfluent
 
Tales from the four-comma club: Managing Kafka as a service at Salesforce | L...
HostedbyConfluent
 
Introduction to Kafka
Akash Vacher
 
How to Lock Down Apache Kafka and Keep Your Streams Safe
confluent
 
Deploying Kafka at Dropbox, Mark Smith, Sean Fellows
confluent
 
Capacity Planning Your Kafka Cluster | Jason Bell, Digitalis
HostedbyConfluent
 
Cross the streams thanks to Kafka and Flink (Christophe Philemotte, Digazu) K...
confluent
 
StreamSQL Feature Store (Apache Pulsar Summit)
Simba Khadder
 
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
HBaseCon
 
Kafka Connect - debezium
Kasun Don
 
Kafka on Kubernetes: Keeping It Simple (Nikki Thean, Etsy) Kafka Summit SF 2019
confluent
 
Performance Tuning RocksDB for Kafka Streams’ State Stores
confluent
 
Confluent Operations Training for Apache Kafka
confluent
 
Shattering The Monolith(s) (Martin Kess, Namely) Kafka Summit SF 2019
confluent
 

Similar to Building a derived data store using Kafka (20)

PDF
Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...
HostedbyConfluent
 
PPTX
Real time Messages at Scale with Apache Kafka and Couchbase
Will Gardella
 
PDF
Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
confluent
 
PPTX
Introducing Venice
Yan Yan
 
PDF
Can Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, Confluent
HostedbyConfluent
 
PPTX
Introducing Venice - Strata NYC 2017
Felix GV
 
PDF
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
Athens Big Data
 
PDF
Replication in real-time from Oracle and MySQL into data warehouses and analy...
Continuent
 
PDF
Replication in real-time from Oracle and MySQL into data warehouses and analy...
Continuent
 
PDF
Real-time Data Loading from Oracle and MySQL to Data Warehouses, Analytics
Continuent
 
PDF
BBL KAPPA Lesfurets.com
Cedric Vidal
 
PDF
Replacing RocksDB with ScyllaDB in Kafka Streams by Almog Gavra
ScyllaDB
 
PDF
What is Apache Kafka and What is an Event Streaming Platform?
confluent
 
PPTX
Westpac Bank Tech Talk 1: Dive into Apache Kafka
confluent
 
PDF
Replicate from Oracle to data warehouses and analytics
Continuent
 
PDF
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
Monal Daxini
 
PPTX
Liveperson DLD 2015
LivePerson
 
PDF
Apache kafka
the100rabh
 
PDF
Self-hosting Kafka at Scale: Netflix's Journey & Challenges
Nick Mahilani
 
PPTX
Building Event-Driven Systems with Apache Kafka
Brian Ritchie
 
Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...
HostedbyConfluent
 
Real time Messages at Scale with Apache Kafka and Couchbase
Will Gardella
 
Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
confluent
 
Introducing Venice
Yan Yan
 
Can Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, Confluent
HostedbyConfluent
 
Introducing Venice - Strata NYC 2017
Felix GV
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
Athens Big Data
 
Replication in real-time from Oracle and MySQL into data warehouses and analy...
Continuent
 
Replication in real-time from Oracle and MySQL into data warehouses and analy...
Continuent
 
Real-time Data Loading from Oracle and MySQL to Data Warehouses, Analytics
Continuent
 
BBL KAPPA Lesfurets.com
Cedric Vidal
 
Replacing RocksDB with ScyllaDB in Kafka Streams by Almog Gavra
ScyllaDB
 
What is Apache Kafka and What is an Event Streaming Platform?
confluent
 
Westpac Bank Tech Talk 1: Dive into Apache Kafka
confluent
 
Replicate from Oracle to data warehouses and analytics
Continuent
 
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
Monal Daxini
 
Liveperson DLD 2015
LivePerson
 
Apache kafka
the100rabh
 
Self-hosting Kafka at Scale: Netflix's Journey & Challenges
Nick Mahilani
 
Building Event-Driven Systems with Apache Kafka
Brian Ritchie
 
Ad

Recently uploaded (20)

DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
PDF
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PDF
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PPT
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PDF
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
Ad

Building a derived data store using Kafka