SlideShare a Scribd company logo
Kafka - Linkedin’s messaging backbone
Kafka - Linkedin's messaging backbone
Who are we ?
▪ Kafka SRE at LinkedIn
▪ Site Reliability Engineering
– Administrators
– Architects
– Developers
▪ Keep the site running, always
Presenters
▪ Clark Haskins
– Manager for Data Infra Streaming SRE (Mountain View, CA)
▪ Ayyappadas Ravindran
– Staff Site Reliability Engineer,
Data Infra Streaming (Bengaluru)
▪ Akash Vacher
▪ Site Reliability Engineer,
Data Infra Streaming (Bengaluru)
Agenda
▪ What the heck is Kafka ?
– Brief intro
– Motivation to build Kafka
▪ Okay, why should I bother ?
– Kafka facts, scale & performance
▪ You have my attention, tell me more !
– Core concepts
– Operating kafka
– Kafka @ Linkedin
▪ Nice, where all do you use kafka ?
– Tale of two applications
▪ Have questions ?
What the heck is Kafka?
▪ A high-throughput distributed messaging system
▪ Developed at Linkedin and open sourced in early 2011
▪ Implemented in Scala and Java
▪ Linkedin’s messaging backbone
▪ Kafka powers around 1000 companies including
Linkedin, Yahoo!, Netflix, Uber, Twitter and many more
If data is lifeblood of high technology, Apache Kafka is the circulatory system used in Linkedin
– Todd Palino (Staff SRE Engineer Linkedin)
Motivation to create Kafka ?
▪ Needed a unified platform to handle all
real time data feeds and stream processing
▪ Wanted a messaging system with high
throughput to support high volume event feeds
▪ Needed data persistence for offline
systems and in case of service recovery
▪ Low latency
▪ Fault tolerant
▪ Linearly scalable
Okay ! what was
the motivation to
create Kafka?
Before
After
How is Kafka used at Linkedin?
▪ Application and System Monitoring (inGraphs)
▪ User tracking on Linkedin web sites
▪ Email, push & SMS notifications
▪ Live search updates
▪ Samza Jobs (standardization, call graph and more)
▪ Database Replication
Okay, why should I bother?
▪ Over 1,300,000,000,000 messages are transported
via Kafka every day at LinkedIn
▪ 300 Terabytes of inbound and 900 Terabytes of
outbound traffic
▪ 4.5 Million messages per second, on single
cluster
▪ Kafka runs on around 1300 servers at LinkedIn
hmmm .. ! How
good is Kafka ?
You have my attention, tell me more !
▪ Building blocks
– Message
– Producers
– Consumers
– Topics
– Partitions
– Segments
– Brokers
– Replicas
Awesome !! I am in Tell
me more !
Bird’s eye view
The data continues ..
What Is Kafka?
Broker
A
P0
A
P1
A
P0
15
Consumer
Producer
Zookeeper
Performance recipes
▪ OS page cache
▪ Linear IO, never fear the file system !
▪ sendfile(), system call
▪ Message batching
Dude, tell me the
performance secret!!
Operating Kafka
▪ Broker Hardware
– Cisco C240, Intel xeon, 64GB
RAM , 14 disk Raid-10
▪ Zookeeper Hardware
– 5 + 1 ensemble, 64GB RAM,
500GB SSD
▪ Monitoring
– Lag monitoring
– Under Replicated Partitions
– Unclean leader election
– Burrow
▪ Cluster rebalance
– Sizewise rebalance
– Partitionwise rebalance
Tell me how you
manage this beast !
Mirror Maker and Audit
Kafka - Linkedin's messaging backbone
Kafka Audit(event count)
Kafka Audit(data transport time)
Kafka @ Linkedin
▪ Cluster Types
– Tracking
– Metrics
– Queuing
▪ Kafka Rest
▪ Schema Registry
Kafka @ Linkedin - Schema registry
Autometrics
▪ Building Blocks
– Sensors
– EventBus
– Kafka Rest
– Kafka cluster
– Kafka consumer
– RRD
– Front end
▪ Facts & Figures
– 320,000,000 metrics
collected per minute
▪ 530 TB of disk space
▪ Over 210,000 metrics
collected per service
InGraphs
Kafka for database replication - Master slave
Kafka for database replication - Multi master
Have questions?

More Related Content

What's hot (20)

PPTX
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Data Con LA
 
PPTX
I Heart Log: Real-time Data and Apache Kafka
Jay Kreps
 
PPTX
Real time Messages at Scale with Apache Kafka and Couchbase
Will Gardella
 
PDF
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
Guozhang Wang
 
PPTX
Apache Kafka at LinkedIn
Discover Pinterest
 
PDF
Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Monal Daxini
 
PDF
Should you read Kafka as a stream or in batch? Should you even care? | Ido Na...
HostedbyConfluent
 
PDF
Flink forward-2017-netflix keystones-paas
Monal Daxini
 
PPTX
The Evolution of Trillion-level Real-time Messaging System in BIGO - Puslar ...
StreamNative
 
PPTX
How to Lock Down Apache Kafka and Keep Your Streams Safe
confluent
 
PDF
Beaming flink to the cloud @ netflix ff 2016-monal-daxini
Monal Daxini
 
PDF
The Many Faces of Apache Kafka: Leveraging real-time data at scale
Neha Narkhede
 
PDF
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
confluent
 
PDF
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Monal Daxini
 
PDF
The Netflix Way to deal with Big Data Problems
Monal Daxini
 
PDF
Introduction to Apache Kafka and why it matters - Madrid
Paolo Castagna
 
PDF
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
confluent
 
PDF
Unbounded bounded-data-strangeloop-2016-monal-daxini
Monal Daxini
 
PDF
Deploying Kafka at Dropbox, Mark Smith, Sean Fellows
confluent
 
PDF
Kafka Summit SF 2017 - Query the Application, Not a Database: “Interactive Qu...
confluent
 
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Data Con LA
 
I Heart Log: Real-time Data and Apache Kafka
Jay Kreps
 
Real time Messages at Scale with Apache Kafka and Couchbase
Will Gardella
 
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
Guozhang Wang
 
Apache Kafka at LinkedIn
Discover Pinterest
 
Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Monal Daxini
 
Should you read Kafka as a stream or in batch? Should you even care? | Ido Na...
HostedbyConfluent
 
Flink forward-2017-netflix keystones-paas
Monal Daxini
 
The Evolution of Trillion-level Real-time Messaging System in BIGO - Puslar ...
StreamNative
 
How to Lock Down Apache Kafka and Keep Your Streams Safe
confluent
 
Beaming flink to the cloud @ netflix ff 2016-monal-daxini
Monal Daxini
 
The Many Faces of Apache Kafka: Leveraging real-time data at scale
Neha Narkhede
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
confluent
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Monal Daxini
 
The Netflix Way to deal with Big Data Problems
Monal Daxini
 
Introduction to Apache Kafka and why it matters - Madrid
Paolo Castagna
 
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
confluent
 
Unbounded bounded-data-strangeloop-2016-monal-daxini
Monal Daxini
 
Deploying Kafka at Dropbox, Mark Smith, Sean Fellows
confluent
 
Kafka Summit SF 2017 - Query the Application, Not a Database: “Interactive Qu...
confluent
 

Viewers also liked (14)

PPTX
Linked in multi tier, multi-tenant, multi-problem kafka
Nitin Kumar
 
PDF
Scalable stream processing with Apache Kafka and Apache Samza
R L
 
PPTX
AWS Chicago 2016 Lessons Learned Deploying the ELK Stack
AWS Chicago
 
PDF
A Visual Introduction to Event Sourcing and CQRS by Lorenzo Nicora
OpenCredo
 
PDF
Apache Kafka, and the Rise of Stream Processing
Guozhang Wang
 
PPTX
Scaling an ELK stack at bol.com
Renzo Tomà
 
PPTX
Building Event-Driven Systems with Apache Kafka
Brian Ritchie
 
PPTX
ELK at LinkedIn - Kafka, scaling, lessons learned
Tin Le
 
PPTX
CQRS and Event Sourcing, An Alternative Architecture for DDD
Dennis Doomen
 
PPTX
Elastic Search
Navule Rao
 
PDF
Developing event-driven microservices with event sourcing and CQRS (svcc, sv...
Chris Richardson
 
PPTX
An Introduction to Elastic Search.
Jurriaan Persyn
 
PPTX
15 Tips for Compelling Company Updates on LinkedIn
LinkedIn
 
PDF
The Top Skills That Can Get You Hired in 2017
LinkedIn
 
Linked in multi tier, multi-tenant, multi-problem kafka
Nitin Kumar
 
Scalable stream processing with Apache Kafka and Apache Samza
R L
 
AWS Chicago 2016 Lessons Learned Deploying the ELK Stack
AWS Chicago
 
A Visual Introduction to Event Sourcing and CQRS by Lorenzo Nicora
OpenCredo
 
Apache Kafka, and the Rise of Stream Processing
Guozhang Wang
 
Scaling an ELK stack at bol.com
Renzo Tomà
 
Building Event-Driven Systems with Apache Kafka
Brian Ritchie
 
ELK at LinkedIn - Kafka, scaling, lessons learned
Tin Le
 
CQRS and Event Sourcing, An Alternative Architecture for DDD
Dennis Doomen
 
Elastic Search
Navule Rao
 
Developing event-driven microservices with event sourcing and CQRS (svcc, sv...
Chris Richardson
 
An Introduction to Elastic Search.
Jurriaan Persyn
 
15 Tips for Compelling Company Updates on LinkedIn
LinkedIn
 
The Top Skills That Can Get You Hired in 2017
LinkedIn
 
Ad

Similar to Kafka - Linkedin's messaging backbone (20)

PPTX
An introduction to Apache Kafka and Kafka ecosystem at LinkedIn
Dong Lin
 
PPTX
CouchbasetoHadoop_Matt_Michael_Justin v4
Michael Kehoe
 
PDF
Apache kafka
Janu Jahnavi
 
PPTX
Apache kafka
Janu Jahnavi
 
PPTX
Apache Kafka with Spark Streaming: Real-time Analytics Redefined
Edureka!
 
PPTX
Understanding kafka
AmitDhodi
 
PPTX
Kafka overview and use cases
Indrajeet Kumar
 
PDF
Apache kafka
NexThoughts Technologies
 
PDF
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
PPTX
Apache Kafka: Next Generation Distributed Messaging System
Edureka!
 
PPTX
Apache kafka
sureshraj43
 
PPTX
Apache Kafka 0.8 basic training - Verisign
Michael Noll
 
PDF
Fault Tolerance with Kafka
Edureka!
 
PPTX
Apache Kafka at LinkedIn - How LinkedIn Customizes Kafka to Work at the Trill...
Jonghyun Lee
 
PPTX
What is Kafka & why is it Important? (UKOUG Tech17, Birmingham, UK - December...
Lucas Jellema
 
PDF
kafka-tutorial-cloudruable-v2.pdf
PriyamTomar1
 
PDF
Kafka Up And Running For Network Devops Set Your Network Data In Motion Eric ...
tjademargis
 
PDF
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis
 
PDF
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
LINE Corporation
 
PPTX
How Apache Kafka is transforming Hadoop, Spark and Storm
Edureka!
 
An introduction to Apache Kafka and Kafka ecosystem at LinkedIn
Dong Lin
 
CouchbasetoHadoop_Matt_Michael_Justin v4
Michael Kehoe
 
Apache kafka
Janu Jahnavi
 
Apache kafka
Janu Jahnavi
 
Apache Kafka with Spark Streaming: Real-time Analytics Redefined
Edureka!
 
Understanding kafka
AmitDhodi
 
Kafka overview and use cases
Indrajeet Kumar
 
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
Apache Kafka: Next Generation Distributed Messaging System
Edureka!
 
Apache kafka
sureshraj43
 
Apache Kafka 0.8 basic training - Verisign
Michael Noll
 
Fault Tolerance with Kafka
Edureka!
 
Apache Kafka at LinkedIn - How LinkedIn Customizes Kafka to Work at the Trill...
Jonghyun Lee
 
What is Kafka & why is it Important? (UKOUG Tech17, Birmingham, UK - December...
Lucas Jellema
 
kafka-tutorial-cloudruable-v2.pdf
PriyamTomar1
 
Kafka Up And Running For Network Devops Set Your Network Data In Motion Eric ...
tjademargis
 
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis
 
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
LINE Corporation
 
How Apache Kafka is transforming Hadoop, Spark and Storm
Edureka!
 
Ad

Recently uploaded (20)

PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
July Patch Tuesday
Ivanti
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
July Patch Tuesday
Ivanti
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 

Kafka - Linkedin's messaging backbone

  • 1. Kafka - Linkedin’s messaging backbone
  • 3. Who are we ? ▪ Kafka SRE at LinkedIn ▪ Site Reliability Engineering – Administrators – Architects – Developers ▪ Keep the site running, always
  • 4. Presenters ▪ Clark Haskins – Manager for Data Infra Streaming SRE (Mountain View, CA) ▪ Ayyappadas Ravindran – Staff Site Reliability Engineer, Data Infra Streaming (Bengaluru) ▪ Akash Vacher ▪ Site Reliability Engineer, Data Infra Streaming (Bengaluru)
  • 5. Agenda ▪ What the heck is Kafka ? – Brief intro – Motivation to build Kafka ▪ Okay, why should I bother ? – Kafka facts, scale & performance ▪ You have my attention, tell me more ! – Core concepts – Operating kafka – Kafka @ Linkedin ▪ Nice, where all do you use kafka ? – Tale of two applications ▪ Have questions ?
  • 6. What the heck is Kafka? ▪ A high-throughput distributed messaging system ▪ Developed at Linkedin and open sourced in early 2011 ▪ Implemented in Scala and Java ▪ Linkedin’s messaging backbone ▪ Kafka powers around 1000 companies including Linkedin, Yahoo!, Netflix, Uber, Twitter and many more If data is lifeblood of high technology, Apache Kafka is the circulatory system used in Linkedin – Todd Palino (Staff SRE Engineer Linkedin)
  • 7. Motivation to create Kafka ? ▪ Needed a unified platform to handle all real time data feeds and stream processing ▪ Wanted a messaging system with high throughput to support high volume event feeds ▪ Needed data persistence for offline systems and in case of service recovery ▪ Low latency ▪ Fault tolerant ▪ Linearly scalable Okay ! what was the motivation to create Kafka?
  • 10. How is Kafka used at Linkedin? ▪ Application and System Monitoring (inGraphs) ▪ User tracking on Linkedin web sites ▪ Email, push & SMS notifications ▪ Live search updates ▪ Samza Jobs (standardization, call graph and more) ▪ Database Replication
  • 11. Okay, why should I bother? ▪ Over 1,300,000,000,000 messages are transported via Kafka every day at LinkedIn ▪ 300 Terabytes of inbound and 900 Terabytes of outbound traffic ▪ 4.5 Million messages per second, on single cluster ▪ Kafka runs on around 1300 servers at LinkedIn hmmm .. ! How good is Kafka ?
  • 12. You have my attention, tell me more ! ▪ Building blocks – Message – Producers – Consumers – Topics – Partitions – Segments – Brokers – Replicas Awesome !! I am in Tell me more !
  • 16. Performance recipes ▪ OS page cache ▪ Linear IO, never fear the file system ! ▪ sendfile(), system call ▪ Message batching Dude, tell me the performance secret!!
  • 17. Operating Kafka ▪ Broker Hardware – Cisco C240, Intel xeon, 64GB RAM , 14 disk Raid-10 ▪ Zookeeper Hardware – 5 + 1 ensemble, 64GB RAM, 500GB SSD ▪ Monitoring – Lag monitoring – Under Replicated Partitions – Unclean leader election – Burrow ▪ Cluster rebalance – Sizewise rebalance – Partitionwise rebalance Tell me how you manage this beast !
  • 22. Kafka @ Linkedin ▪ Cluster Types – Tracking – Metrics – Queuing ▪ Kafka Rest ▪ Schema Registry
  • 23. Kafka @ Linkedin - Schema registry
  • 24. Autometrics ▪ Building Blocks – Sensors – EventBus – Kafka Rest – Kafka cluster – Kafka consumer – RRD – Front end ▪ Facts & Figures – 320,000,000 metrics collected per minute ▪ 530 TB of disk space ▪ Over 210,000 metrics collected per service
  • 26. Kafka for database replication - Master slave
  • 27. Kafka for database replication - Multi master