SlideShare a Scribd company logo
Ideas go software
www.agilelab.it
Machine Learning
BigData
ArtificialIntelligence
DeepLearning
Training
www.agilelab.it
Agile is our philosophy to transform
business ideas in real use cases.
We understand rapidly the key elements
of the Customer’s challenge, without fear
for any challenge.
We are an efficient organization totally focused on
leveraging innovative technologies to satisfy the
customer’s objectives and time-to-market.
Our reference handbook for organization is public:
https://publicagilefactory.gitlab.io/handbook/
The secret of getting ahead is getting started - Mark Twain
www.agilelab.it
More than 50 specialists, who have a deep
understanding of the real production
environments.
We constantly invest time and effort to increase
our skills on new technologies and innovative
analysis methodologies.
International Conferences, R&D Projects,
Smart Working, Welfare, Work for Equity…
We believe in our team and we invest for it.
Our reference handbook for organization is
public:
https://publicagilefactory.gitlab.io/handbook
/
www.agilelab.it
OurTeam
Think about the future, think about what you could do, don’t fear anything - Rita Levi Montalcini
www.agilelab.it
In memory and Machine learning
Hadoop Frameworks and
storage format
Change Data Capture
NoSQL Data Ingestion and Hi-scalable
applications
Deployment and resource
management
Cloud PaaS deployment
Deep Learning Development and Data Visualization
Certified
CHECKOUTOUR
FRAMEWORKONGITHUB
www.agilelab.it
Check ourframeworksreleasedfor“native”
DataQualityinBig Dataenvironmentsorour
SparkSearchlibrary.
SHARING KNOWLEDGE
IN MEETUPS
Don’tmissourMeetUps
appointmentsbothin Milanoand
Torino.
Darwin
Schemaregistry
WASP isourframeworkforstreaming
analyticsatscale.Ithasbeenreleased
OpenSourcesince thebeginningandis
goingtobe evolvedwithnew
developments.
Kafka
www.agilelab.it
• General purpose Message/Event Bus
• Real-Time Stream Processing in collaboration with other
tools
• Collecting User Activity Data
• Collecting Operational Metrics from applications, server or
devices
• Log Aggregation
• Change Data Capture
• Commit Log for distributed systems
www.agilelab.it
• As distributed system and services increasingly become part of modern architecture, this makes
for a fragile system
• As integration complexity increases, evolvability decreases
Slave
System
ETL
ETL
Master
System
Reporting
System
Service 1
Service 3
Service 2
ETL
www.agilelab.it
Decoupling integration patterns increase evolvability !!!
Slave
System
Reporting
System
Service 1 Service 3Service 2
Kafka
Kafka is not a «master» system, but a
«central» system for service and data
integration: real time and event based
www.agilelab.it
Focuson:
• Simplicity
• Throughput
• ExtremeScalability
Not for:
• Large batch exports
• Low Latency
• Message ordering is only guaranteed within a
partition for a topic
• Scaling is achieved through partitions
• At least once delivery
• For a topic with replication factor N, Kafka can
tolerate up to N-1 server failures without “losing”
any messages committed to the log
• Messages sent by a producer to a particular topic
partition will be appended in the order they are
sent
• A consumer instance sees messages in the order
they are store in the log
• Topics are broken up into ordered commit logs called partitions.
• To each message in a partition a sequential id (called offset) is assigned
• Data is retained for a configurable period of time
Producer
• Producers publish to a topic of their choosing (push)
• Load can be distributed
• Typically by “round-robin”
• Can also do “semantic partitioning” based on a key in the message
• Brokers load balance by partition
• Can support async (less durable) sending
• All nodes can answer metadata request about
• Which server is alive
• Where leaders are for the partition of a topic
Consumer
• Multiple Consumers can read from the same topic
• Each Consumer is responsible for managing it’s own offset
• Messages stay on Kafka… they are not removed after they are
consumed
• Consumers can go away
• And then came back
kafka simplicity and complexity
Why is Kafka so
scalable?
@Linkedin:
100+ clusters
4K brokers
100K topics
7M partitions
81M msg/sec ???!!!
On Prem scenario:
6 x Server:
• Xeon 2.5 GHz – 6 cores
• 6 x HDD 7200 rpm
• 32 GB RAM
• 1Gb Ethernet
1 topic – 6 partitions – 3x replica – 1 producer:
• 800K msg/sec
• 133K msg/sec x node
1 consumer from a 3x replicated topic:
• 950K msg/sec
Latency end to end:
3ms 99th
14ms 99.9th
Reference:
KV DB with persistence  10K write/sec x node
KV DB in memory  50K write/sec x node
Kafka is persistent...so what ??
Where is the trick?
Incredible
complex
architecture
from ‘90?
(do you know this system?)
Why is so fast?
It is simple !!!
• Focus on sequential I/O  no Random I/O
• How is this possible with multiple writers and readers ?
• Page cache
• No Indexes
• A partition it’s just a file in append mode
• ZeroCopy: directly copy data from pagecache to NIC buffer through OS sendfile system call,
skipping userspace and socket buffer copies
Writes are sequential by definition
Page Cache
–
no disk access
And hide complexity
• Everything in Kafka is batched and compressed !!!
• Communication is done via a high performance binary API over TCP protocol
Fake !!!!
• Batching of data to reduce
network calls, and also converting
a lot of random writes into
sequential ones.
• Compression of batches (and not
individual messages) using LZ4,
SNAPPY or GZIP codecs.
Questions ?
We are
hiring
www.agilelab.it info@agilelab.it www.linkedin.com/company/agile-lab
Thank you!

More Related Content

What's hot (20)

PDF
Apache Kafka Architecture & Fundamentals Explained
confluent
 
PDF
Introduction to apache kafka
Dimitris Kontokostas
 
PDF
Apache pulsar - storage architecture
Matteo Merli
 
PPTX
Multi-Datacenter Kafka - Strata San Jose 2017
Gwen (Chen) Shapira
 
PDF
Building High-Throughput, Low-Latency Pipelines in Kafka
confluent
 
PDF
Scalability, Availability & Stability Patterns
Jonas Bonér
 
PDF
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Guozhang Wang
 
PPTX
Apache kafka
Zeeshan Khan
 
PPTX
Confluent building a real-time streaming platform using kafka streams and k...
Thomas Alex
 
PPTX
Introduction to Apache Kafka
Jeff Holoman
 
PPTX
Apache Kafka - Overview
CodeOps Technologies LLP
 
PPTX
Design Patterns for working with Fast Data
MapR Technologies
 
PDF
Capacity Planning Your Kafka Cluster | Jason Bell, Digitalis
HostedbyConfluent
 
PPTX
Building Event-Driven Systems with Apache Kafka
Brian Ritchie
 
PDF
Cassandra Summit 2014: Deploying Cassandra for Call of Duty
DataStax Academy
 
PPTX
Apache kafka
Kumar Shivam
 
PDF
Devoxx Morocco 2016 - Microservices with Kafka
László-Róbert Albert
 
PPTX
Kafka 101
Clement Demonchy
 
PDF
Introduction to Apache Kafka and why it matters - Madrid
Paolo Castagna
 
PPTX
Real time Messages at Scale with Apache Kafka and Couchbase
Will Gardella
 
Apache Kafka Architecture & Fundamentals Explained
confluent
 
Introduction to apache kafka
Dimitris Kontokostas
 
Apache pulsar - storage architecture
Matteo Merli
 
Multi-Datacenter Kafka - Strata San Jose 2017
Gwen (Chen) Shapira
 
Building High-Throughput, Low-Latency Pipelines in Kafka
confluent
 
Scalability, Availability & Stability Patterns
Jonas Bonér
 
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Guozhang Wang
 
Apache kafka
Zeeshan Khan
 
Confluent building a real-time streaming platform using kafka streams and k...
Thomas Alex
 
Introduction to Apache Kafka
Jeff Holoman
 
Apache Kafka - Overview
CodeOps Technologies LLP
 
Design Patterns for working with Fast Data
MapR Technologies
 
Capacity Planning Your Kafka Cluster | Jason Bell, Digitalis
HostedbyConfluent
 
Building Event-Driven Systems with Apache Kafka
Brian Ritchie
 
Cassandra Summit 2014: Deploying Cassandra for Call of Duty
DataStax Academy
 
Apache kafka
Kumar Shivam
 
Devoxx Morocco 2016 - Microservices with Kafka
László-Róbert Albert
 
Kafka 101
Clement Demonchy
 
Introduction to Apache Kafka and why it matters - Madrid
Paolo Castagna
 
Real time Messages at Scale with Apache Kafka and Couchbase
Will Gardella
 

Similar to kafka simplicity and complexity (20)

PPTX
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Christopher Curtin
 
PDF
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
HostedbyConfluent
 
PPTX
Building an Event Bus at Scale
jimriecken
 
PPTX
Putting Kafka Into Overdrive
Todd Palino
 
PPTX
Kafka overview v0.1
Mahendran Ponnusamy
 
PDF
What's inside the black box? Using ML to tune and manage Kafka. (Matthew Stum...
confluent
 
PPT
Apache kafka- Onkar Kadam
Onkar Kadam
 
PDF
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
José Román Martín Gil
 
PDF
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
PPTX
Unleashing Real-time Power with Kafka.pptx
Knoldus Inc.
 
PPTX
Kafkha real time analytics platform.pptx
dummyuseage1
 
PPTX
Scality S3 Server: Node js Meetup Presentation
Scality
 
PDF
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Bob Pusateri
 
PDF
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
PDF
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
vanphp
 
PPTX
Distributed Kafka Architecture Taboola Scale
Apache Kafka TLV
 
PPTX
Introducing Apache Kafka and why it is important to Oracle, Java and IT profe...
Lucas Jellema
 
PPTX
Realtime traffic analyser
Alex Moskvin
 
PPTX
Apache Kafka
Saroj Panyasrivanit
 
PDF
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Bob Pusateri
 
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Christopher Curtin
 
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
HostedbyConfluent
 
Building an Event Bus at Scale
jimriecken
 
Putting Kafka Into Overdrive
Todd Palino
 
Kafka overview v0.1
Mahendran Ponnusamy
 
What's inside the black box? Using ML to tune and manage Kafka. (Matthew Stum...
confluent
 
Apache kafka- Onkar Kadam
Onkar Kadam
 
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
José Román Martín Gil
 
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
Unleashing Real-time Power with Kafka.pptx
Knoldus Inc.
 
Kafkha real time analytics platform.pptx
dummyuseage1
 
Scality S3 Server: Node js Meetup Presentation
Scality
 
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Bob Pusateri
 
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
vanphp
 
Distributed Kafka Architecture Taboola Scale
Apache Kafka TLV
 
Introducing Apache Kafka and why it is important to Oracle, Java and IT profe...
Lucas Jellema
 
Realtime traffic analyser
Alex Moskvin
 
Apache Kafka
Saroj Panyasrivanit
 
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Bob Pusateri
 
Ad

More from Paolo Platter (12)

PPTX
Witboost Platform for decentralization of data management
Paolo Platter
 
PPTX
Platform Strategy for decentralization.pptx
Paolo Platter
 
PPTX
DAMA Norway - Computational Governance Model
Paolo Platter
 
PPTX
The role of Dremio in a data mesh architecture
Paolo Platter
 
PPTX
Data Mesh Implementation - a practical journey
Paolo Platter
 
PDF
Wasp2 - IoT and Streaming Platform
Paolo Platter
 
PPTX
Meetup tensorframes
Paolo Platter
 
PDF
Bringing Deep Learning into production
Paolo Platter
 
PDF
Agile Lab_BigData_Meetup_AKKA
Paolo Platter
 
PDF
Agile Lab_BigData_Meetup
Paolo Platter
 
PDF
Massive Streaming Analytics with Spark Streaming
Paolo Platter
 
PDF
Scala Intro
Paolo Platter
 
Witboost Platform for decentralization of data management
Paolo Platter
 
Platform Strategy for decentralization.pptx
Paolo Platter
 
DAMA Norway - Computational Governance Model
Paolo Platter
 
The role of Dremio in a data mesh architecture
Paolo Platter
 
Data Mesh Implementation - a practical journey
Paolo Platter
 
Wasp2 - IoT and Streaming Platform
Paolo Platter
 
Meetup tensorframes
Paolo Platter
 
Bringing Deep Learning into production
Paolo Platter
 
Agile Lab_BigData_Meetup_AKKA
Paolo Platter
 
Agile Lab_BigData_Meetup
Paolo Platter
 
Massive Streaming Analytics with Spark Streaming
Paolo Platter
 
Scala Intro
Paolo Platter
 
Ad

Recently uploaded (20)

PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked} 2025
hashhshs786
 
PDF
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
PPTX
Writing Better Code - Helping Developers make Decisions.pptx
Lorraine Steyn
 
PPTX
3uTools Full Crack Free Version Download [Latest] 2025
muhammadgurbazkhan
 
PDF
Executive Business Intelligence Dashboards
vandeslie24
 
PDF
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
PDF
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
PDF
Mobile CMMS Solutions Empowering the Frontline Workforce
CryotosCMMSSoftware
 
PPTX
MiniTool Power Data Recovery Full Crack Latest 2025
muhammadgurbazkhan
 
PDF
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
PDF
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
PDF
Understanding the Need for Systemic Change in Open Source Through Intersectio...
Imma Valls Bernaus
 
PPTX
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
PPTX
Revolutionizing Code Modernization with AI
KrzysztofKkol1
 
PPTX
Agentic Automation Journey Session 1/5: Context Grounding and Autopilot for E...
klpathrudu
 
PPTX
Feb 2021 Cohesity first pitch presentation.pptx
enginsayin1
 
PPTX
Java Native Memory Leaks: The Hidden Villain Behind JVM Performance Issues
Tier1 app
 
PPTX
Migrating Millions of Users with Debezium, Apache Kafka, and an Acyclic Synch...
MD Sayem Ahmed
 
PDF
GetOnCRM Speeds Up Agentforce 3 Deployment for Enterprise AI Wins.pdf
GetOnCRM Solutions
 
PDF
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
Capcut Pro Crack For PC Latest Version {Fully Unlocked} 2025
hashhshs786
 
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
Writing Better Code - Helping Developers make Decisions.pptx
Lorraine Steyn
 
3uTools Full Crack Free Version Download [Latest] 2025
muhammadgurbazkhan
 
Executive Business Intelligence Dashboards
vandeslie24
 
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
Mobile CMMS Solutions Empowering the Frontline Workforce
CryotosCMMSSoftware
 
MiniTool Power Data Recovery Full Crack Latest 2025
muhammadgurbazkhan
 
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
Understanding the Need for Systemic Change in Open Source Through Intersectio...
Imma Valls Bernaus
 
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
Revolutionizing Code Modernization with AI
KrzysztofKkol1
 
Agentic Automation Journey Session 1/5: Context Grounding and Autopilot for E...
klpathrudu
 
Feb 2021 Cohesity first pitch presentation.pptx
enginsayin1
 
Java Native Memory Leaks: The Hidden Villain Behind JVM Performance Issues
Tier1 app
 
Migrating Millions of Users with Debezium, Apache Kafka, and an Acyclic Synch...
MD Sayem Ahmed
 
GetOnCRM Speeds Up Agentforce 3 Deployment for Enterprise AI Wins.pdf
GetOnCRM Solutions
 
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 

kafka simplicity and complexity

  • 3. Agile is our philosophy to transform business ideas in real use cases. We understand rapidly the key elements of the Customer’s challenge, without fear for any challenge. We are an efficient organization totally focused on leveraging innovative technologies to satisfy the customer’s objectives and time-to-market. Our reference handbook for organization is public: https://publicagilefactory.gitlab.io/handbook/ The secret of getting ahead is getting started - Mark Twain www.agilelab.it
  • 4. More than 50 specialists, who have a deep understanding of the real production environments. We constantly invest time and effort to increase our skills on new technologies and innovative analysis methodologies. International Conferences, R&D Projects, Smart Working, Welfare, Work for Equity… We believe in our team and we invest for it. Our reference handbook for organization is public: https://publicagilefactory.gitlab.io/handbook / www.agilelab.it OurTeam Think about the future, think about what you could do, don’t fear anything - Rita Levi Montalcini
  • 5. www.agilelab.it In memory and Machine learning Hadoop Frameworks and storage format Change Data Capture NoSQL Data Ingestion and Hi-scalable applications Deployment and resource management Cloud PaaS deployment Deep Learning Development and Data Visualization Certified
  • 6. CHECKOUTOUR FRAMEWORKONGITHUB www.agilelab.it Check ourframeworksreleasedfor“native” DataQualityinBig Dataenvironmentsorour SparkSearchlibrary. SHARING KNOWLEDGE IN MEETUPS Don’tmissourMeetUps appointmentsbothin Milanoand Torino. Darwin Schemaregistry WASP isourframeworkforstreaming analyticsatscale.Ithasbeenreleased OpenSourcesince thebeginningandis goingtobe evolvedwithnew developments.
  • 8. www.agilelab.it • General purpose Message/Event Bus • Real-Time Stream Processing in collaboration with other tools • Collecting User Activity Data • Collecting Operational Metrics from applications, server or devices • Log Aggregation • Change Data Capture • Commit Log for distributed systems
  • 9. www.agilelab.it • As distributed system and services increasingly become part of modern architecture, this makes for a fragile system • As integration complexity increases, evolvability decreases Slave System ETL ETL Master System Reporting System Service 1 Service 3 Service 2 ETL
  • 10. www.agilelab.it Decoupling integration patterns increase evolvability !!! Slave System Reporting System Service 1 Service 3Service 2 Kafka Kafka is not a «master» system, but a «central» system for service and data integration: real time and event based
  • 11. www.agilelab.it Focuson: • Simplicity • Throughput • ExtremeScalability Not for: • Large batch exports • Low Latency
  • 12. • Message ordering is only guaranteed within a partition for a topic • Scaling is achieved through partitions • At least once delivery • For a topic with replication factor N, Kafka can tolerate up to N-1 server failures without “losing” any messages committed to the log • Messages sent by a producer to a particular topic partition will be appended in the order they are sent • A consumer instance sees messages in the order they are store in the log
  • 13. • Topics are broken up into ordered commit logs called partitions. • To each message in a partition a sequential id (called offset) is assigned • Data is retained for a configurable period of time
  • 15. • Producers publish to a topic of their choosing (push) • Load can be distributed • Typically by “round-robin” • Can also do “semantic partitioning” based on a key in the message • Brokers load balance by partition • Can support async (less durable) sending • All nodes can answer metadata request about • Which server is alive • Where leaders are for the partition of a topic
  • 17. • Multiple Consumers can read from the same topic • Each Consumer is responsible for managing it’s own offset • Messages stay on Kafka… they are not removed after they are consumed • Consumers can go away • And then came back
  • 19. Why is Kafka so scalable?
  • 20. @Linkedin: 100+ clusters 4K brokers 100K topics 7M partitions 81M msg/sec ???!!! On Prem scenario: 6 x Server: • Xeon 2.5 GHz – 6 cores • 6 x HDD 7200 rpm • 32 GB RAM • 1Gb Ethernet 1 topic – 6 partitions – 3x replica – 1 producer: • 800K msg/sec • 133K msg/sec x node 1 consumer from a 3x replicated topic: • 950K msg/sec Latency end to end: 3ms 99th 14ms 99.9th Reference: KV DB with persistence  10K write/sec x node KV DB in memory  50K write/sec x node Kafka is persistent...so what ??
  • 21. Where is the trick?
  • 23. Why is so fast?
  • 25. • Focus on sequential I/O  no Random I/O • How is this possible with multiple writers and readers ? • Page cache • No Indexes • A partition it’s just a file in append mode • ZeroCopy: directly copy data from pagecache to NIC buffer through OS sendfile system call, skipping userspace and socket buffer copies Writes are sequential by definition Page Cache – no disk access
  • 27. • Everything in Kafka is batched and compressed !!! • Communication is done via a high performance binary API over TCP protocol Fake !!!! • Batching of data to reduce network calls, and also converting a lot of random writes into sequential ones. • Compression of batches (and not individual messages) using LZ4, SNAPPY or GZIP codecs.
  • 29. We are hiring www.agilelab.it [email protected] www.linkedin.com/company/agile-lab Thank you!