SlideShare a Scribd company logo
INTRODUCING APACHE
KAFKA – SCALABLE,
RELIABLE EVENT BUS &
ESSAGE QUEUE
Maarten Smeets & Lucas Jellema
09 February 2017, Nieuwegein
M
AGENDA
INTRODUCTION & OVERVIEW DEMO HANDSON PART 1 -
PRODUCING AND CONSUMING
MESSAGES (PUB/SUB)
DINNER KAFKA:
SOME HISTORY, A PEEK UNDER
THE HOOD, ROLE IN
ARCHITECTURE AND USE CASES
KAFKA AND ORACLE HANDSON PART 2 –
MORE COMPLEX SCENARIOS
AND SOME BACKGROUND &
ADMIN
Producers
Consumers
SENDING MESSAGES TO CONSUMERS
• Dependency on producer at design time and at run time
• Deal with multiple consumers?
• Synchronous (blocking) waits
• (how to) Cross technology realms
• (how to) Cross host, location, clouds
• Availability of consumers
• Message delivery guarantees
• Scaling, high (peak) volumes
Producers
Consumers
MESSAGING – TO DECOUPLE PUB AND SUB
MESSAGING AS WE KNOW IT
• JMS, Oracle Advanced Queuing, IBM MQ, MS MQ, RabbitMQ,
MQTT, XMPP, WebSockets, …
• Challenges
• Costs
• Scalability (size and speed)
• (lack of) Distribution (and therefore availability)
• Complexity of infrastructure
• Message delivery guarantees
• Lack of technology openness
• Deal with temporarily offline consumers
• Retain history
Producers
Consumers
tcp
tcp
Producers
Consumers
Topic
KAFKA TERMINOLOGY
• Topic
• Message
• == ByteArray
• Broker
• Producer
• Consumer
Producer Consumer
Topic
Broker
Key
Value
Time
Message
Producers
Consumers
Topic
Broker
Key
Value
Time
CONSUMING
• Messages are available to consumers only when they have been
committed
• Kafka does not push
• Unlike JMS
• Read does not destroy
• Unlike JMS Topic
• (some) History available
• Offline consumers can catch up
• Consumers can re-consume from the past
• Delivery Guarantees
• Ordering maintained
• At-least-once (per consumer) by default; at-most-once and exactly-once can be
implemented
Producers
Consumers
Topic
Broker
Key
Value
Time
WHAT’S SO SPECIAL?
• Durable
• Scalable
• High volume
• High speed
• Available
• Distributed
• Open
• Quick start
• Free (no license costs)
Producers
Consumers
Topic
Broker
tcp
tcp
AGENDA
INTRODUCTION & OVERVIEW DEMO HANDSON PART 1 -
PRODUCING AND
CONSUMING MESSAGES
(PUB/SUB)
DINNER KAFKA:
SOME HISTORY, A PEEK UNDER
THE HOOD, ROLE IN
ARCHITECTURE AND USE CASES
KAFKA AND ORACLE HANDSON PART 2 –
MORE COMPLEX SCENARIOS
AND SOME BACKGROUND &
ADMIN
AGENDA
INTRODUCTION & OVERVIEW DEMO HANDSON PART 1 -
PRODUCING AND CONSUMING
MESSAGES (PUB/SUB)
DINNER KAFKA:
SOME HISTORY, A PEEK UNDER
THE HOOD, ROLE IN
ARCHITECTURE AND USE CASES
KAFKA AND ORACLE HANDSON PART 2 –
MORE COMPLEX SCENARIOS
AND SOME BACKGROUND &
ADMIN
HISTORY
• ..- 2010 – creation at Linkedin
• It was designed to provide a high-performance, scalable messaging system which could handle multiple
consumers, many types of data [at high volumes and peaks], and provide for the availability & persistence
of clean, structured data […] in real time.
• 2011 – open source under the Apache Incubator
• October 2012 – top project under Apache Software Foundation
• 2014 – several orginal Kafka engineers founded Confluent
• 2016
• Introduction of Kafka Connect (0.9)
• Introduction of Kafka Streams (0.10)
• Octobermost recent stable release 0.10.1
• Kafka is used by many large corporations:
• Walmart, Cisco, Netflix, PayPal, LinkedIn, eBay, Spotify, Uber, Sift Science
• And embraced by many software vendors & cloud providers
USE CASES
• Messaging & Queuing
• Handle fast data (IoT, social media, web clicks, infra metrics, …)
• Receive and save – low latency, high volume
• Log aggregation
• Event Sourcing and Commit Log
• Stream processing
• Single enterprise event backbone
• Connect business processes, applications, microservices
PLAYS NICE WITH & ARCHITECTURE
SOME NUMBERS
KAFKA INCARNATIONS
• Kafka Docker Images
• Confluent (Spotify, Wurstmeister)
• Cloud:
• CloudKarafka
• IBM BlueMix Message Hub
• AWS supports Kafka (but tries to propose Amazon Kinesis Streams)
• Google runs Kafka (though tries to push Google Pub/Sub)
• Bitnami VMs for many cloud providers such as Azure, GCP, AWS, OPC
• Kafka Connectors in many platforms
• Azure IoT Hub, Google Pub/Sub, Mule AnyPoint Connector, …
• Oracle ….
KAFKA ECO SYSTEM
• Confluent
• OpenSource: Native Clients, Camus (link to Hadoop), REST Proxy, Schema
Registry
• Enterprise: Kafka Ops Dashboard/Control Center, Auto Data Balancing,
MultiData Center Replication ,
• Community
• Connectors
• Client libraries
• …
KAFKA CONNECT
• Kafka Connect is a framework for connectors (aka adapters) that
provide bridges for
• Producing from specific technologies
to Kafka
• Consuming from Kafka to specific
technologies
• For example:
• JDBC
• Hadoop
KAFKA CONNECT – CONNECTORS
KAFKA STREAMS
• Real Time Event [Stream] Processing integrated into Kafka
• Aggregations & Top-N
• Time Windows
• Continuous Queries
• Latest State (event sourcing)
• Turn Stream (of changes) into Table
(of most recent or current state)
• Part of the state can be quite old
• A Kafka Streams client will have state
in memory
• Always to be recreated from topic partition
log files
• Note: Kafka Streams is relatively new
• Only support for Java clients
KAFKA STREAMS
Topic
Filter
Aggregate
Join
Topic
Map (Xform)
Publish
Topic
EXAMPLE OF KAFKA STREAMS
Topic
SelectKey
AggregateByKey
Join
Topic
Map (Xform)
Publish
CountryMessage
Continent
Name
Population
Size
Set Continent
as key
Update Top 3
biggest
countries
As JSON
Size in Square
Miles, % of entire
continent
Total area for
each continent
Topic: Top3CountrySizePerContinent
countries2.csv
Topic
Broker
Producer
SelectKey
AggregateByKey
Map (Xform)
Publish
Set Continent as
key
Update Top 3
biggest countries
Topic:
Top3CountrySizePerContinent
EXAMPLE OF
KAFKA STREAMS
Topic
SelectKey
AggregateByKey
Publish to
Topic
Topic: Top3CountrySizePerContinent
CountryMessage
Continent
Name
Population
Size
Set Continent
as key
Update Top 3
biggest
countries
As JSON
Print
Producers
Consumers
Topic
Broker
tcp
tcp
PARTITIONS
• Topics are configured with a number of partitions
• Storage, serialization, replication, availability, order guarantee are all at
partition level
• Each partition is an ordered, immutable sequence of records that is
continually appended to
• Producer can specify the destination
partition to write to
• Alternatively the partition is determined from
the message key or simply by load balancing
• Multiple partitions can be written to at
the same time
PRODUCING MESSAGES
• The producer sets the partition for each message
• Note: it should talk to the broker who is leader for that partition
• Messages can be produced one-by-one or in batches
• Batches balance latency vs throughput
• A batch can contain messages for different topics & partitions
• Messages can be compressed
• Producers can configure required
acknowledgement level (from broker)
• No (waiting for leader to complete)
• Wait for leader to commit [to file log]
• Wait for all replicas to complete
• Note: messages are serialized to byte array
as the wire format
Producers
Topic
Broker
tcp
CONSUMING
• A consumer pulls from a Topic
• Consuming can be done in parallel to producing
• And many consumers can consume at the same time
• Each consumer has a Message Offset per partition
• That can be different across consumers
• That can be adjusted at any time
• Delivery Guarantees
• At least once (per consumer) by default; adjust offset when all messages have been processed
• At-most-once and exactly-once can be implemented (for example: maintain offset in the same
transaction that processes the messages)
• Message Retention
• Time Based (at least for … time)
• Size Based (log files can be no larger than … MB/GB/TB)
• Key based aka Log Compaction (retain at least the latest
message for each primary key value)
Consumers
Topic
tcp
CONSUMER GROUPS FOR PARALLEL
MESSAGE PROCESSING
• Multiple consumers can be in the same Consumer Group
• They collaborate on processing messages from a Topic (horizontal
scalability)
• Each Consumer in the Group receives
messages from a different partition
• Messages are delivered to
only one consumer in the group
• Consumers outside the Consumer Group can
pull from the same Topic & Partition
• And process the same messages
Consumers
Topic
tcp
CLUSTER – RELIABLE, SCALABLE
• A cluster consists of multiple brokers,
possibly on multiple server nodes
• Each node runs
• Apache ZooKeeper to keep track
• One or more Kafka Brokers
• Each with their own set of storage logs
• Each partition lives on one or more
brokers (and sets of logs)
• Defined through topic replication factor
• One is the leader, the others are follower
replicas
• Clients communicate about a partition with the broker
that contains the leader replica for that partition
• Changes are committed by the leader, then
replicated across the followers
Broker
Topic
Partition
Partition
Broker
Topic
Partition
Partition
Broker
Topic
Partition
Partition
Broker
Topic
Partition
Partition
CLUSTER – RELIABLE, SCALABLE (2)
• ZooKeeper has list of all brokers
and a list of all topics and partitions
(with leader and ISR)
• Leader has list of all alive followers
(in-synch replicas or ISR)
• Follower-replicas consume messages
from the leader to synchronize
• Similar to normal message consumers
• Note: message producers requesting
full acknowledgement will get ack
once all follower replicates have
consumed the message
• N-1 replicas can fail without loss of messages
Broker
Topic
Partition
Partition
Broker
Topic
Partition
Partition
Broker
Topic
Partition
Partition
Broker
Topic
Partition
Partition
AGENDA
INTRODUCTION & OVERVIEW DEMO HANDSON PART 1 -
PRODUCING AND CONSUMING
MESSAGES (PUB/SUB)
DINNER KAFKA:
SOME HISTORY, A PEEK UNDER
THE HOOD, ROLE IN
ARCHITECTURE AND USE CASES
KAFKA AND ORACLE HANDSON PART 2 –
MORE COMPLEX SCENARIOS
AND SOME BACKGROUND &
ADMIN
ORACLE AND KAFKA
• On premises
• Service Bus Kafka transport (demo!)
• Stream Analytics Kafka Adapter (demo!)
• GoldenGate for Big Data handler for Kafka
• Data Integrator (coming soon)
• Cloud
• Elastic Big Data & Streaming platform
• Event Hub (coming soon)
GOLDENGATE FOR BIG DATA
GOLDENGATE FOR BIG DATA
DATA INTEGRATOR
ELASTIC BIG DATA & STREAMING PLATFORM
EVENT HUB
EVENT HUB
EVENT HUB
AGENDA
INTRODUCTION & OVERVIEW DEMO HANDSON PART 1 -
PRODUCING AND CONSUMING
MESSAGES (PUB/SUB)
DINNER KAFKA:
SOME HISTORY, A PEEK UNDER
THE HOOD, ROLE IN
ARCHITECTURE AND USE CASES
KAFKA AND ORACLE HANDSON PART 2 –
MORE COMPLEX
SCENARIOS AND SOME
BACKGROUND & ADMIN
HANDS ON PART 2
• Continue part 1
• Java and/or Node consuming/producing
• Some Admin & advanced stuff
• Partitions
• Multiple producers, multiple consumers
• New consumer, go back in time
• Expiration of messages
• Multi-broker, Cluster configuration, ZooKeeper
• Resources: https://github.com/MaartenSmeets/kafka-workshop
• Blog: technology.amis.nl
On Oracle, Cloud, SQL, PL/SQL, Java, JavaScript, Continuous Delivery, SOA, BPM & more
• Email: maarten.smeets@amis.nl , lucas.jellema@amis.nl
• : @MaartenSmeetsNL , @lucasjellema
• : smeetsm , lucas-jellema
• : www.amis.nl, info@amis.nl
+31 306016000
Edisonbaan 15,
Nieuwegein

More Related Content

What's hot (20)

PPTX
Apache Kafka
Joe Stein
 
PDF
Introduction to Apache Kafka
Shiao-An Yuan
 
PPTX
Introduction Apache Kafka
Joe Stein
 
PPTX
kafka for db as postgres
PivotalOpenSourceHub
 
PPTX
Building Event-Driven Systems with Apache Kafka
Brian Ritchie
 
PDF
An Introduction to Apache Kafka
Amir Sedighi
 
PPTX
Design Patterns for working with Fast Data
MapR Technologies
 
PPTX
Introduction to Kafka
Ducas Francis
 
PDF
Apache Kafka - Free Friday
Otávio Carvalho
 
PDF
Apache Kafka - Martin Podval
Martin Podval
 
PPTX
Kafka blr-meetup-presentation - Kafka internals
Ayyappadas Ravindran (Appu)
 
PPTX
Apache Kafka Best Practices
DataWorks Summit/Hadoop Summit
 
PPTX
Kafka 101
Clement Demonchy
 
PPTX
Introduction to Apache Kafka
AIMDek Technologies
 
PDF
Kafka internals
David Groozman
 
PPTX
Apache kafka
Rahul Jain
 
PDF
Devoxx Morocco 2016 - Microservices with Kafka
László-Róbert Albert
 
PDF
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
confluent
 
ODP
Introduction to Apache Kafka- Part 1
Knoldus Inc.
 
PDF
Introduction to Apache Kafka and why it matters - Madrid
Paolo Castagna
 
Apache Kafka
Joe Stein
 
Introduction to Apache Kafka
Shiao-An Yuan
 
Introduction Apache Kafka
Joe Stein
 
kafka for db as postgres
PivotalOpenSourceHub
 
Building Event-Driven Systems with Apache Kafka
Brian Ritchie
 
An Introduction to Apache Kafka
Amir Sedighi
 
Design Patterns for working with Fast Data
MapR Technologies
 
Introduction to Kafka
Ducas Francis
 
Apache Kafka - Free Friday
Otávio Carvalho
 
Apache Kafka - Martin Podval
Martin Podval
 
Kafka blr-meetup-presentation - Kafka internals
Ayyappadas Ravindran (Appu)
 
Apache Kafka Best Practices
DataWorks Summit/Hadoop Summit
 
Kafka 101
Clement Demonchy
 
Introduction to Apache Kafka
AIMDek Technologies
 
Kafka internals
David Groozman
 
Apache kafka
Rahul Jain
 
Devoxx Morocco 2016 - Microservices with Kafka
László-Róbert Albert
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
confluent
 
Introduction to Apache Kafka- Part 1
Knoldus Inc.
 
Introduction to Apache Kafka and why it matters - Madrid
Paolo Castagna
 

Similar to AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message Queue (20)

PPTX
What is Kafka & why is it Important? (UKOUG Tech17, Birmingham, UK - December...
Lucas Jellema
 
PPTX
Kafka.pptx (uploaded from MyFiles SomnathDeb_PC)
somnathdeb0212
 
PDF
Apache Kafka
Worapol Alex Pongpech, PhD
 
PDF
Connect K of SMACK:pykafka, kafka-python or?
Micron Technology
 
PPTX
Kafkha real time analytics platform.pptx
dummyuseage1
 
PPTX
Kafka and ibm event streams basics
Brian S. Paskin
 
PDF
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
mumrah
 
PDF
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis
 
PDF
Devoxx university - Kafka de haut en bas
Florent Ramiere
 
PDF
Event driven-arch
Mohammed Shoaib
 
PPTX
Kafka overview
Shanki Singh Gandhi
 
PDF
Kafka syed academy_v1_introduction
Syed Hadoop
 
PPTX
Apache Kafka with Spark Streaming: Real-time Analytics Redefined
Edureka!
 
PDF
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
PDF
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps_Fest
 
PPTX
Distributed messaging through Kafka
Dileep Kalidindi
 
PPTX
Large scale, distributed and reliable messaging with Kafka
Rafał Hryniewski
 
PDF
STREAMING WITH KAFKA Publish/Subscribe Messaging with Kafka
GravenGuan
 
PPTX
Apache Kafka
Saroj Panyasrivanit
 
PDF
Introduction to Kafka and Event-Driven
arconsis
 
What is Kafka & why is it Important? (UKOUG Tech17, Birmingham, UK - December...
Lucas Jellema
 
Kafka.pptx (uploaded from MyFiles SomnathDeb_PC)
somnathdeb0212
 
Connect K of SMACK:pykafka, kafka-python or?
Micron Technology
 
Kafkha real time analytics platform.pptx
dummyuseage1
 
Kafka and ibm event streams basics
Brian S. Paskin
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
mumrah
 
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis
 
Devoxx university - Kafka de haut en bas
Florent Ramiere
 
Event driven-arch
Mohammed Shoaib
 
Kafka overview
Shanki Singh Gandhi
 
Kafka syed academy_v1_introduction
Syed Hadoop
 
Apache Kafka with Spark Streaming: Real-time Analytics Redefined
Edureka!
 
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps_Fest
 
Distributed messaging through Kafka
Dileep Kalidindi
 
Large scale, distributed and reliable messaging with Kafka
Rafał Hryniewski
 
STREAMING WITH KAFKA Publish/Subscribe Messaging with Kafka
GravenGuan
 
Apache Kafka
Saroj Panyasrivanit
 
Introduction to Kafka and Event-Driven
arconsis
 
Ad

More from Lucas Jellema (20)

PPTX
Introduction to web application development with Vue (for absolute beginners)...
Lucas Jellema
 
PPTX
Making the Shift Left - Bringing Ops to Dev before bringing applications to p...
Lucas Jellema
 
PPTX
Lightweight coding in powerful Cloud Development Environments (DigitalXchange...
Lucas Jellema
 
PPTX
Apache Superset - open source data exploration and visualization (Conclusion ...
Lucas Jellema
 
PPTX
CONNECTING THE REAL WORLD TO ENTERPRISE IT – HOW IoT DRIVES OUR ENERGY TRANSI...
Lucas Jellema
 
PPTX
Help me move away from Oracle - or not?! (Oracle Community Tour EMEA - LVOUG...
Lucas Jellema
 
PPTX
Op je vingers tellen... tot 1000!
Lucas Jellema
 
PPTX
IoT - from prototype to enterprise platform (DigitalXchange 2022)
Lucas Jellema
 
PPTX
Who Wants to Become an IT Architect-A Look at the Bigger Picture - DigitalXch...
Lucas Jellema
 
PPTX
Steampipe - use SQL to retrieve data from cloud, platforms and files (Code Ca...
Lucas Jellema
 
PPTX
Automation of Software Engineering with OCI DevOps Build and Deployment Pipel...
Lucas Jellema
 
PPTX
Introducing Dapr.io - the open source personal assistant to microservices and...
Lucas Jellema
 
PPTX
How and Why you can and should Participate in Open Source Projects (AMIS, Sof...
Lucas Jellema
 
PPTX
Microservices, Apache Kafka, Node, Dapr and more - Part Two (Fontys Hogeschoo...
Lucas Jellema
 
PPTX
Microservices, Node, Dapr and more - Part One (Fontys Hogeschool, Spring 2022)
Lucas Jellema
 
PPTX
6Reinventing Oracle Systems in a Cloudy World (RMOUG Trainingdays, February 2...
Lucas Jellema
 
PPTX
Help me move away from Oracle! (RMOUG Training Days 2022, February 2022)
Lucas Jellema
 
PPTX
Tech Talks 101 - DevOps (jan 2022)
Lucas Jellema
 
PPTX
Conclusion Code Cafe - Microcks for Mocking and Testing Async APIs (January 2...
Lucas Jellema
 
PPTX
Cloud Native Application Development - build fast, low TCO, scalable & agile ...
Lucas Jellema
 
Introduction to web application development with Vue (for absolute beginners)...
Lucas Jellema
 
Making the Shift Left - Bringing Ops to Dev before bringing applications to p...
Lucas Jellema
 
Lightweight coding in powerful Cloud Development Environments (DigitalXchange...
Lucas Jellema
 
Apache Superset - open source data exploration and visualization (Conclusion ...
Lucas Jellema
 
CONNECTING THE REAL WORLD TO ENTERPRISE IT – HOW IoT DRIVES OUR ENERGY TRANSI...
Lucas Jellema
 
Help me move away from Oracle - or not?! (Oracle Community Tour EMEA - LVOUG...
Lucas Jellema
 
Op je vingers tellen... tot 1000!
Lucas Jellema
 
IoT - from prototype to enterprise platform (DigitalXchange 2022)
Lucas Jellema
 
Who Wants to Become an IT Architect-A Look at the Bigger Picture - DigitalXch...
Lucas Jellema
 
Steampipe - use SQL to retrieve data from cloud, platforms and files (Code Ca...
Lucas Jellema
 
Automation of Software Engineering with OCI DevOps Build and Deployment Pipel...
Lucas Jellema
 
Introducing Dapr.io - the open source personal assistant to microservices and...
Lucas Jellema
 
How and Why you can and should Participate in Open Source Projects (AMIS, Sof...
Lucas Jellema
 
Microservices, Apache Kafka, Node, Dapr and more - Part Two (Fontys Hogeschoo...
Lucas Jellema
 
Microservices, Node, Dapr and more - Part One (Fontys Hogeschool, Spring 2022)
Lucas Jellema
 
6Reinventing Oracle Systems in a Cloudy World (RMOUG Trainingdays, February 2...
Lucas Jellema
 
Help me move away from Oracle! (RMOUG Training Days 2022, February 2022)
Lucas Jellema
 
Tech Talks 101 - DevOps (jan 2022)
Lucas Jellema
 
Conclusion Code Cafe - Microcks for Mocking and Testing Async APIs (January 2...
Lucas Jellema
 
Cloud Native Application Development - build fast, low TCO, scalable & agile ...
Lucas Jellema
 
Ad

Recently uploaded (20)

PDF
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
PPTX
MiniTool Power Data Recovery Full Crack Latest 2025
muhammadgurbazkhan
 
PDF
Mobile CMMS Solutions Empowering the Frontline Workforce
CryotosCMMSSoftware
 
PDF
Transform Retail with Smart Technology: Power Your Growth with Ginesys
Ginesys
 
PPTX
An Introduction to ZAP by Checkmarx - Official Version
Simon Bennetts
 
PPTX
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
PDF
Dealing with JSON in the relational world
Andres Almiray
 
PPTX
CONCEPT OF PROGRAMMING in language .pptx
tamim41
 
PDF
Thread In Android-Mastering Concurrency for Responsive Apps.pdf
Nabin Dhakal
 
PDF
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
PPTX
Human Resources Information System (HRIS)
Amity University, Patna
 
PPTX
NeuroStrata: Harnessing Neuro-Symbolic Paradigms for Improved Testability and...
Ivan Ruchkin
 
PPTX
3uTools Full Crack Free Version Download [Latest] 2025
muhammadgurbazkhan
 
PDF
GridView,Recycler view, API, SQLITE& NetworkRequest.pdf
Nabin Dhakal
 
PPTX
Comprehensive Guide: Shoviv Exchange to Office 365 Migration Tool 2025
Shoviv Software
 
PPTX
How Apagen Empowered an EPC Company with Engineering ERP Software
SatishKumar2651
 
PPTX
computer forensics encase emager app exp6 1.pptx
ssuser343e92
 
PPTX
Migrating Millions of Users with Debezium, Apache Kafka, and an Acyclic Synch...
MD Sayem Ahmed
 
PDF
Laboratory Workflows Digitalized and live in 90 days with Scifeon´s SAPPA P...
info969686
 
PPTX
PowerISO Crack 2025 – Free Download Full Version with Serial Key [Latest](1)....
HyperPc soft
 
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
MiniTool Power Data Recovery Full Crack Latest 2025
muhammadgurbazkhan
 
Mobile CMMS Solutions Empowering the Frontline Workforce
CryotosCMMSSoftware
 
Transform Retail with Smart Technology: Power Your Growth with Ginesys
Ginesys
 
An Introduction to ZAP by Checkmarx - Official Version
Simon Bennetts
 
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
Dealing with JSON in the relational world
Andres Almiray
 
CONCEPT OF PROGRAMMING in language .pptx
tamim41
 
Thread In Android-Mastering Concurrency for Responsive Apps.pdf
Nabin Dhakal
 
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
Human Resources Information System (HRIS)
Amity University, Patna
 
NeuroStrata: Harnessing Neuro-Symbolic Paradigms for Improved Testability and...
Ivan Ruchkin
 
3uTools Full Crack Free Version Download [Latest] 2025
muhammadgurbazkhan
 
GridView,Recycler view, API, SQLITE& NetworkRequest.pdf
Nabin Dhakal
 
Comprehensive Guide: Shoviv Exchange to Office 365 Migration Tool 2025
Shoviv Software
 
How Apagen Empowered an EPC Company with Engineering ERP Software
SatishKumar2651
 
computer forensics encase emager app exp6 1.pptx
ssuser343e92
 
Migrating Millions of Users with Debezium, Apache Kafka, and an Acyclic Synch...
MD Sayem Ahmed
 
Laboratory Workflows Digitalized and live in 90 days with Scifeon´s SAPPA P...
info969686
 
PowerISO Crack 2025 – Free Download Full Version with Serial Key [Latest](1)....
HyperPc soft
 

AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message Queue

  • 1. INTRODUCING APACHE KAFKA – SCALABLE, RELIABLE EVENT BUS & ESSAGE QUEUE Maarten Smeets & Lucas Jellema 09 February 2017, Nieuwegein M
  • 2. AGENDA INTRODUCTION & OVERVIEW DEMO HANDSON PART 1 - PRODUCING AND CONSUMING MESSAGES (PUB/SUB) DINNER KAFKA: SOME HISTORY, A PEEK UNDER THE HOOD, ROLE IN ARCHITECTURE AND USE CASES KAFKA AND ORACLE HANDSON PART 2 – MORE COMPLEX SCENARIOS AND SOME BACKGROUND & ADMIN
  • 4. SENDING MESSAGES TO CONSUMERS • Dependency on producer at design time and at run time • Deal with multiple consumers? • Synchronous (blocking) waits • (how to) Cross technology realms • (how to) Cross host, location, clouds • Availability of consumers • Message delivery guarantees • Scaling, high (peak) volumes
  • 6. MESSAGING AS WE KNOW IT • JMS, Oracle Advanced Queuing, IBM MQ, MS MQ, RabbitMQ, MQTT, XMPP, WebSockets, … • Challenges • Costs • Scalability (size and speed) • (lack of) Distribution (and therefore availability) • Complexity of infrastructure • Message delivery guarantees • Lack of technology openness • Deal with temporarily offline consumers • Retain history
  • 9. KAFKA TERMINOLOGY • Topic • Message • == ByteArray • Broker • Producer • Consumer Producer Consumer Topic Broker Key Value Time Message
  • 11. CONSUMING • Messages are available to consumers only when they have been committed • Kafka does not push • Unlike JMS • Read does not destroy • Unlike JMS Topic • (some) History available • Offline consumers can catch up • Consumers can re-consume from the past • Delivery Guarantees • Ordering maintained • At-least-once (per consumer) by default; at-most-once and exactly-once can be implemented
  • 13. WHAT’S SO SPECIAL? • Durable • Scalable • High volume • High speed • Available • Distributed • Open • Quick start • Free (no license costs)
  • 15. AGENDA INTRODUCTION & OVERVIEW DEMO HANDSON PART 1 - PRODUCING AND CONSUMING MESSAGES (PUB/SUB) DINNER KAFKA: SOME HISTORY, A PEEK UNDER THE HOOD, ROLE IN ARCHITECTURE AND USE CASES KAFKA AND ORACLE HANDSON PART 2 – MORE COMPLEX SCENARIOS AND SOME BACKGROUND & ADMIN
  • 16. AGENDA INTRODUCTION & OVERVIEW DEMO HANDSON PART 1 - PRODUCING AND CONSUMING MESSAGES (PUB/SUB) DINNER KAFKA: SOME HISTORY, A PEEK UNDER THE HOOD, ROLE IN ARCHITECTURE AND USE CASES KAFKA AND ORACLE HANDSON PART 2 – MORE COMPLEX SCENARIOS AND SOME BACKGROUND & ADMIN
  • 17. HISTORY • ..- 2010 – creation at Linkedin • It was designed to provide a high-performance, scalable messaging system which could handle multiple consumers, many types of data [at high volumes and peaks], and provide for the availability & persistence of clean, structured data […] in real time. • 2011 – open source under the Apache Incubator • October 2012 – top project under Apache Software Foundation • 2014 – several orginal Kafka engineers founded Confluent • 2016 • Introduction of Kafka Connect (0.9) • Introduction of Kafka Streams (0.10) • Octobermost recent stable release 0.10.1 • Kafka is used by many large corporations: • Walmart, Cisco, Netflix, PayPal, LinkedIn, eBay, Spotify, Uber, Sift Science • And embraced by many software vendors & cloud providers
  • 18. USE CASES • Messaging & Queuing • Handle fast data (IoT, social media, web clicks, infra metrics, …) • Receive and save – low latency, high volume • Log aggregation • Event Sourcing and Commit Log • Stream processing • Single enterprise event backbone • Connect business processes, applications, microservices
  • 19. PLAYS NICE WITH & ARCHITECTURE
  • 21. KAFKA INCARNATIONS • Kafka Docker Images • Confluent (Spotify, Wurstmeister) • Cloud: • CloudKarafka • IBM BlueMix Message Hub • AWS supports Kafka (but tries to propose Amazon Kinesis Streams) • Google runs Kafka (though tries to push Google Pub/Sub) • Bitnami VMs for many cloud providers such as Azure, GCP, AWS, OPC • Kafka Connectors in many platforms • Azure IoT Hub, Google Pub/Sub, Mule AnyPoint Connector, … • Oracle ….
  • 22. KAFKA ECO SYSTEM • Confluent • OpenSource: Native Clients, Camus (link to Hadoop), REST Proxy, Schema Registry • Enterprise: Kafka Ops Dashboard/Control Center, Auto Data Balancing, MultiData Center Replication , • Community • Connectors • Client libraries • …
  • 23. KAFKA CONNECT • Kafka Connect is a framework for connectors (aka adapters) that provide bridges for • Producing from specific technologies to Kafka • Consuming from Kafka to specific technologies • For example: • JDBC • Hadoop
  • 24. KAFKA CONNECT – CONNECTORS
  • 25. KAFKA STREAMS • Real Time Event [Stream] Processing integrated into Kafka • Aggregations & Top-N • Time Windows • Continuous Queries • Latest State (event sourcing) • Turn Stream (of changes) into Table (of most recent or current state) • Part of the state can be quite old • A Kafka Streams client will have state in memory • Always to be recreated from topic partition log files • Note: Kafka Streams is relatively new • Only support for Java clients
  • 27. EXAMPLE OF KAFKA STREAMS Topic SelectKey AggregateByKey Join Topic Map (Xform) Publish CountryMessage Continent Name Population Size Set Continent as key Update Top 3 biggest countries As JSON Size in Square Miles, % of entire continent Total area for each continent Topic: Top3CountrySizePerContinent
  • 28. countries2.csv Topic Broker Producer SelectKey AggregateByKey Map (Xform) Publish Set Continent as key Update Top 3 biggest countries Topic: Top3CountrySizePerContinent
  • 29. EXAMPLE OF KAFKA STREAMS Topic SelectKey AggregateByKey Publish to Topic Topic: Top3CountrySizePerContinent CountryMessage Continent Name Population Size Set Continent as key Update Top 3 biggest countries As JSON Print
  • 31. PARTITIONS • Topics are configured with a number of partitions • Storage, serialization, replication, availability, order guarantee are all at partition level • Each partition is an ordered, immutable sequence of records that is continually appended to • Producer can specify the destination partition to write to • Alternatively the partition is determined from the message key or simply by load balancing • Multiple partitions can be written to at the same time
  • 32. PRODUCING MESSAGES • The producer sets the partition for each message • Note: it should talk to the broker who is leader for that partition • Messages can be produced one-by-one or in batches • Batches balance latency vs throughput • A batch can contain messages for different topics & partitions • Messages can be compressed • Producers can configure required acknowledgement level (from broker) • No (waiting for leader to complete) • Wait for leader to commit [to file log] • Wait for all replicas to complete • Note: messages are serialized to byte array as the wire format Producers Topic Broker tcp
  • 33. CONSUMING • A consumer pulls from a Topic • Consuming can be done in parallel to producing • And many consumers can consume at the same time • Each consumer has a Message Offset per partition • That can be different across consumers • That can be adjusted at any time • Delivery Guarantees • At least once (per consumer) by default; adjust offset when all messages have been processed • At-most-once and exactly-once can be implemented (for example: maintain offset in the same transaction that processes the messages) • Message Retention • Time Based (at least for … time) • Size Based (log files can be no larger than … MB/GB/TB) • Key based aka Log Compaction (retain at least the latest message for each primary key value) Consumers Topic tcp
  • 34. CONSUMER GROUPS FOR PARALLEL MESSAGE PROCESSING • Multiple consumers can be in the same Consumer Group • They collaborate on processing messages from a Topic (horizontal scalability) • Each Consumer in the Group receives messages from a different partition • Messages are delivered to only one consumer in the group • Consumers outside the Consumer Group can pull from the same Topic & Partition • And process the same messages Consumers Topic tcp
  • 35. CLUSTER – RELIABLE, SCALABLE • A cluster consists of multiple brokers, possibly on multiple server nodes • Each node runs • Apache ZooKeeper to keep track • One or more Kafka Brokers • Each with their own set of storage logs • Each partition lives on one or more brokers (and sets of logs) • Defined through topic replication factor • One is the leader, the others are follower replicas • Clients communicate about a partition with the broker that contains the leader replica for that partition • Changes are committed by the leader, then replicated across the followers Broker Topic Partition Partition Broker Topic Partition Partition Broker Topic Partition Partition Broker Topic Partition Partition
  • 36. CLUSTER – RELIABLE, SCALABLE (2) • ZooKeeper has list of all brokers and a list of all topics and partitions (with leader and ISR) • Leader has list of all alive followers (in-synch replicas or ISR) • Follower-replicas consume messages from the leader to synchronize • Similar to normal message consumers • Note: message producers requesting full acknowledgement will get ack once all follower replicates have consumed the message • N-1 replicas can fail without loss of messages Broker Topic Partition Partition Broker Topic Partition Partition Broker Topic Partition Partition Broker Topic Partition Partition
  • 37. AGENDA INTRODUCTION & OVERVIEW DEMO HANDSON PART 1 - PRODUCING AND CONSUMING MESSAGES (PUB/SUB) DINNER KAFKA: SOME HISTORY, A PEEK UNDER THE HOOD, ROLE IN ARCHITECTURE AND USE CASES KAFKA AND ORACLE HANDSON PART 2 – MORE COMPLEX SCENARIOS AND SOME BACKGROUND & ADMIN
  • 38. ORACLE AND KAFKA • On premises • Service Bus Kafka transport (demo!) • Stream Analytics Kafka Adapter (demo!) • GoldenGate for Big Data handler for Kafka • Data Integrator (coming soon) • Cloud • Elastic Big Data & Streaming platform • Event Hub (coming soon)
  • 42. ELASTIC BIG DATA & STREAMING PLATFORM
  • 46. AGENDA INTRODUCTION & OVERVIEW DEMO HANDSON PART 1 - PRODUCING AND CONSUMING MESSAGES (PUB/SUB) DINNER KAFKA: SOME HISTORY, A PEEK UNDER THE HOOD, ROLE IN ARCHITECTURE AND USE CASES KAFKA AND ORACLE HANDSON PART 2 – MORE COMPLEX SCENARIOS AND SOME BACKGROUND & ADMIN
  • 47. HANDS ON PART 2 • Continue part 1 • Java and/or Node consuming/producing • Some Admin & advanced stuff • Partitions • Multiple producers, multiple consumers • New consumer, go back in time • Expiration of messages • Multi-broker, Cluster configuration, ZooKeeper
  • 48. • Resources: https://github.com/MaartenSmeets/kafka-workshop • Blog: technology.amis.nl On Oracle, Cloud, SQL, PL/SQL, Java, JavaScript, Continuous Delivery, SOA, BPM & more • Email: [email protected] , [email protected] • : @MaartenSmeetsNL , @lucasjellema • : smeetsm , lucas-jellema • : www.amis.nl, [email protected] +31 306016000 Edisonbaan 15, Nieuwegein

Editor's Notes

  • #22: http://stackoverflow.com/questions/35861501/kafka-in-docker-not-working Docker images from Confluent: https://hub.docker.com/r/confluent/kafka/
  • #23: http://docs.confluent.io/2.0.0/platform.html
  • #24: http://docs.confluent.io/2.0.0/platform.html https://www.confluent.io/blog/apache-kafka-getting-started/