SlideShare a Scribd company logo
Real-time stream
processing with Apache
Kafka
Juhi
Praveen Singh Bor
Business is driven by streams of events
Business is driven by streams of events
Old World of data
Evolution of processing data
New World
DB/DWH + distributes systems
Monolithic Application -> Microservices
Batch -> Real-time
How Organizations Handle Data Flow
Real-time Distributed Streaming Platform
Realtime stream processing with kafka
Publish + Subscribe
Store
Process
Terminologies
● Message or Event
Terminologies
● Message
● Topic and Partition
Topic and Partitions
Terminologies
● Message/ Batches
● Topic and Partition
● Kafka Brokers & Cluster
Terminologies
● Message/ Batches
● Topic and Partition
● Kafka Brokers & Cluster
Terminologies
● Message/ Batches
● Topic and Partition
● Kafka Brokers & Cluster
● Producer and Consumer
Terminologies
● Message/ Batches
● Topic and Partition
● Kafka Brokers & Cluster
● Producer and Consumer
● Zookeeper
Problem statement
When a customer uses a credit card to do a transaction, the vendor needs a fast
response to the question, “Is it a fraudulent payment?”
Real time stream processing
Fraud Detection ( Is payment fraud? YES/NO )
Broker 1 Broker 2
Kafka Cluster
Kafka
Connect
External
System
Payment 1
Payment 2
Payment 1001
Payment 10002
NO
YES
NO
NO
Is a payment fraudulent one?
● Analysis and forensics on historical data to build the machine learning models.
● Use machine learning models to prediction fraud on live streams.
○ Card Velocity
○ Average spending in last 60 mins > 10 * average spending in 60 mins ever
Problem statement : Fraud detection
● POS Transaction Data (Live Stream)
● User Information
● User Transaction History
● Fraud Location Estimator
Let’s build real-time fraud detection system for Credit Card
Fraud
Detector
Customer
Profile
Step 1
Step 2
Step 3
Kafka Core API
1. Producer API
2. Consumer API
3. Connect API
4. Stream API
Step 1: Produce messages using Producer API
POS_TRANSACTION_TOPIC
Realtime stream processing with kafka
Step 2: Capture Data from External Data source
Customer
Profile
POS_TRANSACTION_TOPIC
CUSTOMER_RECORD_TOPIC
KafkaConnect
Streaming Platform Overview
Broker 1 Broker 2
Kafka Cluster
Kafka
Connect
Kafka
Connect
External
System
External
System
Key concepts
Key ValueX 25
Key ValueY 50
Key ValueZ 9
Table
Key ValueX 20
Key ValueX 25
Key ValueZ 9
Key ValueY 5
Key ValueY 50
Stream
Duality of Stream and Table
Key concepts
Processor Topology
Stream Processor
Stream
How to use Kafka Streams API?
Just three steps
1. Create one or more streams from Kafka topic(s).
2. Compose transformations on these streams.
3. Write transformed streams back to Kafka.
Creating source streams from Kafka
● Input topics to KStream
○ Each app instance gets a subsect of partitions of input streams.
○ Specify the serializer and deserializer .
Transform a Stream
● Stateless transformation
○ Don’t require state for processing.
○ Don’t require state store with stream processor.
○ E.g. Branch, Filter, Inverse Filter, FlatMap, Peek, Map etc
Transform a Stream
● Stateful transformation
○ Depends on state for processing inputs and producing outputs.
○ Require a state store with stream processor.
○ State stores are fault tolerant.
■ Aggregating
■ Joining
■ Windowing
■ Applying custom processors and transformers
Aggregating
○ Group the record by either groupByKey or groupBy.
○ KGroupedStream or KGroupedTable can be aggregated via operations like reduce.
○ Aggregation can be performed on windowed or non-windowed data.
Aggregating
Joins
Realtime stream processing with kafka
Stream Partitions and Tasks
State Store

More Related Content

What's hot (20)

PDF
APAC ksqlDB Workshop
confluent
 
PDF
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
confluent
 
PDF
Bridge to Cloud: Using Apache Kafka to Migrate to AWS
confluent
 
PDF
Building Event-Driven Applications with Apache Kafka & Confluent Platform
confluent
 
PPTX
Bridge Your Kafka Streams to Azure Webinar
confluent
 
PDF
ksqlDB Workshop
confluent
 
PDF
Unleashing Apache Kafka and TensorFlow in the Cloud

Kai Wähner
 
PDF
New Features in Confluent Platform 6.0 / Apache Kafka 2.6
Kai Wähner
 
PDF
What is Apache Kafka®?
confluent
 
PDF
Amsterdam meetup at ING June 18, 2019
confluent
 
PDF
Neha Narkhede | Kafka Summit London 2019 Keynote | Event Streaming: Our Cloud...
confluent
 
PDF
Building a Streaming Platform with Kafka
confluent
 
PPTX
IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...
Kai Wähner
 
PDF
KSQL: Open Source Streaming for Apache Kafka
confluent
 
PDF
Top use cases for 2022 with Data in Motion and Apache Kafka
confluent
 
PDF
Operational Analytics on Event Streams in Kafka
confluent
 
PDF
Kafka summit SF 2019 - the art of the event-streaming app
Neil Avery
 
PDF
All Streams Ahead! ksqlDB Workshop ANZ
confluent
 
PPTX
Deep Dive Series #3: Schema Validation + Structured Audit Logs
confluent
 
PDF
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
confluent
 
APAC ksqlDB Workshop
confluent
 
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
confluent
 
Bridge to Cloud: Using Apache Kafka to Migrate to AWS
confluent
 
Building Event-Driven Applications with Apache Kafka & Confluent Platform
confluent
 
Bridge Your Kafka Streams to Azure Webinar
confluent
 
ksqlDB Workshop
confluent
 
Unleashing Apache Kafka and TensorFlow in the Cloud

Kai Wähner
 
New Features in Confluent Platform 6.0 / Apache Kafka 2.6
Kai Wähner
 
What is Apache Kafka®?
confluent
 
Amsterdam meetup at ING June 18, 2019
confluent
 
Neha Narkhede | Kafka Summit London 2019 Keynote | Event Streaming: Our Cloud...
confluent
 
Building a Streaming Platform with Kafka
confluent
 
IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...
Kai Wähner
 
KSQL: Open Source Streaming for Apache Kafka
confluent
 
Top use cases for 2022 with Data in Motion and Apache Kafka
confluent
 
Operational Analytics on Event Streams in Kafka
confluent
 
Kafka summit SF 2019 - the art of the event-streaming app
Neil Avery
 
All Streams Ahead! ksqlDB Workshop ANZ
confluent
 
Deep Dive Series #3: Schema Validation + Structured Audit Logs
confluent
 
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
confluent
 

Similar to Realtime stream processing with kafka (20)

PDF
Actors or Not: Async Event Architectures
Yaroslav Tkachenko
 
PPTX
Event-Based API Patterns and Practices
LaunchAny
 
PPTX
apidays LIVE India - Asynchronous and Broadcasting APIs using Kafka by Rohit ...
apidays
 
PDF
How did we move the mountain? - Migrating 1 trillion+ messages per day across...
HostedbyConfluent
 
PDF
A Practical Deep Dive into Observability of Streaming Applications with Kosta...
HostedbyConfluent
 
PDF
KFServing Payload Logging for Trusted AI
Animesh Singh
 
PDF
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
confluent
 
PPTX
Flexible and Real-Time Stream Processing with Apache Flink
DataWorks Summit
 
PDF
Jay Kreps | Kafka Summit NYC 2019 Keynote (Events Everywhere) | CEO, Confluent
confluent
 
PDF
Stream Processing with Flink and Stream Sharing
confluent
 
PDF
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Kai Wähner
 
PDF
apidays New York 2023 - Why Finance needs Asychronous APIs, Nicholas Goodman,...
apidays
 
PPTX
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
Ververica
 
PDF
Santander Stream Processing with Apache Flink
confluent
 
PPTX
Data Stream Processing with Apache Flink
Fabian Hueske
 
PDF
JHipster conf 2019 - Kafka Ecosystem
Florent Ramiere
 
PDF
Inside Kafka Streams—Monitoring Comcast’s Outside Plant
confluent
 
PDF
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
confluent
 
PDF
Data Transformations on Ops Metrics using Kafka Streams (Srividhya Ramachandr...
confluent
 
PDF
NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with Streams
Ben Stopford
 
Actors or Not: Async Event Architectures
Yaroslav Tkachenko
 
Event-Based API Patterns and Practices
LaunchAny
 
apidays LIVE India - Asynchronous and Broadcasting APIs using Kafka by Rohit ...
apidays
 
How did we move the mountain? - Migrating 1 trillion+ messages per day across...
HostedbyConfluent
 
A Practical Deep Dive into Observability of Streaming Applications with Kosta...
HostedbyConfluent
 
KFServing Payload Logging for Trusted AI
Animesh Singh
 
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
confluent
 
Flexible and Real-Time Stream Processing with Apache Flink
DataWorks Summit
 
Jay Kreps | Kafka Summit NYC 2019 Keynote (Events Everywhere) | CEO, Confluent
confluent
 
Stream Processing with Flink and Stream Sharing
confluent
 
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Kai Wähner
 
apidays New York 2023 - Why Finance needs Asychronous APIs, Nicholas Goodman,...
apidays
 
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
Ververica
 
Santander Stream Processing with Apache Flink
confluent
 
Data Stream Processing with Apache Flink
Fabian Hueske
 
JHipster conf 2019 - Kafka Ecosystem
Florent Ramiere
 
Inside Kafka Streams—Monitoring Comcast’s Outside Plant
confluent
 
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
confluent
 
Data Transformations on Ops Metrics using Kafka Streams (Srividhya Ramachandr...
confluent
 
NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with Streams
Ben Stopford
 
Ad

Recently uploaded (20)

PDF
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
July Patch Tuesday
Ivanti
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
July Patch Tuesday
Ivanti
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
Ad

Realtime stream processing with kafka

Editor's Notes

  • #3: In a way every business generates stream of events. Retail has stream on orders and shipments, finance has stream of stock tickers. Bitcoins exchanges has stream of exchange rates, website has stream of impression and clicks. Every byte of data has story to tell and it In today’s world Business is becoming more digital. And Ubounded, unorded large scale data sets are increasingly common in day to day business. Every application generates data in form of use clicks, logs or some transaction. Every byte has story to tell. Like your single click on Amazon, behind the scene determines which item would you like to see nex. So data that application generated can be thought as streams of events.
  • #4: Business is becoming more digital. Every application generates data in form of use clicks, logs or some transaction. Every byte has story to tell. Like your single click on Amazon, behind the scene determines which item would you like to see nex. So data that application generated can be thought as streams of events.
  • #6: Request response Batch processing Real time processing
  • #8: To caught up with the need to process data in as it arrives companies has implemented data pipelines like this. It is ver messy. There are applications which talks to each other using some kind of messenging queue Custom etl scripts written to move data between sources and destinations. This adhoc fashion of connecting source and destination to build real time processing application is pretty chaotic.
  • #9: In this talk we will see how Apache kafka cleans up the mess providing distributed streaming platform. The idea is to have kafka as central neve of your system. Which has ability to collect data from variety of sources and make it available at real-time and at large scale to any number of destination as it comes up.
  • #10: Here is how you go about building streaming platform.
  • #11: Kafka as a Messaging System It acts as a publish subscribe system where publishers publish messages and consumers reads messages from server.
  • #12: It is not just limited to pub sub system. It is storage system which stores the stream of data. Persistance and strict ordering Data written to Kafka is written to disk and replicated for fault-tolerance. you can think of Kafka as a kind of special purpose distributed filesystem dedicated to high-performance, low-latency commit log storage, replication, and propagation. Distributed by design Replication Fault Tolerance Partitioning Elastic Scaling Scalability of file systems
  • #13: It isn't enough to just read, write, and store streams of data, the purpose is to enable real-time processing of streams. In Kafka a stream processor is anything that takes continuous streams of data from input topics, performs some processing on this input, and produces continual streams of data to output topics.
  • #14: - Unit of data is Message which has key and data. - It is like record in database - Just byte array. No meaning to Kafka - Message has Optional bit of metadata, referred as key. - for efficiency message is written into batches - Batch is just a collection of messages. - Trade off between Latency and throughput
  • #16: - Messages are categorized into Topics. - Closest analogy is database table or folder - Topics are broken down into partitions - Partition provides redundancy and scalability - Each partition can be hosted on to different server ( Single partition can be scaled horizontally across different server to provide performance - Multiple partition does not guaranty ordering of messages across multiple partitions but ordering is maintain in single partition Offset - Another bit of metadata, an Integer that continuously increases - Kafka adds offset to message as it is produced and which is unique in single partition.
  • #18: Broker: A single Kafka server is called a broker. The broker receives messages from producers, assigns offsets to them, and commits the messages to storage on disk. It also services consumers, responding to fetch requests for partitions and responding with the messages that have been committed to disk. Cluster: Kafka brokers are designed to operate as part of a cluster
  • #19: Producer: - Producer creates new messages and publish to Topic Kafka. Consumer - Subscribes to one or more topic and read messages in the order in which they were produced. - keep track of which messages are already consumed by keeping offset of last consumed message. - With this Consumer can restart and stop without losing its place.
  • #20: Apache Kafka uses Zookeeper to store metadata about the Kafka cluster, as well as consumer client detail
  • #25: I think we have covered enough of theory so let’s build our own simple Credit card fraud detection system. This is very basic example o, So the idea is whenever card-holder use card, transaction events gets generated.
  • #26: Ingest Transaction Stream into Kafka from Web Application using Kafka Producer API Capture Card Holder information from external data source using Kafka Connect Process Stream for Fraud Detection using Kafka stream API