SlideShare a Scribd company logo
1© Cloudera, Inc. All rights reserved.
Real-time analytics with Kafka
and Spark Streaming
Ashish Singh | Software Engineer, Cloudera
2© Cloudera, Inc. All rights reserved.
It’s Real-Time time
Why now? Complex Event Processing (CEP) is not a new concept.
3© Cloudera, Inc. All rights reserved.
It’s Real-Time time
Emergence of
Real-Time
Stream
Processing
Exponential
growth in
continuous
data streams
Open Source
tools for
reliable high-throughput
low latency event
queuing and processing
Tools run on
“Commodity”
Hardware
Why now? Complex Event Processing (CEP) is not a new concept.
4© Cloudera, Inc. All rights reserved.
It’s happening! …Across Industries
Credit Card
& Monetary
Transactions
Identify
fraudulent
transactions
as soon as
they occur.
Transportation
& Logistics
• Real-time
traffic
conditions
• Tracking
fleet and cargo locations and
dynamic re-routing to meet
SLAs
Retail
• Real-time
in-store
Offers and Recommendations.
• Email and
marketing campaigns based on real-
time social trends
Consumer Internet,
Mobile &
E-Commerce
Optimize user
engagement based
on user’s current
behavior. Deliver
recommendations relevant “in
the moment”
Healthcare
Continuously
monitor patient
vital stats and proactively identify
at-risk patients.
Manufacturing
• Identify
equipment
failures and
react instantly
• Perform proactive
maintenance.
• Identify product
quality defects immediately to
prevent resource wastage.
Security & Surveillance
Identify
threats
and intrusions,
both digital and physical, in real-
time.
Digital
Advertising
& Marketing
Optimize and personalize digital
ads based on real-time
information.
5© Cloudera, Inc. All rights reserved.
Canonical Stream Processing Architecture
Data
Sources
6© Cloudera, Inc. All rights reserved.
Canonical Stream Processing Architecture
Data
Sources
Kafka Flume
7© Cloudera, Inc. All rights reserved.
Canonical Stream Processing Architecture
Data
Sources
Kafka Flume
Filter
Enrich
Transform
Stats on Sliding Windows
Stream Joins
Feature Engineering
Predictive Analytics
Active Model Training
.
.
.
.
And combinations of the
above
8© Cloudera, Inc. All rights reserved.
Canonical Stream Processing Architecture
Data
Sources
Kafka Flume
HDFS
9© Cloudera, Inc. All rights reserved.
Canonical Stream Processing Architecture
Data
Sources
Kafka Flume
NoSql
HDFS
10© Cloudera, Inc. All rights reserved.
Canonical Stream Processing Architecture
Data
Sources
Kafka Flume
NoSql
HDFS
11© Cloudera, Inc. All rights reserved.
Canonical Stream Processing Architecture
Data
Sources
Kafka Flume
NoSql
HDFS
12© Cloudera, Inc. All rights reserved.
Canonical Stream Processing Architecture
Data
Sources
Kafka Flume
NoSql
HDFS
13© Cloudera, Inc. All rights reserved.
Canonical Stream Processing Architecture
Data
Sources
Kafka Flume
NoSql
HDFS
Kafka
14© Cloudera, Inc. All rights reserved.
Canonical Stream Processing Architecture
Data
Sources
Kafka Flume
NoSql
HDFS
Kafka .
.
.
15© Cloudera, Inc. All rights reserved.
Too much?
15© Cloudera, Inc. All rights reserved.
16© Cloudera, Inc. All rights reserved.
Example application to
demonstrate how real time
analytics can be done using Kafka
and Spark Streaming
Pankh
https://github.com/SinghAsDev/pankh
17© Cloudera, Inc. All rights reserved.
Pankh – Building Pieces
Data
Sources
18© Cloudera, Inc. All rights reserved.
Pankh – Building Pieces
Data
Sources
Kafka
19© Cloudera, Inc. All rights reserved.
Pankh – Building Pieces
Data
Sources
Kafka
20© Cloudera, Inc. All rights reserved.
Pankh – Building Pieces
Data
Sources
Kafka
NoSql
21© Cloudera, Inc. All rights reserved.
Pankh – Building Pieces
Data
Sources
Kafka
NoSql
22© Cloudera, Inc. All rights reserved.
Demo Time
22© Cloudera, Inc. All rights reserved.
25© Cloudera, Inc. All rights reserved.
Kappa Architecture
26© Cloudera, Inc. All rights reserved.
Demo Time
26© Cloudera, Inc. All rights reserved.
27© Cloudera, Inc. All rights reserved.
Thank you
Ashish Singh
asingh@cloudera.com
@singhasdev

More Related Content

What's hot (20)

PDF
Data Pipeline with Kafka
Peerapat Asoktummarungsri
 
PDF
Stream processing using Apache Storm - Big Data Meetup Athens 2016
Adrianos Dadis
 
PPTX
Real Time Data Processing Using Spark Streaming
Hari Shreedharan
 
PDF
Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...
Big Data Spain
 
ODP
Lambda Architecture with Spark
Knoldus Inc.
 
PDF
Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...
Data Con LA
 
PDF
Tuning Java Driver for Apache Cassandra by Nenad Bozic at Big Data Spain 2017
Big Data Spain
 
PPTX
How Tencent Applies Apache Pulsar to Apache InLong - Pulsar Summit Asia 2021
StreamNative
 
PPT
Distributed and Fault Tolerant Realtime Computation with Apache Storm, Apache...
Folio3 Software
 
PPTX
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
DataWorks Summit/Hadoop Summit
 
PPTX
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
Chris Fregly
 
PDF
Stream Processing Everywhere - What to use?
MapR Technologies
 
PDF
Introduction to Apache Kafka
Jim Plush
 
PPTX
Kafka connect-london-meetup-2016
Gwen (Chen) Shapira
 
PDF
Trend Micro Big Data Platform and Apache Bigtop
Evans Ye
 
PDF
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
confluent
 
PDF
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)
Spark Summit
 
PPTX
Experience with Kafka & Storm
Otto Mok
 
PPTX
Architecture of a Kafka camus infrastructure
mattlieber
 
PDF
Rethinking Streaming Analytics For Scale
Helena Edelson
 
Data Pipeline with Kafka
Peerapat Asoktummarungsri
 
Stream processing using Apache Storm - Big Data Meetup Athens 2016
Adrianos Dadis
 
Real Time Data Processing Using Spark Streaming
Hari Shreedharan
 
Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...
Big Data Spain
 
Lambda Architecture with Spark
Knoldus Inc.
 
Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...
Data Con LA
 
Tuning Java Driver for Apache Cassandra by Nenad Bozic at Big Data Spain 2017
Big Data Spain
 
How Tencent Applies Apache Pulsar to Apache InLong - Pulsar Summit Asia 2021
StreamNative
 
Distributed and Fault Tolerant Realtime Computation with Apache Storm, Apache...
Folio3 Software
 
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
DataWorks Summit/Hadoop Summit
 
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
Chris Fregly
 
Stream Processing Everywhere - What to use?
MapR Technologies
 
Introduction to Apache Kafka
Jim Plush
 
Kafka connect-london-meetup-2016
Gwen (Chen) Shapira
 
Trend Micro Big Data Platform and Apache Bigtop
Evans Ye
 
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
confluent
 
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)
Spark Summit
 
Experience with Kafka & Storm
Otto Mok
 
Architecture of a Kafka camus infrastructure
mattlieber
 
Rethinking Streaming Analytics For Scale
Helena Edelson
 

Viewers also liked (7)

PDF
Real-time streams and logs with Storm and Kafka
Andrew Montalenti
 
PDF
Kafka and Storm - event processing in realtime
Guido Schmutz
 
PPTX
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
DataWorks Summit
 
PPTX
Spark Streaming & Kafka-The Future of Stream Processing
Jack Gudenkauf
 
PPTX
Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Cloudera, Inc.
 
PPTX
初めてのSpark streaming 〜kafka+sparkstreamingの紹介〜
Tanaka Yuichi
 
PPTX
Hadoop Reporting and Analysis - Jaspersoft
Hortonworks
 
Real-time streams and logs with Storm and Kafka
Andrew Montalenti
 
Kafka and Storm - event processing in realtime
Guido Schmutz
 
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
DataWorks Summit
 
Spark Streaming & Kafka-The Future of Stream Processing
Jack Gudenkauf
 
Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Cloudera, Inc.
 
初めてのSpark streaming 〜kafka+sparkstreamingの紹介〜
Tanaka Yuichi
 
Hadoop Reporting and Analysis - Jaspersoft
Hortonworks
 
Ad

Similar to Real time analytics with Kafka and SparkStreaming (20)

PPTX
Unlock value with Confluent and AWS.pptx
Ahmed791434
 
PDF
Confluent Partner Tech Talk with Synthesis
confluent
 
PDF
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
Timothy Spann
 
PDF
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
Timothy Spann
 
PDF
Meetup: Streaming Data Pipeline Development
Timothy Spann
 
PDF
Mainframe Integration, Offloading and Replacement with Apache Kafka
Kai Wähner
 
PPTX
Real Time Data Processing Using Spark Streaming
Hari Shreedharan
 
PDF
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Timothy Spann
 
PDF
Santander Stream Processing with Apache Flink
confluent
 
PPTX
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
Cloudera, Inc.
 
PDF
Data Driven Enterprise with Apache Kafka
confluent
 
PDF
Fraud Detection with Hadoop
markgrover
 
PPTX
Data In Motion Paris 2023
confluent
 
PDF
Apache Kafka Use Cases_ When To Use It_ When Not To Use_.pdf
Noman Shaikh
 
PDF
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
HostedbyConfluent
 
PDF
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
Timothy Spann
 
PDF
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
confluent
 
PDF
Solution Brief: Real-Time Pipeline Accelerator
BlueData, Inc.
 
PPTX
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Mike Percy
 
PPTX
Webinar: Data Streaming with Apache Kafka & MongoDB
MongoDB
 
Unlock value with Confluent and AWS.pptx
Ahmed791434
 
Confluent Partner Tech Talk with Synthesis
confluent
 
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
Timothy Spann
 
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
Timothy Spann
 
Meetup: Streaming Data Pipeline Development
Timothy Spann
 
Mainframe Integration, Offloading and Replacement with Apache Kafka
Kai Wähner
 
Real Time Data Processing Using Spark Streaming
Hari Shreedharan
 
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Timothy Spann
 
Santander Stream Processing with Apache Flink
confluent
 
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
Cloudera, Inc.
 
Data Driven Enterprise with Apache Kafka
confluent
 
Fraud Detection with Hadoop
markgrover
 
Data In Motion Paris 2023
confluent
 
Apache Kafka Use Cases_ When To Use It_ When Not To Use_.pdf
Noman Shaikh
 
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
HostedbyConfluent
 
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
Timothy Spann
 
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
confluent
 
Solution Brief: Real-Time Pipeline Accelerator
BlueData, Inc.
 
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Mike Percy
 
Webinar: Data Streaming with Apache Kafka & MongoDB
MongoDB
 
Ad

Recently uploaded (20)

PPTX
VITEEE 2026 Exam Details , Important Dates
SonaliSingh127098
 
PDF
MAD Unit - 1 Introduction of Android IT Department
JappanMavani
 
PDF
Design Thinking basics for Engineers.pdf
CMR University
 
PDF
International Journal of Information Technology Convergence and services (IJI...
ijitcsjournal4
 
DOCX
8th International Conference on Electrical Engineering (ELEN 2025)
elelijjournal653
 
PDF
PORTFOLIO Golam Kibria Khan — architect with a passion for thoughtful design...
MasumKhan59
 
DOCX
CS-802 (A) BDH Lab manual IPS Academy Indore
thegodhimself05
 
PPTX
Introduction to Neural Networks and Perceptron Learning Algorithm.pptx
Kayalvizhi A
 
PPTX
artificial intelligence applications in Geomatics
NawrasShatnawi1
 
PDF
GTU Civil Engineering All Semester Syllabus.pdf
Vimal Bhojani
 
PDF
MAD Unit - 2 Activity and Fragment Management in Android (Diploma IT)
JappanMavani
 
PDF
Zilliz Cloud Demo for performance and scale
Zilliz
 
PDF
Biomechanics of Gait: Engineering Solutions for Rehabilitation (www.kiu.ac.ug)
publication11
 
PPTX
Element 11. ELECTRICITY safety and hazards
merrandomohandas
 
PDF
Unified_Cloud_Comm_Presentation anil singh ppt
anilsingh298751
 
PPTX
The Role of Information Technology in Environmental Protectio....pptx
nallamillisriram
 
PPTX
Types of Bearing_Specifications_PPT.pptx
PranjulAgrahariAkash
 
PPTX
Product Development & DevelopmentLecture02.pptx
zeeshanwazir2
 
PPTX
美国电子版毕业证南卡罗莱纳大学上州分校水印成绩单USC学费发票定做学位证书编号怎么查
Taqyea
 
PPTX
Worm gear strength and wear calculation as per standard VB Bhandari Databook.
shahveer210504
 
VITEEE 2026 Exam Details , Important Dates
SonaliSingh127098
 
MAD Unit - 1 Introduction of Android IT Department
JappanMavani
 
Design Thinking basics for Engineers.pdf
CMR University
 
International Journal of Information Technology Convergence and services (IJI...
ijitcsjournal4
 
8th International Conference on Electrical Engineering (ELEN 2025)
elelijjournal653
 
PORTFOLIO Golam Kibria Khan — architect with a passion for thoughtful design...
MasumKhan59
 
CS-802 (A) BDH Lab manual IPS Academy Indore
thegodhimself05
 
Introduction to Neural Networks and Perceptron Learning Algorithm.pptx
Kayalvizhi A
 
artificial intelligence applications in Geomatics
NawrasShatnawi1
 
GTU Civil Engineering All Semester Syllabus.pdf
Vimal Bhojani
 
MAD Unit - 2 Activity and Fragment Management in Android (Diploma IT)
JappanMavani
 
Zilliz Cloud Demo for performance and scale
Zilliz
 
Biomechanics of Gait: Engineering Solutions for Rehabilitation (www.kiu.ac.ug)
publication11
 
Element 11. ELECTRICITY safety and hazards
merrandomohandas
 
Unified_Cloud_Comm_Presentation anil singh ppt
anilsingh298751
 
The Role of Information Technology in Environmental Protectio....pptx
nallamillisriram
 
Types of Bearing_Specifications_PPT.pptx
PranjulAgrahariAkash
 
Product Development & DevelopmentLecture02.pptx
zeeshanwazir2
 
美国电子版毕业证南卡罗莱纳大学上州分校水印成绩单USC学费发票定做学位证书编号怎么查
Taqyea
 
Worm gear strength and wear calculation as per standard VB Bhandari Databook.
shahveer210504
 

Real time analytics with Kafka and SparkStreaming

  • 1. 1© Cloudera, Inc. All rights reserved. Real-time analytics with Kafka and Spark Streaming Ashish Singh | Software Engineer, Cloudera
  • 2. 2© Cloudera, Inc. All rights reserved. It’s Real-Time time Why now? Complex Event Processing (CEP) is not a new concept.
  • 3. 3© Cloudera, Inc. All rights reserved. It’s Real-Time time Emergence of Real-Time Stream Processing Exponential growth in continuous data streams Open Source tools for reliable high-throughput low latency event queuing and processing Tools run on “Commodity” Hardware Why now? Complex Event Processing (CEP) is not a new concept.
  • 4. 4© Cloudera, Inc. All rights reserved. It’s happening! …Across Industries Credit Card & Monetary Transactions Identify fraudulent transactions as soon as they occur. Transportation & Logistics • Real-time traffic conditions • Tracking fleet and cargo locations and dynamic re-routing to meet SLAs Retail • Real-time in-store Offers and Recommendations. • Email and marketing campaigns based on real- time social trends Consumer Internet, Mobile & E-Commerce Optimize user engagement based on user’s current behavior. Deliver recommendations relevant “in the moment” Healthcare Continuously monitor patient vital stats and proactively identify at-risk patients. Manufacturing • Identify equipment failures and react instantly • Perform proactive maintenance. • Identify product quality defects immediately to prevent resource wastage. Security & Surveillance Identify threats and intrusions, both digital and physical, in real- time. Digital Advertising & Marketing Optimize and personalize digital ads based on real-time information.
  • 5. 5© Cloudera, Inc. All rights reserved. Canonical Stream Processing Architecture Data Sources
  • 6. 6© Cloudera, Inc. All rights reserved. Canonical Stream Processing Architecture Data Sources Kafka Flume
  • 7. 7© Cloudera, Inc. All rights reserved. Canonical Stream Processing Architecture Data Sources Kafka Flume Filter Enrich Transform Stats on Sliding Windows Stream Joins Feature Engineering Predictive Analytics Active Model Training . . . . And combinations of the above
  • 8. 8© Cloudera, Inc. All rights reserved. Canonical Stream Processing Architecture Data Sources Kafka Flume HDFS
  • 9. 9© Cloudera, Inc. All rights reserved. Canonical Stream Processing Architecture Data Sources Kafka Flume NoSql HDFS
  • 10. 10© Cloudera, Inc. All rights reserved. Canonical Stream Processing Architecture Data Sources Kafka Flume NoSql HDFS
  • 11. 11© Cloudera, Inc. All rights reserved. Canonical Stream Processing Architecture Data Sources Kafka Flume NoSql HDFS
  • 12. 12© Cloudera, Inc. All rights reserved. Canonical Stream Processing Architecture Data Sources Kafka Flume NoSql HDFS
  • 13. 13© Cloudera, Inc. All rights reserved. Canonical Stream Processing Architecture Data Sources Kafka Flume NoSql HDFS Kafka
  • 14. 14© Cloudera, Inc. All rights reserved. Canonical Stream Processing Architecture Data Sources Kafka Flume NoSql HDFS Kafka . . .
  • 15. 15© Cloudera, Inc. All rights reserved. Too much? 15© Cloudera, Inc. All rights reserved.
  • 16. 16© Cloudera, Inc. All rights reserved. Example application to demonstrate how real time analytics can be done using Kafka and Spark Streaming Pankh https://github.com/SinghAsDev/pankh
  • 17. 17© Cloudera, Inc. All rights reserved. Pankh – Building Pieces Data Sources
  • 18. 18© Cloudera, Inc. All rights reserved. Pankh – Building Pieces Data Sources Kafka
  • 19. 19© Cloudera, Inc. All rights reserved. Pankh – Building Pieces Data Sources Kafka
  • 20. 20© Cloudera, Inc. All rights reserved. Pankh – Building Pieces Data Sources Kafka NoSql
  • 21. 21© Cloudera, Inc. All rights reserved. Pankh – Building Pieces Data Sources Kafka NoSql
  • 22. 22© Cloudera, Inc. All rights reserved. Demo Time 22© Cloudera, Inc. All rights reserved.
  • 23. 25© Cloudera, Inc. All rights reserved. Kappa Architecture
  • 24. 26© Cloudera, Inc. All rights reserved. Demo Time 26© Cloudera, Inc. All rights reserved.
  • 25. 27© Cloudera, Inc. All rights reserved. Thank you Ashish Singh [email protected] @singhasdev

Editor's Notes

  • #2: Canonical architecture of real-time stream processing
  • #3: It is easy to get carried away, because real-time sounds cool. But it is totally fair if you are skeptical about the business value of real time stream processing. After all, this is not a new concept. There have been tools provided by traditional enterprise software vendors to do real-time data processing. They were formerly known as “Complex Event Processing” or CEP systems. In fact, even now the traditional vendors refer to their real-time stream processing tools as CEP systems. If CEP systems have been around, but never really took off, are we just seeing the
  • #6: Lets look at an end to end architecture of putting together open source tools to do real time stream processing. Lets start with the sources of data.
  • #7: You want to write this data to a reliable high-throughput low latency messaging system, Kafka and Flume are popular choices, but there are many options out there, like ActiveMQ, RabbitMQ,etc. Kafka is the system that is gaining the most popularity right now. ====== With this architecture, the real-time processed data only gets leveraged when the next application query comes in. But often you want to take some action based on the real-time analysis of your data. For proactive actions, write relevant events out to Kafka. Again, based on yoru stream processign engine you will find libraries that make this easy. You can have an application that is continusouly listeing on your event queue, and can issues alerts, emails, etc
  • #8: A stream processing system like Spark Streaming can then read your data streams from the messaging system. Filter Enrich or embellish your data with relevant metadata Transform Compute statistics based on moving windows of time Feature Engineering + Predictive Analytics … and much more
  • #9: Almost always, you want to take your full fidelity raw data, and put it in HDFS, or an object store if your are running in the cloud. The raw data can then be used in batch jobs where you may want to do deep complex processing that can not be done in a streaming fashion. Or you may have a team of data scientists who may want to explore the data and uncover new insights. Why the dotted line: how you dump your data to HDFS depends on your messaging system. Almost all messaging systems will provide a way to transfer your data to HDFS
  • #10: All this real-time processing is great, but not very useful if you can not serve the processed data to your application in real-time. Your need a system that can enable a lot of fast reads and writes. That is where NoSql stores come in. There are many choices here. Hbase, Cassnadra and MongoDb are popular choices. All those end applications Also, for most stream procsssing engine and NoSql store pairs, there are libraries available that make it easy to read from or write to your NoSql store from the stream processing engine: for example, the SparkOnHbase library makes it easy to write to Hbase from spark streamign jobs.
  • #11: Another common scenario is indexing your data, in real-time, into a search system. This is great if the data your are dealing with is textual data. There are libararies that enable real-time indexing of your data in your stream proocessing engine, and writing it to a Search Engine.
  • #12: Now the data is ready to be queried by your application. This is a very common and popular architecture, and I am guessing this is in keeping with what most of you would have expected.
  • #13: Again, write your processed output to HDFS. Again, why the dotter arrow. Weather or not you need to dump data to HDFS depends upon your serving system of choice. If you write it to Hbase, you may not need to duplicate it in HDFS. But if you are indexing the data in search or writing to a system like Redis, you may want to also write the processed otuptut to HDFS. Why? If nothing else, for auditing purposes. Errors will happen. And you may need to go back and audit what was done in your stream processing engine. Hence, put the data in hdfs and keep it there are some amount of time.
  • #14: With this architecture, the real-time processed data only gets leveraged when the next application query comes in. But often you want to take some action based on the real-time analysis of your data. For proactive actions, write relevant events out to Kafka. Again, based on yoru stream processign engine you will find libraries that make this easy. You can have an application that is continusouly listeing on your event queue, and can issues alerts, emails, etc
  • #15: By writing it to a message queue, you enable multiple downstream applications to consume the data as its produced, including enabling furthur processing of your data with a stream processing engine. Such multi-stage architectures, where you cosnume from say Kafka, process the data, produce a new stream in Kafka, and process
  • #26: For moment like these, streaming systems provide the capability to rewind, at least they should.