SlideShare a Scribd company logo
1
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
August 13, 2022 at DataConLA
Real Time Data Streaming with Kafka
Speaker:
Jie Chen
Manager Advisory
Engineering Architect
LinkedIn
2
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Agenda
Kafka at a Glance
Kafka Use Cases
Key Takeaways
Q&A
Intelligent Forecast System
Kafka in Banking
Distributed Data with CQRS
5
min
20
min
5
min
10
min
3
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Kafka at a Glance
4
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Kafka in the Market
CORE CAPABILITIES
Scalable
Scale production clusters up to a
thousand brokers, trillion of
messages per day, petabytes of
data, hundreds of thousands of
partitions. Elastically expand and
contract storage and processing.
High Throughput
Deliver messages at network limited
throughput using a cluster of machines
with latencies as low as 2ms
Permanent Storage
Store streams of data safely in a
distributed, durable, fault tolerant cluster
High Availability
Stretch clusters efficiently over availability
zones or connect separate clusters across
geographic regions
Source: kafka.apache.org
5
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Kafka Platform Overview
Event Streaming Platform
Distributed streaming platform
that enables real-time, event-
driven applications using a topic-
based pub-sub model
Performance at Scale
Kafka operates as a highly-
available and fault-tolerant
cluster that spans servers and
even data centers with a
partitioning system that supports
data volumes of practically any
size
https://docs.confluent.io/
6
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
What is Event Driven Streaming with Kafka
ETL
Raw Message Queue
Change Data Capture
Mainframe
Customed
Topic
Partition
Partition
Partition
Brokers (Servers) Web
Mobile
Data Warehouse
Monitor Tool
Partners
Subscribing
Publishing
Data Draining
Producers Consumers
Kafka Cluster
An event is a type of data that describes the entity’s observable state updates over time (Definition by IBM)
For example, first time user registration, payment, social media post etc.
7
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Distributed Data with
CQRS and Kafka
8
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Distributed Data with CQRS and Kafka - CQRS at a Glance
Overview
Command Query
Responsibility Segregation
Read and write workloads are
separated, decoupled, and
scaled independently.
Event Sourcing
CQRS is often linked with event
sourcing – Effectively viewing
data state as a series of discrete
events.
Event Sourcing is an approach to handling operations on data that's driven by a sequence of events, each of which is recorded in an append-only store
(Defined by Microsoft). For example, placing an online order, returning the order under the same user account.
9
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Distributed Data with CQRS and Kafka - Traditional Design
Difficult to Scale
SOR must be able to support the load
of all clients and systems. Read
replicas can improve scalability.
Single Point of Failure
If SOR or API layer is unavailable, all
consumers may be affected
Rigid
All access to SOR data flows through
centralized APIs. Consumers receive
data in the schemas set up by access
layer.
Difficult to Manipulate Data
Data access to SOR directly is
restricted. Transforms, joins, and
analytical operations may be difficult
and rely on lagging ETL operations
Client: external facing UI, third party apis
System: internal facing ETL, mainframe
SOR: System of Record (the authoritative data source)
10
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Distributed Data with CQRS and Kafka - CQRS Design
Data Changes as Events
Current state of SOR is captured
through an event format
Consumer Subscribe to
Changes
Consumers listen to data event
changes and consume the information
according to their own use case
Other Systems Act on Data
Systems act on data updates as
defined by use case. Systems may
replicate the data, enrich the data, or
simply process events in real-time
Read / Write Separation
Data read is segregated from data
write. Read only consumers introduce
no additional load to SOR.
11
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Distributed Data with CQRS and Kafka Advantages and Challenges
Independent Scaling
Read and write workloads may
be scaled independently based
on load and access patterns
Separation of Concerns
Segregated models allow for
tightly controlled write logic while
permitting flexibility in read
models and stream processing
System Isolation
Access to the SOR database is
restricted to a controlled write
API. Consumers may safely read
from a replica
Flexible Consumption
Kafka’s scalable architecture
allows for consumers to process
events differently across systems
at different velocities
Eventual Consistency
Reads will be eventually
consistent and may have some
delay until writes have
propagated through the system
Complexity
Implementation of the pattern
increases complexity of the
overall solution
Different Data Velocity
Consumers may process events
at different velocities, resulting in
inconsistencies across systems
Advantages Challenges
12
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Distributed Data with CQRS and Kafka - Common Scenarios
Complex Data Operations Across
Systems
Different systems need to
process and transform data in
complex and evolving use cases.
Real-time Data Processing
Across Systems
Traditional ETL and batch
operations are too slow and rigid
to meet evolving business
requirements. Organization
seeks to process data in real-
time as it becomes available
across different systems.
Resource Bottlenecks with
Growing Demand
Traditional data system
resources are strained and
unable to support growing
demands of business.
Scenarios to Consider
Data Security Concerns Across
Systems
Data must be shared securely
across systems without
introducing new security risks.
Increased Demand for Data
Sharing Across Enterprise
Enterprise seeks to break down
data silos and share data
effectively across the
organization increase synergy
between systems.
13
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Intelligent Forecast
with Kafka
14
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Intelligent Forecast with Native Kafka Solution
ELK Stack
Elasticsearch
Storage
Kafka Connector API
Indexing
ETL
Raw Message Queue
Change Data Capture
Mainframe
Customed
Producers Consumers
Kafka Cluster
Publishing Subscribing
Data
Draining
Kafka's role in this solution is to publish the data from the different channels as the categorized topics; Through Kafka connector APIs
(connector replicators), the ELK Stack subscribe to the specified topics. This Pub/Sub is also called event streaming. The
customized data can then be rendered through Kibana dashboard.
15
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Challenges
Kafka Connector
Similar open source solutions like MirrorMaker, uReplicator by Uber, Mirus by Salesforce can be alternatives to
tackle the scalability bottleneck while reducing the licensing cost.
PII encryption
While considering Kafka security library and in house solution, it is important to establish the early PII governance
among producers, Kafka cluster and consumers. In other words, who is responsible for masking the sensitive data
throughout the real data streaming pipeline.
Intelligent Forecast with Native Kafka Solution
Key Design
Pub/Sub, decoupled
and asynchronous
messaging service
for scalability
Equivalent Solutions
Azure Event hub
Google Pub/Sub
AWS Kinesis
Proactive Analytics
in use cases such as the capability of detecting
and forecasting the abnormal trend outside of
the threshold: transaction fraud at ATMs and
restaurant mobile orders.
16
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Kafka in Banking
What a Banking Institute’s need to modernize its Legacy System
A Banking Institute has been looking to migrate its legacy system to modern technologies that accelerate fast growing demand
in big data through building a modern data streaming platform as part of Business Operation Brain (BOB).
Reuse the existing data centers, storage, infrastructure and security procedures
Scalable and reliable (million transactions/events per second) with the existing infrastructure
Data must be logged for transaction tracing and auditing (For example, Change Data Capture)
What options we have: Kafka and Its Comparables in the marketplace
Not an inclusive options
AWS Kinesis
Open Source, On Prem Managed Cloud Computing
Proprietary
Open Source, On Prem or Managed
Cloud Computing
©
2
0
2
2
K
P
M
G
L
L
P
,
a
D
e
l
a
w
a
r
e
l
i
m
i
t
e
d
l
i
a
b
i
l
i
t
y
p
a
r
t
n
e
r
s
h
i
p
a
n
d
a
m
e
m
b
e
r
f
i
r
m
o
f
t
h
e
K
P
M
G
g
l
o
1
9
Apache Kafka Rabbit MQ
Operation
Cost
Messaging
Immutable, ordered,
replay; User defined
retention policy
Queue/Message index attached with
TTL; Messages are removed once
consumed
Storage
Persistent storage offers
durability and reliability;
Append log
Scalability
Horizonal Scale, Scale Out,
adding more machines to
increase disk I/O
Vertical Scale, Scale up,
adding more CPU, RAM to the
existing machine/hardware
Up to 365 days
Identify KPIs When Evaluating the Options
Autoscaling
Security Customized,
Manual Configuration
Native Cloud Solution Customized,
Manual Configuration
Pay as you go,
Elastic and durable
Messages are removed once
consumed; In memory is preferred
20
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Key Takeaways
21
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Key Takeaways
CQRS Pattern with Kafka
Use the scale, speed, and reliability of
Kafka as the backbone for an
eventually-consistent distributed data
solutions that allows flexible
consumption models and independent
scaling.
Kafka in Banking
Objectively select the metrics for the
business use case. Design the data
streaming solution that is ready to
scale.
Intelligent Forecast with Kafka
To reap the scalability benefit, design
the Kafka connector solution for future
business growth. PII must be encrypted
throughout Kafka pipeline and
automated.
22
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Q&A

More Related Content

Similar to Data Con LA 2022 - Data Streaming with Kafka (20)

PPTX
The Streaming Assessment – An Introduction
confluent
 
PPTX
Streaming Data and Stream Processing with Apache Kafka
confluent
 
PDF
Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...
HostedbyConfluent
 
PDF
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
confluent
 
PDF
BBL KAPPA Lesfurets.com
Cedric Vidal
 
PDF
EDA Meets Data Engineering – What's the Big Deal?
confluent
 
PDF
kafka-tutorial-cloudruable-v2.pdf
PriyamTomar1
 
PDF
Apache Kafka Use Cases_ When To Use It_ When Not To Use_.pdf
Noman Shaikh
 
PDF
Kafka In Action Meap V12 Meap Dylan D Scott Viktor Gamov Dave Klein
gygerurwind8
 
PPTX
Big Data Analytics_basic introduction of Kafka.pptx
khareamit369
 
PDF
Evolving from Messaging to Event Streaming
confluent
 
PDF
Apache Kafka as Event Streaming Platform for Microservice Architectures
Kai Wähner
 
PDF
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Helena Edelson
 
PPTX
Kafka for data scientists
Jenn Rawlins
 
PDF
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Spark Summit
 
PPTX
Kafka Tutorial - introduction to the Kafka streaming platform
Jean-Paul Azar
 
PDF
Kafka in Action MEAP V12 Dylan D Scott Viktor Gamov Dave Klein
biruktresehb
 
PDF
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
DataStax Academy
 
PPTX
Westpac Bank Tech Talk 1: Dive into Apache Kafka
confluent
 
PDF
Apache Kafka® Use Cases for Financial Services
confluent
 
The Streaming Assessment – An Introduction
confluent
 
Streaming Data and Stream Processing with Apache Kafka
confluent
 
Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...
HostedbyConfluent
 
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
confluent
 
BBL KAPPA Lesfurets.com
Cedric Vidal
 
EDA Meets Data Engineering – What's the Big Deal?
confluent
 
kafka-tutorial-cloudruable-v2.pdf
PriyamTomar1
 
Apache Kafka Use Cases_ When To Use It_ When Not To Use_.pdf
Noman Shaikh
 
Kafka In Action Meap V12 Meap Dylan D Scott Viktor Gamov Dave Klein
gygerurwind8
 
Big Data Analytics_basic introduction of Kafka.pptx
khareamit369
 
Evolving from Messaging to Event Streaming
confluent
 
Apache Kafka as Event Streaming Platform for Microservice Architectures
Kai Wähner
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Helena Edelson
 
Kafka for data scientists
Jenn Rawlins
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Spark Summit
 
Kafka Tutorial - introduction to the Kafka streaming platform
Jean-Paul Azar
 
Kafka in Action MEAP V12 Dylan D Scott Viktor Gamov Dave Klein
biruktresehb
 
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
DataStax Academy
 
Westpac Bank Tech Talk 1: Dive into Apache Kafka
confluent
 
Apache Kafka® Use Cases for Financial Services
confluent
 

More from Data Con LA (20)

PPTX
Data Con LA 2022 Keynotes
Data Con LA
 
PPTX
Data Con LA 2022 Keynotes
Data Con LA
 
PDF
Data Con LA 2022 Keynote
Data Con LA
 
PPTX
Data Con LA 2022 - Startup Showcase
Data Con LA
 
PPTX
Data Con LA 2022 Keynote
Data Con LA
 
PDF
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA
 
PPTX
Data Con LA 2022 - AI Ethics
Data Con LA
 
PDF
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA
 
PDF
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA
 
PDF
Data Con LA 2022 - Real world consumer segmentation
Data Con LA
 
PPTX
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA
 
PPTX
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA
 
PDF
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA
 
PDF
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA
 
PDF
Data Con LA 2022 - Intro to Data Science
Data Con LA
 
PDF
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA
 
PPTX
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA
 
PPTX
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA
 
PPTX
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA
 
PPTX
Data Con LA 2022 - Building Field-level Lineage from Scratch for Modern Data ...
Data Con LA
 
Data Con LA 2022 Keynotes
Data Con LA
 
Data Con LA 2022 Keynotes
Data Con LA
 
Data Con LA 2022 Keynote
Data Con LA
 
Data Con LA 2022 - Startup Showcase
Data Con LA
 
Data Con LA 2022 Keynote
Data Con LA
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA
 
Data Con LA 2022 - AI Ethics
Data Con LA
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA
 
Data Con LA 2022 - Intro to Data Science
Data Con LA
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA
 
Data Con LA 2022 - Building Field-level Lineage from Scratch for Modern Data ...
Data Con LA
 
Ad

Recently uploaded (20)

PPTX
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
PDF
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
PDF
Merits and Demerits of DBMS over File System & 3-Tier Architecture in DBMS
MD RIZWAN MOLLA
 
PPT
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
PPTX
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
PDF
R Cookbook - Processing and Manipulating Geological spatial data with R.pdf
OtnielSimopiaref2
 
PDF
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
PPTX
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
PPTX
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
PDF
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
PPTX
AI Presentation Tool Pitch Deck Presentation.pptx
ShyamPanthavoor1
 
PDF
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
PPT
AI Future trends and opportunities_oct7v1.ppt
SHIKHAKMEHTA
 
PDF
Choosing the Right Database for Indexing.pdf
Tamanna
 
PDF
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
PPTX
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
PPTX
ER_Model_Relationship_in_DBMS_Presentation.pptx
dharaadhvaryu1992
 
PDF
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
PPTX
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
PPTX
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
Merits and Demerits of DBMS over File System & 3-Tier Architecture in DBMS
MD RIZWAN MOLLA
 
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
R Cookbook - Processing and Manipulating Geological spatial data with R.pdf
OtnielSimopiaref2
 
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
AI Presentation Tool Pitch Deck Presentation.pptx
ShyamPanthavoor1
 
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
AI Future trends and opportunities_oct7v1.ppt
SHIKHAKMEHTA
 
Choosing the Right Database for Indexing.pdf
Tamanna
 
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
ER_Model_Relationship_in_DBMS_Presentation.pptx
dharaadhvaryu1992
 
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
Ad

Data Con LA 2022 - Data Streaming with Kafka

  • 1. 1 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. August 13, 2022 at DataConLA Real Time Data Streaming with Kafka Speaker: Jie Chen Manager Advisory Engineering Architect LinkedIn
  • 2. 2 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. Agenda Kafka at a Glance Kafka Use Cases Key Takeaways Q&A Intelligent Forecast System Kafka in Banking Distributed Data with CQRS 5 min 20 min 5 min 10 min
  • 3. 3 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. Kafka at a Glance
  • 4. 4 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. Kafka in the Market CORE CAPABILITIES Scalable Scale production clusters up to a thousand brokers, trillion of messages per day, petabytes of data, hundreds of thousands of partitions. Elastically expand and contract storage and processing. High Throughput Deliver messages at network limited throughput using a cluster of machines with latencies as low as 2ms Permanent Storage Store streams of data safely in a distributed, durable, fault tolerant cluster High Availability Stretch clusters efficiently over availability zones or connect separate clusters across geographic regions Source: kafka.apache.org
  • 5. 5 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. Kafka Platform Overview Event Streaming Platform Distributed streaming platform that enables real-time, event- driven applications using a topic- based pub-sub model Performance at Scale Kafka operates as a highly- available and fault-tolerant cluster that spans servers and even data centers with a partitioning system that supports data volumes of practically any size https://docs.confluent.io/
  • 6. 6 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. What is Event Driven Streaming with Kafka ETL Raw Message Queue Change Data Capture Mainframe Customed Topic Partition Partition Partition Brokers (Servers) Web Mobile Data Warehouse Monitor Tool Partners Subscribing Publishing Data Draining Producers Consumers Kafka Cluster An event is a type of data that describes the entity’s observable state updates over time (Definition by IBM) For example, first time user registration, payment, social media post etc.
  • 7. 7 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. Distributed Data with CQRS and Kafka
  • 8. 8 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. Distributed Data with CQRS and Kafka - CQRS at a Glance Overview Command Query Responsibility Segregation Read and write workloads are separated, decoupled, and scaled independently. Event Sourcing CQRS is often linked with event sourcing – Effectively viewing data state as a series of discrete events. Event Sourcing is an approach to handling operations on data that's driven by a sequence of events, each of which is recorded in an append-only store (Defined by Microsoft). For example, placing an online order, returning the order under the same user account.
  • 9. 9 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. Distributed Data with CQRS and Kafka - Traditional Design Difficult to Scale SOR must be able to support the load of all clients and systems. Read replicas can improve scalability. Single Point of Failure If SOR or API layer is unavailable, all consumers may be affected Rigid All access to SOR data flows through centralized APIs. Consumers receive data in the schemas set up by access layer. Difficult to Manipulate Data Data access to SOR directly is restricted. Transforms, joins, and analytical operations may be difficult and rely on lagging ETL operations Client: external facing UI, third party apis System: internal facing ETL, mainframe SOR: System of Record (the authoritative data source)
  • 10. 10 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. Distributed Data with CQRS and Kafka - CQRS Design Data Changes as Events Current state of SOR is captured through an event format Consumer Subscribe to Changes Consumers listen to data event changes and consume the information according to their own use case Other Systems Act on Data Systems act on data updates as defined by use case. Systems may replicate the data, enrich the data, or simply process events in real-time Read / Write Separation Data read is segregated from data write. Read only consumers introduce no additional load to SOR.
  • 11. 11 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. Distributed Data with CQRS and Kafka Advantages and Challenges Independent Scaling Read and write workloads may be scaled independently based on load and access patterns Separation of Concerns Segregated models allow for tightly controlled write logic while permitting flexibility in read models and stream processing System Isolation Access to the SOR database is restricted to a controlled write API. Consumers may safely read from a replica Flexible Consumption Kafka’s scalable architecture allows for consumers to process events differently across systems at different velocities Eventual Consistency Reads will be eventually consistent and may have some delay until writes have propagated through the system Complexity Implementation of the pattern increases complexity of the overall solution Different Data Velocity Consumers may process events at different velocities, resulting in inconsistencies across systems Advantages Challenges
  • 12. 12 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. Distributed Data with CQRS and Kafka - Common Scenarios Complex Data Operations Across Systems Different systems need to process and transform data in complex and evolving use cases. Real-time Data Processing Across Systems Traditional ETL and batch operations are too slow and rigid to meet evolving business requirements. Organization seeks to process data in real- time as it becomes available across different systems. Resource Bottlenecks with Growing Demand Traditional data system resources are strained and unable to support growing demands of business. Scenarios to Consider Data Security Concerns Across Systems Data must be shared securely across systems without introducing new security risks. Increased Demand for Data Sharing Across Enterprise Enterprise seeks to break down data silos and share data effectively across the organization increase synergy between systems.
  • 13. 13 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. Intelligent Forecast with Kafka
  • 14. 14 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. Intelligent Forecast with Native Kafka Solution ELK Stack Elasticsearch Storage Kafka Connector API Indexing ETL Raw Message Queue Change Data Capture Mainframe Customed Producers Consumers Kafka Cluster Publishing Subscribing Data Draining Kafka's role in this solution is to publish the data from the different channels as the categorized topics; Through Kafka connector APIs (connector replicators), the ELK Stack subscribe to the specified topics. This Pub/Sub is also called event streaming. The customized data can then be rendered through Kibana dashboard.
  • 15. 15 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. Challenges Kafka Connector Similar open source solutions like MirrorMaker, uReplicator by Uber, Mirus by Salesforce can be alternatives to tackle the scalability bottleneck while reducing the licensing cost. PII encryption While considering Kafka security library and in house solution, it is important to establish the early PII governance among producers, Kafka cluster and consumers. In other words, who is responsible for masking the sensitive data throughout the real data streaming pipeline. Intelligent Forecast with Native Kafka Solution Key Design Pub/Sub, decoupled and asynchronous messaging service for scalability Equivalent Solutions Azure Event hub Google Pub/Sub AWS Kinesis Proactive Analytics in use cases such as the capability of detecting and forecasting the abnormal trend outside of the threshold: transaction fraud at ATMs and restaurant mobile orders.
  • 16. 16 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. Kafka in Banking
  • 17. What a Banking Institute’s need to modernize its Legacy System A Banking Institute has been looking to migrate its legacy system to modern technologies that accelerate fast growing demand in big data through building a modern data streaming platform as part of Business Operation Brain (BOB). Reuse the existing data centers, storage, infrastructure and security procedures Scalable and reliable (million transactions/events per second) with the existing infrastructure Data must be logged for transaction tracing and auditing (For example, Change Data Capture)
  • 18. What options we have: Kafka and Its Comparables in the marketplace Not an inclusive options
  • 19. AWS Kinesis Open Source, On Prem Managed Cloud Computing Proprietary Open Source, On Prem or Managed Cloud Computing © 2 0 2 2 K P M G L L P , a D e l a w a r e l i m i t e d l i a b i l i t y p a r t n e r s h i p a n d a m e m b e r f i r m o f t h e K P M G g l o 1 9 Apache Kafka Rabbit MQ Operation Cost Messaging Immutable, ordered, replay; User defined retention policy Queue/Message index attached with TTL; Messages are removed once consumed Storage Persistent storage offers durability and reliability; Append log Scalability Horizonal Scale, Scale Out, adding more machines to increase disk I/O Vertical Scale, Scale up, adding more CPU, RAM to the existing machine/hardware Up to 365 days Identify KPIs When Evaluating the Options Autoscaling Security Customized, Manual Configuration Native Cloud Solution Customized, Manual Configuration Pay as you go, Elastic and durable Messages are removed once consumed; In memory is preferred
  • 20. 20 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. Key Takeaways
  • 21. 21 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. Key Takeaways CQRS Pattern with Kafka Use the scale, speed, and reliability of Kafka as the backbone for an eventually-consistent distributed data solutions that allows flexible consumption models and independent scaling. Kafka in Banking Objectively select the metrics for the business use case. Design the data streaming solution that is ready to scale. Intelligent Forecast with Kafka To reap the scalability benefit, design the Kafka connector solution for future business growth. PII must be encrypted throughout Kafka pipeline and automated.
  • 22. 22 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. Q&A