SlideShare a Scribd company logo
Using Hazelcast as the Serving Layer in the
Kappa Architecture
Presented by Oliver Buckley-Salmon
June 1st 2017
Twitter: @SalmonOliver
Github: https://github.com/oliversalmon
LinkedIn: Oliver Buckley-Salmon
Introduction
• Many Industries have a combined need to view and process big and fast data
• Previously tools such as Hadoop allowed the processing of large data sets but at high latency and stream processing systems processed small amounts
of data very fast
• Recently new architectures have been suggested to combine both of these to provide a single solution for big and fast data, with a couple of the most
well known below
• Lambda Architecture
• Nathan Marz came up with the term Lambda Architecture (LA) for a generic, scalable and fault-tolerant data processing architecture, based on his experience working on
distributed data processing systems at Backtype and Twitter
• The LA aims to satisfy the needs for a robust system that is fault-tolerant, both against hardware failures and human mistakes, being able to serve a wide range of
workloads and use cases, and in which low-latency reads and updates are required
• The resulting system should be linearly scalable, and it should scale out rather than up
• Kappa Architecture
• Kappa Architecture is a simplification of Lambda Architecture
• A Kappa Architecture system is like a Lambda Architecture system with the batch processing system removed
• To replace batch processing, data is simply fed through the streaming system quickly
Lambda Architecture Overview
Benefits & Challenges with Lambda Architecture
Benefit Challenge
Real-time view & analysis of latest data Synchronisation between Speed & Batch layers
Support historical data queries & analytics Analytics only, not operational/transactional
Horizontally scalable speed layer 2 separate sub systems for microservices to read from depending on life cycle
Horizontally scalable batch layer Heavy focus on HDFS / Storage / format optimization
Allows use of Hadoop ecosystem for batch processing & analytics Unpredictable latency of batch layer
Fault tolerant
Kappa Architecture Overview
Benefits & Challenges with Kappa Architecture
Benefit Challenge
Real-time view & analysis of latest data Allowing serving layer to replay from Kafka on demand to support historical
queries on demand
Single view of data (serving layer only) Latency and reprocessing required for historical queries
Horizontally scalable serving layer Analytics only, not operational/transactional
Horizontally scalable distributed log layer (Kafka) Hard sell to convince management that Kafka log is the database!
Horizontally Stream Processing layer Kafka log sizes
Fault tolerant Doesn’t leverage Hadoop ecosystem for large scale analytics
Support historical data queries & analytics through Kafka replays into Serving
layer
Allows the Stream Computation layer to do the heavy lifting
Fewer moving parts than Lambda Architecture
Simpler programming model – everything is a stream
Hazelcast in the Kappa Architecture
Introduction to Mu Architecture
• Many Industries have a combined need to view and process big and fast data. The Lambda & Kappa architectures solve the Big & Fast data problem but only for
analytics
• Traditionally there would be two separate architectures, one for OLTP one for OLAP
• Modern software allows us to combine the two into a single platform
• No need for complex ETL or ELT
• No delay for transactional data to be available for analytics
• Real-time reactive microservices and transaction processing
• Massively horizontally scalable
• Cloud ready
• By combining big data technology with in-memory technology the Mu architecture offers all of the above in an architecture that fits on one slide
Mu Architecture Overview
Mu Architecture Demo – Work In Progress
Follow progress or join it at https://github.com/oliversalmon/imcs-demo
Summary
• Many Industries have a combined need to view and process big and fast data
• Previously tools such as Hadoop allowed the processing of large data sets but at high latency and stream processing systems processed small amounts of data
very fast
• The Lambda & Kappa architectures allow real-time analytics
• In memory computing technology such as Hazelcast IMDG & Hazelcast Jet, combined with big data technologies, allows us to process vast volumes of
unbounded data fast
• The Mu architecture takes the best of both of the Kappa & Lambda architectures to produce a combined real-time OLTP & OLAP solution

More Related Content

What's hot (20)

PPTX
The evolution of the big data platform @ Netflix (OSCON 2015)
Eva Tse
 
PDF
Extracting Insights from Data at Twitter
Prasad Wagle
 
PPTX
Lambda architecture: from zero to One
Serg Masyutin
 
PDF
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
HostedbyConfluent
 
PDF
Modern ETL Pipelines with Change Data Capture
Databricks
 
PDF
Lambda architecture
Mario Alexandro Santini
 
PPTX
Implementing the Lambda Architecture efficiently with Apache Spark
DataWorks Summit
 
PPTX
Lambda architecture with Spark
Vincent GALOPIN
 
PPTX
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...
Altan Khendup
 
PDF
It's Time To Stop Using Lambda Architecture | Yaroslav Tkachenko, Shopify
HostedbyConfluent
 
PPTX
Taboola Road To Scale With Apache Spark
tsliwowicz
 
ODP
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
HUJAK - Hrvatska udruga Java korisnika / Croatian Java User Association
 
PPTX
Cloud native data platform
Li Gao
 
PDF
Case Study: Stream Processing on AWS using Kappa Architecture
Joey Bolduc-Gilbert
 
PPTX
Gluent Extending Enterprise Applications with Hadoop
gluent.
 
ODP
Kick-Start with SMACK Stack
Knoldus Inc.
 
PDF
Introduction to Data Engineer and Data Pipeline at Credit OK
Kriangkrai Chaonithi
 
PPTX
Netflix Big Data Paris 2017
Jason Flittner
 
PDF
An End-to-End Spark-Based Machine Learning Stack in the Hybrid Cloud with Far...
Databricks
 
PPTX
Getting It Right Exactly Once: Principles for Streaming Architectures
SingleStore
 
The evolution of the big data platform @ Netflix (OSCON 2015)
Eva Tse
 
Extracting Insights from Data at Twitter
Prasad Wagle
 
Lambda architecture: from zero to One
Serg Masyutin
 
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
HostedbyConfluent
 
Modern ETL Pipelines with Change Data Capture
Databricks
 
Lambda architecture
Mario Alexandro Santini
 
Implementing the Lambda Architecture efficiently with Apache Spark
DataWorks Summit
 
Lambda architecture with Spark
Vincent GALOPIN
 
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...
Altan Khendup
 
It's Time To Stop Using Lambda Architecture | Yaroslav Tkachenko, Shopify
HostedbyConfluent
 
Taboola Road To Scale With Apache Spark
tsliwowicz
 
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
HUJAK - Hrvatska udruga Java korisnika / Croatian Java User Association
 
Cloud native data platform
Li Gao
 
Case Study: Stream Processing on AWS using Kappa Architecture
Joey Bolduc-Gilbert
 
Gluent Extending Enterprise Applications with Hadoop
gluent.
 
Kick-Start with SMACK Stack
Knoldus Inc.
 
Introduction to Data Engineer and Data Pipeline at Credit OK
Kriangkrai Chaonithi
 
Netflix Big Data Paris 2017
Jason Flittner
 
An End-to-End Spark-Based Machine Learning Stack in the Hybrid Cloud with Far...
Databricks
 
Getting It Right Exactly Once: Principles for Streaming Architectures
SingleStore
 

Similar to Using Hazelcast in the Kappa architecture (20)

PPTX
Big Data_Architecture.pptx
betalab
 
PPTX
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Luan Moreno Medeiros Maciel
 
PPTX
2014 09-12 lambda-architecture-at-indix
Yu Ishikawa
 
PPTX
CouchbasetoHadoop_Matt_Michael_Justin v4
Michael Kehoe
 
PDF
Intro to Big Data
Zohar Elkayam
 
PDF
Technologies for Data Analytics Platform
N Masahiro
 
PDF
Innovation in the Data Warehouse - StampedeCon 2016
StampedeCon
 
PDF
Patterns of the Lambda Architecture -- 2015 April -- Hadoop Summit, Europe
Flip Kromer
 
PDF
Simple, Modular and Extensible Big Data Platform Concept
Satish Mohan
 
PDF
Hpc lunch and learn
John D Almon
 
PDF
Big Data Architecture Workshop - Vahid Amiri
datastack
 
PDF
Wasp2 - IoT and Streaming Platform
Paolo Platter
 
PPTX
Data engineering
Parimala Killada
 
PPTX
Big data applications
Juan Pablo Paz Grau, Ph.D., PMP
 
PDF
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
HostedbyConfluent
 
PDF
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
AboutYouGmbH
 
PPTX
Hadoop.pptx
sonukumar379092
 
PPTX
List of Engineering Colleges in Uttarakhand
Roorkee College of Engineering, Roorkee
 
PPTX
Hadoop.pptx
arslanhaneef
 
PPTX
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Mike Percy
 
Big Data_Architecture.pptx
betalab
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Luan Moreno Medeiros Maciel
 
2014 09-12 lambda-architecture-at-indix
Yu Ishikawa
 
CouchbasetoHadoop_Matt_Michael_Justin v4
Michael Kehoe
 
Intro to Big Data
Zohar Elkayam
 
Technologies for Data Analytics Platform
N Masahiro
 
Innovation in the Data Warehouse - StampedeCon 2016
StampedeCon
 
Patterns of the Lambda Architecture -- 2015 April -- Hadoop Summit, Europe
Flip Kromer
 
Simple, Modular and Extensible Big Data Platform Concept
Satish Mohan
 
Hpc lunch and learn
John D Almon
 
Big Data Architecture Workshop - Vahid Amiri
datastack
 
Wasp2 - IoT and Streaming Platform
Paolo Platter
 
Data engineering
Parimala Killada
 
Big data applications
Juan Pablo Paz Grau, Ph.D., PMP
 
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
HostedbyConfluent
 
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
AboutYouGmbH
 
Hadoop.pptx
sonukumar379092
 
List of Engineering Colleges in Uttarakhand
Roorkee College of Engineering, Roorkee
 
Hadoop.pptx
arslanhaneef
 
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Mike Percy
 
Ad

Recently uploaded (20)

PDF
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
PDF
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
PDF
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
PDF
[Solution] Why Choose the VeryPDF DRM Protector Custom-Built Solution for You...
Lingwen1998
 
PDF
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
PPTX
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
PDF
Open Chain Q2 Steering Committee Meeting - 2025-06-25
Shane Coughlan
 
PDF
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
PPTX
Help for Correlations in IBM SPSS Statistics.pptx
Version 1 Analytics
 
PPTX
OpenChain @ OSS NA - In From the Cold: Open Source as Part of Mainstream Soft...
Shane Coughlan
 
PDF
Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...
VictoriaMetrics
 
PPTX
Tally_Basic_Operations_Presentation.pptx
AditiBansal54083
 
PPTX
Agentic Automation Journey Series Day 2 – Prompt Engineering for UiPath Agents
klpathrudu
 
PPTX
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
PDF
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
PPTX
Finding Your License Details in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
4K Video Downloader Plus Pro Crack for MacOS New Download 2025
bashirkhan333g
 
PPTX
Milwaukee Marketo User Group - Summer Road Trip: Mapping and Personalizing Yo...
bbedford2
 
PDF
Driver Easy Pro 6.1.1 Crack Licensce key 2025 FREE
utfefguu
 
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
[Solution] Why Choose the VeryPDF DRM Protector Custom-Built Solution for You...
Lingwen1998
 
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
Open Chain Q2 Steering Committee Meeting - 2025-06-25
Shane Coughlan
 
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
Help for Correlations in IBM SPSS Statistics.pptx
Version 1 Analytics
 
OpenChain @ OSS NA - In From the Cold: Open Source as Part of Mainstream Soft...
Shane Coughlan
 
Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...
VictoriaMetrics
 
Tally_Basic_Operations_Presentation.pptx
AditiBansal54083
 
Agentic Automation Journey Series Day 2 – Prompt Engineering for UiPath Agents
klpathrudu
 
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
Finding Your License Details in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
4K Video Downloader Plus Pro Crack for MacOS New Download 2025
bashirkhan333g
 
Milwaukee Marketo User Group - Summer Road Trip: Mapping and Personalizing Yo...
bbedford2
 
Driver Easy Pro 6.1.1 Crack Licensce key 2025 FREE
utfefguu
 
Ad

Using Hazelcast in the Kappa architecture

  • 1. Using Hazelcast as the Serving Layer in the Kappa Architecture Presented by Oliver Buckley-Salmon June 1st 2017 Twitter: @SalmonOliver Github: https://github.com/oliversalmon LinkedIn: Oliver Buckley-Salmon
  • 2. Introduction • Many Industries have a combined need to view and process big and fast data • Previously tools such as Hadoop allowed the processing of large data sets but at high latency and stream processing systems processed small amounts of data very fast • Recently new architectures have been suggested to combine both of these to provide a single solution for big and fast data, with a couple of the most well known below • Lambda Architecture • Nathan Marz came up with the term Lambda Architecture (LA) for a generic, scalable and fault-tolerant data processing architecture, based on his experience working on distributed data processing systems at Backtype and Twitter • The LA aims to satisfy the needs for a robust system that is fault-tolerant, both against hardware failures and human mistakes, being able to serve a wide range of workloads and use cases, and in which low-latency reads and updates are required • The resulting system should be linearly scalable, and it should scale out rather than up • Kappa Architecture • Kappa Architecture is a simplification of Lambda Architecture • A Kappa Architecture system is like a Lambda Architecture system with the batch processing system removed • To replace batch processing, data is simply fed through the streaming system quickly
  • 4. Benefits & Challenges with Lambda Architecture Benefit Challenge Real-time view & analysis of latest data Synchronisation between Speed & Batch layers Support historical data queries & analytics Analytics only, not operational/transactional Horizontally scalable speed layer 2 separate sub systems for microservices to read from depending on life cycle Horizontally scalable batch layer Heavy focus on HDFS / Storage / format optimization Allows use of Hadoop ecosystem for batch processing & analytics Unpredictable latency of batch layer Fault tolerant
  • 6. Benefits & Challenges with Kappa Architecture Benefit Challenge Real-time view & analysis of latest data Allowing serving layer to replay from Kafka on demand to support historical queries on demand Single view of data (serving layer only) Latency and reprocessing required for historical queries Horizontally scalable serving layer Analytics only, not operational/transactional Horizontally scalable distributed log layer (Kafka) Hard sell to convince management that Kafka log is the database! Horizontally Stream Processing layer Kafka log sizes Fault tolerant Doesn’t leverage Hadoop ecosystem for large scale analytics Support historical data queries & analytics through Kafka replays into Serving layer Allows the Stream Computation layer to do the heavy lifting Fewer moving parts than Lambda Architecture Simpler programming model – everything is a stream
  • 7. Hazelcast in the Kappa Architecture
  • 8. Introduction to Mu Architecture • Many Industries have a combined need to view and process big and fast data. The Lambda & Kappa architectures solve the Big & Fast data problem but only for analytics • Traditionally there would be two separate architectures, one for OLTP one for OLAP • Modern software allows us to combine the two into a single platform • No need for complex ETL or ELT • No delay for transactional data to be available for analytics • Real-time reactive microservices and transaction processing • Massively horizontally scalable • Cloud ready • By combining big data technology with in-memory technology the Mu architecture offers all of the above in an architecture that fits on one slide
  • 10. Mu Architecture Demo – Work In Progress Follow progress or join it at https://github.com/oliversalmon/imcs-demo
  • 11. Summary • Many Industries have a combined need to view and process big and fast data • Previously tools such as Hadoop allowed the processing of large data sets but at high latency and stream processing systems processed small amounts of data very fast • The Lambda & Kappa architectures allow real-time analytics • In memory computing technology such as Hazelcast IMDG & Hazelcast Jet, combined with big data technologies, allows us to process vast volumes of unbounded data fast • The Mu architecture takes the best of both of the Kappa & Lambda architectures to produce a combined real-time OLTP & OLAP solution