Using Hazelcast in the Kappa architecture

Using Hazelcast as the Serving Layer in the
Kappa Architecture
Presented by Oliver Buckley-Salmon
June 1st 2017
Twitter: @SalmonOliver
Github: https://github.com/oliversalmon
LinkedIn: Oliver Buckley-Salmon

Introduction
• Many Industries have a combined need to view and process big and fast data
• Previously tools such as Hadoop allowed the processing of large data sets but at high latency and stream processing systems processed small amounts
of data very fast
• Recently new architectures have been suggested to combine both of these to provide a single solution for big and fast data, with a couple of the most
well known below
• Lambda Architecture
• Nathan Marz came up with the term Lambda Architecture (LA) for a generic, scalable and fault-tolerant data processing architecture, based on his experience working on
distributed data processing systems at Backtype and Twitter
• The LA aims to satisfy the needs for a robust system that is fault-tolerant, both against hardware failures and human mistakes, being able to serve a wide range of
workloads and use cases, and in which low-latency reads and updates are required
• The resulting system should be linearly scalable, and it should scale out rather than up
• Kappa Architecture
• Kappa Architecture is a simplification of Lambda Architecture
• A Kappa Architecture system is like a Lambda Architecture system with the batch processing system removed
• To replace batch processing, data is simply fed through the streaming system quickly

Benefits & Challenges with Lambda Architecture
Benefit Challenge
Real-time view & analysis of latest data Synchronisation between Speed & Batch layers
Support historical data queries & analytics Analytics only, not operational/transactional
Horizontally scalable speed layer 2 separate sub systems for microservices to read from depending on life cycle
Horizontally scalable batch layer Heavy focus on HDFS / Storage / format optimization
Allows use of Hadoop ecosystem for batch processing & analytics Unpredictable latency of batch layer
Fault tolerant

Benefits & Challenges with Kappa Architecture
Benefit Challenge
Real-time view & analysis of latest data Allowing serving layer to replay from Kafka on demand to support historical
queries on demand
Single view of data (serving layer only) Latency and reprocessing required for historical queries
Horizontally scalable serving layer Analytics only, not operational/transactional
Horizontally scalable distributed log layer (Kafka) Hard sell to convince management that Kafka log is the database!
Horizontally Stream Processing layer Kafka log sizes
Fault tolerant Doesn’t leverage Hadoop ecosystem for large scale analytics
Support historical data queries & analytics through Kafka replays into Serving
layer
Allows the Stream Computation layer to do the heavy lifting
Fewer moving parts than Lambda Architecture
Simpler programming model – everything is a stream

Hazelcast in the Kappa Architecture

Introduction to Mu Architecture
• Many Industries have a combined need to view and process big and fast data. The Lambda & Kappa architectures solve the Big & Fast data problem but only for
analytics
• Traditionally there would be two separate architectures, one for OLTP one for OLAP
• Modern software allows us to combine the two into a single platform
• No need for complex ETL or ELT
• No delay for transactional data to be available for analytics
• Real-time reactive microservices and transaction processing
• Massively horizontally scalable
• Cloud ready
• By combining big data technology with in-memory technology the Mu architecture offers all of the above in an architecture that fits on one slide

Mu Architecture Demo – Work In Progress
Follow progress or join it at https://github.com/oliversalmon/imcs-demo

Summary
• Many Industries have a combined need to view and process big and fast data
• Previously tools such as Hadoop allowed the processing of large data sets but at high latency and stream processing systems processed small amounts of data
very fast
• The Lambda & Kappa architectures allow real-time analytics
• In memory computing technology such as Hazelcast IMDG & Hazelcast Jet, combined with big data technologies, allows us to process vast volumes of
unbounded data fast
• The Mu architecture takes the best of both of the Kappa & Lambda architectures to produce a combined real-time OLTP & OLAP solution

Using Hazelcast in the Kappa architecture

More Related Content

What's hot (20)

Similar to Using Hazelcast in the Kappa architecture (20)

Recently uploaded (20)

Using Hazelcast in the Kappa architecture