SlideShare a Scribd company logo
diciembre 2010
Kappa Architecture
Our Experience
Who am I
CDO ASPgems
Former President of Hispalinux (Spanish
LUG)
Author “La Pastilla Roja” first spanish book
about Free Software.
Menu
A little context about Kappa Architecture
What’s Kappa Architecture
What is not Kappa Architecture
How we implement it
Real use cases with KA
A little context
July 2, 2014 Jay Kreps coined the term
Kappa Architecture in an article for
O’reilly Radar
Who is Jay Kreps
Jay has been involved in lots of projects:
Author of the essay:
The Log: What every software engineer
should know about real-time data's
unifying abstraction (12/16/2013)
https://engineering.linkedin.com/distributed-systems/log-what-every-software-
engineer-should-know-about-real-time-datas-unifying
Jay Kreps
Author of the book: I ♥ Logs
Jay Kreps
Involved with projects as:
Apache Kafka
Apache Samza
Voldemort
Azkaban
Ex-Linkedin
Now co-founder and CEO of Confluent
Lambda Architecture
Look something like this:
https://www.mapr.com/developercentral/lambda-architecture
Lambda Architecture
Batch layer that provides the following
functionality
managing the master dataset, an
immutable, append-only set of raw
data.
pre-computing arbitrary query
functions, called batch views.
https://www.mapr.com/developercentral/lambda-architecture
Lambda Architecture
Serving layer
This layer indexes the batch views so
that they can be queried in ad hoc
with low latency.
Speed layer
This layer accommodates all requests
that are subject to low latency
requirements. Using fast and
incremental algorithms, the speed
layer deals with recent data only.
Lambda Architecture
batch layer datasets can be in a distributed
filesystem, while MapReduce can be used to create
batch views that can be fed to the serving layer.
The serving layer can be implemented using NoSQL
technologies such as HBase,Apache Druid, etc.
Querying can be implemented by technologies such as
Apache Drill or Impala
Speed layer can be realized with data streaming
technologies such as Apache Storm or Spark Streaming
https://www.mapr.com/developercentral/lambda-architecture
Pros of Lambda
Architecture
Retain the input data unchanged.
Think about modeling data transformations,
series of data states from the original input.
Lambda architecture take in account the problem
of reprocessing data.
this happens all the time, the code will
change, and you will need to reprocess all the
information. Lots of reasons and you will need
to live with this.
Cons of Lambda
Architecture
Maintain the code that need to produce the same
result from two complex distributed system is
painful.
Very different code for MapReduce and Storm/
Apache Spark
Not only is about different code, is also about
debugging and interaction with other products like
(hive, Oozie, Cascading, etc)
At the end is a problem about different and
diverging programming paradigms.
So what is Kappa
Architecture
The proposal of Jay Kreps is so simple:
Use kafka (or other system) that will let you
retain the full log of the data you need to
reprocess.
When you want to do the reprocessing, start a
second instance of your stream processing job
that starts processing from the beginning of
the retained data, but direct this output data to
a new output table.
So what is Kappa
Architecture
part II
When the second job has caught up, switch the
application to read from the new table.
Stop the old version of the job, and delete the
old output table.
So what is Kappa
Architecture
This architecture looks something like this:
So what is Kappa
Architecture
The first benefit is that only you need to
reprocessing only when you change the code.
You can check if the new version is working ok and
if not reverse to the old output table.
You can mirror a Kafka topic to HDFS so you are
not limited to the Kafka retention configuration.
You have only a code to maintain with an unique
framework.
So what is Kappa
Architecture
The real advantage is not about efficiency at all
(You will need extra temporarily storage when
reprocessing for example) is allowing your team
to develop, test, debug and operate their systems
on top of a single processing framework.
What is not Kappa
Architecture
Is not a silver bullet to solve every problem at
Big Data.
Is not a list of prescriptions of technologies. You
can implement with your favorite frameworks.
Is not a rigid set of rules. But helps to maintain
the complex projects simple.
How we use Kappa
Architecture
We start working with projects with a complex
structure like Linkedin looks at early stage.
That’s very usual.
How we use Kappa
Architecture
How we use Kappa
Architecture
We try to refactoring the data flows to fix in a
Kappa Architecture.
How we use Kappa
Architecture
How we use Kappa
Architecture
We use Kafka as Stream Data Platform
Instead of Samza we feel more comfortable with
Spark Streaming.
At ASPGems we choose Apache Spark as our
Analytics Engine and not only for Spark
Streaming.
How we use Kappa
Architecture
At the end, Kappa Architecture is design pattern
for us.
We use/clone this pattern in almost our projects.
We have projects of every size, volume of data
or speed needing and fix with the Kappa
Architecture.
Use Cases
Telefónica - MSS
We use KA to calculate near real time KPIs,
SLAs related with the managed security system.
We simplify the data flow of the input data.
Kafka in the streaming data platform.
As MPP we use CassandraDB.
IOT - OBD II
One of our clients install On Board Devices in
the cars of its customers.
We implement an API to got all the information
in real time and inject the information in Kafka.
The business rules are implemented in a CEP
running into Apache Spark Streaming.
As MPP we use Elastic Search.
Insurance Company
We implement Kappa Architecture to process
click stream in real time and clustering users
We show content and offers that better fix users
Energy Facility
We implement Kappa Architecture to process
and predict energy consume.
Our customer include energy storage systems
and we got all the information about energy
storage (ultra-capacitors and batteries).
We process this information to calculate the
effective lifetime of the components and its
degradation.
Questions
diciembre 2010
Thank you
Juantomás García
juantomas@aspgems.com
@juantomas

More Related Content

What's hot (20)

PDF
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Wes McKinney
 
PDF
Fundamentals of Apache Kafka
Chhavi Parasher
 
PPTX
Kafka presentation
Mohammed Fazuluddin
 
PPTX
Introduction to Apache Kafka
Jeff Holoman
 
PDF
Iceberg: A modern table format for big data (Strata NY 2018)
Ryan Blue
 
PDF
Intro to HBase
alexbaranau
 
PPTX
Elastic stack Presentation
Amr Alaa Yassen
 
PPTX
Apache kafka
Kumar Shivam
 
PPTX
From cache to in-memory data grid. Introduction to Hazelcast.
Taras Matyashovsky
 
PDF
Apache Kafka Architecture & Fundamentals Explained
confluent
 
PDF
Building an open data platform with apache iceberg
Alluxio, Inc.
 
PDF
Apache Spark Introduction
sudhakara st
 
PDF
Introduction to Apache Flink - Fast and reliable big data processing
Till Rohrmann
 
PPTX
Flink Streaming
Gyula Fóra
 
PPTX
Metasploit
Parth Sahu
 
PDF
Real time stock processing with apache nifi, apache flink and apache kafka
Timothy Spann
 
KEY
NoSQL Databases: Why, what and when
Lorenzo Alberton
 
PPTX
Kafka replication apachecon_2013
Jun Rao
 
PPTX
Data streaming fundamentals
Mohammed Fazuluddin
 
PDF
Kappa vs Lambda Architectures and Technology Comparison
Kai Wähner
 
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Wes McKinney
 
Fundamentals of Apache Kafka
Chhavi Parasher
 
Kafka presentation
Mohammed Fazuluddin
 
Introduction to Apache Kafka
Jeff Holoman
 
Iceberg: A modern table format for big data (Strata NY 2018)
Ryan Blue
 
Intro to HBase
alexbaranau
 
Elastic stack Presentation
Amr Alaa Yassen
 
Apache kafka
Kumar Shivam
 
From cache to in-memory data grid. Introduction to Hazelcast.
Taras Matyashovsky
 
Apache Kafka Architecture & Fundamentals Explained
confluent
 
Building an open data platform with apache iceberg
Alluxio, Inc.
 
Apache Spark Introduction
sudhakara st
 
Introduction to Apache Flink - Fast and reliable big data processing
Till Rohrmann
 
Flink Streaming
Gyula Fóra
 
Metasploit
Parth Sahu
 
Real time stock processing with apache nifi, apache flink and apache kafka
Timothy Spann
 
NoSQL Databases: Why, what and when
Lorenzo Alberton
 
Kafka replication apachecon_2013
Jun Rao
 
Data streaming fundamentals
Mohammed Fazuluddin
 
Kappa vs Lambda Architectures and Technology Comparison
Kai Wähner
 

Viewers also liked (20)

PDF
Big data real time architectures
Daniel Marcous
 
PDF
Kappa Architecture, IoT of the cars - LibreCon 2016
LibreCon
 
PDF
Knowledge Discovery
André Karpištšenko
 
PDF
El software como acción humana
OpenSistemas
 
PDF
El futuro Data Driven en e-Learning y RR.HH.
OpenSistemas
 
PDF
Apache spark y cómo lo usamos en nuestros proyectos
OpenSistemas
 
PPTX
Construyendo una Infraestructura de Big Data rentable y escalable (la evoluci...
Socialmetrix
 
PPTX
Polyglot Processing - An Introduction 1.0
Dr. Mohan K. Bavirisetty
 
PDF
Big Data Streams Architectures. Why? What? How?
Anton Nazaruk
 
PDF
Innovation in the Data Warehouse - StampedeCon 2016
StampedeCon
 
PDF
Arquitectura Lambda
Israel Gaytan
 
PDF
Node Architecture Implications for In-Memory Data Analytics on Scale-in Clusters
Ahsan Javed Awan
 
PDF
Bai tap thuc_hanh_excel_2010
mainth_gtvt
 
PDF
Real time data ingestion and Hybrid Cloud
Neeraj Sabharwal
 
PDF
A real time architecture using Hadoop and Storm @ FOSDEM 2013
Nathan Bijnens
 
PDF
Streaming Patterns Revolutionary Architectures with the Kafka API
Carol McDonald
 
PDF
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Gyula Fóra
 
PDF
Voldemort : Prototype to Production
Vinoth Chandar
 
PDF
Apache Zeppelin Helium and Beyond
DataWorks Summit/Hadoop Summit
 
PPTX
Big Data y el sector salud
BEEVA_es
 
Big data real time architectures
Daniel Marcous
 
Kappa Architecture, IoT of the cars - LibreCon 2016
LibreCon
 
Knowledge Discovery
André Karpištšenko
 
El software como acción humana
OpenSistemas
 
El futuro Data Driven en e-Learning y RR.HH.
OpenSistemas
 
Apache spark y cómo lo usamos en nuestros proyectos
OpenSistemas
 
Construyendo una Infraestructura de Big Data rentable y escalable (la evoluci...
Socialmetrix
 
Polyglot Processing - An Introduction 1.0
Dr. Mohan K. Bavirisetty
 
Big Data Streams Architectures. Why? What? How?
Anton Nazaruk
 
Innovation in the Data Warehouse - StampedeCon 2016
StampedeCon
 
Arquitectura Lambda
Israel Gaytan
 
Node Architecture Implications for In-Memory Data Analytics on Scale-in Clusters
Ahsan Javed Awan
 
Bai tap thuc_hanh_excel_2010
mainth_gtvt
 
Real time data ingestion and Hybrid Cloud
Neeraj Sabharwal
 
A real time architecture using Hadoop and Storm @ FOSDEM 2013
Nathan Bijnens
 
Streaming Patterns Revolutionary Architectures with the Kafka API
Carol McDonald
 
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Gyula Fóra
 
Voldemort : Prototype to Production
Vinoth Chandar
 
Apache Zeppelin Helium and Beyond
DataWorks Summit/Hadoop Summit
 
Big Data y el sector salud
BEEVA_es
 
Ad

Similar to ASPgems - kappa architecture (20)

PPT
An Introduction to Apache spark with scala
johnn210
 
PDF
A Master Guide To Apache Spark Application And Versatile Uses.pdf
DataSpace Academy
 
PPTX
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Anant Corporation
 
PPTX
Pyspark presentationsfspfsjfspfjsfpsjfspfjsfpsjfsfsf
sasuke20y4sh
 
PDF
Apache Spark PDF
Naresh Rupareliya
 
PDF
spark_v1_2
Frank Schroeter
 
PDF
Started with-apache-spark
Happiest Minds Technologies
 
ODP
Lambda Architecture with Spark
Knoldus Inc.
 
PPTX
Stream, stream, stream: Different streaming methods with Spark and Kafka
Itai Yaffe
 
PPTX
Learn about SPARK tool and it's componemts
siddharth30121
 
PPTX
Apache Spark in Scientific Applications
Dr. Mirko Kämpf
 
PPTX
Apache Spark in Scientific Applciations
Dr. Mirko Kämpf
 
PDF
Using pySpark with Google Colab & Spark 3.0 preview
Mario Cartia
 
PPTX
Data streaming
Alberto Paro
 
PPTX
Getting Started with Apache Spark (Scala)
Knoldus Inc.
 
PDF
Apache Spark - Intro to Large-scale recommendations with Apache Spark and Python
Christian Perone
 
PDF
AI at Scale
Adi Polak
 
PPTX
Introduction to spark
Home
 
PPTX
CLOUD_COMPUTING_MODULE5_RK_BIG_DATA.pptx
bhuvankumar3877
 
PDF
Module01
NPN Training
 
An Introduction to Apache spark with scala
johnn210
 
A Master Guide To Apache Spark Application And Versatile Uses.pdf
DataSpace Academy
 
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Anant Corporation
 
Pyspark presentationsfspfsjfspfjsfpsjfspfjsfpsjfsfsf
sasuke20y4sh
 
Apache Spark PDF
Naresh Rupareliya
 
spark_v1_2
Frank Schroeter
 
Started with-apache-spark
Happiest Minds Technologies
 
Lambda Architecture with Spark
Knoldus Inc.
 
Stream, stream, stream: Different streaming methods with Spark and Kafka
Itai Yaffe
 
Learn about SPARK tool and it's componemts
siddharth30121
 
Apache Spark in Scientific Applications
Dr. Mirko Kämpf
 
Apache Spark in Scientific Applciations
Dr. Mirko Kämpf
 
Using pySpark with Google Colab & Spark 3.0 preview
Mario Cartia
 
Data streaming
Alberto Paro
 
Getting Started with Apache Spark (Scala)
Knoldus Inc.
 
Apache Spark - Intro to Large-scale recommendations with Apache Spark and Python
Christian Perone
 
AI at Scale
Adi Polak
 
Introduction to spark
Home
 
CLOUD_COMPUTING_MODULE5_RK_BIG_DATA.pptx
bhuvankumar3877
 
Module01
NPN Training
 
Ad

More from Juantomás García Molina (20)

PDF
#AbadIA machine learning pipelines commit conf 2019
Juantomás García Molina
 
PDF
AbadIA - sphere it krakow 2019
Juantomás García Molina
 
PDF
AbadIA ING Direct - Madrid 2019
Juantomás García Molina
 
PDF
AbadIA US Secret Tour - Pittsburgh'19
Juantomás García Molina
 
PDF
From alpha go to alpha zero TLP innova 2018
Juantomás García Molina
 
PDF
AbadIA: the abbey of the crime AI - GDG Cloud London 2018
Juantomás García Molina
 
PDF
#AbadIA: the abbey of the crime AI - IO18 extended madrid 2018
Juantomás García Molina
 
PDF
#AbadIA: the abbey of the crime AI - IBM meetup Madrid 2018
Juantomás García Molina
 
PDF
AbadIA: the abbey of the crime AI - Vaas Madrid 2018
Juantomás García Molina
 
PDF
From Alpha Go to Alpha Zero - Vaas Madrid 2018
Juantomás García Molina
 
PDF
Alpha zero - London 2018
Juantomás García Molina
 
PDF
Codemotion madrid 2017 Arquitectura kappa 2.0
Juantomás García Molina
 
PDF
JBCN barcelona 2017 kappa architecture 2.0
Juantomás García Molina
 
PDF
Meetup big data developers 2017 madrid - spark real use cases
Juantomás García Molina
 
PDF
Gdg cloud madrid 2017 - GDG kick off metuup
Juantomás García Molina
 
PDF
Scalaua 2017 kyev kappa architecture 2.0
Juantomás García Molina
 
PDF
Icea 2017 big data - recursos humanos
Juantomás García Molina
 
PDF
Gdg cloud london 2017 kappa architecture 2.0 copia
Juantomás García Molina
 
PDF
Datascience lab 2017 odessa kappa architecture 2.0
Juantomás García Molina
 
PDF
Databeers madrid 2017 - Paas pigeons as a service
Juantomás García Molina
 
#AbadIA machine learning pipelines commit conf 2019
Juantomás García Molina
 
AbadIA - sphere it krakow 2019
Juantomás García Molina
 
AbadIA ING Direct - Madrid 2019
Juantomás García Molina
 
AbadIA US Secret Tour - Pittsburgh'19
Juantomás García Molina
 
From alpha go to alpha zero TLP innova 2018
Juantomás García Molina
 
AbadIA: the abbey of the crime AI - GDG Cloud London 2018
Juantomás García Molina
 
#AbadIA: the abbey of the crime AI - IO18 extended madrid 2018
Juantomás García Molina
 
#AbadIA: the abbey of the crime AI - IBM meetup Madrid 2018
Juantomás García Molina
 
AbadIA: the abbey of the crime AI - Vaas Madrid 2018
Juantomás García Molina
 
From Alpha Go to Alpha Zero - Vaas Madrid 2018
Juantomás García Molina
 
Alpha zero - London 2018
Juantomás García Molina
 
Codemotion madrid 2017 Arquitectura kappa 2.0
Juantomás García Molina
 
JBCN barcelona 2017 kappa architecture 2.0
Juantomás García Molina
 
Meetup big data developers 2017 madrid - spark real use cases
Juantomás García Molina
 
Gdg cloud madrid 2017 - GDG kick off metuup
Juantomás García Molina
 
Scalaua 2017 kyev kappa architecture 2.0
Juantomás García Molina
 
Icea 2017 big data - recursos humanos
Juantomás García Molina
 
Gdg cloud london 2017 kappa architecture 2.0 copia
Juantomás García Molina
 
Datascience lab 2017 odessa kappa architecture 2.0
Juantomás García Molina
 
Databeers madrid 2017 - Paas pigeons as a service
Juantomás García Molina
 

Recently uploaded (20)

PDF
D9110.pdfdsfvsdfvsdfvsdfvfvfsvfsvffsdfvsdfvsd
minhn6673
 
PPTX
Pipeline Automatic Leak Detection for Water Distribution Systems
Sione Palu
 
PPTX
Solution+Architecture+Review+-+Sample.pptx
manuvratsingh1
 
PDF
717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...
pedelli41
 
PDF
SUMMER INTERNSHIP REPORT[1] (AutoRecovered) (6) (1).pdf
pandeydiksha814
 
PDF
McKinsey - Global Energy Perspective 2023_11.pdf
niyudha
 
PPTX
Probability systematic sampling methods.pptx
PrakashRajput19
 
PPTX
White Blue Simple Modern Enhancing Sales Strategy Presentation_20250724_21093...
RamNeymarjr
 
PDF
Blue Futuristic Cyber Security Presentation.pdf
tanvikhunt1003
 
PDF
apidays Munich 2025 - Integrate Your APIs into the New AI Marketplace, Senthi...
apidays
 
PPTX
The whitetiger novel review for collegeassignment.pptx
DhruvPatel754154
 
PDF
blockchain123456789012345678901234567890
tanvikhunt1003
 
PDF
Top Civil Engineer Canada Services111111
nengineeringfirms
 
PPTX
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
PDF
Blitz Campinas - Dia 24 de maio - Piettro.pdf
fabigreek
 
PPTX
short term internship project on Data visualization
JMJCollegeComputerde
 
PPTX
lecture 13 mind test academy it skills.pptx
ggesjmrasoolpark
 
PPT
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
PPTX
short term project on AI Driven Data Analytics
JMJCollegeComputerde
 
PPTX
Fluvial_Civilizations_Presentation (1).pptx
alisslovemendoza7
 
D9110.pdfdsfvsdfvsdfvsdfvfvfsvfsvffsdfvsdfvsd
minhn6673
 
Pipeline Automatic Leak Detection for Water Distribution Systems
Sione Palu
 
Solution+Architecture+Review+-+Sample.pptx
manuvratsingh1
 
717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...
pedelli41
 
SUMMER INTERNSHIP REPORT[1] (AutoRecovered) (6) (1).pdf
pandeydiksha814
 
McKinsey - Global Energy Perspective 2023_11.pdf
niyudha
 
Probability systematic sampling methods.pptx
PrakashRajput19
 
White Blue Simple Modern Enhancing Sales Strategy Presentation_20250724_21093...
RamNeymarjr
 
Blue Futuristic Cyber Security Presentation.pdf
tanvikhunt1003
 
apidays Munich 2025 - Integrate Your APIs into the New AI Marketplace, Senthi...
apidays
 
The whitetiger novel review for collegeassignment.pptx
DhruvPatel754154
 
blockchain123456789012345678901234567890
tanvikhunt1003
 
Top Civil Engineer Canada Services111111
nengineeringfirms
 
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
Blitz Campinas - Dia 24 de maio - Piettro.pdf
fabigreek
 
short term internship project on Data visualization
JMJCollegeComputerde
 
lecture 13 mind test academy it skills.pptx
ggesjmrasoolpark
 
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
short term project on AI Driven Data Analytics
JMJCollegeComputerde
 
Fluvial_Civilizations_Presentation (1).pptx
alisslovemendoza7
 

ASPgems - kappa architecture

  • 2. Who am I CDO ASPgems Former President of Hispalinux (Spanish LUG) Author “La Pastilla Roja” first spanish book about Free Software.
  • 3. Menu A little context about Kappa Architecture What’s Kappa Architecture What is not Kappa Architecture How we implement it Real use cases with KA
  • 4. A little context July 2, 2014 Jay Kreps coined the term Kappa Architecture in an article for O’reilly Radar
  • 5. Who is Jay Kreps Jay has been involved in lots of projects: Author of the essay: The Log: What every software engineer should know about real-time data's unifying abstraction (12/16/2013) https://engineering.linkedin.com/distributed-systems/log-what-every-software- engineer-should-know-about-real-time-datas-unifying
  • 6. Jay Kreps Author of the book: I ♥ Logs
  • 7. Jay Kreps Involved with projects as: Apache Kafka Apache Samza Voldemort Azkaban Ex-Linkedin Now co-founder and CEO of Confluent
  • 8. Lambda Architecture Look something like this: https://www.mapr.com/developercentral/lambda-architecture
  • 9. Lambda Architecture Batch layer that provides the following functionality managing the master dataset, an immutable, append-only set of raw data. pre-computing arbitrary query functions, called batch views. https://www.mapr.com/developercentral/lambda-architecture
  • 10. Lambda Architecture Serving layer This layer indexes the batch views so that they can be queried in ad hoc with low latency. Speed layer This layer accommodates all requests that are subject to low latency requirements. Using fast and incremental algorithms, the speed layer deals with recent data only.
  • 11. Lambda Architecture batch layer datasets can be in a distributed filesystem, while MapReduce can be used to create batch views that can be fed to the serving layer. The serving layer can be implemented using NoSQL technologies such as HBase,Apache Druid, etc. Querying can be implemented by technologies such as Apache Drill or Impala Speed layer can be realized with data streaming technologies such as Apache Storm or Spark Streaming https://www.mapr.com/developercentral/lambda-architecture
  • 12. Pros of Lambda Architecture Retain the input data unchanged. Think about modeling data transformations, series of data states from the original input. Lambda architecture take in account the problem of reprocessing data. this happens all the time, the code will change, and you will need to reprocess all the information. Lots of reasons and you will need to live with this.
  • 13. Cons of Lambda Architecture Maintain the code that need to produce the same result from two complex distributed system is painful. Very different code for MapReduce and Storm/ Apache Spark Not only is about different code, is also about debugging and interaction with other products like (hive, Oozie, Cascading, etc) At the end is a problem about different and diverging programming paradigms.
  • 14. So what is Kappa Architecture The proposal of Jay Kreps is so simple: Use kafka (or other system) that will let you retain the full log of the data you need to reprocess. When you want to do the reprocessing, start a second instance of your stream processing job that starts processing from the beginning of the retained data, but direct this output data to a new output table.
  • 15. So what is Kappa Architecture part II When the second job has caught up, switch the application to read from the new table. Stop the old version of the job, and delete the old output table.
  • 16. So what is Kappa Architecture This architecture looks something like this:
  • 17. So what is Kappa Architecture The first benefit is that only you need to reprocessing only when you change the code. You can check if the new version is working ok and if not reverse to the old output table. You can mirror a Kafka topic to HDFS so you are not limited to the Kafka retention configuration. You have only a code to maintain with an unique framework.
  • 18. So what is Kappa Architecture The real advantage is not about efficiency at all (You will need extra temporarily storage when reprocessing for example) is allowing your team to develop, test, debug and operate their systems on top of a single processing framework.
  • 19. What is not Kappa Architecture Is not a silver bullet to solve every problem at Big Data. Is not a list of prescriptions of technologies. You can implement with your favorite frameworks. Is not a rigid set of rules. But helps to maintain the complex projects simple.
  • 20. How we use Kappa Architecture We start working with projects with a complex structure like Linkedin looks at early stage. That’s very usual.
  • 21. How we use Kappa Architecture
  • 22. How we use Kappa Architecture We try to refactoring the data flows to fix in a Kappa Architecture.
  • 23. How we use Kappa Architecture
  • 24. How we use Kappa Architecture We use Kafka as Stream Data Platform Instead of Samza we feel more comfortable with Spark Streaming. At ASPGems we choose Apache Spark as our Analytics Engine and not only for Spark Streaming.
  • 25. How we use Kappa Architecture At the end, Kappa Architecture is design pattern for us. We use/clone this pattern in almost our projects. We have projects of every size, volume of data or speed needing and fix with the Kappa Architecture.
  • 27. Telefónica - MSS We use KA to calculate near real time KPIs, SLAs related with the managed security system. We simplify the data flow of the input data. Kafka in the streaming data platform. As MPP we use CassandraDB.
  • 28. IOT - OBD II One of our clients install On Board Devices in the cars of its customers. We implement an API to got all the information in real time and inject the information in Kafka. The business rules are implemented in a CEP running into Apache Spark Streaming. As MPP we use Elastic Search.
  • 29. Insurance Company We implement Kappa Architecture to process click stream in real time and clustering users We show content and offers that better fix users
  • 30. Energy Facility We implement Kappa Architecture to process and predict energy consume. Our customer include energy storage systems and we got all the information about energy storage (ultra-capacitors and batteries). We process this information to calculate the effective lifetime of the components and its degradation.