SlideShare a Scribd company logo
Lambda architecture use case
6/27/2014
Introduction - Lambda Architecture
• Lambda Architecture (introduced by Nathan Marz) is a
generic, scalable and fault-tolerant data processing
architecture to satisfy the needs for a robust system
that is
– Fault-tolerant, both against hardware failures and human
mistakes. Mistakes are corrected via recomputation
– Being able to serve a wide range of workloads and use
cases, and in which low-latency reads and updates are
required.
– Data storage is history optimized and immutability changes
everything.
– The resulting system should be linearly scalable, and it
should scale out rather than up.
LA High-level perspective
LA High-level perspective ( continue)
• All data entering the system is dispatched to both the
batch layer and the speed layer for processing.
• The batch layer has two functions: (i) managing the
master dataset (an immutable, append-only set of raw
data), and (ii) to pre-compute the batch views.
• The serving layer indexes the batch views so that they
can be queried in low-latency, ad-hoc way.
• The speed layer compensates for the high latency of
updates to the serving layer and deals with recent data
only.
• Any incoming query can be answered by merging
results from batch views and real-time views.
Lambda use case
• Data Injection – Queue & Pub/Sub models are
nature fit. RabbitMQ is used
• Use Apache Spark in Batch Layer and Jenkins for
scheduler
• Use Apache Spark Streaming in Speed Layer. Use
Cassandra to store the real time results
• Adopt Apache Shark in Serving Layer
• In Presentation layer, use Tomcat and D3
• ( Refer to next slide for the diagram )
RabbitMQ
HDFS
Apache Spark
Jenkins
Cassandra
Apache Shark Tomcat
RabbitMQ listener
on Tomcat
Realtime processors
( Spark Streaming)
DataInjection
D3
Speed Layer
Batch Layer
ServingLayer
HDFS Loader
StreamingAdaptor
Hive
Apache Spark
• Hadoop integration
• Spark interactive Shell
• The Spark Analytic Suite includes
– Interactive query analysis (Shark),
– Large-scale graph processing and analysis (Bagel)
– Real-time analysis (Spark Streaming).
– Machine Learning library
• Resilient Distributed Data sets
– Distributed objects that can be cached in-memory, across a cluster of compute
nodes
– Fault-tolerance is built-in: RDD’s are automatically rebuilt if something goes
wrong
• Distributed Operators
• Spark is already used in production
• The Spark codebase is small and extensible
Apache Shark
Shark is a component of Spark, an open source, distributed and fault-
tolerant, in-memory analytics system, that can be installed on the
same cluster as Hadoop.
In particular, Shark is fully compatible with Hive and
supports HiveQL, Hive data formats, and user-defined functions. In
addition Shark can be used to query data4 in HDFS, HBase, and
Amazon S3
• Interactive SQL systems for Hadoop
• In-memory column store and column compression
• Control over data partitioning => Fast, distributed JOINS
• Fault-tolerance
• SQL “optimizer”
• Machine-learning support
Lambda usecase
References
• http://lambda-
architecture.net/components/2013-12-12-batch-
components/
• http://lambda-
architecture.net/components/2013-12-24-speed-
components/
• http://lambda-
architecture.net/architecture/2013-12-24-where-
pp-meets-la/
• http://manning.com/marz/BDmeapch1.pdf
• https://www.youtube.com/watch?v=ucHjyb6jv08
• http://www.drdobbs.com/database/applying-
the-big-data-lambda-architectur/240162604

More Related Content

What's hot (20)

PDF
An Overview of Apache Spark
Yasoda Jayaweera
 
PDF
Introduction to apache spark
UserReport
 
PDF
Spark Summit EU talk by Bas Geerdink
Spark Summit
 
PDF
Exponea - Kafka and Hadoop as components of architecture
MartinStrycek
 
PPTX
Webinar: DataStax Training - Everything you need to become a Cassandra Rockstar
DataStax
 
PDF
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Helena Edelson
 
PDF
Reactive streams
codepitbull
 
PDF
Spark Core
Todd McGrath
 
PDF
Big Data visualization with Apache Spark and Zeppelin
prajods
 
PDF
Kudu austin oct 2015.pptx
Felicia Haggarty
 
PDF
Apache Spark Briefing
Thomas W. Dinsmore
 
PDF
Low latency high throughput streaming using Apache Apex and Apache Kudu
DataWorks Summit
 
PDF
Spark Summit EU talk by Berni Schiefer
Spark Summit
 
PDF
Spark Summit EU talk by Mike Percy
Spark Summit
 
PDF
Interactive Visualization of Streaming Data Powered by Spark
Spark Summit
 
PDF
Spark Summit EU talk by Oscar Castaneda
Spark Summit
 
PPTX
Integrating Apache Phoenix with Distributed Query Engines
DataWorks Summit
 
PDF
Spark Summit EU talk by Ruben Pulido Behar Veliqi
Spark Summit
 
PPTX
Big data stores
Kumaran Ramanujam
 
PDF
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Spark Summit
 
An Overview of Apache Spark
Yasoda Jayaweera
 
Introduction to apache spark
UserReport
 
Spark Summit EU talk by Bas Geerdink
Spark Summit
 
Exponea - Kafka and Hadoop as components of architecture
MartinStrycek
 
Webinar: DataStax Training - Everything you need to become a Cassandra Rockstar
DataStax
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Helena Edelson
 
Reactive streams
codepitbull
 
Spark Core
Todd McGrath
 
Big Data visualization with Apache Spark and Zeppelin
prajods
 
Kudu austin oct 2015.pptx
Felicia Haggarty
 
Apache Spark Briefing
Thomas W. Dinsmore
 
Low latency high throughput streaming using Apache Apex and Apache Kudu
DataWorks Summit
 
Spark Summit EU talk by Berni Schiefer
Spark Summit
 
Spark Summit EU talk by Mike Percy
Spark Summit
 
Interactive Visualization of Streaming Data Powered by Spark
Spark Summit
 
Spark Summit EU talk by Oscar Castaneda
Spark Summit
 
Integrating Apache Phoenix with Distributed Query Engines
DataWorks Summit
 
Spark Summit EU talk by Ruben Pulido Behar Veliqi
Spark Summit
 
Big data stores
Kumaran Ramanujam
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Spark Summit
 

Similar to Lambda usecase (20)

PDF
Lambda architecture
Mario Alexandro Santini
 
PPTX
Trivento summercamp masterclass 9/9/2016
Stavros Kontopoulos
 
ODP
Lambda Architecture with Spark
Knoldus Inc.
 
PDF
Lambda architecture @ Indix
Rajesh Muppalla
 
PDF
Lambda Architecture
Venkateswaran Kandasamy
 
PDF
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Paco Nathan
 
PDF
Introduction To Hadoop Ecosystem
InSemble
 
PDF
Apache Spark and Python: unified Big Data analytics
Julien Anguenot
 
PDF
Apache Spark - A High Level overview
Karan Alang
 
PPSX
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
Institute of Contemporary Sciences
 
PPTX
IOT.ppt
Mvidhya9
 
PDF
Lambda Architectures in Practice
C4Media
 
PDF
Open source stak of big data techs open suse asia
Muhammad Rifqi
 
PPTX
In Memory Analytics with Apache Spark
Venkata Naga Ravi
 
PPTX
Matching Data Intensive Applications and Hardware/Software Architectures
Geoffrey Fox
 
PPTX
Matching Data Intensive Applications and Hardware/Software Architectures
Geoffrey Fox
 
PDF
Hands-on Guide to Apache Spark 3: Build Scalable Computing Engines for Batch ...
bemeneqhueen
 
PDF
Big data real time architectures
Daniel Marcous
 
PDF
Big Data & Machine Learning Pipelines: A Tale of Lambdas, Kappas and Pancakes
Osama Khan
 
PDF
Started with-apache-spark
Happiest Minds Technologies
 
Lambda architecture
Mario Alexandro Santini
 
Trivento summercamp masterclass 9/9/2016
Stavros Kontopoulos
 
Lambda Architecture with Spark
Knoldus Inc.
 
Lambda architecture @ Indix
Rajesh Muppalla
 
Lambda Architecture
Venkateswaran Kandasamy
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Paco Nathan
 
Introduction To Hadoop Ecosystem
InSemble
 
Apache Spark and Python: unified Big Data analytics
Julien Anguenot
 
Apache Spark - A High Level overview
Karan Alang
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
Institute of Contemporary Sciences
 
IOT.ppt
Mvidhya9
 
Lambda Architectures in Practice
C4Media
 
Open source stak of big data techs open suse asia
Muhammad Rifqi
 
In Memory Analytics with Apache Spark
Venkata Naga Ravi
 
Matching Data Intensive Applications and Hardware/Software Architectures
Geoffrey Fox
 
Matching Data Intensive Applications and Hardware/Software Architectures
Geoffrey Fox
 
Hands-on Guide to Apache Spark 3: Build Scalable Computing Engines for Batch ...
bemeneqhueen
 
Big data real time architectures
Daniel Marcous
 
Big Data & Machine Learning Pipelines: A Tale of Lambdas, Kappas and Pancakes
Osama Khan
 
Started with-apache-spark
Happiest Minds Technologies
 
Ad

Recently uploaded (20)

PPTX
Milwaukee Marketo User Group - Summer Road Trip: Mapping and Personalizing Yo...
bbedford2
 
PDF
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
PDF
The 5 Reasons for IT Maintenance - Arna Softech
Arna Softech
 
PDF
Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...
VictoriaMetrics
 
PDF
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
PPTX
Home Care Tools: Benefits, features and more
Third Rock Techkno
 
PPTX
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PPTX
Tally software_Introduction_Presentation
AditiBansal54083
 
PPTX
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
PPTX
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
PPTX
Foundations of Marketo Engage - Powering Campaigns with Marketo Personalization
bbedford2
 
PDF
How to Hire AI Developers_ Step-by-Step Guide in 2025.pdf
DianApps Technologies
 
PPTX
Agentic Automation Journey Series Day 2 – Prompt Engineering for UiPath Agents
klpathrudu
 
PPTX
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
PPTX
Transforming Mining & Engineering Operations with Odoo ERP | Streamline Proje...
SatishKumar2651
 
PDF
Open Chain Q2 Steering Committee Meeting - 2025-06-25
Shane Coughlan
 
PDF
Empower Your Tech Vision- Why Businesses Prefer to Hire Remote Developers fro...
logixshapers59
 
PDF
SciPy 2025 - Packaging a Scientific Python Project
Henry Schreiner
 
PDF
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
PDF
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
Milwaukee Marketo User Group - Summer Road Trip: Mapping and Personalizing Yo...
bbedford2
 
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
The 5 Reasons for IT Maintenance - Arna Softech
Arna Softech
 
Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...
VictoriaMetrics
 
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
Home Care Tools: Benefits, features and more
Third Rock Techkno
 
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Tally software_Introduction_Presentation
AditiBansal54083
 
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
Foundations of Marketo Engage - Powering Campaigns with Marketo Personalization
bbedford2
 
How to Hire AI Developers_ Step-by-Step Guide in 2025.pdf
DianApps Technologies
 
Agentic Automation Journey Series Day 2 – Prompt Engineering for UiPath Agents
klpathrudu
 
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
Transforming Mining & Engineering Operations with Odoo ERP | Streamline Proje...
SatishKumar2651
 
Open Chain Q2 Steering Committee Meeting - 2025-06-25
Shane Coughlan
 
Empower Your Tech Vision- Why Businesses Prefer to Hire Remote Developers fro...
logixshapers59
 
SciPy 2025 - Packaging a Scientific Python Project
Henry Schreiner
 
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
Ad

Lambda usecase

  • 1. Lambda architecture use case 6/27/2014
  • 2. Introduction - Lambda Architecture • Lambda Architecture (introduced by Nathan Marz) is a generic, scalable and fault-tolerant data processing architecture to satisfy the needs for a robust system that is – Fault-tolerant, both against hardware failures and human mistakes. Mistakes are corrected via recomputation – Being able to serve a wide range of workloads and use cases, and in which low-latency reads and updates are required. – Data storage is history optimized and immutability changes everything. – The resulting system should be linearly scalable, and it should scale out rather than up.
  • 4. LA High-level perspective ( continue) • All data entering the system is dispatched to both the batch layer and the speed layer for processing. • The batch layer has two functions: (i) managing the master dataset (an immutable, append-only set of raw data), and (ii) to pre-compute the batch views. • The serving layer indexes the batch views so that they can be queried in low-latency, ad-hoc way. • The speed layer compensates for the high latency of updates to the serving layer and deals with recent data only. • Any incoming query can be answered by merging results from batch views and real-time views.
  • 5. Lambda use case • Data Injection – Queue & Pub/Sub models are nature fit. RabbitMQ is used • Use Apache Spark in Batch Layer and Jenkins for scheduler • Use Apache Spark Streaming in Speed Layer. Use Cassandra to store the real time results • Adopt Apache Shark in Serving Layer • In Presentation layer, use Tomcat and D3 • ( Refer to next slide for the diagram )
  • 6. RabbitMQ HDFS Apache Spark Jenkins Cassandra Apache Shark Tomcat RabbitMQ listener on Tomcat Realtime processors ( Spark Streaming) DataInjection D3 Speed Layer Batch Layer ServingLayer HDFS Loader StreamingAdaptor Hive
  • 7. Apache Spark • Hadoop integration • Spark interactive Shell • The Spark Analytic Suite includes – Interactive query analysis (Shark), – Large-scale graph processing and analysis (Bagel) – Real-time analysis (Spark Streaming). – Machine Learning library • Resilient Distributed Data sets – Distributed objects that can be cached in-memory, across a cluster of compute nodes – Fault-tolerance is built-in: RDD’s are automatically rebuilt if something goes wrong • Distributed Operators • Spark is already used in production • The Spark codebase is small and extensible
  • 8. Apache Shark Shark is a component of Spark, an open source, distributed and fault- tolerant, in-memory analytics system, that can be installed on the same cluster as Hadoop. In particular, Shark is fully compatible with Hive and supports HiveQL, Hive data formats, and user-defined functions. In addition Shark can be used to query data4 in HDFS, HBase, and Amazon S3 • Interactive SQL systems for Hadoop • In-memory column store and column compression • Control over data partitioning => Fast, distributed JOINS • Fault-tolerance • SQL “optimizer” • Machine-learning support
  • 10. References • http://lambda- architecture.net/components/2013-12-12-batch- components/ • http://lambda- architecture.net/components/2013-12-24-speed- components/ • http://lambda- architecture.net/architecture/2013-12-24-where- pp-meets-la/ • http://manning.com/marz/BDmeapch1.pdf • https://www.youtube.com/watch?v=ucHjyb6jv08 • http://www.drdobbs.com/database/applying- the-big-data-lambda-architectur/240162604