Lambda usecase

Download as PPTX, PDF

5 likes930 views

The Lambda architecture uses a batch layer to process all incoming data and generate batch views to serve queries with high latency, a speed layer to process recent data and compensate for batch view latency with low latency real-time views, and a serving layer to merge batch and real-time views to answer queries. This document provides an example use case where RabbitMQ is used for data injection, Apache Spark is used for batch processing, Apache Spark Streaming is used for the speed layer, Apache Shark is used in the serving layer, and results are stored in Cassandra and presented using Tomcat and D3.

Software

Introduction - Lambda Architecture
• Lambda Architecture (introduced by Nathan Marz) is a
generic, scalable and fault-tolerant data processing
architecture to satisfy the needs for a robust system
that is
– Fault-tolerant, both against hardware failures and human
mistakes. Mistakes are corrected via recomputation
– Being able to serve a wide range of workloads and use
cases, and in which low-latency reads and updates are
required.
– Data storage is history optimized and immutability changes
everything.
– The resulting system should be linearly scalable, and it
should scale out rather than up.

LA High-level perspective ( continue)
• All data entering the system is dispatched to both the
batch layer and the speed layer for processing.
• The batch layer has two functions: (i) managing the
master dataset (an immutable, append-only set of raw
data), and (ii) to pre-compute the batch views.
• The serving layer indexes the batch views so that they
can be queried in low-latency, ad-hoc way.
• The speed layer compensates for the high latency of
updates to the serving layer and deals with recent data
only.
• Any incoming query can be answered by merging
results from batch views and real-time views.

Lambda use case
• Data Injection – Queue & Pub/Sub models are
nature fit. RabbitMQ is used
• Use Apache Spark in Batch Layer and Jenkins for
scheduler
• Use Apache Spark Streaming in Speed Layer. Use
Cassandra to store the real time results
• Adopt Apache Shark in Serving Layer
• In Presentation layer, use Tomcat and D3
• ( Refer to next slide for the diagram )

RabbitMQ
HDFS
Apache Spark
Jenkins
Cassandra
Apache Shark Tomcat
RabbitMQ listener
on Tomcat
Realtime processors
( Spark Streaming)
DataInjection
D3
Speed Layer
Batch Layer
ServingLayer
HDFS Loader
StreamingAdaptor
Hive

Apache Spark
• Hadoop integration
• Spark interactive Shell
• The Spark Analytic Suite includes
– Interactive query analysis (Shark),
– Large-scale graph processing and analysis (Bagel)
– Real-time analysis (Spark Streaming).
– Machine Learning library
• Resilient Distributed Data sets
– Distributed objects that can be cached in-memory, across a cluster of compute
nodes
– Fault-tolerance is built-in: RDD’s are automatically rebuilt if something goes
wrong
• Distributed Operators
• Spark is already used in production
• The Spark codebase is small and extensible

Apache Shark
Shark is a component of Spark, an open source, distributed and fault-
tolerant, in-memory analytics system, that can be installed on the
same cluster as Hadoop.
In particular, Shark is fully compatible with Hive and
supports HiveQL, Hive data formats, and user-defined functions. In
addition Shark can be used to query data4 in HDFS, HBase, and
Amazon S3
• Interactive SQL systems for Hadoop
• In-memory column store and column compression
• Control over data partitioning => Fast, distributed JOINS
• Fault-tolerance
• SQL “optimizer”
• Machine-learning support

References
• http://lambda-
architecture.net/components/2013-12-12-batch-
components/
• http://lambda-
architecture.net/components/2013-12-24-speed-
components/
• http://lambda-
architecture.net/architecture/2013-12-24-where-
pp-meets-la/
• http://manning.com/marz/BDmeapch1.pdf
• https://www.youtube.com/watch?v=ucHjyb6jv08
• http://www.drdobbs.com/database/applying-
the-big-data-lambda-architectur/240162604

More Related Content

What's hot (20)

PDF

An Overview of Apache SparkYasoda Jayaweera

PDF

Introduction to apache sparkUserReport

PDF

Spark Summit EU talk by Bas GeerdinkSpark Summit

PDF

Exponea - Kafka and Hadoop as components of architectureMartinStrycek

PPTX

Webinar: DataStax Training - Everything you need to become a Cassandra RockstarDataStax

PDF

Streaming Analytics with Spark, Kafka, Cassandra and AkkaHelena Edelson

PDF

Reactive streamscodepitbull

PDF

Spark CoreTodd McGrath

PDF

Big Data visualization with Apache Spark and Zeppelinprajods

PDF

Kudu austin oct 2015.pptxFelicia Haggarty

PDF

Apache Spark BriefingThomas W. Dinsmore

PDF

Low latency high throughput streaming using Apache Apex and Apache KuduDataWorks Summit

PDF

Spark Summit EU talk by Berni SchieferSpark Summit

PDF

Spark Summit EU talk by Mike PercySpark Summit

PDF

Interactive Visualization of Streaming Data Powered by SparkSpark Summit

PDF

Spark Summit EU talk by Oscar CastanedaSpark Summit

PPTX

Integrating Apache Phoenix with Distributed Query EnginesDataWorks Summit

PDF

Spark Summit EU talk by Ruben Pulido Behar VeliqiSpark Summit

PPTX

Big data storesKumaran Ramanujam

PDF

Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonSpark Summit

An Overview of Apache SparkYasoda Jayaweera

Introduction to apache sparkUserReport

Spark Summit EU talk by Bas GeerdinkSpark Summit

Exponea - Kafka and Hadoop as components of architectureMartinStrycek

Webinar: DataStax Training - Everything you need to become a Cassandra RockstarDataStax

Streaming Analytics with Spark, Kafka, Cassandra and AkkaHelena Edelson

Reactive streamscodepitbull

Spark CoreTodd McGrath

Big Data visualization with Apache Spark and Zeppelinprajods

Kudu austin oct 2015.pptxFelicia Haggarty

Apache Spark BriefingThomas W. Dinsmore

Low latency high throughput streaming using Apache Apex and Apache KuduDataWorks Summit

Spark Summit EU talk by Berni SchieferSpark Summit

Spark Summit EU talk by Mike PercySpark Summit

Interactive Visualization of Streaming Data Powered by SparkSpark Summit

Spark Summit EU talk by Oscar CastanedaSpark Summit

Integrating Apache Phoenix with Distributed Query EnginesDataWorks Summit

Spark Summit EU talk by Ruben Pulido Behar VeliqiSpark Summit

Big data storesKumaran Ramanujam

Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonSpark Summit

Similar to Lambda usecase (20)

PDF

Lambda architectureMario Alexandro Santini

PPTX

Trivento summercamp masterclass 9/9/2016Stavros Kontopoulos

ODP

Lambda Architecture with SparkKnoldus Inc.

PDF

Lambda architecture @ IndixRajesh Muppalla

PDF

Lambda ArchitectureVenkateswaran Kandasamy

PDF

Tiny Batches, in the wine: Shiny New Bits in Spark StreamingPaco Nathan

PDF

Introduction To Hadoop EcosystemInSemble

PDF

Apache Spark and Python: unified Big Data analyticsJulien Anguenot

PDF

Apache Spark - A High Level overviewKaran Alang

PPSX

How to use Big Data and Data Lake concept in business using Hadoop and Spark...Institute of Contemporary Sciences

PPTX

IOT.pptMvidhya9

PDF

Lambda Architectures in PracticeC4Media

PDF

Open source stak of big data techs open suse asiaMuhammad Rifqi

PPTX

In Memory Analytics with Apache SparkVenkata Naga Ravi

PPTX

Matching Data Intensive Applications and Hardware/Software ArchitecturesGeoffrey Fox

PPTX

Matching Data Intensive Applications and Hardware/Software ArchitecturesGeoffrey Fox

PDF

Hands-on Guide to Apache Spark 3: Build Scalable Computing Engines for Batch ...bemeneqhueen

PDF

Big data real time architecturesDaniel Marcous

PDF

Big Data & Machine Learning Pipelines: A Tale of Lambdas, Kappas and PancakesOsama Khan

PDF

Started with-apache-sparkHappiest Minds Technologies

Lambda architectureMario Alexandro Santini

Trivento summercamp masterclass 9/9/2016Stavros Kontopoulos

Lambda Architecture with SparkKnoldus Inc.

Lambda architecture @ IndixRajesh Muppalla

Lambda ArchitectureVenkateswaran Kandasamy

Tiny Batches, in the wine: Shiny New Bits in Spark StreamingPaco Nathan

Introduction To Hadoop EcosystemInSemble

Apache Spark and Python: unified Big Data analyticsJulien Anguenot

Apache Spark - A High Level overviewKaran Alang

How to use Big Data and Data Lake concept in business using Hadoop and Spark...Institute of Contemporary Sciences

IOT.pptMvidhya9

Lambda Architectures in PracticeC4Media

Open source stak of big data techs open suse asiaMuhammad Rifqi

In Memory Analytics with Apache SparkVenkata Naga Ravi

Matching Data Intensive Applications and Hardware/Software ArchitecturesGeoffrey Fox

Hands-on Guide to Apache Spark 3: Build Scalable Computing Engines for Batch ...bemeneqhueen

Big data real time architecturesDaniel Marcous

Big Data & Machine Learning Pipelines: A Tale of Lambdas, Kappas and PancakesOsama Khan

Started with-apache-sparkHappiest Minds Technologies

Recently uploaded (20)

PPTX

Milwaukee Marketo User Group - Summer Road Trip: Mapping and Personalizing Yo...bbedford2

PDF

Odoo CRM vs Zoho CRM: Honest Comparison 2025 Odiware Technologies Private Limited

PDF

The 5 Reasons for IT Maintenance - Arna SoftechArna Softech

PDF

Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...VictoriaMetrics

PDF

MiniTool Partition Wizard 12.8 Crack License Key LATESThashhshs786

PPTX

Home Care Tools: Benefits, features and moreThird Rock Techkno

PPTX

Change Common Properties in IBM SPSS Statistics Version 31.pptxVersion 1 Analytics

PPTX

Tally software_Introduction_PresentationAditiBansal54083

PPTX

AEM User Group: India Chapter Kickoff Meetingjennaf3

PPTX

Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptxVarsha Nayak

PPTX

Foundations of Marketo Engage - Powering Campaigns with Marketo Personalizationbbedford2

PDF

How to Hire AI Developers_ Step-by-Step Guide in 2025.pdfDianApps Technologies

PPTX

Agentic Automation Journey Series Day 2 – Prompt Engineering for UiPath Agentsklpathrudu

PPTX

Agentic Automation: Build & Deploy Your First UiPath Agentklpathrudu

PPTX

Transforming Mining & Engineering Operations with Odoo ERP | Streamline Proje...SatishKumar2651

PDF

Open Chain Q2 Steering Committee Meeting - 2025-06-25Shane Coughlan

PDF

Empower Your Tech Vision- Why Businesses Prefer to Hire Remote Developers fro...logixshapers59

PDF

SciPy 2025 - Packaging a Scientific Python ProjectHenry Schreiner

PDF

IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025bashirkhan333g

PDF

Linux Certificate of Completion - LabEx CertificateVICTOR MAESTRE RAMIREZ