SlideShare a Scribd company logo
Big Data Pipeline
Lambda Architecture - Batch Layer
with
AngularJS
Java Restful Web Services
Apache Hadoop
Apache Spark
Apache Cassandra
on Amazon Web Services Cloud Platform
INGEST STORE Process Visualize
BIG Data Pipeline
Data Pipeline
AngularJS
Web App
Rest
Web Services
Apache
Web Logs
S3
Log/Data File
Spark
Engine
Spark
SQL
HDFS
Apache
Cassandra S3
HDFS
Apache
Cassandra
AngularJS
Web App
0255075100125
April
-7.507.51522.530
-4048121620
INGEST STORE PROCES
S
VISUALIZE
STORE
Interactive
Queries
BIG Data Batch Layer Pipeline
Spark Cluster
AngularJS
Web App
ClickStream
Data
Apache
Web Logs
Log/Data File
Spark
Streaming
Spark
SQL
Apache
Kafka
S3
HDFS
Apache
Cassandra
AngularJS
Web App
0255075100125
April
-7.507.51522.530
-4048121620
INGEST STREA
M
PROCES
S
VISUALIZE
STORE
Interactive
Queries
Spark Cluster
TCP
Sockets
BIG Data Real-Time Layer
Pipeline
Install Web Server
EC2 instance for Web Server
cat /etc/*-release
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer
java -version
mkdir webserver
cd webserver
wget http://www-eu.apache.org/dist/tomcat/tomcat-8/v8.0.36/bin/apache-tomcat-8.0.36.tar.gz
tar xvzf apache-tomcat-8.0.36.tar.gz
ubuntu@ip-172-31-59-137:~/webserver/apache-tomcat-8.0.36/bin$ ./startup.sh
Commands to setup Apache Tomcat 8.0
Apache Tomcat 8.0 running on EC2 Instance
Install Apache Cassandra - 3 Node Cluster on AWS
3 EC2 instance for Cassandra Cluster
cat /etc/*-release
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer
java -version
mkdir db
cd db
wget http://www-eu.apache.org/dist/cassandra/3.0.7/apache-cassandra-3.0.7-bin.tar.gz
tar xvzf apache-cassandra-3.0.7-bin.tar.gz
cd apache-cassandra-3.0.7/
cd apache-cassandra-3.0.7
bin/cassandra -f
bin/cqlsh
cassandra1 ——-> 52.87.183.121
cassandra2 ——-> 52.207.239.229
cassandra3 ——-> 54.174.185.29
Commands to setup Apache Cassandra 3.0.7
Repeat for all 3 EC2 instances
Change following in conf/cassandra.yaml
cluster_name: 'Test Cluster’
listen_address:
broadcast_address: 54.174.185.29
seeds: “52.87.183.121,52.207.239.229"
rpc_address:
cassandra1 ——-> 52.87.183.121
cassandra2 ——-> 52.207.239.229
cassandra3 ——-> 54.174.185.29
3 Node Cassandra Server running on AWS EC2 Instances
3 Node Cassandra Server running
CREATE KEYSPACE users;
WITH replication = {'class':'SimpleStrategy', 'replication_facto
CREATE TABLE user(
id int PRIMARY KEY,
name text
);
select * from user;
AngularJS - Java Restful WebServices Deployed on
AWS Cloud
AngularJS - Java Restful WebServices
AngularJS - Java Restful WebServices
AngularJS - Java Restful WebServices
Tomcat Web Server Web Log we will be processing
with Apache Hadoop/Spark
Web Log and Python Application deployed to
AWS Bucket
Spark job executed on AWS EMR - Spark Cluster
Results stored in Cassandra Database
Results stored in AWS S3 Bucket
Python Application BatchLogAnalyzer.py executed on
AWS Spark Cluster
Results compared in console and Cassandra Database
Thank You
hkbhadraa@gmail.com

More Related Content

What's hot (18)

PPTX
Apache Incubator Samza: Stream Processing at LinkedIn
Chris Riccomini
 
PDF
Lambda at Weather Scale by Robbie Strickland
Spark Summit
 
PDF
nuclio Overview October 2017
iguazio
 
PDF
Productizing Structured Streaming Jobs
Databricks
 
PDF
Cassandra and Spark SQL
Russell Spitzer
 
PPTX
Kafka Lambda architecture with mirroring
Anant Rustagi
 
PDF
Benchmarking at Parse
Travis Redman
 
PDF
Digdagによる大規模データ処理の自動化とエラー処理
Sadayuki Furuhashi
 
PPTX
Streaming Data from Scylla to Kafka
ScyllaDB
 
PDF
iguazio - nuclio Meetup Nov 30th
iguazio
 
PDF
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Anton Kirillov
 
PDF
Analytics at Scale with Apache Spark on AWS with Jonathan Fritz
Databricks
 
PDF
Docker Monitoring Webinar
Sematext Group, Inc.
 
PDF
Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka
Mark Bittmann
 
PDF
Querying Data Pipeline with AWS Athena
Yaroslav Tkachenko
 
PDF
Care and Feeding of Large Scale Graphite Installations - DevOpsDays Austin 2013
Nick Galbreath
 
PDF
Building Super Fast Cloud-Native Data Platforms - Yaron Haviv, KubeCon 2017 EU
Yaron Haviv
 
PDF
Automating Workflows for Analytics Pipelines
Sadayuki Furuhashi
 
Apache Incubator Samza: Stream Processing at LinkedIn
Chris Riccomini
 
Lambda at Weather Scale by Robbie Strickland
Spark Summit
 
nuclio Overview October 2017
iguazio
 
Productizing Structured Streaming Jobs
Databricks
 
Cassandra and Spark SQL
Russell Spitzer
 
Kafka Lambda architecture with mirroring
Anant Rustagi
 
Benchmarking at Parse
Travis Redman
 
Digdagによる大規模データ処理の自動化とエラー処理
Sadayuki Furuhashi
 
Streaming Data from Scylla to Kafka
ScyllaDB
 
iguazio - nuclio Meetup Nov 30th
iguazio
 
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Anton Kirillov
 
Analytics at Scale with Apache Spark on AWS with Jonathan Fritz
Databricks
 
Docker Monitoring Webinar
Sematext Group, Inc.
 
Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka
Mark Bittmann
 
Querying Data Pipeline with AWS Athena
Yaroslav Tkachenko
 
Care and Feeding of Large Scale Graphite Installations - DevOpsDays Austin 2013
Nick Galbreath
 
Building Super Fast Cloud-Native Data Platforms - Yaron Haviv, KubeCon 2017 EU
Yaron Haviv
 
Automating Workflows for Analytics Pipelines
Sadayuki Furuhashi
 

Viewers also liked (19)

PPTX
Dr. Daniel Sabbah, IBM - OpenStack in the IBM Cloud, OpenStack Israel 2015
Cloud Native Day Tel Aviv
 
PDF
Fait pour Persuader - LE paradigme LE PLUS IMPORTANT qui fait convertir vos v...
Jochen (Thomas) Grünbeck ◁
 
PDF
Project management part 5
hkbhadraa
 
PDF
Project management part 2
hkbhadraa
 
PDF
Project management part 4
hkbhadraa
 
PDF
Gamification
hkbhadraa
 
PDF
Internet of things
hkbhadraa
 
PDF
Project management part 1
hkbhadraa
 
PDF
Project management part 3
hkbhadraa
 
PDF
Framework and Product Comparison for Big Data Log Analytics and ITOA
Kai Wähner
 
PDF
IBM Bluemix Cloud Platform Application Development with Eclipse IDE
hkbhadraa
 
PDF
[AWSマイスターシリーズ] AWS Billingについて
Amazon Web Services Japan
 
PDF
Setup 3 Node Kafka Cluster on AWS - Hands On
hkbhadraa
 
PDF
セキュリティを捉えてクラウドを使うためのポイント
Yasuhiro Araki, Ph.D
 
PDF
AWSの課金体系
Amazon Web Services Japan
 
PDF
Hadoop BIG Data - Fraud Detection with Real-Time Analytics
hkbhadraa
 
PDF
AWS セキュリティとコンプライアンス
Amazon Web Services Japan
 
PDF
いまさら聞けないAWSクラウド - Java Festa 2013
SORACOM, INC
 
PDF
AWSの共有責任モデル(shared responsibility model)
Akio Katayama
 
Dr. Daniel Sabbah, IBM - OpenStack in the IBM Cloud, OpenStack Israel 2015
Cloud Native Day Tel Aviv
 
Fait pour Persuader - LE paradigme LE PLUS IMPORTANT qui fait convertir vos v...
Jochen (Thomas) Grünbeck ◁
 
Project management part 5
hkbhadraa
 
Project management part 2
hkbhadraa
 
Project management part 4
hkbhadraa
 
Gamification
hkbhadraa
 
Internet of things
hkbhadraa
 
Project management part 1
hkbhadraa
 
Project management part 3
hkbhadraa
 
Framework and Product Comparison for Big Data Log Analytics and ITOA
Kai Wähner
 
IBM Bluemix Cloud Platform Application Development with Eclipse IDE
hkbhadraa
 
[AWSマイスターシリーズ] AWS Billingについて
Amazon Web Services Japan
 
Setup 3 Node Kafka Cluster on AWS - Hands On
hkbhadraa
 
セキュリティを捉えてクラウドを使うためのポイント
Yasuhiro Araki, Ph.D
 
AWSの課金体系
Amazon Web Services Japan
 
Hadoop BIG Data - Fraud Detection with Real-Time Analytics
hkbhadraa
 
AWS セキュリティとコンプライアンス
Amazon Web Services Japan
 
いまさら聞けないAWSクラウド - Java Festa 2013
SORACOM, INC
 
AWSの共有責任モデル(shared responsibility model)
Akio Katayama
 
Ad

Similar to Big data Lambda Architecture - Batch Layer Hands On (20)

PDF
Lambda architecture
Mario Alexandro Santini
 
PPTX
Rakuten techconf2015.baiji.he.bigdataforsmallstartupandbeyond
Baiji He
 
PPTX
Lambda usecase
David Tung
 
PPTX
Big Data Technology Stack : Nutshell
Khalid Imran
 
PDF
Big Data & Machine Learning Pipelines: A Tale of Lambdas, Kappas and Pancakes
Osama Khan
 
PDF
Building a Data Ingestion & Processing Pipeline with Spark & Airflow
Tom Lous
 
PPTX
Building data pipelines
Jonathan Holloway
 
PDF
Introduction To Hadoop Ecosystem
InSemble
 
PPTX
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Omid Vahdaty
 
PDF
Simple, Modular and Extensible Big Data Platform Concept
Satish Mohan
 
PDF
Open source stak of big data techs open suse asia
Muhammad Rifqi
 
PPTX
Aws re invent 2018 recap
CloudHesive
 
PPTX
Introduction to AWS Big Data
Omid Vahdaty
 
PDF
Simplify Big Data with AWS
Julien SIMON
 
PPTX
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
Omid Vahdaty
 
PPTX
Big data and hadoop training - Session 5
hkbhadraa
 
PPTX
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Simplilearn
 
PPT
Hadoop applicationarchitectures
Doug Chang
 
PPTX
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
Geoffrey Fox
 
PDF
Hadoop - Architectural road map for Hadoop Ecosystem
nallagangus
 
Lambda architecture
Mario Alexandro Santini
 
Rakuten techconf2015.baiji.he.bigdataforsmallstartupandbeyond
Baiji He
 
Lambda usecase
David Tung
 
Big Data Technology Stack : Nutshell
Khalid Imran
 
Big Data & Machine Learning Pipelines: A Tale of Lambdas, Kappas and Pancakes
Osama Khan
 
Building a Data Ingestion & Processing Pipeline with Spark & Airflow
Tom Lous
 
Building data pipelines
Jonathan Holloway
 
Introduction To Hadoop Ecosystem
InSemble
 
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Omid Vahdaty
 
Simple, Modular and Extensible Big Data Platform Concept
Satish Mohan
 
Open source stak of big data techs open suse asia
Muhammad Rifqi
 
Aws re invent 2018 recap
CloudHesive
 
Introduction to AWS Big Data
Omid Vahdaty
 
Simplify Big Data with AWS
Julien SIMON
 
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
Omid Vahdaty
 
Big data and hadoop training - Session 5
hkbhadraa
 
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Simplilearn
 
Hadoop applicationarchitectures
Doug Chang
 
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
Geoffrey Fox
 
Hadoop - Architectural road map for Hadoop Ecosystem
nallagangus
 
Ad

Recently uploaded (20)

PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PPTX
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 

Big data Lambda Architecture - Batch Layer Hands On

  • 1. Big Data Pipeline Lambda Architecture - Batch Layer with AngularJS Java Restful Web Services Apache Hadoop Apache Spark Apache Cassandra on Amazon Web Services Cloud Platform
  • 2. INGEST STORE Process Visualize BIG Data Pipeline Data Pipeline
  • 3. AngularJS Web App Rest Web Services Apache Web Logs S3 Log/Data File Spark Engine Spark SQL HDFS Apache Cassandra S3 HDFS Apache Cassandra AngularJS Web App 0255075100125 April -7.507.51522.530 -4048121620 INGEST STORE PROCES S VISUALIZE STORE Interactive Queries BIG Data Batch Layer Pipeline Spark Cluster
  • 4. AngularJS Web App ClickStream Data Apache Web Logs Log/Data File Spark Streaming Spark SQL Apache Kafka S3 HDFS Apache Cassandra AngularJS Web App 0255075100125 April -7.507.51522.530 -4048121620 INGEST STREA M PROCES S VISUALIZE STORE Interactive Queries Spark Cluster TCP Sockets BIG Data Real-Time Layer Pipeline
  • 6. EC2 instance for Web Server
  • 7. cat /etc/*-release sudo add-apt-repository ppa:webupd8team/java sudo apt-get update sudo apt-get install oracle-java8-installer java -version mkdir webserver cd webserver wget http://www-eu.apache.org/dist/tomcat/tomcat-8/v8.0.36/bin/apache-tomcat-8.0.36.tar.gz tar xvzf apache-tomcat-8.0.36.tar.gz ubuntu@ip-172-31-59-137:~/webserver/apache-tomcat-8.0.36/bin$ ./startup.sh Commands to setup Apache Tomcat 8.0
  • 8. Apache Tomcat 8.0 running on EC2 Instance
  • 9. Install Apache Cassandra - 3 Node Cluster on AWS
  • 10. 3 EC2 instance for Cassandra Cluster
  • 11. cat /etc/*-release sudo add-apt-repository ppa:webupd8team/java sudo apt-get update sudo apt-get install oracle-java8-installer java -version mkdir db cd db wget http://www-eu.apache.org/dist/cassandra/3.0.7/apache-cassandra-3.0.7-bin.tar.gz tar xvzf apache-cassandra-3.0.7-bin.tar.gz cd apache-cassandra-3.0.7/ cd apache-cassandra-3.0.7 bin/cassandra -f bin/cqlsh cassandra1 ——-> 52.87.183.121 cassandra2 ——-> 52.207.239.229 cassandra3 ——-> 54.174.185.29 Commands to setup Apache Cassandra 3.0.7 Repeat for all 3 EC2 instances Change following in conf/cassandra.yaml cluster_name: 'Test Cluster’ listen_address: broadcast_address: 54.174.185.29 seeds: “52.87.183.121,52.207.239.229" rpc_address: cassandra1 ——-> 52.87.183.121 cassandra2 ——-> 52.207.239.229 cassandra3 ——-> 54.174.185.29
  • 12. 3 Node Cassandra Server running on AWS EC2 Instances
  • 13. 3 Node Cassandra Server running CREATE KEYSPACE users; WITH replication = {'class':'SimpleStrategy', 'replication_facto CREATE TABLE user( id int PRIMARY KEY, name text ); select * from user;
  • 14. AngularJS - Java Restful WebServices Deployed on AWS Cloud
  • 15. AngularJS - Java Restful WebServices
  • 16. AngularJS - Java Restful WebServices
  • 17. AngularJS - Java Restful WebServices
  • 18. Tomcat Web Server Web Log we will be processing with Apache Hadoop/Spark
  • 19. Web Log and Python Application deployed to AWS Bucket
  • 20. Spark job executed on AWS EMR - Spark Cluster
  • 21. Results stored in Cassandra Database
  • 22. Results stored in AWS S3 Bucket
  • 23. Python Application BatchLogAnalyzer.py executed on AWS Spark Cluster
  • 24. Results compared in console and Cassandra Database