SlideShare a Scribd company logo
www.edureka.co/hadoop-admin
Setting High Availability in Hadoop Cluster
www.edureka.co/hadoop-admin
What will you learn today?
 Hadoop: A synonym for Big Data
 Hadoop High Availability
 Hands-On: Achieving NameNode and YARN high availability
 Hands-On: Securing HDFS through ACL
 Hadoop as a Data Warehouse
www.edureka.co/hadoop-admin
What is Hadoop?
Apache Hadoop is an open source, scalable and reliable solution that stores and allows distributed
processing of large data sets across clusters of computers using simple programming model
www.edureka.co/hadoop-admin
A closer look at Apache Hadoop
Apache Hadoop includes following modules :
 Hadoop Distributed File System (HDFS): A distributed file system
 Hadoop Common: The common utilities that support the other Hadoop modules
 Hadoop YARN: A framework for job scheduling and cluster resource management
 Hadoop MapReduce: A YARN-based system for parallel processing of large data sets
www.edureka.co/hadoop-admin
High Availability
www.edureka.co/hadoop-admin
Maintaining High Availability
In Distributed Computing, failure is a norm, which means YARN should have acceptable amount of availability
NameNode - No Horizontal Scale NameNode - No High Availability
Data
Node
Data
Node
Data
Node
….
Client get Block Locations
Read Data
NameNode
NS
Block Management
www.edureka.co/hadoop-admin
NameNode: Single Point of Failure
Secondary
NameNode
NameNode
 Secondary NameNode:
 "Not a hot standby" for the NameNode
 Connects to NameNode every hour*
 Housekeeping, backup of NemeNode metadata
 Saved metadata can build a failed NameNode
metadata
metadata
Single Point
Failure
You give me
metadata
every hour, I
will make it
secure
www.edureka.co/hadoop-admin
Hadoop 2.0 Cluster Architecture: High Availability
Node Manager
HDFS
YARN
Resource
Manager
Shared
edit logs
All name space edits
logged to shared NFS
storage; single writer
(fencing)
Read edit logs and applies
to its own namespace
Secondary
Name Node
DataNode
Standby
NameNode
Active
NameNode
Container
App
Master
Node Manager
DataNode
Container
App
Master
Data Node
Client
DataNode
Container
App
Master
Node Manager
DataNode
Container
App
Master
Node Manager
NameNode
High
Availability
Next Generation
MapReduce
http://hadoop.apache.org/docs/stable2/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithNFS.html
HDFS HIGH AVAILABILITY
www.edureka.co/hadoop-admin
NN Active
NN
Standby
DN 1 DN 2 DN n
Shared
storage
Failover Controller
Active
ZK ZK ZK
Failover Controller
Standby
Heartbeat Heartbeat
Monitor
s NN’s
Health
Monitor
s NN’s
Health
Block Reports to Active and standby NN:
Update cmds from one
Sharead NN state
with single writer
(fencing)
HDFS
Cmds
www.edureka.co/hadoop-admin
ZooKeeper
RMState
ZooKeeper
RMState
ZKFC
Resource Manager
Active
ZKFC
Resource
Manager
Passive
1. Active Node stores all
state in ZKStore
2. Failure 4. Failover
3. Standby Node
become active
3. ZKFC
Detects
failure
www.edureka.co/hadoop-admin
Monitor
liveness &
heath
zookeeper
Journal Node
zookeeper
zookeeper
Journal Node
Journal Node
ZookeeperFC
NameNode
Standby
NameNode
Active
DataNode DataNode DataNode
ZookeeperFC
Zookeeper Service
Shared Edits
Monitor and
maintain
active lock
Monitor and
try to take
active lock
Monitor
liveness &
heath
ReadWrite
www.edureka.co/hadoop-admin
Hands-On
Achieving HDFS and YARN High Availability
www.edureka.co/hadoop-admin
Hands-On
Securing HDFS through ACL
www.edureka.co/hadoop-admin
What to do with Big Data?
www.edureka.co/hadoop-admin
Hadoop: The Perfect Data Warehouse
Free Text
Images/
Videos
HCatalog
HiveSQL Others …ImpalaSQL
Tableau CognosQlikView
LogsTransaction Sensors
Pentaho
HDFS Files
Metadata
Query Engines
BI Tools
www.edureka.co/hadoop-admin
What a Data Warehouse is good at?
Among others, a data warehouse is the foundation for a successful business intelligence program
The Data Warehouse Institute
www.tdwi.org
www.edureka.co/hadoop-admin
Thank You …
Questions/Queries/Feedback
Recording and presentation will be made available to you within 24 hours

More Related Content

What's hot (20)

DOC
Hadoop cluster configuration
prabakaranbrick
 
PPT
Hadoop Tutorial
awesomesos
 
PPTX
Hadoop administration
Aneesh Pulickal Karunakaran
 
PPT
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hari Shankar Sreekumar
 
DOCX
Hadoop admin training
Arun Kumar
 
PDF
Secure Hadoop Cluster With Kerberos
Edureka!
 
PPT
Deployment and Management of Hadoop Clusters
Amal G Jose
 
PDF
Introduction to hadoop administration jk
Edureka!
 
ODP
Hadoop - Overview
Jay
 
PPTX
Hadoop
yasser hassen
 
PPTX
Hadoop File system (HDFS)
Prashant Gupta
 
PPTX
Introduction to Hadoop
Ran Ziv
 
PPTX
Hadoop Developer
Edureka!
 
PDF
Introduction to Big Data & Hadoop
Edureka!
 
PPTX
A Basic Introduction to the Hadoop eco system - no animation
Sameer Tiwari
 
PPT
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
ryancox
 
PPTX
Huhadoop - v1.1
Big Data Joe™ Rossi
 
PPTX
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Simplilearn
 
PDF
Introduction to Hadoop
joelcrabb
 
PDF
Power Hadoop Cluster with AWS Cloud
Edureka!
 
Hadoop cluster configuration
prabakaranbrick
 
Hadoop Tutorial
awesomesos
 
Hadoop administration
Aneesh Pulickal Karunakaran
 
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hari Shankar Sreekumar
 
Hadoop admin training
Arun Kumar
 
Secure Hadoop Cluster With Kerberos
Edureka!
 
Deployment and Management of Hadoop Clusters
Amal G Jose
 
Introduction to hadoop administration jk
Edureka!
 
Hadoop - Overview
Jay
 
Hadoop File system (HDFS)
Prashant Gupta
 
Introduction to Hadoop
Ran Ziv
 
Hadoop Developer
Edureka!
 
Introduction to Big Data & Hadoop
Edureka!
 
A Basic Introduction to the Hadoop eco system - no animation
Sameer Tiwari
 
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
ryancox
 
Huhadoop - v1.1
Big Data Joe™ Rossi
 
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Simplilearn
 
Introduction to Hadoop
joelcrabb
 
Power Hadoop Cluster with AWS Cloud
Edureka!
 

Viewers also liked (17)

PPTX
HDFS Namenode High Availability
Hortonworks
 
PDF
HDFS NameNode High Availability
DataWorks Summit
 
PDF
Manage Hadoop Cluster with Ambari
TeK Charnsilp Chinprasert
 
PDF
[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축
Kwang Woo NAM
 
PDF
R + 15 minutes = Hadoop cluster
Jeffrey Breen
 
PDF
Troubleshooting Kafka's socket server: from incident to resolution
Joel Koshy
 
PDF
Hadoop Architecture Options for Existing Enterprise DataWarehouse
Asis Mohanty
 
PDF
Neutron high availability open stack architecture openstack israel event 2015
Arthur Berezin
 
PPTX
Keep your Hadoop Cluster at its Best
DataWorks Summit/Hadoop Summit
 
PPTX
Apache Hadoop YARN: best practices
DataWorks Summit
 
PDF
Hadoop Internals (2.3.0 or later)
Emilio Coppa
 
PPTX
Distributed Deep Learning on Hadoop Clusters
DataWorks Summit/Hadoop Summit
 
PPTX
Moving towards enterprise ready Hadoop clusters on the cloud
DataWorks Summit/Hadoop Summit
 
PPTX
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Hortonworks
 
PDF
Building a Modern Data Architecture with Enterprise Hadoop
Slim Baltagi
 
PPT
Hadoop 1.x vs 2
Rommel Garcia
 
PPTX
Introduction to YARN and MapReduce 2
Cloudera, Inc.
 
HDFS Namenode High Availability
Hortonworks
 
HDFS NameNode High Availability
DataWorks Summit
 
Manage Hadoop Cluster with Ambari
TeK Charnsilp Chinprasert
 
[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축
Kwang Woo NAM
 
R + 15 minutes = Hadoop cluster
Jeffrey Breen
 
Troubleshooting Kafka's socket server: from incident to resolution
Joel Koshy
 
Hadoop Architecture Options for Existing Enterprise DataWarehouse
Asis Mohanty
 
Neutron high availability open stack architecture openstack israel event 2015
Arthur Berezin
 
Keep your Hadoop Cluster at its Best
DataWorks Summit/Hadoop Summit
 
Apache Hadoop YARN: best practices
DataWorks Summit
 
Hadoop Internals (2.3.0 or later)
Emilio Coppa
 
Distributed Deep Learning on Hadoop Clusters
DataWorks Summit/Hadoop Summit
 
Moving towards enterprise ready Hadoop clusters on the cloud
DataWorks Summit/Hadoop Summit
 
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Hortonworks
 
Building a Modern Data Architecture with Enterprise Hadoop
Slim Baltagi
 
Hadoop 1.x vs 2
Rommel Garcia
 
Introduction to YARN and MapReduce 2
Cloudera, Inc.
 
Ad

Similar to Setting High Availability in Hadoop Cluster (20)

PDF
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solution
Edureka!
 
PDF
Hadoop Administration Core Concepts | Edureka
Edureka!
 
PDF
Hadoop Cluster With High Availability
Edureka!
 
PPTX
Understanding Hadoop
Mahendran Ponnusamy
 
PDF
Apache Hadoop YARN, NameNode HA, HDFS Federation
Adam Kawa
 
PPTX
Hadoop a Highly Available and Secure Enterprise Data Warehousing solution
Edureka!
 
ODP
Hadoop2
Gagan Agrawal
 
PDF
Hdfs high availability
Hadoop User Group
 
PDF
Hdfs high availability
Hadoop User Group
 
PPTX
Hadoop and It_s Components_PPT .pptx
ABHIJEETKUMAR632313
 
PDF
Hadoop Architecture and HDFS
Edureka!
 
PPTX
Introduction to Hadoop Administration
Edureka!
 
PDF
Hadoop ha system admin
Trieu Dao Minh
 
PPTX
[OSDC 2013] Hadoop Cluster HA 的經驗分享
Tsu-Fen Han
 
PPTX
Hadoop
Archana Gopinath
 
PPTX
Topic 9a-Hadoop Storage- HDFS.pptx
DanishMahmood23
 
PDF
IRJET- Performing Load Balancing between Namenodes in HDFS
IRJET Journal
 
PPTX
Introduction to Cloudera's Administrator Training for Apache Hadoop
Cloudera, Inc.
 
PPTX
Introduction to hadoop high availability
Omid Vahdaty
 
PDF
Hadoop distributed computing framework for big data
Cyanny LIANG
 
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solution
Edureka!
 
Hadoop Administration Core Concepts | Edureka
Edureka!
 
Hadoop Cluster With High Availability
Edureka!
 
Understanding Hadoop
Mahendran Ponnusamy
 
Apache Hadoop YARN, NameNode HA, HDFS Federation
Adam Kawa
 
Hadoop a Highly Available and Secure Enterprise Data Warehousing solution
Edureka!
 
Hadoop2
Gagan Agrawal
 
Hdfs high availability
Hadoop User Group
 
Hdfs high availability
Hadoop User Group
 
Hadoop and It_s Components_PPT .pptx
ABHIJEETKUMAR632313
 
Hadoop Architecture and HDFS
Edureka!
 
Introduction to Hadoop Administration
Edureka!
 
Hadoop ha system admin
Trieu Dao Minh
 
[OSDC 2013] Hadoop Cluster HA 的經驗分享
Tsu-Fen Han
 
Topic 9a-Hadoop Storage- HDFS.pptx
DanishMahmood23
 
IRJET- Performing Load Balancing between Namenodes in HDFS
IRJET Journal
 
Introduction to Cloudera's Administrator Training for Apache Hadoop
Cloudera, Inc.
 
Introduction to hadoop high availability
Omid Vahdaty
 
Hadoop distributed computing framework for big data
Cyanny LIANG
 
Ad

More from Edureka! (20)

PDF
What to learn during the 21 days Lockdown | Edureka
Edureka!
 
PDF
Top 10 Dying Programming Languages in 2020 | Edureka
Edureka!
 
PDF
Top 5 Trending Business Intelligence Tools | Edureka
Edureka!
 
PDF
Tableau Tutorial for Data Science | Edureka
Edureka!
 
PDF
Python Programming Tutorial | Edureka
Edureka!
 
PDF
Top 5 PMP Certifications | Edureka
Edureka!
 
PDF
Top Maven Interview Questions in 2020 | Edureka
Edureka!
 
PDF
Linux Mint Tutorial | Edureka
Edureka!
 
PDF
How to Deploy Java Web App in AWS| Edureka
Edureka!
 
PDF
Importance of Digital Marketing | Edureka
Edureka!
 
PDF
RPA in 2020 | Edureka
Edureka!
 
PDF
Email Notifications in Jenkins | Edureka
Edureka!
 
PDF
EA Algorithm in Machine Learning | Edureka
Edureka!
 
PDF
Cognitive AI Tutorial | Edureka
Edureka!
 
PDF
AWS Cloud Practitioner Tutorial | Edureka
Edureka!
 
PDF
Blue Prism Top Interview Questions | Edureka
Edureka!
 
PDF
Big Data on AWS Tutorial | Edureka
Edureka!
 
PDF
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
Edureka!
 
PDF
Kubernetes Installation on Ubuntu | Edureka
Edureka!
 
PDF
Introduction to DevOps | Edureka
Edureka!
 
What to learn during the 21 days Lockdown | Edureka
Edureka!
 
Top 10 Dying Programming Languages in 2020 | Edureka
Edureka!
 
Top 5 Trending Business Intelligence Tools | Edureka
Edureka!
 
Tableau Tutorial for Data Science | Edureka
Edureka!
 
Python Programming Tutorial | Edureka
Edureka!
 
Top 5 PMP Certifications | Edureka
Edureka!
 
Top Maven Interview Questions in 2020 | Edureka
Edureka!
 
Linux Mint Tutorial | Edureka
Edureka!
 
How to Deploy Java Web App in AWS| Edureka
Edureka!
 
Importance of Digital Marketing | Edureka
Edureka!
 
RPA in 2020 | Edureka
Edureka!
 
Email Notifications in Jenkins | Edureka
Edureka!
 
EA Algorithm in Machine Learning | Edureka
Edureka!
 
Cognitive AI Tutorial | Edureka
Edureka!
 
AWS Cloud Practitioner Tutorial | Edureka
Edureka!
 
Blue Prism Top Interview Questions | Edureka
Edureka!
 
Big Data on AWS Tutorial | Edureka
Edureka!
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
Edureka!
 
Kubernetes Installation on Ubuntu | Edureka
Edureka!
 
Introduction to DevOps | Edureka
Edureka!
 

Recently uploaded (20)

PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PDF
July Patch Tuesday
Ivanti
 
PDF
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
July Patch Tuesday
Ivanti
 
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 

Setting High Availability in Hadoop Cluster