SlideShare a Scribd company logo
View Hadoop Administration course details at www.edureka.co/hadoop-admin
Top 5 Hadoop admin tasks
www.edureka.co/hadoop-adminSlide 2
Objectives of this Session
At the end of this module, you will be able to
Understand Cluster Planning
Understand Hadoop fully distributed cluster set up
Add further nodes to the running cluster
Upgrade existing Hadoop cluster
Understand name node High availability
www.edureka.co/hadoop-adminSlide 3
Why Hadoop Administration
www.edureka.co/hadoop-adminSlide 4
With the Rise of Hadoop Adoption and usage across various industries, the role of Hadoop Administrator has
become very important and is in demand.
Hadoop Administrator
www.edureka.co/hadoop-adminSlide 5
Hadoop Administration Responsibilities
www.edureka.co/hadoop-adminSlide 6
HDFS Support & Maintenance Monitor Hadoop ClusterProviding Security
Integrating Different Frameworks Hadoop Infrastructure Maintenance
Hadoop Admin Responsibilities
www.edureka.co/hadoop-adminSlide 7
Top 5 Hadoop Admin Tasks
www.edureka.co/hadoop-adminSlide 8
Top 5 Hadoop Admin Tasks
Task-1
Cluster Planning
Task-2
Hadoop Cluster set up Hadoop Version upgrade
Task-3
Adding or Removing Nodes to Cluster Providing High Availability to Cluster
Task-4 Task-5
www.edureka.co/hadoop-adminSlide 9
Cluster Planning
Task-1
www.edureka.co/hadoop-adminSlide 10
RAM: 16GB
Hard disk: 6 x 2TB
Processor: Xenon with 2 cores
Ethernet: 3 x 10 GB/s
OS: 64-bit CentOS
Hadoop Cluster: A Typical Use Case
RAM: 16GB
Hard disk: 6 x 2TB
Processor: Xenon with 2 cores.
Ethernet: 3 x 10 GB/s
OS: 64-bit CentOS
RAM: 64 GB,
Hard disk: 1 TB
Processor: Xenon with 8 Cores
Ethernet: 3 x 10 GB/s
OS: 64-bit CentOS
Power: Redundant Power Supply
RAM: 32 GB,
Hard disk: 1 TB
Processor: Xenon with 4 Cores
Ethernet: 3 x 10 GB/s
OS: 64-bit CentOS
Power: Redundant Power Supply
Active NameNodeSecondary NameNode
DataNode DataNode
RAM: 64 GB,
Hard disk: 1 TB
Processor: Xenon with 8 Cores
Ethernet: 3 x 10 GB/s
OS: 64-bit CentOS
Power: Redundant Power Supply
StandBy NameNode
Optional
RAM: 16GB
Hard disk: 6 x 2TB
Processor: Xenon with 2 cores
Ethernet: 3 x 10 GB/s
OS: 64-bit CentOS
DataNode
DataNode DataNode DataNode
www.edureka.co/hadoop-adminSlide 11
Seeking cluster growth on storage capacity is often a good method to use!
Cluster Growth Based On Storage Capacity
Data grows by approximately
5TB per week
HDFS set up to replicate each
block three times
Thus, 15TB of extra storage
space required per week
Assuming machines with 5x3TB
hard drives, equating to a new
machine required each week
Assume Overheads to be 30%
www.edureka.co/hadoop-adminSlide 12
Slave Nodes: Recommended Configuration
Higher-performance vs lower performance components
Save the Money, Buy more Nodes!
 General ( Depends on requirement
‘base’ configuration for a slave Node
» 4 x 1 TB or 2 TB hard drives, in a
JBOD* configuration
» Do not use RAID!
» 2 x Quad-core CPUs
» 24 -32GB RAM
» Gigabit Ethernet
General Configuration
 Multiples of ( 1 hard drive + 2 cores
+ 6-8GB RAM) generally work well
for many types of applications
Special Configuration
Slave Nodes
“A cluster with more nodes performs better than one with fewer, slightly faster nodes”
www.edureka.co/hadoop-adminSlide 13
Slave Nodes: More Details (RAM)
Slave Nodes (RAM)
Generally each Map or Reduce task
will take 1GB to 2GB of RAM
Slave nodes should not be using
virtual memory
RULE OF THUMB!
Total number of tasks = 1.5 x number
of processor core
Ensure enough RAM is present to
run all tasks, plus the DataNode,
TaskTracker daemons, plus the
operating system
www.edureka.co/hadoop-adminSlide 14
Master Node Hardware Recommendations
Carrier-class hardware
(Not commodity hardware)
Dual power supplies
Dual Ethernet cards
(Bonded to provide failover)
Raided hard drives
At least 32GB of RAM
Master
Node
Requires
www.edureka.co/hadoop-adminSlide 15
Hadoop Cluster Set up
Task-2
www.edureka.co/hadoop-adminSlide 16
Hadoop Cluster Modes
Hadoop can run in any of the following three modes:
Fully-Distributed Mode
Pseudo-Distributed Mode
 No daemons, everything runs in a single JVM
 Suitable for running MapReduce programs during development
 Has no DFS
 Hadoop daemons run on the local machine
 Hadoop daemons run on a cluster of machines
Standalone (or Local) Mode
www.edureka.co/hadoop-adminSlide 17
Core
HDFS
core-site.xml
hdfs-site.xml
yarn-site.xmlYARN
mapred-site.xml
Map
Reduce
Hadoop 2.x Configuration Files – Apache Hadoop
www.edureka.co/hadoop-admin
www.edureka.co/hadoop-adminSlide 18
Configuration Files
Configuration
Filenames
Description of Log Files
hadoop-env.sh
yarn-env.sh
Settings for Hadoop Daemon’s process environment.
core-site.xml
Configuration settings for Hadoop Core such as I/O settings that common to both HDFS and
YARN.
hdfs-site.xml Configuration settings for HDFS Daemons, the Name Node and the Data Nodes.
yarn-site.xml Configuration setting for Resource Manager and Node Manager.
mapred-site.xml Configuration settings for MapReduce Applications.
slaves A list of machines (one per line) that each run DataNode and Node Manager.
www.edureka.co/hadoop-adminSlide 19
Hadoop Daemons
NameNode daemon
» Runs on master node of the Hadoop Distributed File System (HDFS)
» Directs Data Nodes to perform their low-level I/O tasks
DataNode daemon
» Runs on each slave machine in the HDFS
» Does the low-level I/O work
Resource Manager
» Runs on master node of the Data processing System(MapReduce)
» Global resource Scheduler
Node Manager
» Runs on each slave node of Data processing System
» Platform for the Data processing tasks
Job HistoryServer
» JobHistoryServer is responsible for servicing all job history related requests from client
www.edureka.co/hadoop-adminSlide 20
Hadoop 1.x and Hadoop 2.x Ecosystem
Pig Latin
Data Analysis
Hive
DW System
Other
YARN
Frameworks
(MPI, GIRAPH)
HBaseMapReduce Framework
YARN
Cluster Resource Management
Apache Oozie
(Workflow)
HDFS
(Hadoop Distributed File System)
Pig Latin
Data Analysis
Hive
DW System
MapReduce Framework
Apache Oozie
(Workflow)
HDFS
(Hadoop Distributed File System)
Pig Latin
Data Analysis
HBase
Structured DataUnstructured/
Semi-structured Data
Hadoop 1.x Hadoop 2.x
www.edureka.co/hadoop-adminSlide 21
Demo On Hadoop Cluster Set Up
www.edureka.co/hadoop-adminSlide 22
Hadoop Version upgrade
Task-3
www.edureka.co/hadoop-adminSlide 23
Stop map-reduce cluster and all client applications running on the DFS cluster
Take the back up of File System Name Space
Install new version of Hadoop software
Update the all configuration files in new Hadoop
start name node with Upgrade command
Compare the new HDFS file system with previous version file system name space
finalize upgrade.
Hadoop Version Upgrade
www.edureka.co/hadoop-adminSlide 24
1) Run Report
• FSCK
• LSR
• DFSADMIN
2) Take Back up
• Configuration
• Applications
• Data and Meta Data
3) Install new Version of Hadoop
4) Upgrade
hadoop-daemon.sh start namenode -upgrade
Hadoop Version Upgrade
5) Run New Reports
• FSCK
• LSR
• DFSADMIN
Compare old and new Reports
Test new Cluster
6) Finalize upgrade
• hadoop dfsadmin -finalizeUpgrade
www.edureka.co/hadoop-adminSlide 25
Adding or Removing Nodes
from Cluster
Task-4
www.edureka.co/hadoop-adminSlide 26
Commissioning and Decommissioning of DataNode
DataNode
Master Node
DataNode
DataNode DataNode DataNode
DataNodeDataNode
DataNode
DecommissioningCommissioning
www.edureka.co/hadoop-adminSlide 27
Add (Commission) DataNodes
Update the network
addresses in the
‘include’ files
dfs.include
mapred.include
Update the
NameNode:
hadoop dfsadmin
-refreshNodes
Update the Job
Tracker:
hadoop mradmin
-refreshNodes Update the
‘slaves’ file
Start the DataNode
and TaskTracker
hadoop-daemon.sh
start tasktracker
hadoop-daemon.sh
start datanode
Cross Check the Web
6 UI to ensure the
successful addition
Run Balancer to
7 move the HDFS
blocks to
DataNodes
1 2 3
4
5
www.edureka.co/hadoop-adminSlide 28
Demo On Commissioning Data Node
www.edureka.co/hadoop-adminSlide 29
Providing High Availability to Cluster
Task-5
www.edureka.co/hadoop-adminSlide 30
Prior to Hadoop 2.0.0, the NameNode was a single point of failure (SPOF) in an HDFS cluster. Each cluster had a
single NameNode, and if that machine or process became unavailable, the cluster as a whole would be unavailable
until the NameNode was either restarted or brought up on a separate machine.
Achieve the High Availability in two different ways
 HA using the Quorum Journal Manager (QJM) to share edit logs between the Active and Standby NameNodes.
 HA using NFS for shared storage instead of the QJM
High Availability (HA)
www.edureka.co/hadoop-adminSlide 31
Slave NodeSlave NodeSlave Node
Standby NodeActive Node
Journal Nodes
(Shared Edits)
Failover Controller
Standby
Failover Controller
Active
Zookeeper Service
Block Report & Heart
beat
Monitor status and
health. Manage HA
state
HA Architecture
Monitor status and
health. Manage HA
state
Write Read
www.edureka.co/hadoop-adminSlide 32
Demo On NameNode High Availability
www.edureka.co/hadoop-adminSlide 33
Hadoop admin Job Trends
Questions
www.edureka.co/hadoop-adminSlide 34 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Introduction to hadoop administration   jk

More Related Content

What's hot (20)

PDF
Top 5 Hadoop Admin Tasks
Edureka!
 
PDF
Introduction to Hadoop Administration
Ramesh Pabba - seeking new projects
 
PDF
Secure Hadoop Cluster With Kerberos
Edureka!
 
PPTX
Learn to setup a Hadoop Multi Node Cluster
Edureka!
 
DOC
Hadoop cluster configuration
prabakaranbrick
 
PDF
Hadoop single node installation on ubuntu 14
jijukjoseph
 
PDF
Hadoop installation by santosh nage
Santosh Nage
 
PDF
Introduction to apache hadoop
Shashwat Shriparv
 
DOCX
Apache kafka configuration-guide
Chetan Khatri
 
PPT
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
ryancox
 
ODT
Hadoop Interview Questions and Answers by rohit kapa
kapa rohit
 
PDF
Setting High Availability in Hadoop Cluster
Edureka!
 
PDF
Improving Hadoop Performance via Linux
Alex Moundalexis
 
DOC
Configure h base hadoop and hbase client
Shashwat Shriparv
 
PDF
Introduction to Hadoop
Ovidiu Dimulescu
 
PPTX
A day in the life of hadoop administrator!
Edureka!
 
PPTX
Optimizing your Infrastrucure and Operating System for Hadoop
DataWorks Summit
 
PDF
Improving Hadoop Cluster Performance via Linux Configuration
Alex Moundalexis
 
PPT
Hadoop Tutorial
awesomesos
 
PPT
Hadoop 1.x vs 2
Rommel Garcia
 
Top 5 Hadoop Admin Tasks
Edureka!
 
Introduction to Hadoop Administration
Ramesh Pabba - seeking new projects
 
Secure Hadoop Cluster With Kerberos
Edureka!
 
Learn to setup a Hadoop Multi Node Cluster
Edureka!
 
Hadoop cluster configuration
prabakaranbrick
 
Hadoop single node installation on ubuntu 14
jijukjoseph
 
Hadoop installation by santosh nage
Santosh Nage
 
Introduction to apache hadoop
Shashwat Shriparv
 
Apache kafka configuration-guide
Chetan Khatri
 
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
ryancox
 
Hadoop Interview Questions and Answers by rohit kapa
kapa rohit
 
Setting High Availability in Hadoop Cluster
Edureka!
 
Improving Hadoop Performance via Linux
Alex Moundalexis
 
Configure h base hadoop and hbase client
Shashwat Shriparv
 
Introduction to Hadoop
Ovidiu Dimulescu
 
A day in the life of hadoop administrator!
Edureka!
 
Optimizing your Infrastrucure and Operating System for Hadoop
DataWorks Summit
 
Improving Hadoop Cluster Performance via Linux Configuration
Alex Moundalexis
 
Hadoop Tutorial
awesomesos
 
Hadoop 1.x vs 2
Rommel Garcia
 

Viewers also liked (6)

PDF
Bn1028 demo hadoop administration and development
conline training
 
PDF
Introduction to Hadoop Administration
Ramesh Pabba - seeking new projects
 
PDF
Advanced Security In Hadoop Cluster
Edureka!
 
PDF
Introduction To Hadoop Administration - SpringPeople
SpringPeople
 
PPTX
Introduction to Hadoop Administration
Edureka!
 
PDF
Hadoop Administration pdf
Edureka!
 
Bn1028 demo hadoop administration and development
conline training
 
Introduction to Hadoop Administration
Ramesh Pabba - seeking new projects
 
Advanced Security In Hadoop Cluster
Edureka!
 
Introduction To Hadoop Administration - SpringPeople
SpringPeople
 
Introduction to Hadoop Administration
Edureka!
 
Hadoop Administration pdf
Edureka!
 
Ad

Similar to Introduction to hadoop administration jk (20)

PDF
Webinar: Top 5 Hadoop Admin Tasks
Edureka!
 
PDF
Power Hadoop Cluster with AWS Cloud
Edureka!
 
PDF
Hadoop Architecture and HDFS
Edureka!
 
PPTX
A Day in the Life of a Hadoop Administrator
Edureka!
 
PPTX
Hadoop Adminstration with Latest Release (2.0)
Edureka!
 
PPTX
Hadoop Developer
Edureka!
 
PPTX
Big data processing using hadoop poster presentation
Amrut Patil
 
PDF
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Edureka!
 
PDF
Hadoop_Admin_eVenkat
Venkat Krishnan
 
PDF
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
Edureka!
 
PDF
Hadoop MapReduce Framework
Edureka!
 
PDF
Hadoop Administration Core Concepts | Edureka
Edureka!
 
PPT
Hbase in action - Chapter 09: Deploying HBase
phanleson
 
PPTX
Distro-independent Hadoop cluster management
DataWorks Summit
 
PPTX
Hadoop a Highly Available and Secure Enterprise Data Warehousing solution
Edureka!
 
PPTX
Hadoop configuration & performance tuning
Vitthal Gogate
 
PDF
Hadoop Administration Online Training.pdf
SpiritsoftsTraining
 
PPTX
Introduction to Cloudera's Administrator Training for Apache Hadoop
Cloudera, Inc.
 
PDF
Hadoop Administration Training | Hadoop Administration Tutorial | Hadoop Admi...
Edureka!
 
PDF
Hadoop Operations: Keeping the Elephant Running Smoothly
Michael Arnold
 
Webinar: Top 5 Hadoop Admin Tasks
Edureka!
 
Power Hadoop Cluster with AWS Cloud
Edureka!
 
Hadoop Architecture and HDFS
Edureka!
 
A Day in the Life of a Hadoop Administrator
Edureka!
 
Hadoop Adminstration with Latest Release (2.0)
Edureka!
 
Hadoop Developer
Edureka!
 
Big data processing using hadoop poster presentation
Amrut Patil
 
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Edureka!
 
Hadoop_Admin_eVenkat
Venkat Krishnan
 
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
Edureka!
 
Hadoop MapReduce Framework
Edureka!
 
Hadoop Administration Core Concepts | Edureka
Edureka!
 
Hbase in action - Chapter 09: Deploying HBase
phanleson
 
Distro-independent Hadoop cluster management
DataWorks Summit
 
Hadoop a Highly Available and Secure Enterprise Data Warehousing solution
Edureka!
 
Hadoop configuration & performance tuning
Vitthal Gogate
 
Hadoop Administration Online Training.pdf
SpiritsoftsTraining
 
Introduction to Cloudera's Administrator Training for Apache Hadoop
Cloudera, Inc.
 
Hadoop Administration Training | Hadoop Administration Tutorial | Hadoop Admi...
Edureka!
 
Hadoop Operations: Keeping the Elephant Running Smoothly
Michael Arnold
 
Ad

More from Edureka! (20)

PDF
What to learn during the 21 days Lockdown | Edureka
Edureka!
 
PDF
Top 10 Dying Programming Languages in 2020 | Edureka
Edureka!
 
PDF
Top 5 Trending Business Intelligence Tools | Edureka
Edureka!
 
PDF
Tableau Tutorial for Data Science | Edureka
Edureka!
 
PDF
Python Programming Tutorial | Edureka
Edureka!
 
PDF
Top 5 PMP Certifications | Edureka
Edureka!
 
PDF
Top Maven Interview Questions in 2020 | Edureka
Edureka!
 
PDF
Linux Mint Tutorial | Edureka
Edureka!
 
PDF
How to Deploy Java Web App in AWS| Edureka
Edureka!
 
PDF
Importance of Digital Marketing | Edureka
Edureka!
 
PDF
RPA in 2020 | Edureka
Edureka!
 
PDF
Email Notifications in Jenkins | Edureka
Edureka!
 
PDF
EA Algorithm in Machine Learning | Edureka
Edureka!
 
PDF
Cognitive AI Tutorial | Edureka
Edureka!
 
PDF
AWS Cloud Practitioner Tutorial | Edureka
Edureka!
 
PDF
Blue Prism Top Interview Questions | Edureka
Edureka!
 
PDF
Big Data on AWS Tutorial | Edureka
Edureka!
 
PDF
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
Edureka!
 
PDF
Kubernetes Installation on Ubuntu | Edureka
Edureka!
 
PDF
Introduction to DevOps | Edureka
Edureka!
 
What to learn during the 21 days Lockdown | Edureka
Edureka!
 
Top 10 Dying Programming Languages in 2020 | Edureka
Edureka!
 
Top 5 Trending Business Intelligence Tools | Edureka
Edureka!
 
Tableau Tutorial for Data Science | Edureka
Edureka!
 
Python Programming Tutorial | Edureka
Edureka!
 
Top 5 PMP Certifications | Edureka
Edureka!
 
Top Maven Interview Questions in 2020 | Edureka
Edureka!
 
Linux Mint Tutorial | Edureka
Edureka!
 
How to Deploy Java Web App in AWS| Edureka
Edureka!
 
Importance of Digital Marketing | Edureka
Edureka!
 
RPA in 2020 | Edureka
Edureka!
 
Email Notifications in Jenkins | Edureka
Edureka!
 
EA Algorithm in Machine Learning | Edureka
Edureka!
 
Cognitive AI Tutorial | Edureka
Edureka!
 
AWS Cloud Practitioner Tutorial | Edureka
Edureka!
 
Blue Prism Top Interview Questions | Edureka
Edureka!
 
Big Data on AWS Tutorial | Edureka
Edureka!
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
Edureka!
 
Kubernetes Installation on Ubuntu | Edureka
Edureka!
 
Introduction to DevOps | Edureka
Edureka!
 

Recently uploaded (20)

PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
July Patch Tuesday
Ivanti
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PPTX
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
July Patch Tuesday
Ivanti
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 

Introduction to hadoop administration jk

  • 1. View Hadoop Administration course details at www.edureka.co/hadoop-admin Top 5 Hadoop admin tasks
  • 2. www.edureka.co/hadoop-adminSlide 2 Objectives of this Session At the end of this module, you will be able to Understand Cluster Planning Understand Hadoop fully distributed cluster set up Add further nodes to the running cluster Upgrade existing Hadoop cluster Understand name node High availability
  • 4. www.edureka.co/hadoop-adminSlide 4 With the Rise of Hadoop Adoption and usage across various industries, the role of Hadoop Administrator has become very important and is in demand. Hadoop Administrator
  • 6. www.edureka.co/hadoop-adminSlide 6 HDFS Support & Maintenance Monitor Hadoop ClusterProviding Security Integrating Different Frameworks Hadoop Infrastructure Maintenance Hadoop Admin Responsibilities
  • 8. www.edureka.co/hadoop-adminSlide 8 Top 5 Hadoop Admin Tasks Task-1 Cluster Planning Task-2 Hadoop Cluster set up Hadoop Version upgrade Task-3 Adding or Removing Nodes to Cluster Providing High Availability to Cluster Task-4 Task-5
  • 10. www.edureka.co/hadoop-adminSlide 10 RAM: 16GB Hard disk: 6 x 2TB Processor: Xenon with 2 cores Ethernet: 3 x 10 GB/s OS: 64-bit CentOS Hadoop Cluster: A Typical Use Case RAM: 16GB Hard disk: 6 x 2TB Processor: Xenon with 2 cores. Ethernet: 3 x 10 GB/s OS: 64-bit CentOS RAM: 64 GB, Hard disk: 1 TB Processor: Xenon with 8 Cores Ethernet: 3 x 10 GB/s OS: 64-bit CentOS Power: Redundant Power Supply RAM: 32 GB, Hard disk: 1 TB Processor: Xenon with 4 Cores Ethernet: 3 x 10 GB/s OS: 64-bit CentOS Power: Redundant Power Supply Active NameNodeSecondary NameNode DataNode DataNode RAM: 64 GB, Hard disk: 1 TB Processor: Xenon with 8 Cores Ethernet: 3 x 10 GB/s OS: 64-bit CentOS Power: Redundant Power Supply StandBy NameNode Optional RAM: 16GB Hard disk: 6 x 2TB Processor: Xenon with 2 cores Ethernet: 3 x 10 GB/s OS: 64-bit CentOS DataNode DataNode DataNode DataNode
  • 11. www.edureka.co/hadoop-adminSlide 11 Seeking cluster growth on storage capacity is often a good method to use! Cluster Growth Based On Storage Capacity Data grows by approximately 5TB per week HDFS set up to replicate each block three times Thus, 15TB of extra storage space required per week Assuming machines with 5x3TB hard drives, equating to a new machine required each week Assume Overheads to be 30%
  • 12. www.edureka.co/hadoop-adminSlide 12 Slave Nodes: Recommended Configuration Higher-performance vs lower performance components Save the Money, Buy more Nodes!  General ( Depends on requirement ‘base’ configuration for a slave Node » 4 x 1 TB or 2 TB hard drives, in a JBOD* configuration » Do not use RAID! » 2 x Quad-core CPUs » 24 -32GB RAM » Gigabit Ethernet General Configuration  Multiples of ( 1 hard drive + 2 cores + 6-8GB RAM) generally work well for many types of applications Special Configuration Slave Nodes “A cluster with more nodes performs better than one with fewer, slightly faster nodes”
  • 13. www.edureka.co/hadoop-adminSlide 13 Slave Nodes: More Details (RAM) Slave Nodes (RAM) Generally each Map or Reduce task will take 1GB to 2GB of RAM Slave nodes should not be using virtual memory RULE OF THUMB! Total number of tasks = 1.5 x number of processor core Ensure enough RAM is present to run all tasks, plus the DataNode, TaskTracker daemons, plus the operating system
  • 14. www.edureka.co/hadoop-adminSlide 14 Master Node Hardware Recommendations Carrier-class hardware (Not commodity hardware) Dual power supplies Dual Ethernet cards (Bonded to provide failover) Raided hard drives At least 32GB of RAM Master Node Requires
  • 16. www.edureka.co/hadoop-adminSlide 16 Hadoop Cluster Modes Hadoop can run in any of the following three modes: Fully-Distributed Mode Pseudo-Distributed Mode  No daemons, everything runs in a single JVM  Suitable for running MapReduce programs during development  Has no DFS  Hadoop daemons run on the local machine  Hadoop daemons run on a cluster of machines Standalone (or Local) Mode
  • 18. www.edureka.co/hadoop-adminSlide 18 Configuration Files Configuration Filenames Description of Log Files hadoop-env.sh yarn-env.sh Settings for Hadoop Daemon’s process environment. core-site.xml Configuration settings for Hadoop Core such as I/O settings that common to both HDFS and YARN. hdfs-site.xml Configuration settings for HDFS Daemons, the Name Node and the Data Nodes. yarn-site.xml Configuration setting for Resource Manager and Node Manager. mapred-site.xml Configuration settings for MapReduce Applications. slaves A list of machines (one per line) that each run DataNode and Node Manager.
  • 19. www.edureka.co/hadoop-adminSlide 19 Hadoop Daemons NameNode daemon » Runs on master node of the Hadoop Distributed File System (HDFS) » Directs Data Nodes to perform their low-level I/O tasks DataNode daemon » Runs on each slave machine in the HDFS » Does the low-level I/O work Resource Manager » Runs on master node of the Data processing System(MapReduce) » Global resource Scheduler Node Manager » Runs on each slave node of Data processing System » Platform for the Data processing tasks Job HistoryServer » JobHistoryServer is responsible for servicing all job history related requests from client
  • 20. www.edureka.co/hadoop-adminSlide 20 Hadoop 1.x and Hadoop 2.x Ecosystem Pig Latin Data Analysis Hive DW System Other YARN Frameworks (MPI, GIRAPH) HBaseMapReduce Framework YARN Cluster Resource Management Apache Oozie (Workflow) HDFS (Hadoop Distributed File System) Pig Latin Data Analysis Hive DW System MapReduce Framework Apache Oozie (Workflow) HDFS (Hadoop Distributed File System) Pig Latin Data Analysis HBase Structured DataUnstructured/ Semi-structured Data Hadoop 1.x Hadoop 2.x
  • 23. www.edureka.co/hadoop-adminSlide 23 Stop map-reduce cluster and all client applications running on the DFS cluster Take the back up of File System Name Space Install new version of Hadoop software Update the all configuration files in new Hadoop start name node with Upgrade command Compare the new HDFS file system with previous version file system name space finalize upgrade. Hadoop Version Upgrade
  • 24. www.edureka.co/hadoop-adminSlide 24 1) Run Report • FSCK • LSR • DFSADMIN 2) Take Back up • Configuration • Applications • Data and Meta Data 3) Install new Version of Hadoop 4) Upgrade hadoop-daemon.sh start namenode -upgrade Hadoop Version Upgrade 5) Run New Reports • FSCK • LSR • DFSADMIN Compare old and new Reports Test new Cluster 6) Finalize upgrade • hadoop dfsadmin -finalizeUpgrade
  • 25. www.edureka.co/hadoop-adminSlide 25 Adding or Removing Nodes from Cluster Task-4
  • 26. www.edureka.co/hadoop-adminSlide 26 Commissioning and Decommissioning of DataNode DataNode Master Node DataNode DataNode DataNode DataNode DataNodeDataNode DataNode DecommissioningCommissioning
  • 27. www.edureka.co/hadoop-adminSlide 27 Add (Commission) DataNodes Update the network addresses in the ‘include’ files dfs.include mapred.include Update the NameNode: hadoop dfsadmin -refreshNodes Update the Job Tracker: hadoop mradmin -refreshNodes Update the ‘slaves’ file Start the DataNode and TaskTracker hadoop-daemon.sh start tasktracker hadoop-daemon.sh start datanode Cross Check the Web 6 UI to ensure the successful addition Run Balancer to 7 move the HDFS blocks to DataNodes 1 2 3 4 5
  • 29. www.edureka.co/hadoop-adminSlide 29 Providing High Availability to Cluster Task-5
  • 30. www.edureka.co/hadoop-adminSlide 30 Prior to Hadoop 2.0.0, the NameNode was a single point of failure (SPOF) in an HDFS cluster. Each cluster had a single NameNode, and if that machine or process became unavailable, the cluster as a whole would be unavailable until the NameNode was either restarted or brought up on a separate machine. Achieve the High Availability in two different ways  HA using the Quorum Journal Manager (QJM) to share edit logs between the Active and Standby NameNodes.  HA using NFS for shared storage instead of the QJM High Availability (HA)
  • 31. www.edureka.co/hadoop-adminSlide 31 Slave NodeSlave NodeSlave Node Standby NodeActive Node Journal Nodes (Shared Edits) Failover Controller Standby Failover Controller Active Zookeeper Service Block Report & Heart beat Monitor status and health. Manage HA state HA Architecture Monitor status and health. Manage HA state Write Read
  • 32. www.edureka.co/hadoop-adminSlide 32 Demo On NameNode High Availability
  • 34. Questions www.edureka.co/hadoop-adminSlide 34 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions