SlideShare a Scribd company logo
View Hadoop Administration Course at www.edureka.co/hadoop-admin
Power the Hadoop Cluster with AWS Cloud
www.edureka.co/hadoop-adminSlide 2 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Objectives
At the end of this module, you will be able to
Hadoop Cluster introduction
Recommended Configuration for cluster
Hadoop cluster running modes
Hadoop configuration files
Hadoop Admin Responsibilities
Hadoop cluster set up on AWS Demo
Slide 3Slide 3Slide 3 www.edureka.co/java-hadoop
Hadoop Core Components
Hadoop 2.x Core Components
HDFS YARN
Storage Processing
DataNode
NameNode Resource Manager
Node Manager
Master
Slave
Secondary
NameNode
www.edureka.co/hadoop-admin
Slide 4
RAM: 16GB
Hard disk: 6 x 2TB
Processor: Xenon with 2 cores
Ethernet: 3 x 10 GB/s
OS: 64-bit CentOS
Hadoop Cluster: A Typical Use Case
RAM: 16GB
Hard disk: 6 x 2TB
Processor: Xenon with 2 cores.
Ethernet: 3 x 10 GB/s
OS: 64-bit CentOS
RAM: 64 GB,
Hard disk: 1 TB
Processor: Xenon with 8 Cores
Ethernet: 3 x 10 GB/s
OS: 64-bit CentOS
Power: Redundant Power Supply
RAM: 32 GB,
Hard disk: 1 TB
Processor: Xenon with 4 Cores
Ethernet: 3 x 10 GB/s
OS: 64-bit CentOS
Power: Redundant Power Supply
Active NameNodeSecondary NameNode
DataNode DataNode
RAM: 64 GB,
Hard disk: 1 TB
Processor: Xenon with 8 Cores
Ethernet: 3 x 10 GB/s
OS: 64-bit CentOS
Power: Redundant Power Supply
StandBy NameNode
Optional
RAM: 16GB
Hard disk: 6 x 2TB
Processor: Xenon with 2 cores
Ethernet: 3 x 10 GB/s
OS: 64-bit CentOS
DataNode
DataNode DataNode DataNode
www.edureka.co/hadoop-admin
www.edureka.co/hadoop-adminSlide 5
Seeking cluster growth on storage capacity is often a good method to use!
Cluster Growth Based On Storage Capacity
Data grows by approximately
5TB per week
HDFS set up to replicate each
block three times
Thus, 15TB of extra storage
space required per week
Assuming machines with 5x3TB
hard drives, equating to a new
machine required each week
Assume Overheads to be 30%
www.edureka.co/hadoop-adminSlide 6
Slave Nodes: Recommended Configuration
Higher-performance vs lower performance components
Save the Money, Buy more Nodes!
 General ( Depends on requirement
‘base’ configuration for a slave Node
» 4 x 1 TB or 2 TB hard drives, in a
JBOD* configuration
» Do not use RAID!
» 2 x Quad-core CPUs
» 24 -32GB RAM
» Gigabit Ethernet
General Configuration
 Multiples of ( 1 hard drive + 2 cores
+ 6-8GB RAM) generally work well
for many types of applications
Special Configuration
Slave Nodes
“A cluster with more nodes performs better than one with fewer, slightly faster nodes”
www.edureka.co/hadoop-adminSlide 7
Slave Nodes: More Details (RAM)
Slave Nodes (RAM)
Generally each Map or Reduce task
will take 1GB to 2GB of RAM
Slave nodes should not be using
virtual memory
RULE OF THUMB!
Total number of tasks = 1.5 x number
of processor core
Ensure enough RAM is present to
run all tasks, plus the DataNode,
TaskTracker daemons, plus the
operating system
www.edureka.co/hadoop-adminSlide 8
Master Node Hardware Recommendations
Carrier-class hardware
(Not commodity hardware)
Dual power supplies
Dual Ethernet cards
(Bonded to provide failover)
Raided hard drives
At least 32GB of RAM
Master
Node
Requires
www.edureka.co/hadoop-adminSlide 9
Hadoop Cluster Modes
Hadoop can run in any of the following three modes:
Fully-Distributed Mode
Pseudo-Distributed Mode
 No daemons, everything runs in a single JVM
 Suitable for running MapReduce programs during development
 Has no DFS
 Hadoop daemons run on the local machine
 Hadoop daemons run on a cluster of machines
Standalone (or Local) Mode
www.edureka.co/hadoop-adminSlide 10
Configuration Files
Configuration
Filenames
Description of Log Files
hadoop-env.sh
yarn-env.sh
Settings for Hadoop Daemon’s process environment.
core-site.xml
Configuration settings for Hadoop Core such as I/O settings that common to both HDFS and
YARN.
hdfs-site.xml Configuration settings for HDFS Daemons, the Name Node and the Data Nodes.
yarn-site.xml Configuration setting for Resource Manager and Node Manager.
mapred-site.xml Configuration settings for MapReduce Applications.
slaves A list of machines (one per line) that each run DataNode and Node Manager.
www.edureka.co/hadoop-adminSlide 11
Configuration Files (Contd.)
Deprecated Property Name New Property Name
dfs.data.dir dfs.datanode.data.dir
dfs.http.address dfs.namenode.http-address
fs.default.name fs.defaultFS
The core functionality and usage of these core configuration files are same in Hadoop 2.0 and 1.0 but many new properties
have been added and many have been deprecated
For example:
 ’fs.default.name’ has been deprecated and replaced with ‘fs.defaultFS’ for YARN in core-site.xml
 ‘dfs.nameservices’ has been added to enable NameNode High Availability in hdfs-site.xml
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/DeprecatedProperties.html
 In Hadoop 2.2.0 release, you can use either the old or the new properties
 The old property names are now deprecated, but still work!
Slide 12
Core
HDFS
core-site.xml
hdfs-site.xml
yarn-site.xmlYARN
mapred-site.xml
Map
Reduce
Hadoop 2.x Configuration Files – Apache Hadoop
www.edureka.co/hadoop-admin
Slide 13
Hadoop Daemons
NameNode daemon
» Runs on master node of the Hadoop Distributed File System (HDFS)
» Directs Data Nodes to perform their low-level I/O tasks
DataNode daemon
» Runs on each slave machine in the HDFS
» Does the low-level I/O work
Resource Manager
» Runs on master node of the Data processing System(MapReduce)
» Global resource Scheduler
Node Manager
» Runs on each slave node of Data processing System
» Platform for the Data processing tasks
Job HistoryServer
» JobHistoryServer is responsible for servicing all job history related requests from client
www.edureka.co/hadoop-admin
www.edureka.co/hadoop-adminSlide 14
Why Cloud?
Challenges in current trend:
Arranging a large common storage area
Providing secure access to the shared data
www.edureka.co/hadoop-adminSlide 15
Amazon EC2
A cloud web host that allows you to dynamically add and remove computer server resources as you need them,
allowing you to pay for only the capacity that you used.
Good For Hadoop Cluster set : we can bring up enormous cluster with in minutes and then spin it down when we
have finished to reduce costs.
www.edureka.co/hadoop-adminSlide 16
Hadoop on AWS
ANALYZING…
www.edureka.co/hadoop-adminSlide 17
DEMO
www.edureka.co/hadoop-adminSlide 18
Hadoop Admin Responsibilities
Responsible for implementation and administration of Hadoop infrastructure.
Testing HDFS, Hive, Pig and MapReduce access for Applications.
Cluster maintenance tasks like Backup, Recovery, Upgrade, Patching.
Performance tuning and Capacity planning for Clusters.
Monitor Hadoop cluster and deploy security.
LIVE Online Class
Class Recording in LMS
24/7 Post Class Support
Module Wise Quiz
Project Work
Verifiable Certificate
www.edureka.co/hadoop-adminSlide 19 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
How it Works?
Questions
www.edureka.co/hadoop-adminSlide 20 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
www.edureka.co/hadoop-adminSlide 21
Course Topics
 Module 1
» Hadoop Cluster Administration
 Module 2
» Hadoop Architecture and Cluster setup
 Module 3
» Hadoop Cluster: Planning and Managing
 Module 4
» Backup, Recovery and Maintenance
 Module 5
» Hadoop 2.0 and High Availability
 Module 6
» Advanced Topics: QJM, HDFS Federation and
Security
 Module 7
» Oozie, Hcatalog/Hive and HBase Administration
 Module 8
» Project: Hadoop Implementation
Power Hadoop Cluster with AWS Cloud

More Related Content

What's hot (20)

DOCX
Hadoop admin training
Arun Kumar
 
PDF
Hadoop Career Path and Interview Preparation
Edureka!
 
PDF
Introduction to Hadoop Administration
Ramesh Pabba - seeking new projects
 
PDF
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solution
Edureka!
 
PPTX
Hadoop for Data Warehousing professionals
Edureka!
 
PPTX
Introduction to Big Data and Hadoop
Edureka!
 
PDF
Administer Hadoop Cluster
Edureka!
 
PPTX
Hadoop administration
Aneesh Pulickal Karunakaran
 
PDF
XML Parsing with Map Reduce
Edureka!
 
PPTX
Hadoop for Java Professionals
Edureka!
 
PPT
Deployment and Management of Hadoop Clusters
Amal G Jose
 
PDF
5 things one must know about spark!
Edureka!
 
PDF
Hadoop Developer
Edureka!
 
DOC
Hadoop cluster configuration
prabakaranbrick
 
PDF
Hadoop MapReduce Framework
Edureka!
 
PDF
Hadoop installation, Configuration, and Mapreduce program
Praveen Kumar Donta
 
PPT
Seminar Presentation Hadoop
Varun Narang
 
PPT
Hadoop Tutorial
awesomesos
 
PDF
Introduction to Hadoop
Giovanna Roda
 
PDF
Big data Hadoop Analytic and Data warehouse comparison guide
Danairat Thanabodithammachari
 
Hadoop admin training
Arun Kumar
 
Hadoop Career Path and Interview Preparation
Edureka!
 
Introduction to Hadoop Administration
Ramesh Pabba - seeking new projects
 
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solution
Edureka!
 
Hadoop for Data Warehousing professionals
Edureka!
 
Introduction to Big Data and Hadoop
Edureka!
 
Administer Hadoop Cluster
Edureka!
 
Hadoop administration
Aneesh Pulickal Karunakaran
 
XML Parsing with Map Reduce
Edureka!
 
Hadoop for Java Professionals
Edureka!
 
Deployment and Management of Hadoop Clusters
Amal G Jose
 
5 things one must know about spark!
Edureka!
 
Hadoop Developer
Edureka!
 
Hadoop cluster configuration
prabakaranbrick
 
Hadoop MapReduce Framework
Edureka!
 
Hadoop installation, Configuration, and Mapreduce program
Praveen Kumar Donta
 
Seminar Presentation Hadoop
Varun Narang
 
Hadoop Tutorial
awesomesos
 
Introduction to Hadoop
Giovanna Roda
 
Big data Hadoop Analytic and Data warehouse comparison guide
Danairat Thanabodithammachari
 

Viewers also liked (13)

PDF
Bulk Loading Into HBase With MapReduce
Edureka!
 
PDF
5 Scenarios: When To Use & When Not to Use Hadoop
Edureka!
 
PDF
Hadoop Cluster With High Availability
Edureka!
 
PPTX
Build your own 4 node virtualized hadoop cluster
Peter Avakian
 
PDF
Talend For Big Data : Secret Key to Hadoop
Edureka!
 
PDF
Advanced Security In Hadoop Cluster
Edureka!
 
PPTX
Whatisbigdataandwhylearnhadoop
Edureka!
 
PDF
Cloudera cluster setup and configuration
Sudheer Kondla
 
PDF
Top 5 Hadoop Admin Tasks
Edureka!
 
PPTX
Cloud Computing with AWS
Edureka!
 
PPTX
Hadoop AWS infrastructure cost evaluation
mattlieber
 
PPTX
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
Edureka!
 
PDF
Cloudera Impala technical deep dive
huguk
 
Bulk Loading Into HBase With MapReduce
Edureka!
 
5 Scenarios: When To Use & When Not to Use Hadoop
Edureka!
 
Hadoop Cluster With High Availability
Edureka!
 
Build your own 4 node virtualized hadoop cluster
Peter Avakian
 
Talend For Big Data : Secret Key to Hadoop
Edureka!
 
Advanced Security In Hadoop Cluster
Edureka!
 
Whatisbigdataandwhylearnhadoop
Edureka!
 
Cloudera cluster setup and configuration
Sudheer Kondla
 
Top 5 Hadoop Admin Tasks
Edureka!
 
Cloud Computing with AWS
Edureka!
 
Hadoop AWS infrastructure cost evaluation
mattlieber
 
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
Edureka!
 
Cloudera Impala technical deep dive
huguk
 
Ad

Similar to Power Hadoop Cluster with AWS Cloud (20)

PDF
Webinar: Top 5 Hadoop Admin Tasks
Edureka!
 
PDF
Introduction to hadoop administration jk
Edureka!
 
PPTX
Learn Hadoop Administration
Edureka!
 
PPTX
Learn to setup a Hadoop Multi Node Cluster
Edureka!
 
PPTX
A day in the life of hadoop administrator!
Edureka!
 
PDF
Hadoop_Admin_eVenkat
Venkat Krishnan
 
PPTX
Big data processing using hadoop poster presentation
Amrut Patil
 
PPTX
Hadoop Adminstration with Latest Release (2.0)
Edureka!
 
PPT
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
ryancox
 
PDF
Cloudera Hadoop Administrator Content - ReadyNerd
ReadyNerd Computer Academy
 
PDF
Hadoop Administration Online Training.pdf
SpiritsoftsTraining
 
PPT
Hbase in action - Chapter 09: Deploying HBase
phanleson
 
PPTX
Distro-independent Hadoop cluster management
DataWorks Summit
 
PDF
Introduction to Big Data & Hadoop
Edureka!
 
PDF
Hadoop Administration Certification Training in Bangalore
myTectra Learning Solutions Private Ltd
 
PPTX
Hadoop configuration & performance tuning
Vitthal Gogate
 
PPTX
Introduction to Cloudera's Administrator Training for Apache Hadoop
Cloudera, Inc.
 
PDF
Hadoop Administration Training | Hadoop Administration Tutorial | Hadoop Admi...
Edureka!
 
PDF
Technix-Pro Cloudera Certified Admin for Hadoop Exam Prep.
Technix-Pro Systems Consulting Ltd
 
PDF
Hadoop tutorial-pdf.pdf
Sheetal Jain
 
Webinar: Top 5 Hadoop Admin Tasks
Edureka!
 
Introduction to hadoop administration jk
Edureka!
 
Learn Hadoop Administration
Edureka!
 
Learn to setup a Hadoop Multi Node Cluster
Edureka!
 
A day in the life of hadoop administrator!
Edureka!
 
Hadoop_Admin_eVenkat
Venkat Krishnan
 
Big data processing using hadoop poster presentation
Amrut Patil
 
Hadoop Adminstration with Latest Release (2.0)
Edureka!
 
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
ryancox
 
Cloudera Hadoop Administrator Content - ReadyNerd
ReadyNerd Computer Academy
 
Hadoop Administration Online Training.pdf
SpiritsoftsTraining
 
Hbase in action - Chapter 09: Deploying HBase
phanleson
 
Distro-independent Hadoop cluster management
DataWorks Summit
 
Introduction to Big Data & Hadoop
Edureka!
 
Hadoop Administration Certification Training in Bangalore
myTectra Learning Solutions Private Ltd
 
Hadoop configuration & performance tuning
Vitthal Gogate
 
Introduction to Cloudera's Administrator Training for Apache Hadoop
Cloudera, Inc.
 
Hadoop Administration Training | Hadoop Administration Tutorial | Hadoop Admi...
Edureka!
 
Technix-Pro Cloudera Certified Admin for Hadoop Exam Prep.
Technix-Pro Systems Consulting Ltd
 
Hadoop tutorial-pdf.pdf
Sheetal Jain
 
Ad

More from Edureka! (20)

PDF
What to learn during the 21 days Lockdown | Edureka
Edureka!
 
PDF
Top 10 Dying Programming Languages in 2020 | Edureka
Edureka!
 
PDF
Top 5 Trending Business Intelligence Tools | Edureka
Edureka!
 
PDF
Tableau Tutorial for Data Science | Edureka
Edureka!
 
PDF
Python Programming Tutorial | Edureka
Edureka!
 
PDF
Top 5 PMP Certifications | Edureka
Edureka!
 
PDF
Top Maven Interview Questions in 2020 | Edureka
Edureka!
 
PDF
Linux Mint Tutorial | Edureka
Edureka!
 
PDF
How to Deploy Java Web App in AWS| Edureka
Edureka!
 
PDF
Importance of Digital Marketing | Edureka
Edureka!
 
PDF
RPA in 2020 | Edureka
Edureka!
 
PDF
Email Notifications in Jenkins | Edureka
Edureka!
 
PDF
EA Algorithm in Machine Learning | Edureka
Edureka!
 
PDF
Cognitive AI Tutorial | Edureka
Edureka!
 
PDF
AWS Cloud Practitioner Tutorial | Edureka
Edureka!
 
PDF
Blue Prism Top Interview Questions | Edureka
Edureka!
 
PDF
Big Data on AWS Tutorial | Edureka
Edureka!
 
PDF
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
Edureka!
 
PDF
Kubernetes Installation on Ubuntu | Edureka
Edureka!
 
PDF
Introduction to DevOps | Edureka
Edureka!
 
What to learn during the 21 days Lockdown | Edureka
Edureka!
 
Top 10 Dying Programming Languages in 2020 | Edureka
Edureka!
 
Top 5 Trending Business Intelligence Tools | Edureka
Edureka!
 
Tableau Tutorial for Data Science | Edureka
Edureka!
 
Python Programming Tutorial | Edureka
Edureka!
 
Top 5 PMP Certifications | Edureka
Edureka!
 
Top Maven Interview Questions in 2020 | Edureka
Edureka!
 
Linux Mint Tutorial | Edureka
Edureka!
 
How to Deploy Java Web App in AWS| Edureka
Edureka!
 
Importance of Digital Marketing | Edureka
Edureka!
 
RPA in 2020 | Edureka
Edureka!
 
Email Notifications in Jenkins | Edureka
Edureka!
 
EA Algorithm in Machine Learning | Edureka
Edureka!
 
Cognitive AI Tutorial | Edureka
Edureka!
 
AWS Cloud Practitioner Tutorial | Edureka
Edureka!
 
Blue Prism Top Interview Questions | Edureka
Edureka!
 
Big Data on AWS Tutorial | Edureka
Edureka!
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
Edureka!
 
Kubernetes Installation on Ubuntu | Edureka
Edureka!
 
Introduction to DevOps | Edureka
Edureka!
 

Recently uploaded (20)

PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
Biography of Daniel Podor.pdf
Daniel Podor
 
PDF
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Biography of Daniel Podor.pdf
Daniel Podor
 
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 

Power Hadoop Cluster with AWS Cloud

  • 1. View Hadoop Administration Course at www.edureka.co/hadoop-admin Power the Hadoop Cluster with AWS Cloud
  • 2. www.edureka.co/hadoop-adminSlide 2 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Objectives At the end of this module, you will be able to Hadoop Cluster introduction Recommended Configuration for cluster Hadoop cluster running modes Hadoop configuration files Hadoop Admin Responsibilities Hadoop cluster set up on AWS Demo
  • 3. Slide 3Slide 3Slide 3 www.edureka.co/java-hadoop Hadoop Core Components Hadoop 2.x Core Components HDFS YARN Storage Processing DataNode NameNode Resource Manager Node Manager Master Slave Secondary NameNode www.edureka.co/hadoop-admin
  • 4. Slide 4 RAM: 16GB Hard disk: 6 x 2TB Processor: Xenon with 2 cores Ethernet: 3 x 10 GB/s OS: 64-bit CentOS Hadoop Cluster: A Typical Use Case RAM: 16GB Hard disk: 6 x 2TB Processor: Xenon with 2 cores. Ethernet: 3 x 10 GB/s OS: 64-bit CentOS RAM: 64 GB, Hard disk: 1 TB Processor: Xenon with 8 Cores Ethernet: 3 x 10 GB/s OS: 64-bit CentOS Power: Redundant Power Supply RAM: 32 GB, Hard disk: 1 TB Processor: Xenon with 4 Cores Ethernet: 3 x 10 GB/s OS: 64-bit CentOS Power: Redundant Power Supply Active NameNodeSecondary NameNode DataNode DataNode RAM: 64 GB, Hard disk: 1 TB Processor: Xenon with 8 Cores Ethernet: 3 x 10 GB/s OS: 64-bit CentOS Power: Redundant Power Supply StandBy NameNode Optional RAM: 16GB Hard disk: 6 x 2TB Processor: Xenon with 2 cores Ethernet: 3 x 10 GB/s OS: 64-bit CentOS DataNode DataNode DataNode DataNode www.edureka.co/hadoop-admin
  • 5. www.edureka.co/hadoop-adminSlide 5 Seeking cluster growth on storage capacity is often a good method to use! Cluster Growth Based On Storage Capacity Data grows by approximately 5TB per week HDFS set up to replicate each block three times Thus, 15TB of extra storage space required per week Assuming machines with 5x3TB hard drives, equating to a new machine required each week Assume Overheads to be 30%
  • 6. www.edureka.co/hadoop-adminSlide 6 Slave Nodes: Recommended Configuration Higher-performance vs lower performance components Save the Money, Buy more Nodes!  General ( Depends on requirement ‘base’ configuration for a slave Node » 4 x 1 TB or 2 TB hard drives, in a JBOD* configuration » Do not use RAID! » 2 x Quad-core CPUs » 24 -32GB RAM » Gigabit Ethernet General Configuration  Multiples of ( 1 hard drive + 2 cores + 6-8GB RAM) generally work well for many types of applications Special Configuration Slave Nodes “A cluster with more nodes performs better than one with fewer, slightly faster nodes”
  • 7. www.edureka.co/hadoop-adminSlide 7 Slave Nodes: More Details (RAM) Slave Nodes (RAM) Generally each Map or Reduce task will take 1GB to 2GB of RAM Slave nodes should not be using virtual memory RULE OF THUMB! Total number of tasks = 1.5 x number of processor core Ensure enough RAM is present to run all tasks, plus the DataNode, TaskTracker daemons, plus the operating system
  • 8. www.edureka.co/hadoop-adminSlide 8 Master Node Hardware Recommendations Carrier-class hardware (Not commodity hardware) Dual power supplies Dual Ethernet cards (Bonded to provide failover) Raided hard drives At least 32GB of RAM Master Node Requires
  • 9. www.edureka.co/hadoop-adminSlide 9 Hadoop Cluster Modes Hadoop can run in any of the following three modes: Fully-Distributed Mode Pseudo-Distributed Mode  No daemons, everything runs in a single JVM  Suitable for running MapReduce programs during development  Has no DFS  Hadoop daemons run on the local machine  Hadoop daemons run on a cluster of machines Standalone (or Local) Mode
  • 10. www.edureka.co/hadoop-adminSlide 10 Configuration Files Configuration Filenames Description of Log Files hadoop-env.sh yarn-env.sh Settings for Hadoop Daemon’s process environment. core-site.xml Configuration settings for Hadoop Core such as I/O settings that common to both HDFS and YARN. hdfs-site.xml Configuration settings for HDFS Daemons, the Name Node and the Data Nodes. yarn-site.xml Configuration setting for Resource Manager and Node Manager. mapred-site.xml Configuration settings for MapReduce Applications. slaves A list of machines (one per line) that each run DataNode and Node Manager.
  • 11. www.edureka.co/hadoop-adminSlide 11 Configuration Files (Contd.) Deprecated Property Name New Property Name dfs.data.dir dfs.datanode.data.dir dfs.http.address dfs.namenode.http-address fs.default.name fs.defaultFS The core functionality and usage of these core configuration files are same in Hadoop 2.0 and 1.0 but many new properties have been added and many have been deprecated For example:  ’fs.default.name’ has been deprecated and replaced with ‘fs.defaultFS’ for YARN in core-site.xml  ‘dfs.nameservices’ has been added to enable NameNode High Availability in hdfs-site.xml http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/DeprecatedProperties.html  In Hadoop 2.2.0 release, you can use either the old or the new properties  The old property names are now deprecated, but still work!
  • 12. Slide 12 Core HDFS core-site.xml hdfs-site.xml yarn-site.xmlYARN mapred-site.xml Map Reduce Hadoop 2.x Configuration Files – Apache Hadoop www.edureka.co/hadoop-admin
  • 13. Slide 13 Hadoop Daemons NameNode daemon » Runs on master node of the Hadoop Distributed File System (HDFS) » Directs Data Nodes to perform their low-level I/O tasks DataNode daemon » Runs on each slave machine in the HDFS » Does the low-level I/O work Resource Manager » Runs on master node of the Data processing System(MapReduce) » Global resource Scheduler Node Manager » Runs on each slave node of Data processing System » Platform for the Data processing tasks Job HistoryServer » JobHistoryServer is responsible for servicing all job history related requests from client www.edureka.co/hadoop-admin
  • 14. www.edureka.co/hadoop-adminSlide 14 Why Cloud? Challenges in current trend: Arranging a large common storage area Providing secure access to the shared data
  • 15. www.edureka.co/hadoop-adminSlide 15 Amazon EC2 A cloud web host that allows you to dynamically add and remove computer server resources as you need them, allowing you to pay for only the capacity that you used. Good For Hadoop Cluster set : we can bring up enormous cluster with in minutes and then spin it down when we have finished to reduce costs.
  • 18. www.edureka.co/hadoop-adminSlide 18 Hadoop Admin Responsibilities Responsible for implementation and administration of Hadoop infrastructure. Testing HDFS, Hive, Pig and MapReduce access for Applications. Cluster maintenance tasks like Backup, Recovery, Upgrade, Patching. Performance tuning and Capacity planning for Clusters. Monitor Hadoop cluster and deploy security.
  • 19. LIVE Online Class Class Recording in LMS 24/7 Post Class Support Module Wise Quiz Project Work Verifiable Certificate www.edureka.co/hadoop-adminSlide 19 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions How it Works?
  • 20. Questions www.edureka.co/hadoop-adminSlide 20 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
  • 21. www.edureka.co/hadoop-adminSlide 21 Course Topics  Module 1 » Hadoop Cluster Administration  Module 2 » Hadoop Architecture and Cluster setup  Module 3 » Hadoop Cluster: Planning and Managing  Module 4 » Backup, Recovery and Maintenance  Module 5 » Hadoop 2.0 and High Availability  Module 6 » Advanced Topics: QJM, HDFS Federation and Security  Module 7 » Oozie, Hcatalog/Hive and HBase Administration  Module 8 » Project: Hadoop Implementation