SlideShare a Scribd company logo
2
Most read
4
Most read
5
Most read
Hadoop Architecture | Features and Objectives
What is Hadoop?
Hadoop is an Apache open-source framework. It was written using Java that allows the
distributed processing of large datasets across clusters of computers using simple
programming models. The Hadoop framework application works in a platform that provides
distributed storage and computation across clusters of computers. Using Google’s solution,
Doug Cutting and his team developed an Open Source Project which is named as HADOOP.
Using the Map Reduce algorithm, Hadoop runs the applications where the data is processed in
parallel with others. In simple, Hadoop is used to develop applications that could perform a
complete statistical analysis of huge amounts of data.
Architecture of Hadoop
Hadoop has two major layers namely
• Processing/Computation layer (Map Reduce), and
• Storage layer (Hadoop Distributed File System).
Map Reduce
Map Reduce is a parallel programming model that is used for writing distributed applications.
These distributed applications are devised at Google for efficient processing of large amounts
of data, on large clusters of commodity hardware in a reliable, fault-tolerant manner. The Map
Reduce program runs on the Hadoop framework.
Hadoop Distributed File System
The Hadoop Distributed File System is based on the Google File System. It provides a
distributed file system that is designed to run on commodity hardware. HDFS has many
similarities with existing distributed file systems. It is designed to be deployed on low-cost
hardware and highly fault-tolerant. It provides high throughput access to application data.
The below following two modules are also included in Hadoop Framework −
a. Hadoop Common
These are Java libraries and utilities which are required by other Hadoop modules.
b. Hadoop YARN
Hadoop a framework for job scheduling and cluster resource management.
How Does Hadoop Work?
It’s quite expensive to build bigger servers with heavy configurations that handle large scale
processing , As it is cheaper than one high-end server We can use Hadoop as an alternative. So
this is the major factor behind using Hadoop that it runs across clustered and low-cost
machines.
The following core tasks that Hadoop performs are clearly mentioned below:-
1. Data is firstly segmented into directories and files. Files are further divided into
uniform-sized blocks of 128M and 64M (preferably 128M).
2. These files are then again distributed across various cluster nodes for further
processing.
3. Being on top of the local file system, HDFS supervises the processing.
4. All the Blocks are replicated for handling hardware failure.
5. It Checks that the code was executed successfully.
6. Performs the sort that takes place between the map and reduces stages.
7. Sends the sorted data to a certain computer.
8. Writes the debugging logs for each job.
Hadoop File System was developed using a distributed file system design. It runs on commodity
hardware. Comparing to other distributed systems, HDFS is highly faulted tolerant and
designed using low-cost hardware.
HDFS holds a very large amount of data and it maintains easier access. The files are stored
across multiple machines for storing such huge data. HDFS also makes applications available to
parallel processing.
Features of HDFS.
1. To interact with HDFS, Hadoop provides a command interface
2. Users can easily check the status of the cluster with the help of name node and data node
3. Available of streaming access to file system data.
4. HDFS provides file permissions and authentication.
HDFS Architecture
It mainly follows the master-slave architecture
Name node
It is the commodity hardware that consists of the GNU/Linux operating system and the name
node software. It is software that runs on commodity hardware. Below mentioned are the
following tasks that it can perform
a. It manages the file system namespace.
b. It regulates the client’s access to files.
c. Executes the file system operations such as renaming, closing, and opening files and
directories.
Data node
It is a commodity hardware that consists of the GNU/Linux operating system and data node
software. There will be a data node. For every node in a cluster, these nodes will manage the
data storage of their system.
a. As per client request, Data nodes perform read-write operations on the file systems
b. They also perform other operations such as block creation, deletion, and replication.
Block
The file in a file system is divided into one or more segments. These file segments are called
blocks. In simple words, we can say that the minimum amount of data that HDFS can read or
write is called a Block. Generally, the default block size is 64MB, but we can increase the block
size as per the need to change in HDFS configuration.
Objectives of HDFS
1. Fault detection and recovery
As HDFS includes a large number of commodity hardware, there is a probability of having
failures in components. To overcome this HDFS should have mechanisms for quick and
automatic fault detection and recovery.
2. Huge datasets
To manage the applications having huge datasets HDFS should have hundreds of nodes per
cluster
3. Hardware at data
When the computation takes place near the data a requested task can be done. The network
traffic is reduced and results in increment in the throughput.
Hadoop Advantages :-
1. Varied data sources
2. Availability
3. Scalable
4. Cost effective
5. Low network traffic
6. Ease of use
7. Performance
8. High throughput
9. Compatibility
10. Fault tolerant
11. Open source
12. Multi-Language support
Limitations of Hadoop:-
1. Issues with small files
2. Slow processing speed
3. Latency
4. Security
5. No real time data processing
6. Uncertainty
7. Lengthy line of code
8. Not easy to use
9. No caching
10. Supports only batch processing
Summary:-
This brings us to the end of this article on Hadoop. In this article you have learn what is Hadoop,
Architecture of Hadoop, Features and HDFS Architecture. We have also come up with a
curriculum that covers exactly what you would need to be expert in Hadoop Development! You
can have a look at the course details for Hadoop.
Hadoop  architecture-tutorial

More Related Content

What's hot (20)

PPTX
HADOOP TECHNOLOGY ppt
sravya raju
 
PPTX
Hadoop Tutorial For Beginners
Dataflair Web Services Pvt Ltd
 
PPT
Hadoop Map Reduce
VNIT-ACM Student Chapter
 
PPTX
PPT on Hadoop
Shubham Parmar
 
PPSX
Hadoop
Nishant Gandhi
 
PDF
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
Edureka!
 
PPTX
Introduction to YARN and MapReduce 2
Cloudera, Inc.
 
PDF
HDFS Architecture
Jeff Hammerbacher
 
PPTX
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Simplilearn
 
PDF
Sqoop
Prashant Gupta
 
PPTX
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Simplilearn
 
PPTX
Big Data and Hadoop
Flavio Vit
 
PDF
What Is Hadoop | Hadoop Tutorial For Beginners | Edureka
Edureka!
 
PDF
Mapreduce by examples
Andrea Iacono
 
PPTX
Introduction to sqoop
Uday Vakalapudi
 
PPTX
Introduction To HBase
Anil Gupta
 
PDF
Hadoop & MapReduce
Newvewm
 
PPTX
Hadoop and Big Data
Harshdeep Kaur
 
PPTX
Apache Ambari: Past, Present, Future
Hortonworks
 
PDF
Apache Spark Overview
Vadim Y. Bichutskiy
 
HADOOP TECHNOLOGY ppt
sravya raju
 
Hadoop Tutorial For Beginners
Dataflair Web Services Pvt Ltd
 
Hadoop Map Reduce
VNIT-ACM Student Chapter
 
PPT on Hadoop
Shubham Parmar
 
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
Edureka!
 
Introduction to YARN and MapReduce 2
Cloudera, Inc.
 
HDFS Architecture
Jeff Hammerbacher
 
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Simplilearn
 
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Simplilearn
 
Big Data and Hadoop
Flavio Vit
 
What Is Hadoop | Hadoop Tutorial For Beginners | Edureka
Edureka!
 
Mapreduce by examples
Andrea Iacono
 
Introduction to sqoop
Uday Vakalapudi
 
Introduction To HBase
Anil Gupta
 
Hadoop & MapReduce
Newvewm
 
Hadoop and Big Data
Harshdeep Kaur
 
Apache Ambari: Past, Present, Future
Hortonworks
 
Apache Spark Overview
Vadim Y. Bichutskiy
 

Similar to Hadoop architecture-tutorial (20)

PPTX
Hadoop architecture-tutorial
vinayiqbusiness
 
PPTX
Hadoop
yasser hassen
 
PPTX
hadoop_Introduction module 2 and chapter 3pptx.pptx
Shrinivasa6
 
PPTX
Hadoop_Introduction unit-2 for vtu syllabus
Shrinivasa6
 
PPTX
Introduction to Hadoop and Hadoop component
rebeccatho
 
PDF
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
MaharajothiP
 
PDF
hadoop distributed file systems complete information
bhargavi804095
 
PPTX
2. hadoop fundamentals
Lokesh Ramaswamy
 
PPT
Hadoop
chandinisanz
 
PPTX
Hadoop introduction
Chirag Ahuja
 
PPTX
Hadoop ppt1
chariorienit
 
PPTX
OPERATING SYSTEM .pptx
AltafKhadim
 
PPTX
Bigdata and Hadoop Introduction
umapavankumar kethavarapu
 
PPT
Hadoop
Girish Khanzode
 
PPTX
Managing Big data with Hadoop
Nalini Mehta
 
PPTX
Distributed Systems Hadoop.pptx
Uttara University
 
PPTX
Unit 5
Ravi Kumar
 
PPTX
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
Venneladonthireddy1
 
PDF
hdfs readrmation ghghg bigdats analytics info.pdf
ssuser2d043c
 
Hadoop architecture-tutorial
vinayiqbusiness
 
hadoop_Introduction module 2 and chapter 3pptx.pptx
Shrinivasa6
 
Hadoop_Introduction unit-2 for vtu syllabus
Shrinivasa6
 
Introduction to Hadoop and Hadoop component
rebeccatho
 
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
MaharajothiP
 
hadoop distributed file systems complete information
bhargavi804095
 
2. hadoop fundamentals
Lokesh Ramaswamy
 
Hadoop
chandinisanz
 
Hadoop introduction
Chirag Ahuja
 
Hadoop ppt1
chariorienit
 
OPERATING SYSTEM .pptx
AltafKhadim
 
Bigdata and Hadoop Introduction
umapavankumar kethavarapu
 
Managing Big data with Hadoop
Nalini Mehta
 
Distributed Systems Hadoop.pptx
Uttara University
 
Unit 5
Ravi Kumar
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
Venneladonthireddy1
 
hdfs readrmation ghghg bigdats analytics info.pdf
ssuser2d043c
 
Ad

Recently uploaded (20)

PPTX
Controller Request and Response in Odoo18
Celine George
 
PDF
The History of Phone Numbers in Stoke Newington by Billy Thomas
History of Stoke Newington
 
PDF
Governor Josh Stein letter to NC delegation of U.S. House
Mebane Rash
 
PPTX
How to Create a Customer From Website in Odoo 18.pptx
Celine George
 
PPTX
How to Configure Re-Ordering From Portal in Odoo 18 Website
Celine George
 
PDF
Women's Health: Essential Tips for Every Stage.pdf
Iftikhar Ahmed
 
PPTX
Introduction to Biochemistry & Cellular Foundations.pptx
marvinnbustamante1
 
PPTX
Post Dated Cheque(PDC) Management in Odoo 18
Celine George
 
PDF
Stokey: A Jewish Village by Rachel Kolsky
History of Stoke Newington
 
PDF
AI-Powered-Visual-Storytelling-for-Nonprofits.pdf
TechSoup
 
PPTX
EDUCATIONAL MEDIA/ TEACHING AUDIO VISUAL AIDS
Sonali Gupta
 
PPTX
Introduction to Indian Writing in English
Trushali Dodiya
 
PDF
Mahidol_Change_Agent_Note_2025-06-27-29_MUSEF
Tassanee Lerksuthirat
 
PPTX
Cultivation practice of Litchi in Nepal.pptx
UmeshTimilsina1
 
PPTX
PPT-Q1-WEEK-3-SCIENCE-ERevised Matatag Grade 3.pptx
reijhongidayawan02
 
PPTX
PPT-Q1-WK-3-ENGLISH Revised Matatag Grade 3.pptx
reijhongidayawan02
 
PPTX
How to Set Up Tags in Odoo 18 - Odoo Slides
Celine George
 
PPTX
How to Manage Allocation Report for Manufacturing Orders in Odoo 18
Celine George
 
PDF
Reconstruct, Restore, Reimagine: New Perspectives on Stoke Newington’s Histor...
History of Stoke Newington
 
PDF
Is Assignment Help Legal in Australia_.pdf
thomas19williams83
 
Controller Request and Response in Odoo18
Celine George
 
The History of Phone Numbers in Stoke Newington by Billy Thomas
History of Stoke Newington
 
Governor Josh Stein letter to NC delegation of U.S. House
Mebane Rash
 
How to Create a Customer From Website in Odoo 18.pptx
Celine George
 
How to Configure Re-Ordering From Portal in Odoo 18 Website
Celine George
 
Women's Health: Essential Tips for Every Stage.pdf
Iftikhar Ahmed
 
Introduction to Biochemistry & Cellular Foundations.pptx
marvinnbustamante1
 
Post Dated Cheque(PDC) Management in Odoo 18
Celine George
 
Stokey: A Jewish Village by Rachel Kolsky
History of Stoke Newington
 
AI-Powered-Visual-Storytelling-for-Nonprofits.pdf
TechSoup
 
EDUCATIONAL MEDIA/ TEACHING AUDIO VISUAL AIDS
Sonali Gupta
 
Introduction to Indian Writing in English
Trushali Dodiya
 
Mahidol_Change_Agent_Note_2025-06-27-29_MUSEF
Tassanee Lerksuthirat
 
Cultivation practice of Litchi in Nepal.pptx
UmeshTimilsina1
 
PPT-Q1-WEEK-3-SCIENCE-ERevised Matatag Grade 3.pptx
reijhongidayawan02
 
PPT-Q1-WK-3-ENGLISH Revised Matatag Grade 3.pptx
reijhongidayawan02
 
How to Set Up Tags in Odoo 18 - Odoo Slides
Celine George
 
How to Manage Allocation Report for Manufacturing Orders in Odoo 18
Celine George
 
Reconstruct, Restore, Reimagine: New Perspectives on Stoke Newington’s Histor...
History of Stoke Newington
 
Is Assignment Help Legal in Australia_.pdf
thomas19williams83
 
Ad

Hadoop architecture-tutorial

  • 1. Hadoop Architecture | Features and Objectives What is Hadoop? Hadoop is an Apache open-source framework. It was written using Java that allows the distributed processing of large datasets across clusters of computers using simple programming models. The Hadoop framework application works in a platform that provides distributed storage and computation across clusters of computers. Using Google’s solution, Doug Cutting and his team developed an Open Source Project which is named as HADOOP. Using the Map Reduce algorithm, Hadoop runs the applications where the data is processed in parallel with others. In simple, Hadoop is used to develop applications that could perform a complete statistical analysis of huge amounts of data. Architecture of Hadoop Hadoop has two major layers namely • Processing/Computation layer (Map Reduce), and • Storage layer (Hadoop Distributed File System).
  • 2. Map Reduce Map Reduce is a parallel programming model that is used for writing distributed applications. These distributed applications are devised at Google for efficient processing of large amounts of data, on large clusters of commodity hardware in a reliable, fault-tolerant manner. The Map Reduce program runs on the Hadoop framework. Hadoop Distributed File System The Hadoop Distributed File System is based on the Google File System. It provides a distributed file system that is designed to run on commodity hardware. HDFS has many similarities with existing distributed file systems. It is designed to be deployed on low-cost hardware and highly fault-tolerant. It provides high throughput access to application data. The below following two modules are also included in Hadoop Framework − a. Hadoop Common These are Java libraries and utilities which are required by other Hadoop modules. b. Hadoop YARN Hadoop a framework for job scheduling and cluster resource management. How Does Hadoop Work? It’s quite expensive to build bigger servers with heavy configurations that handle large scale processing , As it is cheaper than one high-end server We can use Hadoop as an alternative. So this is the major factor behind using Hadoop that it runs across clustered and low-cost machines. The following core tasks that Hadoop performs are clearly mentioned below:- 1. Data is firstly segmented into directories and files. Files are further divided into uniform-sized blocks of 128M and 64M (preferably 128M). 2. These files are then again distributed across various cluster nodes for further processing. 3. Being on top of the local file system, HDFS supervises the processing. 4. All the Blocks are replicated for handling hardware failure.
  • 3. 5. It Checks that the code was executed successfully. 6. Performs the sort that takes place between the map and reduces stages. 7. Sends the sorted data to a certain computer. 8. Writes the debugging logs for each job. Hadoop File System was developed using a distributed file system design. It runs on commodity hardware. Comparing to other distributed systems, HDFS is highly faulted tolerant and designed using low-cost hardware. HDFS holds a very large amount of data and it maintains easier access. The files are stored across multiple machines for storing such huge data. HDFS also makes applications available to parallel processing. Features of HDFS. 1. To interact with HDFS, Hadoop provides a command interface 2. Users can easily check the status of the cluster with the help of name node and data node 3. Available of streaming access to file system data. 4. HDFS provides file permissions and authentication. HDFS Architecture It mainly follows the master-slave architecture
  • 4. Name node It is the commodity hardware that consists of the GNU/Linux operating system and the name node software. It is software that runs on commodity hardware. Below mentioned are the following tasks that it can perform a. It manages the file system namespace. b. It regulates the client’s access to files. c. Executes the file system operations such as renaming, closing, and opening files and directories. Data node It is a commodity hardware that consists of the GNU/Linux operating system and data node software. There will be a data node. For every node in a cluster, these nodes will manage the data storage of their system. a. As per client request, Data nodes perform read-write operations on the file systems b. They also perform other operations such as block creation, deletion, and replication. Block The file in a file system is divided into one or more segments. These file segments are called blocks. In simple words, we can say that the minimum amount of data that HDFS can read or write is called a Block. Generally, the default block size is 64MB, but we can increase the block size as per the need to change in HDFS configuration. Objectives of HDFS 1. Fault detection and recovery As HDFS includes a large number of commodity hardware, there is a probability of having failures in components. To overcome this HDFS should have mechanisms for quick and automatic fault detection and recovery. 2. Huge datasets To manage the applications having huge datasets HDFS should have hundreds of nodes per cluster 3. Hardware at data
  • 5. When the computation takes place near the data a requested task can be done. The network traffic is reduced and results in increment in the throughput. Hadoop Advantages :- 1. Varied data sources 2. Availability 3. Scalable 4. Cost effective 5. Low network traffic 6. Ease of use 7. Performance 8. High throughput 9. Compatibility 10. Fault tolerant 11. Open source 12. Multi-Language support Limitations of Hadoop:- 1. Issues with small files 2. Slow processing speed 3. Latency 4. Security 5. No real time data processing 6. Uncertainty 7. Lengthy line of code 8. Not easy to use 9. No caching 10. Supports only batch processing Summary:- This brings us to the end of this article on Hadoop. In this article you have learn what is Hadoop, Architecture of Hadoop, Features and HDFS Architecture. We have also come up with a curriculum that covers exactly what you would need to be expert in Hadoop Development! You can have a look at the course details for Hadoop.