SlideShare a Scribd company logo
Introduction to
Hadoop
By L.Karthik(22Q91A6734)
01
02
03
Handles data storage across nodes, providing a distributed
file system for storing and accessing massive datasets.
A programming model for parallel data processing,
breaking down tasks into Map and Reduce phases for
efficient execution on distributed systems.
Manages job scheduling and resource allocation across the
Hadoop cluster, ensuring efficient utilization of resources
for parallel processing.
HDFS
MapReduce
YARN
Hadoop Architecture
Tracks metadata of files, including
file locations, permissions, and
other details, essential for
managing and accessing data
blocks.
Stores actual data blocks across the
cluster, replicating them for fault
tolerance and ensuring high
availability of data.
NameNode DataNode
Hadoop Distributed File System (HDFS)
Data is divided into key-value
pairs, where each key represents
a specific attribute or value, and
the corresponding value contains
associated data.
Aggregates outputs from the Map
phase, combining data with the
same key to perform calculations
or summaries, resulting in
reduced output.
Map Phase Reduce Phase
MapReduce
01
02
Coordinates resources and schedules jobs across the
Hadoop cluster, ensuring optimal resource utilization and
efficient processing of distributed tasks.
Additional tools that complement Hadoop, providing data
processing, ingestion, and query functionalities for different
data formats and tasks.
YARN
Pig, Hive, Flume
YARN and Additional
Components
Handles petabytes of data on
clusters, scaling to handle massive
datasets and providing a
distributed platform for processing
large volumes of data.
Utilizes commodity hardware,
making it cost-effective for storing
and processing large amounts of
data, compared to expensive
specialized systems.
Supports a wide range of data
formats, providing flexibility in
handling different data types and
structures, facilitating efficient
data analysis across various
domains.
Advantages of Using Hadoop
Analyzing customer behavior for
personalized marketing,
uncovering patterns and insights
for targeted campaigns.
Processing data from millions of
sensors, extracting valuable
information for real-time
monitoring and analysis in
various industries.
Building models from large
datasets for predictive analytics,
enabling predictions and insights
based on historical data and
trends.
Data Science IoT Data Processing Machine Learning
Real-World Applications
of Hadoop
Requires expert setup and maintenance,
demanding technical expertise for installation,
configuration, and ongoing management.
Experiences a shift towards newer cloud-native
platforms, as these platforms offer greater
scalability, flexibility, and managed services.
Complexity
Cloud Adopton
AI can help to recover data that has been lost
or corrupted due to a cyber attack.
Speed issues
Challenges with Hadoop
01
02
03
Integration with cloud services, enabling compatibility with
cloud storage and processing for enhanced scalability and
flexibility.
Emergence of alternatives, such as Databricks Lakehouse
and Spark, offering newer technologies and features for
data processing and analysis.
Innovations in YARN and HDFS, focusing on improving
efficiency, scalability, and performance for handling massive
datasets.
Future Trends in Hadoop
Revolutionized distributed
computing, providing a powerful
framework for handling large
datasets and enabling efficient
analysis of big data.
Faces limitations in meeting the
evolving need for real-time
processing and low latency,
demanding faster processing
capabilities for certain applications.
Embraces future developments,
focusing on enhancing scalability,
efficiency, and integration with
cloud services to meet the evolving
needs of big data analytics.
Conclusion and Hadoop’s Legacy

More Related Content

Similar to The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple (20)

PDF
Hadoop Application Architectures Mark Grover Ted Malaska Jonathan Seidman Gwe...
jolangoldikp
 
PPTX
Introduction to Hadoop and Big Data
Joe Alex
 
PPTX
Apache hadoop basics
saili mane
 
PDF
r4.OReilly.Hadoop.The.Definitive.Guide.4th.Edition.2015.pdf
ArunKumar750226
 
PDF
Hadoop installation by santosh nage
Santosh Nage
 
PPTX
Big Data Analytics With Hadoop
Umair Shafique
 
PPT
Big Data & Hadoop
Krishna Sujeer
 
PPTX
OPERATING SYSTEM .pptx
AltafKhadim
 
PPTX
Hadoop and Big Data
Harshdeep Kaur
 
PPTX
What is hadoop
faizrashid1995
 
ODP
Hadoop seminar
KrishnenduKrishh
 
PPTX
Big Data Training in Amritsar
E2MATRIX
 
PPTX
Big Data Training in Ludhiana
E2MATRIX
 
PPTX
INTRODUCTION TO BIG DATA HADOOP
Krishna Sujeer
 
PPTX
Introduction to Apache Hadoop Ecosystem
Mahabubur Rahaman
 
PPTX
Big Data UNIT 2 AKTU syllabus all topics covered
chinky1118
 
PPTX
Big Data Training in Mohali
E2MATRIX
 
PDF
Big data and hadoop overvew
Kunal Khanna
 
DOCX
HDFS
Vardhman Kale
 
PPTX
Hadoop by kamran khan
KamranKhan587
 
Hadoop Application Architectures Mark Grover Ted Malaska Jonathan Seidman Gwe...
jolangoldikp
 
Introduction to Hadoop and Big Data
Joe Alex
 
Apache hadoop basics
saili mane
 
r4.OReilly.Hadoop.The.Definitive.Guide.4th.Edition.2015.pdf
ArunKumar750226
 
Hadoop installation by santosh nage
Santosh Nage
 
Big Data Analytics With Hadoop
Umair Shafique
 
Big Data & Hadoop
Krishna Sujeer
 
OPERATING SYSTEM .pptx
AltafKhadim
 
Hadoop and Big Data
Harshdeep Kaur
 
What is hadoop
faizrashid1995
 
Hadoop seminar
KrishnenduKrishh
 
Big Data Training in Amritsar
E2MATRIX
 
Big Data Training in Ludhiana
E2MATRIX
 
INTRODUCTION TO BIG DATA HADOOP
Krishna Sujeer
 
Introduction to Apache Hadoop Ecosystem
Mahabubur Rahaman
 
Big Data UNIT 2 AKTU syllabus all topics covered
chinky1118
 
Big Data Training in Mohali
E2MATRIX
 
Big data and hadoop overvew
Kunal Khanna
 
Hadoop by kamran khan
KamranKhan587
 

More from 23Q95A6706 (8)

PPTX
Learning – Types of Machine Learning – Supervised Learning – Unsupervised UNI...
23Q95A6706
 
PPTX
R22 Machine learning jntuh UNIT- 5.pptx
23Q95A6706
 
PPTX
house price prediction w2_batc_4[1].pptx
23Q95A6706
 
PPTX
PPT Depression Detection from Text, Image & Speech using Deep Learning Algori...
23Q95A6706
 
PDF
evolution of machine learning algorithms for the detection.pdf
23Q95A6706
 
PPTX
BHAVANI internet of things internet .pptx
23Q95A6706
 
PPTX
DATA MINING OF DATA 1 DATA MINING OF DATA .pptx
23Q95A6706
 
PPTX
IoT- Evolution of Internet of Things, Enabling. Technologies, M2M Communicati...
23Q95A6706
 
Learning – Types of Machine Learning – Supervised Learning – Unsupervised UNI...
23Q95A6706
 
R22 Machine learning jntuh UNIT- 5.pptx
23Q95A6706
 
house price prediction w2_batc_4[1].pptx
23Q95A6706
 
PPT Depression Detection from Text, Image & Speech using Deep Learning Algori...
23Q95A6706
 
evolution of machine learning algorithms for the detection.pdf
23Q95A6706
 
BHAVANI internet of things internet .pptx
23Q95A6706
 
DATA MINING OF DATA 1 DATA MINING OF DATA .pptx
23Q95A6706
 
IoT- Evolution of Internet of Things, Enabling. Technologies, M2M Communicati...
23Q95A6706
 
Ad

Recently uploaded (20)

PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PPTX
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PPTX
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Ad

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple

  • 2. 01 02 03 Handles data storage across nodes, providing a distributed file system for storing and accessing massive datasets. A programming model for parallel data processing, breaking down tasks into Map and Reduce phases for efficient execution on distributed systems. Manages job scheduling and resource allocation across the Hadoop cluster, ensuring efficient utilization of resources for parallel processing. HDFS MapReduce YARN Hadoop Architecture
  • 3. Tracks metadata of files, including file locations, permissions, and other details, essential for managing and accessing data blocks. Stores actual data blocks across the cluster, replicating them for fault tolerance and ensuring high availability of data. NameNode DataNode Hadoop Distributed File System (HDFS)
  • 4. Data is divided into key-value pairs, where each key represents a specific attribute or value, and the corresponding value contains associated data. Aggregates outputs from the Map phase, combining data with the same key to perform calculations or summaries, resulting in reduced output. Map Phase Reduce Phase MapReduce
  • 5. 01 02 Coordinates resources and schedules jobs across the Hadoop cluster, ensuring optimal resource utilization and efficient processing of distributed tasks. Additional tools that complement Hadoop, providing data processing, ingestion, and query functionalities for different data formats and tasks. YARN Pig, Hive, Flume YARN and Additional Components
  • 6. Handles petabytes of data on clusters, scaling to handle massive datasets and providing a distributed platform for processing large volumes of data. Utilizes commodity hardware, making it cost-effective for storing and processing large amounts of data, compared to expensive specialized systems. Supports a wide range of data formats, providing flexibility in handling different data types and structures, facilitating efficient data analysis across various domains. Advantages of Using Hadoop
  • 7. Analyzing customer behavior for personalized marketing, uncovering patterns and insights for targeted campaigns. Processing data from millions of sensors, extracting valuable information for real-time monitoring and analysis in various industries. Building models from large datasets for predictive analytics, enabling predictions and insights based on historical data and trends. Data Science IoT Data Processing Machine Learning Real-World Applications of Hadoop
  • 8. Requires expert setup and maintenance, demanding technical expertise for installation, configuration, and ongoing management. Experiences a shift towards newer cloud-native platforms, as these platforms offer greater scalability, flexibility, and managed services. Complexity Cloud Adopton AI can help to recover data that has been lost or corrupted due to a cyber attack. Speed issues Challenges with Hadoop
  • 9. 01 02 03 Integration with cloud services, enabling compatibility with cloud storage and processing for enhanced scalability and flexibility. Emergence of alternatives, such as Databricks Lakehouse and Spark, offering newer technologies and features for data processing and analysis. Innovations in YARN and HDFS, focusing on improving efficiency, scalability, and performance for handling massive datasets. Future Trends in Hadoop
  • 10. Revolutionized distributed computing, providing a powerful framework for handling large datasets and enabling efficient analysis of big data. Faces limitations in meeting the evolving need for real-time processing and low latency, demanding faster processing capabilities for certain applications. Embraces future developments, focusing on enhancing scalability, efficiency, and integration with cloud services to meet the evolving needs of big data analytics. Conclusion and Hadoop’s Legacy