SlideShare a Scribd company logo
Presentation On
By
Deepali Dhandar
deepali.dhandar@gmail.com
Index

Introduction and History

Use and Advantages

Issues and Need of Hadoop

Users of Hadoop

Framework and Architecture

HDFS Basic Concept

Map Reduce

Summery
Introduction and History
• Apache Software Foundation Project
• Open Source - Reliable, Scalable, Distributed
Computing and Data Storage
• Concept: Moving computation > Moving large data.
History:
• Google File System paper – Oct’2003
• MapReducing & Clustering
• Doug Cutting and Mike Carafella in 2005.
• Name: Doug Cutting – Yahoo – Feb 2006
• Name Comes - Doug Cutting’s Son (Tohvelant)
Use & Advantages
• Data-intensive text processing
• Assembly of large genomes
• Graph mining
• Machine learning and data mining
• Large scale social network analysis
Advantages:
• Massive Scalability
• Flexible Schema
• Quicker/Cheaper to set up
• Consistence with High Performance
• Limitation:

Gaps in Analytic Functionality

Multiple copies of already big data

Inefficient execution & Challenging framework
Issues and Need Of Hadoop
500 TB per day
Over 170 PB
Over 6 PB
Getting the data to the processors
becomes the bottleneck
Presentation on Hadoop Technology
Users Of Hadoop
And Many More…
Hadoop Framework Tool
Architecture of Hadoop
Master node (single node)
Many slave nodes
HDFS Basic Concept
• HDFS works best with a smaller number of large files
o Millions as opposed to billions of files
o Typically 100MB or more per file
• Files in HDFS are write once
• Optimized for streaming reads of large files and not
random reads
MapReduce Component
• JobTracker & TaskTracker
• JobTracker splits up data into smaller tasks(“Map”)
and sends it to the TaskTracker process in each
node
• TaskTracker reports back to the JobTracker node
and reports on job progress, sends data
(“Reduce”) or requests new jobs
Partition and Shuffling
Summery

Open Source Data Management with Scale out Storage

High Performance while handling large and Complex data

Optimizing for Streaming & Distributed Processing
Thank You !!

More Related Content

What's hot (20)

PPTX
PPT on Hadoop
Shubham Parmar
 
PPTX
Apache hadoop technology : Beginners
Shweta Patnaik
 
PPTX
Hadoop: Distributed Data Processing
Cloudera, Inc.
 
PPTX
Hadoop and Big Data
Harshdeep Kaur
 
DOCX
Hadoop Seminar Report
Atul Kushwaha
 
PPTX
Big Data and Hadoop
Flavio Vit
 
PDF
Apache Hadoop - Big Data Engineering
BADR
 
ODP
Hadoop seminar
KrishnenduKrishh
 
PPTX
Big data Hadoop presentation
Shivanee garg
 
PPTX
Introduction to Apache Hadoop
Christopher Pezza
 
ODP
Hadoop demo ppt
Phil Young
 
PPTX
Hadoop introduction , Why and What is Hadoop ?
sudhakara st
 
PDF
Hadoop 101
EMC
 
DOCX
Hadoop technology doc
tipanagiriharika
 
PDF
Overview of Hadoop and HDFS
Brendan Tierney
 
PPSX
Hadoop
Nishant Gandhi
 
PPTX
Introduction to Hadoop - The Essentials
Fadi Yousuf
 
PDF
Introduction to Hadoop and MapReduce
eakasit_dpu
 
PPTX
HADOOP TECHNOLOGY ppt
sravya raju
 
PDF
Introduction to Big Data & Hadoop
Edureka!
 
PPT on Hadoop
Shubham Parmar
 
Apache hadoop technology : Beginners
Shweta Patnaik
 
Hadoop: Distributed Data Processing
Cloudera, Inc.
 
Hadoop and Big Data
Harshdeep Kaur
 
Hadoop Seminar Report
Atul Kushwaha
 
Big Data and Hadoop
Flavio Vit
 
Apache Hadoop - Big Data Engineering
BADR
 
Hadoop seminar
KrishnenduKrishh
 
Big data Hadoop presentation
Shivanee garg
 
Introduction to Apache Hadoop
Christopher Pezza
 
Hadoop demo ppt
Phil Young
 
Hadoop introduction , Why and What is Hadoop ?
sudhakara st
 
Hadoop 101
EMC
 
Hadoop technology doc
tipanagiriharika
 
Overview of Hadoop and HDFS
Brendan Tierney
 
Introduction to Hadoop - The Essentials
Fadi Yousuf
 
Introduction to Hadoop and MapReduce
eakasit_dpu
 
HADOOP TECHNOLOGY ppt
sravya raju
 
Introduction to Big Data & Hadoop
Edureka!
 

Viewers also liked (9)

PPTX
Spiral of silence
Brennan Perry
 
PPTX
Model etika dalam bisnis, sumber nilai etika
makasdasd415
 
PDF
Imt slideshare3.2
Peter Faggioni
 
PPTX
Game sense
AndrewMitas
 
PPTX
Wqw
makasdasd415
 
DOC
AudreyButcher_PDSWorkExperience2
Audrey Butcher
 
PPTX
Historia inspiradora.
Alejandra Rodriguez Quintero
 
PPTX
El curriculum nacional base lupita
Sheny Pop Chocoj
 
PPTX
Las tablet
ablyn arrouz
 
Spiral of silence
Brennan Perry
 
Model etika dalam bisnis, sumber nilai etika
makasdasd415
 
Imt slideshare3.2
Peter Faggioni
 
Game sense
AndrewMitas
 
AudreyButcher_PDSWorkExperience2
Audrey Butcher
 
Historia inspiradora.
Alejandra Rodriguez Quintero
 
El curriculum nacional base lupita
Sheny Pop Chocoj
 
Las tablet
ablyn arrouz
 
Ad

Similar to Presentation on Hadoop Technology (20)

PPTX
Hadoo its a good pdf to read some notes p.pptx
helloworldw793
 
PPTX
Hadoop.pptx
sonukumar379092
 
PPTX
List of Engineering Colleges in Uttarakhand
Roorkee College of Engineering, Roorkee
 
PPTX
Hadoop.pptx
arslanhaneef
 
PPTX
Big data - Online Training
Learntek1
 
PPT
Big data and hadoop
Prashanth Yennampelli
 
PPTX
Hadoop ppt1
chariorienit
 
PPTX
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
Venneladonthireddy1
 
PPTX
Hadoop Data Modeling
Adam Doyle
 
PDF
Big data and hadoop overvew
Kunal Khanna
 
PPTX
Big Data Technologies and Why They Matter To R Users
Adaryl "Bob" Wakefield, MBA
 
PPT
data analytics lecture3.ppt
NamrataBhatt8
 
PPTX
Architecting Your First Big Data Implementation
Adaryl "Bob" Wakefield, MBA
 
PPTX
Hadoop/MapReduce/HDFS
praveen bhat
 
PDF
Intro to Big Data
Zohar Elkayam
 
PPTX
Introduction to BIg Data and Hadoop
Amir Shaikh
 
PPTX
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
Dr.Florence Dayana
 
PPTX
Big data and hadoop anupama
Anupama Prabhudesai
 
PPT
Hadoop HDFS.ppt
6535ANURAGANURAG
 
PPTX
Big Data in the Microsoft Platform
Jesus Rodriguez
 
Hadoo its a good pdf to read some notes p.pptx
helloworldw793
 
Hadoop.pptx
sonukumar379092
 
List of Engineering Colleges in Uttarakhand
Roorkee College of Engineering, Roorkee
 
Hadoop.pptx
arslanhaneef
 
Big data - Online Training
Learntek1
 
Big data and hadoop
Prashanth Yennampelli
 
Hadoop ppt1
chariorienit
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
Venneladonthireddy1
 
Hadoop Data Modeling
Adam Doyle
 
Big data and hadoop overvew
Kunal Khanna
 
Big Data Technologies and Why They Matter To R Users
Adaryl "Bob" Wakefield, MBA
 
data analytics lecture3.ppt
NamrataBhatt8
 
Architecting Your First Big Data Implementation
Adaryl "Bob" Wakefield, MBA
 
Hadoop/MapReduce/HDFS
praveen bhat
 
Intro to Big Data
Zohar Elkayam
 
Introduction to BIg Data and Hadoop
Amir Shaikh
 
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
Dr.Florence Dayana
 
Big data and hadoop anupama
Anupama Prabhudesai
 
Hadoop HDFS.ppt
6535ANURAGANURAG
 
Big Data in the Microsoft Platform
Jesus Rodriguez
 
Ad

Recently uploaded (20)

PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 

Presentation on Hadoop Technology

  • 2. Index  Introduction and History  Use and Advantages  Issues and Need of Hadoop  Users of Hadoop  Framework and Architecture  HDFS Basic Concept  Map Reduce  Summery
  • 3. Introduction and History • Apache Software Foundation Project • Open Source - Reliable, Scalable, Distributed Computing and Data Storage • Concept: Moving computation > Moving large data. History: • Google File System paper – Oct’2003 • MapReducing & Clustering • Doug Cutting and Mike Carafella in 2005. • Name: Doug Cutting – Yahoo – Feb 2006 • Name Comes - Doug Cutting’s Son (Tohvelant)
  • 4. Use & Advantages • Data-intensive text processing • Assembly of large genomes • Graph mining • Machine learning and data mining • Large scale social network analysis Advantages: • Massive Scalability • Flexible Schema • Quicker/Cheaper to set up • Consistence with High Performance • Limitation:  Gaps in Analytic Functionality  Multiple copies of already big data  Inefficient execution & Challenging framework
  • 5. Issues and Need Of Hadoop 500 TB per day Over 170 PB Over 6 PB Getting the data to the processors becomes the bottleneck
  • 7. Users Of Hadoop And Many More…
  • 9. Architecture of Hadoop Master node (single node) Many slave nodes
  • 10. HDFS Basic Concept • HDFS works best with a smaller number of large files o Millions as opposed to billions of files o Typically 100MB or more per file • Files in HDFS are write once • Optimized for streaming reads of large files and not random reads
  • 11. MapReduce Component • JobTracker & TaskTracker • JobTracker splits up data into smaller tasks(“Map”) and sends it to the TaskTracker process in each node • TaskTracker reports back to the JobTracker node and reports on job progress, sends data (“Reduce”) or requests new jobs
  • 13. Summery  Open Source Data Management with Scale out Storage  High Performance while handling large and Complex data  Optimizing for Streaming & Distributed Processing