SlideShare a Scribd company logo
HADOOP FOUNDATION FOR ANALYTICS
BY
B.MONICA
II M.SC COMPUTER SCIENCE
BON SECOURS COLLEGE FOR WOMEN
1
HADOOP
 It is an open-source software framework
 licensed under the Apache v2 license
 It includes:
– Map Reduce : offline computing engine
– HDFS : Hadoop distributed file system
EXAMPLE
2
HADOOP GOALS
 Scalable: It can reliably store and process petabytes.
 Economical: It distributes the data
 Efficient: it can process it in parallel on the nodes where the
data is located.
 Reliable: It automatically maintains multiple copies of data
3
USES FOR HADOOP
 Data-intensive text processing
 Assembly of large genomes
 Graph mining
 Machine learning and data mining
 Large scale social network analysis
4
HADOOP: ASSUMPTIONS
 Hardware will fail.
 Applications need a write-once-read-many access model.
 EXAMPLE
Facebook:
- To store copies of internal log and dimension
data sources
- it as a source for reporting/analytics and
machine learning
- 320 machine cluster with 2,560 cores and
about 1.3 PB raw storage 5
HADOOP CONFIGURATION
Conf /hdfs-site.xml:
<configuration>
<property>
<name>
Dfs . replication
</name>
<value>
1
</value>
</property>
</configuration> 6
HISTORY OF HADOOP
 Hadoop was started by Doug Cutting to support
two of his other well known projects, Lucene and
Nutch
 Hadoop has been inspired by Google's File
System (GFS) which was detailed in a paper by
released by Google in 2003
 Hadoop, originally called Nutch Distributed File
System (NDFS) split from Nutch in 2006 to
become a sub-project of Lucene. At this point it
was renamed to Hadoop.
7
 EXAMPLE
Google search engine
 2013 - Hadoop 1.1.2 and Hadoop 2.0.3 alpha.
- Ambari , Cassandra, Mahout have been
added
8
• Hadoop is in use at most organizations that
handle big data:
o Yahoo!
o Facebook
o Amazon
o Netflix
9
APACHE MAP REDUCE
 A software framework for distributed
processing of large data sets
 The framework takes care of scheduling tasks,
monitoring them and re-executing any failed
tasks.
 It splits the input data set into independent
chunks.
 Map Reduce framework sorts the outputs of
the maps, which are then input to the reduce
tasks..
10
11
MAP REDUCE DATAFLOW
 An input reader
 A Map function
 A partition function
 A compare function
 A Reduce function
 An output writer
EXAMPLE:
JOB TRACKER
TASK TRACKER 12
MAP REDUCE-FAULT TOLERANCE
 Worker failure: The master pings every worker
periodically.
 Master Failure: It is easy to make the master write
periodic checkpoints of the master data structures
13
JOB TRACKER
 Tracking Map Reduce jobs in Hadoop
 Job Tracker performs following actions in Hadoop
 It accepts the Map Reduce Jobs from client
applications
 Talks to Name Node to determine data location
 Locates available Task Tracker Node
 Submits the work to the chosen Task Tracker
Node
14
OTHER TOOLS
 Hive
 Hadoop processing with SQL
 Pig
 Hadoop processing with scripting
 Cascading
 Pipe and Filter processing model
 H Base
 Database model built on top of Hadoop
 Flume
 Designed for large scale data movement
15
THANK YOU
16

More Related Content

What's hot (20)

PPT
Hw09 Rethinking The Data Warehouse With Hadoop And Hive
Cloudera, Inc.
 
PPTX
Apache spark installation [autosaved]
Shweta Patnaik
 
PPT
Introduction To Map Reduce
rantav
 
PPTX
Hadoop
Shamama Kamal
 
KEY
Intro to Hadoop
jeffturner
 
PDF
Geek camp
jdhok
 
PDF
Hadoop - A Very Short Introduction
dewang_mistry
 
PPT
Apache hama @ Samsung SW Academy
Edward Yoon
 
PPT
Hadoop online-training
Geohedrick
 
PDF
Introduction of Apache Hama - 2011
Edward Yoon
 
PPTX
MapReduce basic
Chirag Ahuja
 
PDF
Apache Hama at Samsung Open Source Conference
Edward Yoon
 
PPTX
Apache Hadoop Big Data Technology
Jay Nagar
 
PDF
Hadoop Ecosystem
Sandip Darwade
 
PPT
Map Reduce
Michel Bruley
 
PPTX
Learn what is Hadoop-and-BigData
Thanusha154
 
PPTX
Introduction to Yarn
Apache Apex
 
DOCX
Hadoop Seminar Report
Atul Kushwaha
 
PPTX
Big data and Hadoop
Rahul Agarwal
 
PPTX
3.introduction to map reduce
databloginfo
 
Hw09 Rethinking The Data Warehouse With Hadoop And Hive
Cloudera, Inc.
 
Apache spark installation [autosaved]
Shweta Patnaik
 
Introduction To Map Reduce
rantav
 
Hadoop
Shamama Kamal
 
Intro to Hadoop
jeffturner
 
Geek camp
jdhok
 
Hadoop - A Very Short Introduction
dewang_mistry
 
Apache hama @ Samsung SW Academy
Edward Yoon
 
Hadoop online-training
Geohedrick
 
Introduction of Apache Hama - 2011
Edward Yoon
 
MapReduce basic
Chirag Ahuja
 
Apache Hama at Samsung Open Source Conference
Edward Yoon
 
Apache Hadoop Big Data Technology
Jay Nagar
 
Hadoop Ecosystem
Sandip Darwade
 
Map Reduce
Michel Bruley
 
Learn what is Hadoop-and-BigData
Thanusha154
 
Introduction to Yarn
Apache Apex
 
Hadoop Seminar Report
Atul Kushwaha
 
Big data and Hadoop
Rahul Agarwal
 
3.introduction to map reduce
databloginfo
 

Similar to B.MONICA II M.SC COMPUTER SCIENCE (20)

PPT
Hadoop a Natural Choice for Data Intensive Log Processing
Hitendra Kumar
 
PPTX
Apache hadoop introduction and architecture
Harikrishnan K
 
PDF
20131205 hadoop-hdfs-map reduce-introduction
Xuan-Chao Huang
 
PDF
Big data overview of apache hadoop
veeracynixit
 
PDF
Big data overview of apache hadoop
veeracynixit
 
PPTX
Hadoop info
Nikita Sure
 
PPTX
THE SOLUTION FOR BIG DATA
Tarak Tar
 
PPTX
THE SOLUTION FOR BIG DATA
Tarak Tar
 
PPTX
Cppt Hadoop
chunkypandey12
 
PPTX
Cppt
chunkypandey12
 
PPTX
Cppt
chunkypandey12
 
PDF
Big Data Analysis and Its Scheduling Policy – Hadoop
IOSR Journals
 
PDF
G017143640
IOSR Journals
 
PPTX
Big data Analytics Hadoop
Mishika Bharadwaj
 
PPTX
Introduction to Hadoop and Hadoop component
rebeccatho
 
PDF
Harnessing Hadoop and Big Data to Reduce Execution Times
David Tjahjono,MD,MBA(UK)
 
PDF
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
maharajothip1
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hitendra Kumar
 
Apache hadoop introduction and architecture
Harikrishnan K
 
20131205 hadoop-hdfs-map reduce-introduction
Xuan-Chao Huang
 
Big data overview of apache hadoop
veeracynixit
 
Big data overview of apache hadoop
veeracynixit
 
Hadoop info
Nikita Sure
 
THE SOLUTION FOR BIG DATA
Tarak Tar
 
THE SOLUTION FOR BIG DATA
Tarak Tar
 
Cppt Hadoop
chunkypandey12
 
Big Data Analysis and Its Scheduling Policy – Hadoop
IOSR Journals
 
G017143640
IOSR Journals
 
Big data Analytics Hadoop
Mishika Bharadwaj
 
Introduction to Hadoop and Hadoop component
rebeccatho
 
Harnessing Hadoop and Big Data to Reduce Execution Times
David Tjahjono,MD,MBA(UK)
 
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
maharajothip1
 
Ad

Recently uploaded (20)

PDF
Merits and Demerits of DBMS over File System & 3-Tier Architecture in DBMS
MD RIZWAN MOLLA
 
PDF
AUDITABILITY & COMPLIANCE OF AI SYSTEMS IN HEALTHCARE
GAHI Youssef
 
PPTX
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
PDF
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
PPTX
ER_Model_Relationship_in_DBMS_Presentation.pptx
dharaadhvaryu1992
 
PDF
apidays Helsinki & North 2025 - REST in Peace? Hunting the Dominant Design fo...
apidays
 
PPTX
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
PDF
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
PDF
Data Chunking Strategies for RAG in 2025.pdf
Tamanna
 
PDF
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
PPTX
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
PPTX
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
PPTX
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
PPTX
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
PPT
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
PDF
Context Engineering for AI Agents, approaches, memories.pdf
Tamanna
 
PPTX
apidays Helsinki & North 2025 - Vero APIs - Experiences of API development in...
apidays
 
PDF
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
PPTX
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
PPT
AI Future trends and opportunities_oct7v1.ppt
SHIKHAKMEHTA
 
Merits and Demerits of DBMS over File System & 3-Tier Architecture in DBMS
MD RIZWAN MOLLA
 
AUDITABILITY & COMPLIANCE OF AI SYSTEMS IN HEALTHCARE
GAHI Youssef
 
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
ER_Model_Relationship_in_DBMS_Presentation.pptx
dharaadhvaryu1992
 
apidays Helsinki & North 2025 - REST in Peace? Hunting the Dominant Design fo...
apidays
 
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
Data Chunking Strategies for RAG in 2025.pdf
Tamanna
 
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
Context Engineering for AI Agents, approaches, memories.pdf
Tamanna
 
apidays Helsinki & North 2025 - Vero APIs - Experiences of API development in...
apidays
 
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
AI Future trends and opportunities_oct7v1.ppt
SHIKHAKMEHTA
 
Ad

B.MONICA II M.SC COMPUTER SCIENCE

  • 1. HADOOP FOUNDATION FOR ANALYTICS BY B.MONICA II M.SC COMPUTER SCIENCE BON SECOURS COLLEGE FOR WOMEN 1
  • 2. HADOOP  It is an open-source software framework  licensed under the Apache v2 license  It includes: – Map Reduce : offline computing engine – HDFS : Hadoop distributed file system EXAMPLE 2
  • 3. HADOOP GOALS  Scalable: It can reliably store and process petabytes.  Economical: It distributes the data  Efficient: it can process it in parallel on the nodes where the data is located.  Reliable: It automatically maintains multiple copies of data 3
  • 4. USES FOR HADOOP  Data-intensive text processing  Assembly of large genomes  Graph mining  Machine learning and data mining  Large scale social network analysis 4
  • 5. HADOOP: ASSUMPTIONS  Hardware will fail.  Applications need a write-once-read-many access model.  EXAMPLE Facebook: - To store copies of internal log and dimension data sources - it as a source for reporting/analytics and machine learning - 320 machine cluster with 2,560 cores and about 1.3 PB raw storage 5
  • 6. HADOOP CONFIGURATION Conf /hdfs-site.xml: <configuration> <property> <name> Dfs . replication </name> <value> 1 </value> </property> </configuration> 6
  • 7. HISTORY OF HADOOP  Hadoop was started by Doug Cutting to support two of his other well known projects, Lucene and Nutch  Hadoop has been inspired by Google's File System (GFS) which was detailed in a paper by released by Google in 2003  Hadoop, originally called Nutch Distributed File System (NDFS) split from Nutch in 2006 to become a sub-project of Lucene. At this point it was renamed to Hadoop. 7
  • 8.  EXAMPLE Google search engine  2013 - Hadoop 1.1.2 and Hadoop 2.0.3 alpha. - Ambari , Cassandra, Mahout have been added 8
  • 9. • Hadoop is in use at most organizations that handle big data: o Yahoo! o Facebook o Amazon o Netflix 9
  • 10. APACHE MAP REDUCE  A software framework for distributed processing of large data sets  The framework takes care of scheduling tasks, monitoring them and re-executing any failed tasks.  It splits the input data set into independent chunks.  Map Reduce framework sorts the outputs of the maps, which are then input to the reduce tasks.. 10
  • 11. 11
  • 12. MAP REDUCE DATAFLOW  An input reader  A Map function  A partition function  A compare function  A Reduce function  An output writer EXAMPLE: JOB TRACKER TASK TRACKER 12
  • 13. MAP REDUCE-FAULT TOLERANCE  Worker failure: The master pings every worker periodically.  Master Failure: It is easy to make the master write periodic checkpoints of the master data structures 13
  • 14. JOB TRACKER  Tracking Map Reduce jobs in Hadoop  Job Tracker performs following actions in Hadoop  It accepts the Map Reduce Jobs from client applications  Talks to Name Node to determine data location  Locates available Task Tracker Node  Submits the work to the chosen Task Tracker Node 14
  • 15. OTHER TOOLS  Hive  Hadoop processing with SQL  Pig  Hadoop processing with scripting  Cascading  Pipe and Filter processing model  H Base  Database model built on top of Hadoop  Flume  Designed for large scale data movement 15