SlideShare a Scribd company logo
BY – Arvind
BEST IT TRAINING INSTITUTE IN BANGALORE
What is Hadoop?
• The Apache Hadoop software library is a
framework that allows for the distributed
processing of large data sets across clusters
of computers using simple programming
models.
• It is made by apache software foundation in
2011.
• Written in JAVA.
Hadoop is open source software.
Framework
Massive Storage
Processing Power
Big Data
•Big data is a term used to define very large amount of unstructured and
semi structured data a company creates.
•The term is used when talking about Petabytes and Exabyte of data.
•That much data would take so much time and cost to load into relational
database for analysis.
•Facebook has almost 10billion photos taking up to 1Petabytes of storage.
So what is theproblem??
1. Processing that large data is very difficult in relational database.
2. It would take too much time to process data and cost.
We can solve this problem by Distributed
Computing.
But the problems in distributed computing is –
Hardware failure
Chances of hardware failure is always there.
Combine the data after analysis
Data from all disks have to be combined from all the disks which is a mess.
ToSolve all the Problems HadoopCame.
It has two main parts –
1. Hadoop Distributed File System (HDFS),
2. Data Processing Framework & MapReduce
1. Hadoop Distributed File System
It ties so many small and reasonable priced machines together into a single cost effective computer
cluster.
Data and application processing are protected against hardware failure.
If a node goes down, jobs are automatically redirected to other nodes to make sure the distributed
computing does not fail.
it automatically stores multiple copies of all data.
It provides simplified programming model which allows user to quickly read and write the
distributed system.
2. MapReduce
MapReduce is a programming model for processing and generating large data sets with a
parallel, distributed algorithm on a cluster.
It is an associative implementation for processing and generating large data sets.
MAP function that process a key pair to generates a set of intermediate key pairs.
REDUCE function that merges all intermediate values associated with the same intermediate
key
Hadoop tutorial for Freshers,
Hadoop tutorial for Freshers,
Pros of Hadoop
1. Computing power
2. Flexibility
3. Fault Tolerance
4. Low Cost
5. Scalability
Cons of Hadoop
1. Integration with existing systems
Hadoop is not optimised for ease for use. Installing and integrating with existing
databases might prove to be difficult, especially since there is no software support
provided.
2. Administration and ease of use
Hadoop requires knowledge of MapReduce, while most data practitioners use SQL. This
means significant training may be required to administer Hadoop clusters.
3. Security
Hadoop lacks the level of security functionality needed for safe enterprise deployment,
especially if it concerns sensitive data.
For More Query
+91 9513332301/02
Website :http://www.traininginbangalore.com/best-hadoop-training-institutes-in-bangalore/

More Related Content

What's hot (20)

PDF
Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ...
Databricks
 
PPTX
Power aware load balancing in cloud
manjula manju
 
PDF
Why Talend for Big Data?
Edureka!
 
ODP
Challenges in Large Scale Machine Learning
Sudarsun Santhiappan
 
PPTX
Machine Learning on Distributed Systems by Josh Poduska
Data Con LA
 
PPTX
Azure machine learning service
Ruth Yakubu
 
PDF
5 Scenarios: When To Use & When Not to Use Hadoop
Edureka!
 
PPTX
Introduction to Big Data Technologies: Hadoop/EMR/Map Reduce & Redshift
DataKitchen
 
PPTX
From HadoopDB to Hadapt: A Case Study of Transitioning a VLDB paper into Real...
Daniel Abadi
 
PPTX
Couchbase
templedf
 
PDF
.Net development with Azure Machine Learning (AzureML) Nov 2014
Mark Tabladillo
 
PDF
Continuous Integration & Continuous Delivery
Databricks
 
PPTX
Atlanta MLConf
Qubole
 
PDF
SparkApplicationDevMadeEasy_Spark_Summit_2015
Lance Co Ting Keh
 
PDF
Building Data Quality pipelines with Apache Spark and Delta Lake
Databricks
 
PPT
Pervasive DataRush
templedf
 
PPTX
Hadoop and Graph Data Management: Challenges and Opportunities
Daniel Abadi
 
PPTX
MCT Summit Azure automated Machine Learning
Usama Wahab Khan Cloud, Data and AI
 
PDF
Impala use case @ Zoosk
Cloudera, Inc.
 
PPTX
Revolution Analytics
templedf
 
Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ...
Databricks
 
Power aware load balancing in cloud
manjula manju
 
Why Talend for Big Data?
Edureka!
 
Challenges in Large Scale Machine Learning
Sudarsun Santhiappan
 
Machine Learning on Distributed Systems by Josh Poduska
Data Con LA
 
Azure machine learning service
Ruth Yakubu
 
5 Scenarios: When To Use & When Not to Use Hadoop
Edureka!
 
Introduction to Big Data Technologies: Hadoop/EMR/Map Reduce & Redshift
DataKitchen
 
From HadoopDB to Hadapt: A Case Study of Transitioning a VLDB paper into Real...
Daniel Abadi
 
Couchbase
templedf
 
.Net development with Azure Machine Learning (AzureML) Nov 2014
Mark Tabladillo
 
Continuous Integration & Continuous Delivery
Databricks
 
Atlanta MLConf
Qubole
 
SparkApplicationDevMadeEasy_Spark_Summit_2015
Lance Co Ting Keh
 
Building Data Quality pipelines with Apache Spark and Delta Lake
Databricks
 
Pervasive DataRush
templedf
 
Hadoop and Graph Data Management: Challenges and Opportunities
Daniel Abadi
 
MCT Summit Azure automated Machine Learning
Usama Wahab Khan Cloud, Data and AI
 
Impala use case @ Zoosk
Cloudera, Inc.
 
Revolution Analytics
templedf
 

Similar to Hadoop tutorial for Freshers, (20)

PPTX
PPT on Hadoop
Shubham Parmar
 
PPTX
Hadoop introduction , Why and What is Hadoop ?
sudhakara st
 
PPTX
Hadoop by kamran khan
KamranKhan587
 
PPTX
Hadoop live online training
Harika583
 
PPTX
Seminar ppt
RajatTripathi34
 
PPTX
Hadoop info
Nikita Sure
 
DOCX
Hadoop Seminar Report
Atul Kushwaha
 
PPTX
Hadoop technology
tipanagiriharika
 
PDF
Understanding hadoop
RexRamos9
 
PPTX
Big Data and Hadoop
Mr. Ankit
 
PPTX
Learn what is Hadoop-and-BigData
Thanusha154
 
PDF
Seminar_Report_hadoop
Varun Narang
 
PDF
2.1-HADOOP.pdf
MarianJRuben
 
PDF
Hadoop vs spark
amarkayam
 
PPTX
Hadoop Framework, its characteristics, advantages and uses
UswaAbid1
 
PPT
Introduccion a Hadoop / Introduction to Hadoop
GERARDO BARBERENA
 
PPTX
Talend for big_data_intorduction
Lakshman Dhullipalla
 
PDF
Hadoop .pdf
SudhanshiBakre1
 
PPTX
Big data
Abilash Mavila
 
PPT on Hadoop
Shubham Parmar
 
Hadoop introduction , Why and What is Hadoop ?
sudhakara st
 
Hadoop by kamran khan
KamranKhan587
 
Hadoop live online training
Harika583
 
Seminar ppt
RajatTripathi34
 
Hadoop info
Nikita Sure
 
Hadoop Seminar Report
Atul Kushwaha
 
Hadoop technology
tipanagiriharika
 
Understanding hadoop
RexRamos9
 
Big Data and Hadoop
Mr. Ankit
 
Learn what is Hadoop-and-BigData
Thanusha154
 
Seminar_Report_hadoop
Varun Narang
 
2.1-HADOOP.pdf
MarianJRuben
 
Hadoop vs spark
amarkayam
 
Hadoop Framework, its characteristics, advantages and uses
UswaAbid1
 
Introduccion a Hadoop / Introduction to Hadoop
GERARDO BARBERENA
 
Talend for big_data_intorduction
Lakshman Dhullipalla
 
Hadoop .pdf
SudhanshiBakre1
 
Big data
Abilash Mavila
 
Ad

More from TIB Academy (16)

PPTX
Msbi
TIB Academy
 
PPTX
Ios operating system
TIB Academy
 
PPTX
Salesforce
TIB Academy
 
PPTX
CCNA Introducing
TIB Academy
 
PPTX
CCNA Introducing
TIB Academy
 
PPTX
Hadoop training
TIB Academy
 
PPTX
Selenium institute in bangalore
TIB Academy
 
PPTX
Selenium Tutorial for Beginners - TIB Academy
TIB Academy
 
PPTX
Django framework
TIB Academy
 
PPTX
Python basics
TIB Academy
 
PPTX
Core java tutorials
TIB Academy
 
PPTX
Spring tutorials
TIB Academy
 
PPTX
78
TIB Academy
 
PPTX
Oracle DBA Tutorial for Beginners -Oracle training institute in bangalore
TIB Academy
 
PPTX
Python tutorial for beginners - Tib academy
TIB Academy
 
PPTX
Best Angularjs tutorial for beginners - TIB Academy
TIB Academy
 
Ios operating system
TIB Academy
 
Salesforce
TIB Academy
 
CCNA Introducing
TIB Academy
 
CCNA Introducing
TIB Academy
 
Hadoop training
TIB Academy
 
Selenium institute in bangalore
TIB Academy
 
Selenium Tutorial for Beginners - TIB Academy
TIB Academy
 
Django framework
TIB Academy
 
Python basics
TIB Academy
 
Core java tutorials
TIB Academy
 
Spring tutorials
TIB Academy
 
Oracle DBA Tutorial for Beginners -Oracle training institute in bangalore
TIB Academy
 
Python tutorial for beginners - Tib academy
TIB Academy
 
Best Angularjs tutorial for beginners - TIB Academy
TIB Academy
 
Ad

Recently uploaded (20)

PDF
The-Invisible-Living-World-Beyond-Our-Naked-Eye chapter 2.pdf/8th science cur...
Sandeep Swamy
 
PDF
My Thoughts On Q&A- A Novel By Vikas Swarup
Niharika
 
PPTX
Sonnet 130_ My Mistress’ Eyes Are Nothing Like the Sun By William Shakespear...
DhatriParmar
 
PPTX
PROTIEN ENERGY MALNUTRITION: NURSING MANAGEMENT.pptx
PRADEEP ABOTHU
 
PPT
DRUGS USED IN THERAPY OF SHOCK, Shock Therapy, Treatment or management of shock
Rajshri Ghogare
 
PPTX
Introduction to pediatric nursing in 5th Sem..pptx
AneetaSharma15
 
PPTX
ENGLISH 8 WEEK 3 Q1 - Analyzing the linguistic, historical, andor biographica...
OliverOllet
 
PPTX
Rules and Regulations of Madhya Pradesh Library Part-I
SantoshKumarKori2
 
PPTX
I INCLUDED THIS TOPIC IS INTELLIGENCE DEFINITION, MEANING, INDIVIDUAL DIFFERE...
parmarjuli1412
 
PPTX
TOP 10 AI TOOLS YOU MUST LEARN TO SURVIVE IN 2025 AND ABOVE
digilearnings.com
 
PDF
BÀI TẬP TEST BỔ TRỢ THEO TỪNG CHỦ ĐỀ CỦA TỪNG UNIT KÈM BÀI TẬP NGHE - TIẾNG A...
Nguyen Thanh Tu Collection
 
PPTX
Virus sequence retrieval from NCBI database
yamunaK13
 
PPTX
Translation_ Definition, Scope & Historical Development.pptx
DhatriParmar
 
PPTX
20250924 Navigating the Future: How to tell the difference between an emergen...
McGuinness Institute
 
PPTX
K-Circle-Weekly-Quiz12121212-May2025.pptx
Pankaj Rodey
 
PPTX
Applied-Statistics-1.pptx hardiba zalaaa
hardizala899
 
PPTX
Top 10 AI Tools, Like ChatGPT. You Must Learn In 2025
Digilearnings
 
PPTX
How to Close Subscription in Odoo 18 - Odoo Slides
Celine George
 
PDF
The Minister of Tourism, Culture and Creative Arts, Abla Dzifa Gomashie has e...
nservice241
 
PPTX
YSPH VMOC Special Report - Measles Outbreak Southwest US 7-20-2025.pptx
Yale School of Public Health - The Virtual Medical Operations Center (VMOC)
 
The-Invisible-Living-World-Beyond-Our-Naked-Eye chapter 2.pdf/8th science cur...
Sandeep Swamy
 
My Thoughts On Q&A- A Novel By Vikas Swarup
Niharika
 
Sonnet 130_ My Mistress’ Eyes Are Nothing Like the Sun By William Shakespear...
DhatriParmar
 
PROTIEN ENERGY MALNUTRITION: NURSING MANAGEMENT.pptx
PRADEEP ABOTHU
 
DRUGS USED IN THERAPY OF SHOCK, Shock Therapy, Treatment or management of shock
Rajshri Ghogare
 
Introduction to pediatric nursing in 5th Sem..pptx
AneetaSharma15
 
ENGLISH 8 WEEK 3 Q1 - Analyzing the linguistic, historical, andor biographica...
OliverOllet
 
Rules and Regulations of Madhya Pradesh Library Part-I
SantoshKumarKori2
 
I INCLUDED THIS TOPIC IS INTELLIGENCE DEFINITION, MEANING, INDIVIDUAL DIFFERE...
parmarjuli1412
 
TOP 10 AI TOOLS YOU MUST LEARN TO SURVIVE IN 2025 AND ABOVE
digilearnings.com
 
BÀI TẬP TEST BỔ TRỢ THEO TỪNG CHỦ ĐỀ CỦA TỪNG UNIT KÈM BÀI TẬP NGHE - TIẾNG A...
Nguyen Thanh Tu Collection
 
Virus sequence retrieval from NCBI database
yamunaK13
 
Translation_ Definition, Scope & Historical Development.pptx
DhatriParmar
 
20250924 Navigating the Future: How to tell the difference between an emergen...
McGuinness Institute
 
K-Circle-Weekly-Quiz12121212-May2025.pptx
Pankaj Rodey
 
Applied-Statistics-1.pptx hardiba zalaaa
hardizala899
 
Top 10 AI Tools, Like ChatGPT. You Must Learn In 2025
Digilearnings
 
How to Close Subscription in Odoo 18 - Odoo Slides
Celine George
 
The Minister of Tourism, Culture and Creative Arts, Abla Dzifa Gomashie has e...
nservice241
 
YSPH VMOC Special Report - Measles Outbreak Southwest US 7-20-2025.pptx
Yale School of Public Health - The Virtual Medical Operations Center (VMOC)
 

Hadoop tutorial for Freshers,

  • 1. BY – Arvind BEST IT TRAINING INSTITUTE IN BANGALORE
  • 2. What is Hadoop? • The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. • It is made by apache software foundation in 2011. • Written in JAVA.
  • 3. Hadoop is open source software. Framework Massive Storage Processing Power
  • 4. Big Data •Big data is a term used to define very large amount of unstructured and semi structured data a company creates. •The term is used when talking about Petabytes and Exabyte of data. •That much data would take so much time and cost to load into relational database for analysis. •Facebook has almost 10billion photos taking up to 1Petabytes of storage.
  • 5. So what is theproblem?? 1. Processing that large data is very difficult in relational database. 2. It would take too much time to process data and cost.
  • 6. We can solve this problem by Distributed Computing. But the problems in distributed computing is – Hardware failure Chances of hardware failure is always there. Combine the data after analysis Data from all disks have to be combined from all the disks which is a mess.
  • 7. ToSolve all the Problems HadoopCame. It has two main parts – 1. Hadoop Distributed File System (HDFS), 2. Data Processing Framework & MapReduce
  • 8. 1. Hadoop Distributed File System It ties so many small and reasonable priced machines together into a single cost effective computer cluster. Data and application processing are protected against hardware failure. If a node goes down, jobs are automatically redirected to other nodes to make sure the distributed computing does not fail. it automatically stores multiple copies of all data. It provides simplified programming model which allows user to quickly read and write the distributed system.
  • 9. 2. MapReduce MapReduce is a programming model for processing and generating large data sets with a parallel, distributed algorithm on a cluster. It is an associative implementation for processing and generating large data sets. MAP function that process a key pair to generates a set of intermediate key pairs. REDUCE function that merges all intermediate values associated with the same intermediate key
  • 12. Pros of Hadoop 1. Computing power 2. Flexibility 3. Fault Tolerance 4. Low Cost 5. Scalability
  • 13. Cons of Hadoop 1. Integration with existing systems Hadoop is not optimised for ease for use. Installing and integrating with existing databases might prove to be difficult, especially since there is no software support provided. 2. Administration and ease of use Hadoop requires knowledge of MapReduce, while most data practitioners use SQL. This means significant training may be required to administer Hadoop clusters. 3. Security Hadoop lacks the level of security functionality needed for safe enterprise deployment, especially if it concerns sensitive data.
  • 14. For More Query +91 9513332301/02 Website :http://www.traininginbangalore.com/best-hadoop-training-institutes-in-bangalore/