SlideShare a Scribd company logo
Bigdata ppt
HADOOP ECOSYSTEM
In the previous blog on Hadoop Tutorial,
we discussed about Hadoop, its features
and core components.
Now, the next step forward is to
understand Hadoop Ecosystem.
It is an essential topic to understand
before you start working with Hadoop.
This Hadoop ecosystem blog will
familiarize you with industry-wide used
Big Data frameworks, required
for Hadoop Certification.
 HDFS ->Hadoop Distributed File System
 YARN -> Yet Another Resource Negotiator
 MapReduce -> Data processing using
programming
 Spark -> In-memory Data Processing
 PIG, HIVE-> Data Processing Services using
Query (SQL-like)
 HBase -> NoSQL Database
 Mahout, Spark MLlib -> Machine Learning
 Apache Drill -> SQL on Hadoop
 Zookeeper -> Managing Cluster
HDFS
 Hadoop Distributed File System is the core
component or you can say, the backbone of
Hadoop Ecosystem.
 HDFS is the one, which makes it possible to
store different types of large data sets (i.e.
structured, unstructured and semi structured
data).
 HDFS creates a level of abstraction over the
resources, from where we can see the whole
HDFS as a single unit.
 It helps us in storing our data across various
nodes and maintaining the log file about the
stored data (metadata).
 HDFS has two core components, i.e. NameNode and DataNode.
◦ The NameNode is the main node and it doesn’t store the
actual data. It contains metadata, just like a log file or you can
say as a table of content. Therefore, it requires less storage and
high computational resources.
◦ On the other hand, all your data is stored on
the DataNodes and hence it requires more storage resources.
These DataNodes are commodity hardware (like your laptops
and desktops) in the distributed environment. That’s the
reason, why Hadoop solutions are very cost effective.
◦ You always communicate to the NameNode while writing the
data. Then, it internally sends a request to the client to store
and replicate data on various DataNodes.
YARN
 Consider YARN as the brain of your Hadoop
Ecosystem. It performs all your processing
activities by allocating resources and
scheduling tasks.
 It has two major components,
i.e. ResourceManager and NodeManager.
◦ ResourceManager is again a main node in the
processing department.
◦ It receives the processing requests, and then
passes the parts of requests to corresponding
NodeManagers accordingly, where the actual
processing takes place.
◦ NodeManagers are installed on every DataNode. It
is responsible for execution of task on every single
DataNode.
◦ Schedulers: Based on your application resource
requirements, Schedulers perform scheduling
algorithms and allocates the resources.
◦ ApplicationsManager: While ApplicationsManager
accepts the job submission, negotiates to
containers (i.e. the Data node environment where
process executes) for executing the application
specific ApplicationMaster and monitoring the
progress. ApplicationMasters are the deamons
which reside on DataNode and communicates to
containers for execution of tasks on each
DataNode.ResourceManager has two components,
MAPREDUCE
 It is the core component of processing in a
Hadoop Ecosystem as it provides the logic of
processing. In other words, MapReduce is a
software framework which helps in writing
applications that processes large data sets using
distributed and parallel algorithms inside Hadoop
environment.
 In a MapReduce program, Map() and
Reduce() are two functions.
◦ The Map function performs actions like filtering,
grouping and sorting.
◦ While Reduce function aggregates and summarizes
the result produced by map function.
◦ The result generated by the Map function is a key
value pair (K, V) which acts as the input for Reduce
function.
APACHE PIG
 But don’t be shocked when I say that at the PIG
has two parts: Pig Latin, the language and the pig
runtime, for the execution environment. You can
better understand it as Java and JVM.
 It supports pig latin language, which has SQL like
command structure.
 As everyone does not belong from a programming
background. So, Apache PIG relieves them. You
might be curious to know how?
 Well, I will tell you an interesting fact:
 10 line of pig latin = approx. 200 lines of Map-
Reduce Java code
 back end of Pig job, a map-reduce job executes.
HIVE:
 Facebook created HIVE for people who are fluent with SQL. Thus, HIVE
makes them feel at home while working in a Hadoop Ecosystem.
 Basically, HIVE is a data warehousing component which performs reading,
writing and managing large data sets in a distributed environment using
SQL-like interface.
 HIVE + SQL = HQL
 The query language of Hive is called Hive Query Language(HQL), which is
very similar like SQL.
 It has 2 basic components: Hive Command Line and JDBC/ODBC driver.
 The Hive Command line interface is used to execute HQL commands.
 While, Java Database Connectivity (JDBC) and Object Database
Connectivity (ODBC) is used to establish connection from data storage.
 Secondly, Hive is highly scalable. As, it can serve both the purposes, i.e.
large data set processing (i.e. Batch query processing) and uery
processing).
 It supports all primitive data types of SQL.
 You can use predefined functions, or write tailored user defined functions
(UDF) also to accomplish your specific needs.
 As an alternative, you may go to this comprehensive video tutorial where
each tool present in Hadoop Ecosystem has been discussed:
 Hadoop Ecosystem | Edureka
 APACHE MAHOUT
Now, let us talk about Mahout which is
renowned for machine learning. Mahout
provides an environment for creating
machine learning applications which are
scalable.
 Collaborative filtering: Mahout mines user
behaviors, their patterns and their characteristics and
based on that it predicts and make recommendations
to the users. The typical use case is E-commerce
website.
 Clustering: It organizes a similar group of data
together like articles can contain blogs, news,
research papers etc.
 Classification: It means classifying and categorizing
data into various sub-departments like articles can be
categorized into blogs, news, essay, research papers
and other categories.
 Frequent item set missing: Here Mahout checks,
which objects are likely to be appearing together and
make suggestions, if they are missing. For example,
cell phone and cover are brought together in general.
So, if you search for a cell phone, it will also
recommend you the cover and cases.
 Mahout provides a command line to invoke various
algorithms. It has a predefined set of library which
APACHE SPARK
 Apache Spark is a framework for real time
data analytics in a distributed computing
environment.
 The Spark is written in Scala and was
originally developed at the University of
California, Berkeley.
 It executes in-memory computations to
increase speed of data processing over Map-
Reduce.
 It is 100x faster than Hadoop for large scale
data processing by exploiting in-memory
computations and other
APACHE HBASE
 HBase is an open source, non-relational
distributed database. In other words, it is a NoSQL
database.
 It supports all types of data and that is why, it’s
capable of handling anything and everything
inside a Hadoop ecosystem.
 It is modelled after Google’s BigTable, which is a
distributed storage system designed to cope up
with large data sets.
 The HBase was designed to run on top of HDFS
and provides BigTable like capabilities.
 It gives us a fault tolerant way of storing sparse
data, which is common in most Big Data use
cases.
 The HBase is written in Java, whereas HBase
applications can be written in REST, Avro and
Thrift APIs.
APACHE DRILL
 As the name suggests, Apache Drill is used to drill into any
kind of data. It’s an open source application which works with
distributed environment to analyze large data sets.
 It is a replica of Google Dremel.
 It supports different kinds NoSQL databases and file systems,
which is a powerful feature of Drill. For example: Azure Blob
Storage, Google Cloud Storage, HBase, MongoDB, MapR-DB
HDFS, MapR-FS, Amazon S3, Swift, NAS and local files.
 So, basically the main aim behind Apache Drill is to provide
scalability so that we can process petabytes and exabytes of
data efficiently (or you can say in minutes).
 The main power of Apache Drill lies in combining a variety
of data stores just by using a single query.
 Apache Drill basically follows the ANSI SQL.
 It has a powerful scalability factor in supporting millions of
users and serve their query requests over large scale data.

More Related Content

What's hot (18)

DOCX
HDFS
Vardhman Kale
 
PDF
Hadoop tutorial-pdf.pdf
Sheetal Jain
 
PDF
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
Edureka!
 
PPTX
Big Data and Hadoop - An Introduction
Nagarjuna Kanamarlapudi
 
KEY
Intro to Hadoop
jeffturner
 
PPTX
Intro to Hadoop
Jonathan Bloom
 
PPT
Another Intro To Hadoop
Adeel Ahmad
 
PDF
Hadoop vs spark
amarkayam
 
PPTX
Big Data Training in Amritsar
E2MATRIX
 
PPTX
Hadoop Presentation
Pham Thai Hoa
 
PPTX
Big Data Training in Ludhiana
E2MATRIX
 
PDF
Big Data technology Landscape
ShivanandaVSeeri
 
PPTX
Big Data Training in Mohali
E2MATRIX
 
PDF
Why Spark over Hadoop?
Prwatech Institution
 
PDF
Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?
Edureka!
 
PPTX
Hadoop white papers
Muthu Natarajan
 
PPTX
Hive
Manas Nayak
 
Hadoop tutorial-pdf.pdf
Sheetal Jain
 
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
Edureka!
 
Big Data and Hadoop - An Introduction
Nagarjuna Kanamarlapudi
 
Intro to Hadoop
jeffturner
 
Intro to Hadoop
Jonathan Bloom
 
Another Intro To Hadoop
Adeel Ahmad
 
Hadoop vs spark
amarkayam
 
Big Data Training in Amritsar
E2MATRIX
 
Hadoop Presentation
Pham Thai Hoa
 
Big Data Training in Ludhiana
E2MATRIX
 
Big Data technology Landscape
ShivanandaVSeeri
 
Big Data Training in Mohali
E2MATRIX
 
Why Spark over Hadoop?
Prwatech Institution
 
Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?
Edureka!
 
Hadoop white papers
Muthu Natarajan
 

Similar to Bigdata ppt (20)

PPTX
Complete Hadoop Ecosystem with suitable Example
harikumar288574
 
PPTX
INTRODUCTION TO THE HADOOP ECOSYSTEM.pptx
harikumar288574
 
PPTX
HADOOP ECOSYSTEM ALL ABOUT HADOOP,HADOOP PPT BIG DATA.pptx
Himani271945
 
PDF
BIGDATA MODULE 3.pdf
DIVYA370851
 
PDF
Introduction to HADOOP.pdf
8840VinayShelke
 
PDF
2.1-HADOOP.pdf
MarianJRuben
 
PPTX
hadoop eco system regarding big data analytics.pptx
mrudulasb
 
PDF
Introduction To Hadoop Ecosystem
InSemble
 
PDF
Hadoop Technologies
zahid-mian
 
PDF
Hadoop - Architectural road map for Hadoop Ecosystem
nallagangus
 
PDF
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Andrey Vykhodtsev
 
PPTX
Hadoop Big Data A big picture
J S Jodha
 
PPTX
Hadoop An Introduction
Mohanasundaram Ponnusamy
 
PPTX
MODULE 1: Introduction to Big Data Analytics.pptx
NiramayKolalle
 
PDF
BIGDATA ppts
Krisshhna Daasaarii
 
PPTX
Hadoop_arunam_ppt
jerrin joseph
 
PPTX
Introduction to Hadoop
Dr. C.V. Suresh Babu
 
PPTX
Apache-Hadoop-Slides.pptx
MURINDANYISUDI
 
PPTX
Getting started big data
Kibrom Gebrehiwot
 
PDF
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
tcloudcomputing-tw
 
Complete Hadoop Ecosystem with suitable Example
harikumar288574
 
INTRODUCTION TO THE HADOOP ECOSYSTEM.pptx
harikumar288574
 
HADOOP ECOSYSTEM ALL ABOUT HADOOP,HADOOP PPT BIG DATA.pptx
Himani271945
 
BIGDATA MODULE 3.pdf
DIVYA370851
 
Introduction to HADOOP.pdf
8840VinayShelke
 
2.1-HADOOP.pdf
MarianJRuben
 
hadoop eco system regarding big data analytics.pptx
mrudulasb
 
Introduction To Hadoop Ecosystem
InSemble
 
Hadoop Technologies
zahid-mian
 
Hadoop - Architectural road map for Hadoop Ecosystem
nallagangus
 
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Andrey Vykhodtsev
 
Hadoop Big Data A big picture
J S Jodha
 
Hadoop An Introduction
Mohanasundaram Ponnusamy
 
MODULE 1: Introduction to Big Data Analytics.pptx
NiramayKolalle
 
BIGDATA ppts
Krisshhna Daasaarii
 
Hadoop_arunam_ppt
jerrin joseph
 
Introduction to Hadoop
Dr. C.V. Suresh Babu
 
Apache-Hadoop-Slides.pptx
MURINDANYISUDI
 
Getting started big data
Kibrom Gebrehiwot
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
tcloudcomputing-tw
 
Ad

More from renukarenuka9 (20)

PPTX
mobile computing
renukarenuka9
 
PPTX
Dip
renukarenuka9
 
PPTX
Compiler design
renukarenuka9
 
PPTX
Web programming
renukarenuka9
 
PPTX
Software engineering
renukarenuka9
 
PPTX
Software engineering
renukarenuka9
 
PPTX
Software engineering
renukarenuka9
 
PPTX
Bigdata
renukarenuka9
 
PPTX
Rdbms
renukarenuka9
 
PPTX
Rdbms
renukarenuka9
 
PPTX
operating system
renukarenuka9
 
PPTX
Rdbms
renukarenuka9
 
PPTX
OPERATING SYSTEM
renukarenuka9
 
PPTX
Data mining
renukarenuka9
 
PPTX
Computer network
renukarenuka9
 
PPTX
computer network
renukarenuka9
 
PPTX
operating system
renukarenuka9
 
PPTX
data mining
renukarenuka9
 
PPTX
COMPUTER NETWORK
renukarenuka9
 
PPTX
data mining
renukarenuka9
 
mobile computing
renukarenuka9
 
Compiler design
renukarenuka9
 
Web programming
renukarenuka9
 
Software engineering
renukarenuka9
 
Software engineering
renukarenuka9
 
Software engineering
renukarenuka9
 
Bigdata
renukarenuka9
 
operating system
renukarenuka9
 
OPERATING SYSTEM
renukarenuka9
 
Data mining
renukarenuka9
 
Computer network
renukarenuka9
 
computer network
renukarenuka9
 
operating system
renukarenuka9
 
data mining
renukarenuka9
 
COMPUTER NETWORK
renukarenuka9
 
data mining
renukarenuka9
 
Ad

Recently uploaded (20)

PDF
A High-Caliber View of the Bullet Cluster through JWST Strong and Weak Lensin...
Sérgio Sacani
 
PDF
Carbon-richDustInjectedintotheInterstellarMediumbyGalacticWCBinaries Survives...
Sérgio Sacani
 
DOCX
Paper - Taboo Language (Makalah Presentasi)
Sahmiral Amri Rajagukguk
 
PPTX
Bacillus thuringiensis.crops & golden rice
priyadharshini87125
 
PPTX
ION EXCHANGE CHROMATOGRAPHY NEW PPT (JA).pptx
adhagalejotshna
 
PDF
Carbonate formation and fluctuating habitability on Mars
Sérgio Sacani
 
PDF
Adding Geochemistry To Understand Recharge Areas - Kinney County, Texas - Jim...
Texas Alliance of Groundwater Districts
 
PPTX
Diagnostic Features of Common Oral Ulcerative Lesions.pptx
Dr Palak borade
 
PDF
20250603 Recycling 4.pdf . Rice flour, aluminium, hydrogen, paper, cardboard.
Sharon Liu
 
PDF
The ALMA-CRISTAL survey: Gas, dust, and stars in star-forming galaxies when t...
Sérgio Sacani
 
PDF
Calcium in a supernova remnant as a fingerprint of a sub-Chandrasekhar-mass e...
Sérgio Sacani
 
PDF
Preserving brand authenticity amid AI-driven misinformation: Sustaining consu...
Selcen Ozturkcan
 
PPT
Experimental Design by Cary Willard v3.ppt
MohammadRezaNirooman1
 
PPTX
Presentation 1 Microbiome Engineering and Synthetic Microbiology.pptx
Prachi Virat
 
PDF
Rapid protoplanet formation in the outer Solar System recorded in a dunite fr...
Sérgio Sacani
 
PPTX
770043401-q1-Ppt-pe-and-Health-7-week-1-lesson-1.pptx
AizaRazonado
 
PPTX
Q1 - W1 - D2 - Models of matter for science.pptx
RyanCudal3
 
PDF
soil and environmental microbiology.pdf
Divyaprabha67
 
PDF
Integrating Lifestyle Data into Personalized Health Solutions (www.kiu.ac.ug)
publication11
 
PDF
Pharmakon of algorithmic alchemy: Marketing in the age of AI
Selcen Ozturkcan
 
A High-Caliber View of the Bullet Cluster through JWST Strong and Weak Lensin...
Sérgio Sacani
 
Carbon-richDustInjectedintotheInterstellarMediumbyGalacticWCBinaries Survives...
Sérgio Sacani
 
Paper - Taboo Language (Makalah Presentasi)
Sahmiral Amri Rajagukguk
 
Bacillus thuringiensis.crops & golden rice
priyadharshini87125
 
ION EXCHANGE CHROMATOGRAPHY NEW PPT (JA).pptx
adhagalejotshna
 
Carbonate formation and fluctuating habitability on Mars
Sérgio Sacani
 
Adding Geochemistry To Understand Recharge Areas - Kinney County, Texas - Jim...
Texas Alliance of Groundwater Districts
 
Diagnostic Features of Common Oral Ulcerative Lesions.pptx
Dr Palak borade
 
20250603 Recycling 4.pdf . Rice flour, aluminium, hydrogen, paper, cardboard.
Sharon Liu
 
The ALMA-CRISTAL survey: Gas, dust, and stars in star-forming galaxies when t...
Sérgio Sacani
 
Calcium in a supernova remnant as a fingerprint of a sub-Chandrasekhar-mass e...
Sérgio Sacani
 
Preserving brand authenticity amid AI-driven misinformation: Sustaining consu...
Selcen Ozturkcan
 
Experimental Design by Cary Willard v3.ppt
MohammadRezaNirooman1
 
Presentation 1 Microbiome Engineering and Synthetic Microbiology.pptx
Prachi Virat
 
Rapid protoplanet formation in the outer Solar System recorded in a dunite fr...
Sérgio Sacani
 
770043401-q1-Ppt-pe-and-Health-7-week-1-lesson-1.pptx
AizaRazonado
 
Q1 - W1 - D2 - Models of matter for science.pptx
RyanCudal3
 
soil and environmental microbiology.pdf
Divyaprabha67
 
Integrating Lifestyle Data into Personalized Health Solutions (www.kiu.ac.ug)
publication11
 
Pharmakon of algorithmic alchemy: Marketing in the age of AI
Selcen Ozturkcan
 

Bigdata ppt

  • 2. HADOOP ECOSYSTEM In the previous blog on Hadoop Tutorial, we discussed about Hadoop, its features and core components. Now, the next step forward is to understand Hadoop Ecosystem. It is an essential topic to understand before you start working with Hadoop. This Hadoop ecosystem blog will familiarize you with industry-wide used Big Data frameworks, required for Hadoop Certification.
  • 3.  HDFS ->Hadoop Distributed File System  YARN -> Yet Another Resource Negotiator  MapReduce -> Data processing using programming  Spark -> In-memory Data Processing  PIG, HIVE-> Data Processing Services using Query (SQL-like)  HBase -> NoSQL Database  Mahout, Spark MLlib -> Machine Learning  Apache Drill -> SQL on Hadoop  Zookeeper -> Managing Cluster
  • 4. HDFS  Hadoop Distributed File System is the core component or you can say, the backbone of Hadoop Ecosystem.  HDFS is the one, which makes it possible to store different types of large data sets (i.e. structured, unstructured and semi structured data).  HDFS creates a level of abstraction over the resources, from where we can see the whole HDFS as a single unit.  It helps us in storing our data across various nodes and maintaining the log file about the stored data (metadata).
  • 5.  HDFS has two core components, i.e. NameNode and DataNode. ◦ The NameNode is the main node and it doesn’t store the actual data. It contains metadata, just like a log file or you can say as a table of content. Therefore, it requires less storage and high computational resources. ◦ On the other hand, all your data is stored on the DataNodes and hence it requires more storage resources. These DataNodes are commodity hardware (like your laptops and desktops) in the distributed environment. That’s the reason, why Hadoop solutions are very cost effective. ◦ You always communicate to the NameNode while writing the data. Then, it internally sends a request to the client to store and replicate data on various DataNodes.
  • 6. YARN  Consider YARN as the brain of your Hadoop Ecosystem. It performs all your processing activities by allocating resources and scheduling tasks.  It has two major components, i.e. ResourceManager and NodeManager. ◦ ResourceManager is again a main node in the processing department. ◦ It receives the processing requests, and then passes the parts of requests to corresponding NodeManagers accordingly, where the actual processing takes place.
  • 7. ◦ NodeManagers are installed on every DataNode. It is responsible for execution of task on every single DataNode. ◦ Schedulers: Based on your application resource requirements, Schedulers perform scheduling algorithms and allocates the resources. ◦ ApplicationsManager: While ApplicationsManager accepts the job submission, negotiates to containers (i.e. the Data node environment where process executes) for executing the application specific ApplicationMaster and monitoring the progress. ApplicationMasters are the deamons which reside on DataNode and communicates to containers for execution of tasks on each DataNode.ResourceManager has two components,
  • 8. MAPREDUCE  It is the core component of processing in a Hadoop Ecosystem as it provides the logic of processing. In other words, MapReduce is a software framework which helps in writing applications that processes large data sets using distributed and parallel algorithms inside Hadoop environment.  In a MapReduce program, Map() and Reduce() are two functions. ◦ The Map function performs actions like filtering, grouping and sorting. ◦ While Reduce function aggregates and summarizes the result produced by map function. ◦ The result generated by the Map function is a key value pair (K, V) which acts as the input for Reduce function.
  • 9. APACHE PIG  But don’t be shocked when I say that at the PIG has two parts: Pig Latin, the language and the pig runtime, for the execution environment. You can better understand it as Java and JVM.  It supports pig latin language, which has SQL like command structure.  As everyone does not belong from a programming background. So, Apache PIG relieves them. You might be curious to know how?  Well, I will tell you an interesting fact:  10 line of pig latin = approx. 200 lines of Map- Reduce Java code  back end of Pig job, a map-reduce job executes.
  • 10. HIVE:  Facebook created HIVE for people who are fluent with SQL. Thus, HIVE makes them feel at home while working in a Hadoop Ecosystem.  Basically, HIVE is a data warehousing component which performs reading, writing and managing large data sets in a distributed environment using SQL-like interface.  HIVE + SQL = HQL  The query language of Hive is called Hive Query Language(HQL), which is very similar like SQL.  It has 2 basic components: Hive Command Line and JDBC/ODBC driver.  The Hive Command line interface is used to execute HQL commands.  While, Java Database Connectivity (JDBC) and Object Database Connectivity (ODBC) is used to establish connection from data storage.  Secondly, Hive is highly scalable. As, it can serve both the purposes, i.e. large data set processing (i.e. Batch query processing) and uery processing).  It supports all primitive data types of SQL.  You can use predefined functions, or write tailored user defined functions (UDF) also to accomplish your specific needs.  As an alternative, you may go to this comprehensive video tutorial where each tool present in Hadoop Ecosystem has been discussed:  Hadoop Ecosystem | Edureka
  • 11.  APACHE MAHOUT Now, let us talk about Mahout which is renowned for machine learning. Mahout provides an environment for creating machine learning applications which are scalable.
  • 12.  Collaborative filtering: Mahout mines user behaviors, their patterns and their characteristics and based on that it predicts and make recommendations to the users. The typical use case is E-commerce website.  Clustering: It organizes a similar group of data together like articles can contain blogs, news, research papers etc.  Classification: It means classifying and categorizing data into various sub-departments like articles can be categorized into blogs, news, essay, research papers and other categories.  Frequent item set missing: Here Mahout checks, which objects are likely to be appearing together and make suggestions, if they are missing. For example, cell phone and cover are brought together in general. So, if you search for a cell phone, it will also recommend you the cover and cases.  Mahout provides a command line to invoke various algorithms. It has a predefined set of library which
  • 13. APACHE SPARK  Apache Spark is a framework for real time data analytics in a distributed computing environment.  The Spark is written in Scala and was originally developed at the University of California, Berkeley.  It executes in-memory computations to increase speed of data processing over Map- Reduce.  It is 100x faster than Hadoop for large scale data processing by exploiting in-memory computations and other
  • 14. APACHE HBASE  HBase is an open source, non-relational distributed database. In other words, it is a NoSQL database.  It supports all types of data and that is why, it’s capable of handling anything and everything inside a Hadoop ecosystem.  It is modelled after Google’s BigTable, which is a distributed storage system designed to cope up with large data sets.  The HBase was designed to run on top of HDFS and provides BigTable like capabilities.  It gives us a fault tolerant way of storing sparse data, which is common in most Big Data use cases.  The HBase is written in Java, whereas HBase applications can be written in REST, Avro and Thrift APIs.
  • 15. APACHE DRILL  As the name suggests, Apache Drill is used to drill into any kind of data. It’s an open source application which works with distributed environment to analyze large data sets.  It is a replica of Google Dremel.  It supports different kinds NoSQL databases and file systems, which is a powerful feature of Drill. For example: Azure Blob Storage, Google Cloud Storage, HBase, MongoDB, MapR-DB HDFS, MapR-FS, Amazon S3, Swift, NAS and local files.  So, basically the main aim behind Apache Drill is to provide scalability so that we can process petabytes and exabytes of data efficiently (or you can say in minutes).  The main power of Apache Drill lies in combining a variety of data stores just by using a single query.  Apache Drill basically follows the ANSI SQL.  It has a powerful scalability factor in supporting millions of users and serve their query requests over large scale data.