SlideShare a Scribd company logo
Hbase
What is HBase?
1. Hbase is an open source and sorted map data built on Hadoop.
2. It is column oriented and horizontally scalable.
3. It is based on Google's Big Table.It has set of tables which keep data in key value
format.
4. Hbase is well suited for sparse data sets which are very common in big data use
cases.
5. Hbase provides APIs enabling development in practically any programming
language.
6. It is a part of the Hadoop ecosystem that provides random real-time read/write
access to data in the Hadoop File System
CCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPER
Why Hbase?
• RDBMS get exponentially slow as the data becomes large
• Expects data to be highly structured, i.e. ability to fit in a well-
defined schema
• Any change in schema might require a downtime
• For sparse datasets, too much of overhead of maintaining
NULL values
Features of Hbase
• Horizontally scalable: You can add any number of columns anytime.
• Automatic Failover: Automatic failover is a resource that allows a
system administrator to automatically switch data handling to a
standby system in the event of system compromise
• Integrations with Map/Reduce framework: Al the commands and java
codes internally implement Map/ Reduce to do the task and it is built
over Hadoop Distributed File System.
• sparse, distributed, persistent, multidimensional sorted map, which is
indexed by rowkey, column key,and timestamp.
• Often referred as a key value store or column family-oriented
database, or storing versioned maps of maps.
• fundamentally, it's a platform for storing and retrieving data with
random access.
• It doesn't care about datatypes(storing an integer in one row
and a string in another for the same column).
• It doesn't enforce relationships within your data.
• It is designed to run on a cluster of computers, built using
commodity hardware.
HBase Architecture: Use Cases,
Components & Data Model
• HBase architecture consists mainly of four components
• HMaster
• HRegionserver
• HRegions
• Zookeeper
• HDFS
HMaster
• HMaster in HBase is the implementation of a Master
server in HBase architecture.
• It acts as a monitoring agent to monitor all Region Server
instances present in the cluster and acts as an interface
all the metadata changes.
• In a distributed cluster environment, Master runs on
NameNode. Master runs several background threads.
• Plays a vital role in terms of performance and maintaining
nodes in the cluster.
• HMaster provides admin performance and distributes services
to different region servers.
• HMaster assigns regions to region servers.
• HMaster has the features like controlling load balancing and
failover to handle the load over nodes present in the cluster.
• When a client wants to change any schema and to change any
Metadata operations, HMaster takes responsibility for these
operations.
• Some of the methods exposed by HMaster Interface are
primarily Metadata oriented methods.
• Table (createTable, removeTable, enable, disable)
• ColumnFamily (add Column, modify Column)
• Region (move, assign)
• The client communicates in a bi-directional way with both
HMaster and ZooKeeper. For read and write operations, it
directly contacts with HRegion servers. HMaster assigns
regions to region servers and in turn, check the health
status of region servers.
• In entire architecture, we have multiple region servers.
Hlog present in region servers which are going to store all
the log files.
HBase Region Servers
• When HBase Region Server receives writes and read requests
from the client, it assigns the request to a specific region,
where the actual column family resides. However, the client
can directly contact with HRegion servers, there is no need of
HMaster mandatory permission to the client regarding
communication with HRegion servers. The client requires
HMaster help when operations related to metadata and
schema changes are required.
• HRegionServer is the Region Server implementation. It is
responsible for serving and managing regions or data that is
present in a distributed cluster. The region servers run on
Data Nodes present in the Hadoop cluster.
• HMaster can get into contact with multiple HRegion servers
and performs the following functions.
• Hosting and managing regions
• Splitting regions automatically
• Handling read and writes requests
• Communicating with the client directly
• HBase Regions
• HRegions are the basic building elements of HBase cluster
that consists of the distribution of tables and are
comprised of Column families.
• It contains multiple stores, one for each column family. It
consists of mainly two components, which are Memstore
and Hfile.
ZooKeeper
• HBase Zookeeper is a centralized monitoring server which
maintains configuration information and provides distributed
synchronization. Distributed synchronization is to access the
distributed applications running across the cluster with the
responsibility of providing coordination services between
nodes.
• If the client wants to communicate with regions, the server’s
client has to approach ZooKeeper first.
• It is an open source project, and it provides so many
important services.
Services provided by ZooKeeper
• Maintains Configuration information
• Provides distributed synchronization
• Client Communication establishment with region servers
• Provides ephemeral nodes for which represent different region
servers
• Master servers usability of ephemeral nodes for discovering
available servers in the cluster
• To track server failure and network partitions
• Master and HBase slave nodes ( region servers) registered
themselves with ZooKeeper. The client needs access to
ZK(zookeeper) quorum configuration to connect with master
and region servers.
• During a failure of nodes that present in HBase cluster,
ZKquoram will trigger error messages, and it starts to repair the
failed nodes.
HDFS
• HDFS is a Hadoop distributed File System, as the name
implies it provides a distributed environment for the
storage and it is a file system designed in a way to run on
commodity hardware.
• It stores each file in multiple blocks and to maintain fault
tolerance, the blocks are replicated across a Hadoop
cluster.
HBase Data Model
• HBase Data Model is a set of components that consists of
Tables, Rows, Column families, Cells, Columns, and Versions.
HBase tables contain column families and rows with elements
defined as Primary keys. A column in HBase data model table
represents attributes to the objects.
• HBase Data Model consists of following elements,
• Set of tables
• Each table with column families and rows
• Each table must have an element defined as Primary Key.
• Row key acts as a Primary key in HBase.
• Any access to HBase tables uses this Primary Key
• Each column present in HBase denotes attribute
corresponding to object
Storage Mechanism in HBase
• HBase is a column-oriented database and data is stored
in tables. The tables are sorted by RowId. As shown
below, HBase has RowId, which is the collection of several
column families that are present in the table.
• The column families that are present in the schema are
key-value pairs. If we observe in detail each column
family having multiple numbers of columns. The column
values stored into disk memory. Each cell of the table has
its own Metadata like timestamp and other information.
CCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPER
• Coming to HBase the following are the key terms
representing table schema
• Table: Collection of rows present.
• Row: Collection of column families.
• Column Family: Collection of columns.
• Column: Collection of key-value pairs.
• Namespace: Logical grouping of tables.
• Cell: A {row, column, version} tuple exactly specifies a cell
definition in HBase.
HBase Read and Write operation
• Step 1) Client wants to write data and in turn first
communicates with Regions server and then regions
• Step 2) Regions contacting memstore for storing associated
with the column family
• Step 3) First data stores into Memstore, where the data is
sorted and after that, it flushes into HFile. The main reason for
using Memstore is to store data in a Distributed file system
based on Row Key. Memstore will be placed in Region server
main memory while HFiles are written into HDFS.
• Step 4) Client wants to read data from Regions
• Step 5) In turn Client can have direct access to Mem store, and
it can request for data.
• Step 6) Client approaches HFiles to get the data. The data are
fetched and retrieved by the Client.
• Memstore holds in-memory modifications to the store. The
hierarchy of objects in HBase Regions is as shown from top to
bottom in below table.
Table HBase table present in the HBase cluster
Region HRegions for the presented tables
Store It stores per ColumnFamily for each region for the table
Memstore
•Memstore for each store for each region for the table
•It sorts data before flushing into HFiles
•Write and read performance will increase because of sorting
StoreFile StoreFiles for each store for each region for the table
Block Blocks present inside StoreFiles
Hbase clients
• The HBase shell
• Kundera – the object mapper
• The REST client
• The Thrift client
• The Hadoop ecosystem client
CCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPER
HBase Shell
• HBase contains a shell using which you can communicate
with HBase.
• HBase uses the Hadoop File System to store its data. It will
have a master server and region servers.
• The data storage will be in the form of regions (tables).
These regions will be split up and stored in region servers.
• The master server manages these region servers and all
these tasks take place on HDFS.
• Given below are some of the commands supported by
HBase Shell.
• Data Definition Language
• These are the commands that operate on the tables in HBase.
• create - Creates a table.
• list - Lists all the tables in HBase.
• disable - Disables a table.
• is_disabled - Verifies whether a table is disabled.
• enable - Enables a table.
• is_enabled - Verifies whether a table is enabled.
• describe - Provides the description of a table.
• alter - Alters a table.
• exists - Verifies whether a table exists.
• drop - Drops a table from HBase.
• drop_all - Drops the tables matching the ‘regex’ given in the command.
• Java Admin API - Prior to all the above commands, Java provides an Admin API
to achieve DDL functionalities through programming.
Under org.apache.hadoop.hbase.client package, HBaseAdmin and
HTableDescriptor are the two important classes in this package that provide
DDL functionalities.
Data Manipulation Language
• put - Puts a cell value at a specified column in a specified
row in a particular table.
• get - Fetches the contents of row or a cell.
• delete - Deletes a cell value in a table.
• deleteall - Deletes all the cells in a given row.
• scan - Scans and returns the table data.
• count - Counts and returns the number of rows in a table.
• truncate - Disables, drops, and recreates a specified table.
• Java client API - Prior to all the above commands, Java
provides a client API to achieve DML
functionalities, CRUD (Create Retrieve Update Delete)
operations and more through programming, under
org.apache.hadoop.hbase.client package. HTable
Put and Get are the important classes in this package.
Kundera-Object mapper
• In order to start using HBase in your Java application with minimal
learning, you can use one of the popular open source API named
Kundera.
• Kundera is a polyglot object mapper for NoSQL, as well as RDBMS
data stores. It is a single high-level Java API that supports NoSQL
data stores.
• The idea behind Kundera is to make working with NoSQL databases
drop-dead simple and fun. Kundera provides the following qualities:
• A robust querying system
• Easy object/relation mapping
• Support for secondary level caching and event-based data handling
• Optimized data store persistence
• Connection pooling and Lucene-based indexing
• Kundera currently supports following data stores :
• Cassandra
• MongoDB
• HBase
• Redis
• OracleNoSQL
• Neo4j
• Couchdb
• RethinkDB
• Kudu
• Relational databases
• Apache Spark
REST client
• The Java API provides the most functionality, but many people want
to use HBase without Java.
• There are two main approaches for doing that: One is the Thrift
interface, which is the faster and more lightweight of the two
options. The other way to access HBase is using the REST interface,
which uses HTTP verbs to perform an action, giving developers a
wide choice of languages and programs to use.
• HBase REST Basics
• For both Thrift and REST to work, another HBase daemon needs to
be running to handle these requests. These daemons can be
installed in the hbase-thrift and hbase-rest packages.
The diagram below illustrates where Thrift and REST are
placed in the cluster. Note that the Thrift and REST
clients usually don’t run any other services services like
DataNode or RegionServers to keep the load down, and
responsiveness high, for REST interactions.
Thrift
• The Apache Thrift software framework, for scalable cross-
language services development, combines a software stack
with a code generation engine to build services that work
efficiently and seamlessly between C++, Java, Python, PHP,
Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js,
Smalltalk, OCaml and Delphi and other languages.
HBASE EXAMPLES
• It is used whenever there is a need to write heavy
applications.
• HBase is used whenever we need to provide fast
random access to available data.
• Companies such as Facebook, Twitter, Yahoo, and
Adobe use HBase internally.
• MEDICAL
• SPORTS
• WEB
• E commerce
• Banking industry
• Telecom industry
• Oil and petroleum industry
• Hbase at facebook,
• pinterest
• Hbase at longtail video
• AADHAR
• Twitter
• meetup

More Related Content

What's hot (20)

PPTX
Chapter 7(documnet databse termininology) no sql for mere mortals
nehabsairam
 
ZIP
NoSQL databases
Harri Kauhanen
 
PPTX
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Simplilearn
 
PPTX
Big Data & Hadoop Introduction
Jayant Mukherjee
 
PPTX
Apache HBase™
Prashant Gupta
 
PPTX
Introduction to Hadoop
Dr. C.V. Suresh Babu
 
PDF
Big data Analytics
ShivanandaVSeeri
 
PPTX
Apache Hadoop
Ajit Koti
 
PPTX
Public Cloud vs Private Cloud
SKALI Group
 
PPT
Hadoop MapReduce Fundamentals
Lynn Langit
 
PDF
Introduction to Hadoop
Apache Apex
 
PDF
Hadoop Overview & Architecture
EMC
 
PPTX
Hadoop Installation presentation
puneet yadav
 
PDF
Big Data
Seminar Links
 
PPT
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
ManiMaran230751
 
PPTX
introduction to NOSQL Database
nehabsairam
 
PPTX
Distributed information system
District Administration
 
PDF
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Slim Baltagi
 
PPT
Active Directory
Sandeep Kapadane
 
PPTX
An Introduction To NoSQL & MongoDB
Lee Theobald
 
Chapter 7(documnet databse termininology) no sql for mere mortals
nehabsairam
 
NoSQL databases
Harri Kauhanen
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Simplilearn
 
Big Data & Hadoop Introduction
Jayant Mukherjee
 
Apache HBase™
Prashant Gupta
 
Introduction to Hadoop
Dr. C.V. Suresh Babu
 
Big data Analytics
ShivanandaVSeeri
 
Apache Hadoop
Ajit Koti
 
Public Cloud vs Private Cloud
SKALI Group
 
Hadoop MapReduce Fundamentals
Lynn Langit
 
Introduction to Hadoop
Apache Apex
 
Hadoop Overview & Architecture
EMC
 
Hadoop Installation presentation
puneet yadav
 
Big Data
Seminar Links
 
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
ManiMaran230751
 
introduction to NOSQL Database
nehabsairam
 
Distributed information system
District Administration
 
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Slim Baltagi
 
Active Directory
Sandeep Kapadane
 
An Introduction To NoSQL & MongoDB
Lee Theobald
 

Similar to CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER (20)

PPT
HBASE Overview
Sampath Rachakonda
 
PPTX
Hbase.pptx
mirwais12
 
PPTX
Hbasepreso 111116185419-phpapp02
Gokuldas Pillai
 
PPT
Hbase introduction
yangwm
 
PPTX
Introduction to Apache HBase
Gokuldas Pillai
 
PPTX
HBase.pptx
Sadhik7
 
ODP
HBase introduction talk
Hayden Marchant
 
PPTX
Hadoop - Apache Hbase
Vibrant Technologies & Computers
 
PPTX
01 hbase
Subhas Kumar Ghosh
 
PDF
Nyc hadoop meetup introduction to h base
智杰 付
 
ODP
Apache hadoop hbase
sheetal sharma
 
DOCX
Hbase Quick Review Guide for Interviews
Ravindra kumar
 
PPTX
Introduction to HBase
Byeongweon Moon
 
PPTX
Hbase
AmitkumarPal21
 
PPTX
Hbase
AllsoftSolutions
 
PDF
HBase
Pooja Sunkapur
 
PDF
Intro to HBase - Lars George
JAX London
 
PDF
Apachecon Europe 2012: Operating HBase - Things you need to know
Christian Gügi
 
PPTX
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
Simplilearn
 
PPTX
HBASE, HIVE , ARCHITECTURE AND WORKING EXAMPLES
harikumar288574
 
HBASE Overview
Sampath Rachakonda
 
Hbase.pptx
mirwais12
 
Hbasepreso 111116185419-phpapp02
Gokuldas Pillai
 
Hbase introduction
yangwm
 
Introduction to Apache HBase
Gokuldas Pillai
 
HBase.pptx
Sadhik7
 
HBase introduction talk
Hayden Marchant
 
Hadoop - Apache Hbase
Vibrant Technologies & Computers
 
Nyc hadoop meetup introduction to h base
智杰 付
 
Apache hadoop hbase
sheetal sharma
 
Hbase Quick Review Guide for Interviews
Ravindra kumar
 
Introduction to HBase
Byeongweon Moon
 
Intro to HBase - Lars George
JAX London
 
Apachecon Europe 2012: Operating HBase - Things you need to know
Christian Gügi
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
Simplilearn
 
HBASE, HIVE , ARCHITECTURE AND WORKING EXAMPLES
harikumar288574
 
Ad

More from KrishnaVeni451953 (6)

PPTX
Grey box testing in software security involves assessing the security of a sy...
KrishnaVeni451953
 
PPTX
Design and Evaluation techniques unit 5
KrishnaVeni451953
 
PPTX
Guidelines principle and theories in UID
KrishnaVeni451953
 
PPTX
Alpha-beta pruning can be applied at any depth of a tree
KrishnaVeni451953
 
PPTX
Problem Solving Agents decide what to do by finding a sequence of actions tha...
KrishnaVeni451953
 
PPTX
A slide share pig in CCS334 for big data analytics
KrishnaVeni451953
 
Grey box testing in software security involves assessing the security of a sy...
KrishnaVeni451953
 
Design and Evaluation techniques unit 5
KrishnaVeni451953
 
Guidelines principle and theories in UID
KrishnaVeni451953
 
Alpha-beta pruning can be applied at any depth of a tree
KrishnaVeni451953
 
Problem Solving Agents decide what to do by finding a sequence of actions tha...
KrishnaVeni451953
 
A slide share pig in CCS334 for big data analytics
KrishnaVeni451953
 
Ad

Recently uploaded (20)

PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
July Patch Tuesday
Ivanti
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PPTX
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PDF
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
July Patch Tuesday
Ivanti
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 

CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER

  • 1. Hbase What is HBase? 1. Hbase is an open source and sorted map data built on Hadoop. 2. It is column oriented and horizontally scalable. 3. It is based on Google's Big Table.It has set of tables which keep data in key value format. 4. Hbase is well suited for sparse data sets which are very common in big data use cases. 5. Hbase provides APIs enabling development in practically any programming language. 6. It is a part of the Hadoop ecosystem that provides random real-time read/write access to data in the Hadoop File System
  • 3. Why Hbase? • RDBMS get exponentially slow as the data becomes large • Expects data to be highly structured, i.e. ability to fit in a well- defined schema • Any change in schema might require a downtime • For sparse datasets, too much of overhead of maintaining NULL values
  • 4. Features of Hbase • Horizontally scalable: You can add any number of columns anytime. • Automatic Failover: Automatic failover is a resource that allows a system administrator to automatically switch data handling to a standby system in the event of system compromise • Integrations with Map/Reduce framework: Al the commands and java codes internally implement Map/ Reduce to do the task and it is built over Hadoop Distributed File System. • sparse, distributed, persistent, multidimensional sorted map, which is indexed by rowkey, column key,and timestamp. • Often referred as a key value store or column family-oriented database, or storing versioned maps of maps.
  • 5. • fundamentally, it's a platform for storing and retrieving data with random access. • It doesn't care about datatypes(storing an integer in one row and a string in another for the same column). • It doesn't enforce relationships within your data. • It is designed to run on a cluster of computers, built using commodity hardware.
  • 6. HBase Architecture: Use Cases, Components & Data Model
  • 7. • HBase architecture consists mainly of four components • HMaster • HRegionserver • HRegions • Zookeeper • HDFS
  • 8. HMaster • HMaster in HBase is the implementation of a Master server in HBase architecture. • It acts as a monitoring agent to monitor all Region Server instances present in the cluster and acts as an interface all the metadata changes. • In a distributed cluster environment, Master runs on NameNode. Master runs several background threads.
  • 9. • Plays a vital role in terms of performance and maintaining nodes in the cluster. • HMaster provides admin performance and distributes services to different region servers. • HMaster assigns regions to region servers. • HMaster has the features like controlling load balancing and failover to handle the load over nodes present in the cluster. • When a client wants to change any schema and to change any Metadata operations, HMaster takes responsibility for these operations. • Some of the methods exposed by HMaster Interface are primarily Metadata oriented methods. • Table (createTable, removeTable, enable, disable) • ColumnFamily (add Column, modify Column) • Region (move, assign)
  • 10. • The client communicates in a bi-directional way with both HMaster and ZooKeeper. For read and write operations, it directly contacts with HRegion servers. HMaster assigns regions to region servers and in turn, check the health status of region servers. • In entire architecture, we have multiple region servers. Hlog present in region servers which are going to store all the log files.
  • 11. HBase Region Servers • When HBase Region Server receives writes and read requests from the client, it assigns the request to a specific region, where the actual column family resides. However, the client can directly contact with HRegion servers, there is no need of HMaster mandatory permission to the client regarding communication with HRegion servers. The client requires HMaster help when operations related to metadata and schema changes are required. • HRegionServer is the Region Server implementation. It is responsible for serving and managing regions or data that is present in a distributed cluster. The region servers run on Data Nodes present in the Hadoop cluster. • HMaster can get into contact with multiple HRegion servers and performs the following functions.
  • 12. • Hosting and managing regions • Splitting regions automatically • Handling read and writes requests • Communicating with the client directly
  • 13. • HBase Regions • HRegions are the basic building elements of HBase cluster that consists of the distribution of tables and are comprised of Column families. • It contains multiple stores, one for each column family. It consists of mainly two components, which are Memstore and Hfile.
  • 14. ZooKeeper • HBase Zookeeper is a centralized monitoring server which maintains configuration information and provides distributed synchronization. Distributed synchronization is to access the distributed applications running across the cluster with the responsibility of providing coordination services between nodes. • If the client wants to communicate with regions, the server’s client has to approach ZooKeeper first. • It is an open source project, and it provides so many important services. Services provided by ZooKeeper • Maintains Configuration information • Provides distributed synchronization • Client Communication establishment with region servers
  • 15. • Provides ephemeral nodes for which represent different region servers • Master servers usability of ephemeral nodes for discovering available servers in the cluster • To track server failure and network partitions • Master and HBase slave nodes ( region servers) registered themselves with ZooKeeper. The client needs access to ZK(zookeeper) quorum configuration to connect with master and region servers. • During a failure of nodes that present in HBase cluster, ZKquoram will trigger error messages, and it starts to repair the failed nodes.
  • 16. HDFS • HDFS is a Hadoop distributed File System, as the name implies it provides a distributed environment for the storage and it is a file system designed in a way to run on commodity hardware. • It stores each file in multiple blocks and to maintain fault tolerance, the blocks are replicated across a Hadoop cluster.
  • 17. HBase Data Model • HBase Data Model is a set of components that consists of Tables, Rows, Column families, Cells, Columns, and Versions. HBase tables contain column families and rows with elements defined as Primary keys. A column in HBase data model table represents attributes to the objects. • HBase Data Model consists of following elements, • Set of tables • Each table with column families and rows • Each table must have an element defined as Primary Key. • Row key acts as a Primary key in HBase. • Any access to HBase tables uses this Primary Key • Each column present in HBase denotes attribute corresponding to object
  • 18. Storage Mechanism in HBase • HBase is a column-oriented database and data is stored in tables. The tables are sorted by RowId. As shown below, HBase has RowId, which is the collection of several column families that are present in the table. • The column families that are present in the schema are key-value pairs. If we observe in detail each column family having multiple numbers of columns. The column values stored into disk memory. Each cell of the table has its own Metadata like timestamp and other information.
  • 20. • Coming to HBase the following are the key terms representing table schema • Table: Collection of rows present. • Row: Collection of column families. • Column Family: Collection of columns. • Column: Collection of key-value pairs. • Namespace: Logical grouping of tables. • Cell: A {row, column, version} tuple exactly specifies a cell definition in HBase.
  • 21. HBase Read and Write operation
  • 22. • Step 1) Client wants to write data and in turn first communicates with Regions server and then regions • Step 2) Regions contacting memstore for storing associated with the column family • Step 3) First data stores into Memstore, where the data is sorted and after that, it flushes into HFile. The main reason for using Memstore is to store data in a Distributed file system based on Row Key. Memstore will be placed in Region server main memory while HFiles are written into HDFS. • Step 4) Client wants to read data from Regions • Step 5) In turn Client can have direct access to Mem store, and it can request for data. • Step 6) Client approaches HFiles to get the data. The data are fetched and retrieved by the Client. • Memstore holds in-memory modifications to the store. The hierarchy of objects in HBase Regions is as shown from top to bottom in below table.
  • 23. Table HBase table present in the HBase cluster Region HRegions for the presented tables Store It stores per ColumnFamily for each region for the table Memstore •Memstore for each store for each region for the table •It sorts data before flushing into HFiles •Write and read performance will increase because of sorting StoreFile StoreFiles for each store for each region for the table Block Blocks present inside StoreFiles
  • 24. Hbase clients • The HBase shell • Kundera – the object mapper • The REST client • The Thrift client • The Hadoop ecosystem client
  • 26. HBase Shell • HBase contains a shell using which you can communicate with HBase. • HBase uses the Hadoop File System to store its data. It will have a master server and region servers. • The data storage will be in the form of regions (tables). These regions will be split up and stored in region servers. • The master server manages these region servers and all these tasks take place on HDFS. • Given below are some of the commands supported by HBase Shell.
  • 27. • Data Definition Language • These are the commands that operate on the tables in HBase. • create - Creates a table. • list - Lists all the tables in HBase. • disable - Disables a table. • is_disabled - Verifies whether a table is disabled. • enable - Enables a table. • is_enabled - Verifies whether a table is enabled. • describe - Provides the description of a table. • alter - Alters a table. • exists - Verifies whether a table exists. • drop - Drops a table from HBase. • drop_all - Drops the tables matching the ‘regex’ given in the command. • Java Admin API - Prior to all the above commands, Java provides an Admin API to achieve DDL functionalities through programming. Under org.apache.hadoop.hbase.client package, HBaseAdmin and HTableDescriptor are the two important classes in this package that provide DDL functionalities.
  • 28. Data Manipulation Language • put - Puts a cell value at a specified column in a specified row in a particular table. • get - Fetches the contents of row or a cell. • delete - Deletes a cell value in a table. • deleteall - Deletes all the cells in a given row. • scan - Scans and returns the table data. • count - Counts and returns the number of rows in a table. • truncate - Disables, drops, and recreates a specified table. • Java client API - Prior to all the above commands, Java provides a client API to achieve DML functionalities, CRUD (Create Retrieve Update Delete) operations and more through programming, under org.apache.hadoop.hbase.client package. HTable Put and Get are the important classes in this package.
  • 29. Kundera-Object mapper • In order to start using HBase in your Java application with minimal learning, you can use one of the popular open source API named Kundera. • Kundera is a polyglot object mapper for NoSQL, as well as RDBMS data stores. It is a single high-level Java API that supports NoSQL data stores. • The idea behind Kundera is to make working with NoSQL databases drop-dead simple and fun. Kundera provides the following qualities: • A robust querying system • Easy object/relation mapping • Support for secondary level caching and event-based data handling • Optimized data store persistence • Connection pooling and Lucene-based indexing
  • 30. • Kundera currently supports following data stores : • Cassandra • MongoDB • HBase • Redis • OracleNoSQL • Neo4j • Couchdb • RethinkDB • Kudu • Relational databases • Apache Spark
  • 31. REST client • The Java API provides the most functionality, but many people want to use HBase without Java. • There are two main approaches for doing that: One is the Thrift interface, which is the faster and more lightweight of the two options. The other way to access HBase is using the REST interface, which uses HTTP verbs to perform an action, giving developers a wide choice of languages and programs to use. • HBase REST Basics • For both Thrift and REST to work, another HBase daemon needs to be running to handle these requests. These daemons can be installed in the hbase-thrift and hbase-rest packages.
  • 32. The diagram below illustrates where Thrift and REST are placed in the cluster. Note that the Thrift and REST clients usually don’t run any other services services like DataNode or RegionServers to keep the load down, and responsiveness high, for REST interactions.
  • 33. Thrift • The Apache Thrift software framework, for scalable cross- language services development, combines a software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, OCaml and Delphi and other languages.
  • 34. HBASE EXAMPLES • It is used whenever there is a need to write heavy applications. • HBase is used whenever we need to provide fast random access to available data. • Companies such as Facebook, Twitter, Yahoo, and Adobe use HBase internally. • MEDICAL • SPORTS • WEB
  • 35. • E commerce • Banking industry • Telecom industry • Oil and petroleum industry • Hbase at facebook, • pinterest • Hbase at longtail video • AADHAR • Twitter • meetup