SlideShare a Scribd company logo
1
HBase: Just the Basics
Jesse Anderson – Curriculum Developer and Instructor
v2
2
What Is HBase?
Š2014 Cloudera, Inc. All rights reserved.2
• NoSQL datastore built on top of HDFS (Hadoop)
• An Apache Top Level Project
• Handles the various manifestations of Big Data
• Based on Google’s BigTable paper
3
Why Use HBase?
Š2014 Cloudera, Inc. All rights reserved.3
• Storing large amounts of data (TB/PB)
• High throughput for a large number of requests
• Storing unstructured or variable column data
• Big Data with random read and writes
4
When to Consider Not Using HBase?
Š2014 Cloudera, Inc. All rights reserved.4
• Only use with Big Data problems
• Read straight through files
• Write all at once or append new files
• Not random reads or writes
• Access patterns of the data are ill-defined
5
HBase Architecture
How it works
6
Meet the Daemons
Š2014 Cloudera, Inc. All rights reserved.6
• HBase Master
• RegionServer
• ZooKeeper
• HDFS
• NameNode/Standby NameNode
• DataNode
7
Daemon Locations
Š2014 Cloudera, Inc. All rights reserved.7
NameNode
DataNodeDataNode
Standby
NameNode
DataNode
RegionServer
Master
RegionServerRegionServer
ZooKeeper ZooKeeper ZooKeeper
Master Master
DataNodeDataNode DataNode
RegionServerRegionServerRegionServer
Master Nodes
Slave Nodes
8
Tables and Column Families
Š2014 Cloudera, Inc. All rights reserved.8
Column Family “contactinfo” Column Family “profilephoto”
Tables are broken into groupings called Column Families.
Group data frequently
accessed together and
compress it Group photos with different settings
9
Rows and Columns
Š2014 Cloudera, Inc. All rights reserved.9
Row key Column Family “contactinfo” Column Family “profilephoto”
adupont fname: Andre lname: Dupont
jsmith fname: John lname: Smith image: <smith.jpg>
mrossi fname: Mario lname: Rossi image: <mario.jpg>
Row keys identify a row
No storage penalty for unused columns
Each Column Family can have many columns
10
Regions
Š2014 Cloudera, Inc. All rights reserved.10
Row key Column Family “contactinfo”
adupont fname: Andre lname: Dupont
jsmith fname: John lname: Smith
A table is broken into regions
NameNode
DataNodeDataNode
Standby
NameNode
DataNode
RegionServer
Master
RegionServerRegionServer
ZooKeeper ZooKeeper ZooKeeper
Master Master
DataNodeDataNode DataNode
RegionServerRegionServerRegionServer
Row key Column Family “contactinfo”
mrossi fname: Mario lname: Rossi
zstevens fname: Zack lname: Stevens
Regions are served by
RegionServers
11
Client
Write Path
Š2014 Cloudera, Inc. All rights reserved.11
NameNode
DataNodeDataNode
Standby
NameNode
DataNode
RegionServer
Master
RegionServerRegionServer
ZooKeeper ZooKeeper ZooKeeper
Master Master
DataNodeDataNode DataNode
RegionServerRegionServerRegionServer
1. Which
RegionServer is
serving the Region?
2. Write to
RegionServer
12
Client
Read Path
Š2014 Cloudera, Inc. All rights reserved.12
NameNode
DataNodeDataNode
Standby
NameNode
DataNode
RegionServer
Master
RegionServerRegionServer
ZooKeeper ZooKeeper ZooKeeper
Master Master
DataNodeDataNode DataNode
RegionServerRegionServerRegionServer
1. Which
RegionServer is
serving the Region?
2. Read from
RegionServer
13
HBase API
How to access the data
14
No SQL Means No SQL
Š2014 Cloudera, Inc. All rights reserved.14
• Data is not accessed over SQL
• You must:
• Create your own connections
• Keep track of the type of data in a column
• Give each row a key
• Access a row by its key
15
Types of Access
Š2014 Cloudera, Inc. All rights reserved.15
• Gets
• Gets a row’s data based on the row key
• Puts
• Upserts a row with data based on the row key
• Scans
• Finds all matching rows based on the row key
• Scan logic can be increased by using filters
16
Gets
Š2014 Cloudera, Inc. All rights reserved.16
1
2
3
4
Get g = new Get(ROW_KEY_BYTES);
Result r= table.get(g);
byte[] byteArray =
r.getValue(COLFAM_BYTS,COLDESC_BYTS);
String columnValue =
Bytes.toString(byteArray);
17
Puts
Š2014 Cloudera, Inc. All rights reserved.17
1
2
3
4
Put p = new Put(ROW_KEY_BYTES);
p.add(COLFAM_BYTES, COLDESC_BYTES,
Bytes.toBytes("value"));
table.put(p);
18
HBase Schema Design
How to design
19
No SQL Means No SQL
Š2014 Cloudera, Inc. All rights reserved.19
• Designing schemas for HBase requires an in-depth
knowledge
• Schema Design is ‘data-centric’ not ‘relationship-
centric’
• You design around how data is accessed
• Row keys are engineered
20
Treating HBase like a traditional RDBMS will lead
to abject failure!
Captain Picard
21
Row Keys
Š2014 Cloudera, Inc. All rights reserved.21
• A row key is more than the glue between two tables
• Engineering time is spent just on constructing a row
key
• Contents of a row key vary by access pattern
• Often made up of several pieces of data:
<group_id><email>
22
Schema Design
Š2014 Cloudera, Inc. All rights reserved.22
• Schema design does not start in an ERD
• Access pattern must be known and ascertained
• Denormalize to improve performance
• Fewer, bigger tables
23 Š2014 Cloudera, Inc. All rights reserved.
Jesse Anderson
@jessetanderson

More Related Content

PPTX
NoSQL Databases
Amit Kumar Gupta
 
PPT
Apache scoop overview
Nisanth Simon
 
PPT
Checkupload1 140213043220-phpapp01
Nitish Bhardwaj
 
PPT
Hadoop 130419075715-phpapp02(1)
Nitish Bhardwaj
 
PPTX
Pptx present
Nitish Bhardwaj
 
PPTX
HBase: Just the Basics
HBaseCon
 
PDF
Apache Hadoop and HBase
Cloudera, Inc.
 
PDF
Intro to HBase
alexbaranau
 
NoSQL Databases
Amit Kumar Gupta
 
Apache scoop overview
Nisanth Simon
 
Checkupload1 140213043220-phpapp01
Nitish Bhardwaj
 
Hadoop 130419075715-phpapp02(1)
Nitish Bhardwaj
 
Pptx present
Nitish Bhardwaj
 
HBase: Just the Basics
HBaseCon
 
Apache Hadoop and HBase
Cloudera, Inc.
 
Intro to HBase
alexbaranau
 

Similar to HBaseCon 2014-Just the Basics (20)

PDF
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
Inhacking
 
PDF
Valerii Moisieienko Apache hbase workshop
Аліна Шепшелей
 
PDF
Apache HBase Workshop
Valerii Moisieienko
 
PDF
Hbase
Vetri V
 
DOCX
Hbase Quick Review Guide for Interviews
Ravindra kumar
 
PPTX
HBase in Practice
larsgeorge
 
PPTX
HBase Advanced Schema Design - Berlin Buzzwords - June 2012
larsgeorge
 
PPTX
HBase in Practice
DataWorks Summit/Hadoop Summit
 
PPT
Chicago Data Summit: Apache HBase: An Introduction
Cloudera, Inc.
 
PPTX
Hortonworks Technical Workshop: HBase For Mission Critical Applications
Hortonworks
 
PDF
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLab
CloudxLab
 
ODP
Apache hadoop hbase
sheetal sharma
 
PPTX
HBASE, HIVE , ARCHITECTURE AND WORKING EXAMPLES
harikumar288574
 
PDF
Nyc hadoop meetup introduction to h base
智杰 付
 
PPTX
HBase.pptx
Sadhik7
 
PDF
Hbase: an introduction
Jean-Baptiste Poullet
 
PPTX
Introduction to Apache HBase, MapR Tables and Security
MapR Technologies
 
PPTX
Introduction to HBase - Phoenix HUG 5/14
Jeremy Walsh
 
PDF
Intro to HBase - Lars George
JAX London
 
PPTX
Hbase.pptx
mirwais12
 
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
Inhacking
 
Valerii Moisieienko Apache hbase workshop
Аліна Шепшелей
 
Apache HBase Workshop
Valerii Moisieienko
 
Hbase
Vetri V
 
Hbase Quick Review Guide for Interviews
Ravindra kumar
 
HBase in Practice
larsgeorge
 
HBase Advanced Schema Design - Berlin Buzzwords - June 2012
larsgeorge
 
HBase in Practice
DataWorks Summit/Hadoop Summit
 
Chicago Data Summit: Apache HBase: An Introduction
Cloudera, Inc.
 
Hortonworks Technical Workshop: HBase For Mission Critical Applications
Hortonworks
 
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLab
CloudxLab
 
Apache hadoop hbase
sheetal sharma
 
HBASE, HIVE , ARCHITECTURE AND WORKING EXAMPLES
harikumar288574
 
Nyc hadoop meetup introduction to h base
智杰 付
 
HBase.pptx
Sadhik7
 
Hbase: an introduction
Jean-Baptiste Poullet
 
Introduction to Apache HBase, MapR Tables and Security
MapR Technologies
 
Introduction to HBase - Phoenix HUG 5/14
Jeremy Walsh
 
Intro to HBase - Lars George
JAX London
 
Hbase.pptx
mirwais12
 
Ad

More from Jesse Anderson (13)

PDF
Managing Real-Time Data Teams
Jesse Anderson
 
PDF
Pulsar for Kafka People
Jesse Anderson
 
PDF
Big Data and Analytics in the COVID-19 Era
Jesse Anderson
 
PDF
Working Together As Data Teams V1
Jesse Anderson
 
PDF
What Does an Exec Need to About Architecture and Why
Jesse Anderson
 
PDF
The Five Dysfunctions of a Data Engineering Team
Jesse Anderson
 
PPTX
Million Monkeys User Group
Jesse Anderson
 
PPTX
Strata 2012 Million Monkeys
Jesse Anderson
 
PPTX
EC2 Performance, Spot Instance ROI and EMR Scalability
Jesse Anderson
 
PPT
Introduction to Regular Expressions
Jesse Anderson
 
ODP
Why Use MVC?
Jesse Anderson
 
ODP
How to Use MVC
Jesse Anderson
 
PPT
Introduction to Android
Jesse Anderson
 
Managing Real-Time Data Teams
Jesse Anderson
 
Pulsar for Kafka People
Jesse Anderson
 
Big Data and Analytics in the COVID-19 Era
Jesse Anderson
 
Working Together As Data Teams V1
Jesse Anderson
 
What Does an Exec Need to About Architecture and Why
Jesse Anderson
 
The Five Dysfunctions of a Data Engineering Team
Jesse Anderson
 
Million Monkeys User Group
Jesse Anderson
 
Strata 2012 Million Monkeys
Jesse Anderson
 
EC2 Performance, Spot Instance ROI and EMR Scalability
Jesse Anderson
 
Introduction to Regular Expressions
Jesse Anderson
 
Why Use MVC?
Jesse Anderson
 
How to Use MVC
Jesse Anderson
 
Introduction to Android
Jesse Anderson
 
Ad

Recently uploaded (20)

PPTX
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
PDF
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
PDF
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
PPTX
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
PPTX
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
PDF
Doc9.....................................
SofiaCollazos
 
PDF
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
PDF
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
PDF
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
PDF
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PDF
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
PPTX
Simple and concise overview about Quantum computing..pptx
mughal641
 
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
PDF
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
Doc9.....................................
SofiaCollazos
 
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
Simple and concise overview about Quantum computing..pptx
mughal641
 
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 

HBaseCon 2014-Just the Basics

  • 1. 1 HBase: Just the Basics Jesse Anderson – Curriculum Developer and Instructor v2
  • 2. 2 What Is HBase? Š2014 Cloudera, Inc. All rights reserved.2 • NoSQL datastore built on top of HDFS (Hadoop) • An Apache Top Level Project • Handles the various manifestations of Big Data • Based on Google’s BigTable paper
  • 3. 3 Why Use HBase? Š2014 Cloudera, Inc. All rights reserved.3 • Storing large amounts of data (TB/PB) • High throughput for a large number of requests • Storing unstructured or variable column data • Big Data with random read and writes
  • 4. 4 When to Consider Not Using HBase? Š2014 Cloudera, Inc. All rights reserved.4 • Only use with Big Data problems • Read straight through files • Write all at once or append new files • Not random reads or writes • Access patterns of the data are ill-defined
  • 6. 6 Meet the Daemons Š2014 Cloudera, Inc. All rights reserved.6 • HBase Master • RegionServer • ZooKeeper • HDFS • NameNode/Standby NameNode • DataNode
  • 7. 7 Daemon Locations Š2014 Cloudera, Inc. All rights reserved.7 NameNode DataNodeDataNode Standby NameNode DataNode RegionServer Master RegionServerRegionServer ZooKeeper ZooKeeper ZooKeeper Master Master DataNodeDataNode DataNode RegionServerRegionServerRegionServer Master Nodes Slave Nodes
  • 8. 8 Tables and Column Families Š2014 Cloudera, Inc. All rights reserved.8 Column Family “contactinfo” Column Family “profilephoto” Tables are broken into groupings called Column Families. Group data frequently accessed together and compress it Group photos with different settings
  • 9. 9 Rows and Columns Š2014 Cloudera, Inc. All rights reserved.9 Row key Column Family “contactinfo” Column Family “profilephoto” adupont fname: Andre lname: Dupont jsmith fname: John lname: Smith image: <smith.jpg> mrossi fname: Mario lname: Rossi image: <mario.jpg> Row keys identify a row No storage penalty for unused columns Each Column Family can have many columns
  • 10. 10 Regions Š2014 Cloudera, Inc. All rights reserved.10 Row key Column Family “contactinfo” adupont fname: Andre lname: Dupont jsmith fname: John lname: Smith A table is broken into regions NameNode DataNodeDataNode Standby NameNode DataNode RegionServer Master RegionServerRegionServer ZooKeeper ZooKeeper ZooKeeper Master Master DataNodeDataNode DataNode RegionServerRegionServerRegionServer Row key Column Family “contactinfo” mrossi fname: Mario lname: Rossi zstevens fname: Zack lname: Stevens Regions are served by RegionServers
  • 11. 11 Client Write Path Š2014 Cloudera, Inc. All rights reserved.11 NameNode DataNodeDataNode Standby NameNode DataNode RegionServer Master RegionServerRegionServer ZooKeeper ZooKeeper ZooKeeper Master Master DataNodeDataNode DataNode RegionServerRegionServerRegionServer 1. Which RegionServer is serving the Region? 2. Write to RegionServer
  • 12. 12 Client Read Path Š2014 Cloudera, Inc. All rights reserved.12 NameNode DataNodeDataNode Standby NameNode DataNode RegionServer Master RegionServerRegionServer ZooKeeper ZooKeeper ZooKeeper Master Master DataNodeDataNode DataNode RegionServerRegionServerRegionServer 1. Which RegionServer is serving the Region? 2. Read from RegionServer
  • 13. 13 HBase API How to access the data
  • 14. 14 No SQL Means No SQL Š2014 Cloudera, Inc. All rights reserved.14 • Data is not accessed over SQL • You must: • Create your own connections • Keep track of the type of data in a column • Give each row a key • Access a row by its key
  • 15. 15 Types of Access Š2014 Cloudera, Inc. All rights reserved.15 • Gets • Gets a row’s data based on the row key • Puts • Upserts a row with data based on the row key • Scans • Finds all matching rows based on the row key • Scan logic can be increased by using filters
  • 16. 16 Gets Š2014 Cloudera, Inc. All rights reserved.16 1 2 3 4 Get g = new Get(ROW_KEY_BYTES); Result r= table.get(g); byte[] byteArray = r.getValue(COLFAM_BYTS,COLDESC_BYTS); String columnValue = Bytes.toString(byteArray);
  • 17. 17 Puts Š2014 Cloudera, Inc. All rights reserved.17 1 2 3 4 Put p = new Put(ROW_KEY_BYTES); p.add(COLFAM_BYTES, COLDESC_BYTES, Bytes.toBytes("value")); table.put(p);
  • 19. 19 No SQL Means No SQL Š2014 Cloudera, Inc. All rights reserved.19 • Designing schemas for HBase requires an in-depth knowledge • Schema Design is ‘data-centric’ not ‘relationship- centric’ • You design around how data is accessed • Row keys are engineered
  • 20. 20 Treating HBase like a traditional RDBMS will lead to abject failure! Captain Picard
  • 21. 21 Row Keys Š2014 Cloudera, Inc. All rights reserved.21 • A row key is more than the glue between two tables • Engineering time is spent just on constructing a row key • Contents of a row key vary by access pattern • Often made up of several pieces of data: <group_id><email>
  • 22. 22 Schema Design Š2014 Cloudera, Inc. All rights reserved.22 • Schema design does not start in an ERD • Access pattern must be known and ascertained • Denormalize to improve performance • Fewer, bigger tables
  • 23. 23 Š2014 Cloudera, Inc. All rights reserved. Jesse Anderson @jessetanderson