Intro to HBase - Lars George

HBASE – THE SCALABLE
DATA STORE
An Introduction to HBase
JAX UK, October 2012

Lars George
Director EMEA Services

About Me

•  Director EMEA Services @ Cloudera
•  Consulting on Hadoop projects (everywhere)
•  Apache Committer
•  HBase and Whirr
•  O’Reilly Author
•  HBase – The Definitive Guide
•  Now in Japanese!

•  Contact
•  lars@cloudera.com 日本語版も出ました!

•  @larsgeorge

Agenda

•  Introduction to HBase
•  HBase Architecture
•  MapReduce with HBase
•  Advanced Techniques
•  Current Project Status

Why Hadoop/HBase?

•  Datasets are constantly growing and intake soars
•  Yahoo! has 140PB+ and 42k+ machines
•  Facebook adds 500TB+ per day, 100PB+ raw data, on
tens of thousands of machines
•  Are you “throwing” data away today?
•  Traditional databases are expensive to scale and
inherently difficult to distribute
•  Commodity hardware is cheap and powerful
•  $1000 buys you 4-8 cores/4GB/1TB
•  600GB 15k RPM SAS nearly $500
•  Need for random access and batch processing
•  Hadoop only supports batch/streaming

History of Hadoop/HBase

•  Google solved its scalability problems
•  “The Google File System” published October 2003
•  Hadoop DFS
•  “MapReduce: Simplified Data Processing on Large
Clusters” published December 2004
•  Hadoop MapReduce
•  “BigTable: A Distributed Storage System for
Structured Data” published November 2006
•  HBase

Hadoop Introduction

•  Two main components
•  Hadoop Distributed File System (HDFS)
•  A scalable, fault-tolerant, high performance distributed file
system capable of running on commodity hardware
•  Hadoop MapReduce
•  Software framework for distributed computation

•  Significant adoption
•  Used in production in hundreds of organizations
•  Primary contributors: Yahoo!, Facebook, Cloudera

HDFS: Hadoop Distributed File System

•  Reliably store petabytes of replicated data across
thousands of nodes
•  Data divided into 64MB blocks, each block replicated
three times
•  Master/Slave architecture
•  Master NameNode contains block locations
•  Slave DataNode manages block on local file system
•  Built on commodity hardware
•  No 15k RPM disks or RAID required (nor wanted!)

MapReduce

•  Distributed programming model to reliably
process petabytes of data using its locality
•  Built-in bindings for Java and C
•  Can be used with any language via Hadoop
Streaming
•  Inspired by map and reduce functions in
functional programming

Input
è
Map()
è
Copy/Sort
è
Reduce()
è
Output

Hadoop…

•  … is designed to store and stream extremely large
datasets in batch
•  … is not intended for realtime querying
•  … does not support random access
•  … does not handle billions of small files well
•  Less than default block size of 64MB and smaller
•  Keeps “inodes” in memory on master
•  … is not supporting structured data more than
unstructured or complex data

That is why we have HBase!

Why HBase and not …?

•  Question: Why HBase and not <put-your-favorite-
nosql-solution-here>?
•  What else is there?
•  Key/value stores
•  Document-oriented stores
•  Column-oriented stores
•  Graph-oriented stores
•  Features to ask for
•  In memory or persistent?
•  Strict or eventual consistency?
•  Distributed or single machine (or afterthought)?
•  Designed for read and/or write speeds?
•  How does it scale? (if that is what you need)

What is HBase?

•  Distributed
•  Column-Oriented
•  Multi-Dimensional
•  High-Availability (CAP anyone?)
•  High-Performance
•  Storage System

Project Goals
Billions of Rows * Millions of Columns * Thousands of
Versions
Petabytes across thousands of commodity servers

HBase is not…

•  An SQL Database
•  No joins, no query engine, no types, no SQL
•  Transactions and secondary indexes only as add-ons but
immature
•  A drop-in replacement for your RDBMS
•  You must be OK with RDBMS anti-schema
•  Denormalized data
•  Wide and sparsely populated tables
•  Just say “no” to your inner DBA

Keyword: Impedance Match

HBase Tables

•  Tables are sorted by the Row Key in
lexicographical order
•  Table schema only defines its Column Families
•  Each family consists of any number of Columns
•  Each column consists of any number of Versions
•  Columns only exist when inserted, NULLs are free
•  Columns within a family are sorted and stored
together
•  Everything except table names are byte[]

(Table, Row, Family:Column, Timestamp) è Value

Column Family vs. Column

•  Use only a few column families
•  Causes many files that need to stay open per region
plus class overhead per family
•  Best used when logical separation between data
and meta columns
•  Sorting per family can be used to convey
application logic or access pattern

HBase Architecture

•  Table is made up of any number if regions
•  Region is specified by its startKey and endKey
•  Empty table: (Table, NULL, NULL)
•  Two-region table: (Table, NULL, “com.cloudera.www”)
and (Table, “com.cloudera.www”, NULL)
•  Each region may live on a different node and is
made up of several HDFS files and blocks, each
of which is replicated by Hadoop

HBase Architecture (cont.)

•  Two types of HBase nodes:
Master and RegionServer
•  Special tables -ROOT- and.META. store schema
information and region locations
•  Master server responsible for RegionServer
monitoring as well as assignment and load
balancing of regions
•  Uses ZooKeeper as its distributed coordination
service
•  Manages Master election and server availability

Web Crawl Example

•  Canonical use-case for BigTable
•  Store web crawl data
•  Table webtable with family content and meta
•  Row is reversed URL with Columns
•  content:data stores the raw crawled data
•  meta:language stores http language header
•  meta:type stores http content-type header
•  While processing raw data for hyperlinks and images,
add families links and images
•  links:<rurl> column for each hyperlink
•  images:<rurl> column for each image

HBase Clients

•  Native Java Client/API
•  Non-Java Clients
•  REST server
•  Avro server
•  Thrift server
•  Jython, Scala, Groovy DSL
•  TableInputFormat/TableOutputFormat for
MapReduce
•  HBase as MapReduce source and/or target
•  HBase Shell
•  JRuby shell adding get, put, scan and admin calls

Java API

•  CRUD
•  get: retrieve an entire, or partial row (R)
•  put: create and update a row (CU)
•  delete: delete a cell, column, columns, or row (D)

Result get(Get get) throws IOException;

void put(Put put) throws IOException;

void delete(Delete delete) throws IOException;

Java API (cont.)

•  CRUD+SI
•  scan: Scan any number of rows (S)
•  increment: Increment a column value (I)

ResultScanner getScanner(Scan scan) throws IOException;

Result increment(Increment increment) throws IOException ;

Java API (cont.)

•  CRUD+SI+CAS
•  Atomic compare-and-swap (CAS)

•  Combined get, check, and put operation
•  Helps to overcome lack of full transactions

Batch Operations

•  Support Get, Put, and Delete
•  Reduce network round-trips
•  If possible, batch operation to the server to gain
better overall throughput

void batch(List<Row> actions, Object[] results)
throws IOException, InterruptedException;

Object[] batch(List<Row> actions)
throws IOException, InterruptedException;

Filters

•  Can be used with Get and Scan operations
•  Server side hinting
•  Reduce data transferred to client
•  Filters are no guarantee for fast scans
•  Still full table scan in worst-case scenario
•  Might have to implement your own
•  Filters can hint next row key

HBase Extensions

•  Hive, Pig, Cascading
•  Hadoop-targeted MapReduce tools with HBase
integration
•  Sqoop
•  Read and write to HBase for further processing in
Hadoop
•  HBase Explorer, Nutch, Heretrix
•  SpringData
•  Toad

History of HBase
•  November 2006
•  Google releases paper on BigTable
•  February 2007
•  Initial HBase prototype created as Hadoop contrib
•  October 2007
•  First “useable” HBase (Hadoop 0.15.0)
•  January 2008
•  Hadoop becomes TLP, HBase becomes subproject
•  October 2008
•  HBase 0.18.1 released
•  January 2009
•  HBase 0.19.0
•  September 2009
•  HBase 0.20.0 released (Performance Release)
•  May 2010
•  HBase becomes TLP
•  June 2010
•  HBase 0.89.20100621, first developer release
•  May 2011
•  HBase 0.90.3 release

HBase Users

•  Adobe
•  eBay
•  Facebook
•  Mozilla (Socorro)
•  Trend Micro (Advanced Threat Research)
•  Twitter
•  Yahoo!
•  …

HBase Architecture (cont.)

•  Based on Log-Structured Merge-Trees (LSM-Trees)
•  Inserts are done in write-ahead log first
•  Data is stored in memory and flushed to disk on
regular intervals or based on size
•  Small flushes are merged in the background to keep
number of files small
•  Reads read memory stores first and then disk based
files second
•  Deletes are handled with “tombstone” markers
•  Atomicity on row level no matter how many columns
•  keeps locking model easy

MapReduce with HBase

•  Framework to use HBase as source and/or sink for
MapReduce jobs
•  Thin layer over native Java API
•  Provides helper class to set up jobs easier

TableMapReduceUtil.initTableMapperJob(
“test”, scan, MyMapper.class,
ImmutableBytesWritable.class,
RowResult.class, job);

TableMapReduceUtil.initTableReducerJob(
“table”, MyReducer.class, job);

MapReduce with HBase (cont.)

•  Special use-case in regards to Hadoop
•  Tables are sorted and have unique keys
•  Often we do not need a Reducer phase
•  Combiner not needed
•  Need to make sure load is distributed properly by
randomizing keys (or use bulk import)
•  Partial or full table scans possible
•  Scans are very efficient as they make use of block
caches
•  But then make sure you do not create to much churn, or
better switch caching off when doing full table scans.
•  Can use filters to limit rows being processed

TableInputFormat

•  Transforms a HBase table into a source for
MapReduce jobs
•  Internally uses a TableRecordReader which
wraps a Scan instance
•  Supports restarts to handle temporary issues
•  Splits table by region boundaries and stores
current region locality

TableOutputFormat

•  Allows to use HBase table as output target
•  Put and Delete support from mapper or reducer
class
•  Uses TableOutputCommitter to write data
•  Disables auto-commit on table to make use of
client side write buffer
•  Handles final flush in close()

HFileOutputFormat

•  Used to bulk load data into HBase
•  Bypasses normal API and generates low-level
store files
•  Prepares files for final bulk insert
•  Needs special handling of sort order and
partitioning
•  Only supports one column family (for now)
•  Can load bulk updates into existing tables

MapReduce Helper

•  TableMapReduceUtil
•  IdentityTableMapper
•  Passes on key and value, where value is a Result
instance and key is set to value.getRow()
•  IdentityTableReducer
•  Stores values into HBase, must be Put or Delete
instances
•  HRegionPartitioner
•  Not set by default, use it to control partioning on
Hadoop level

Custom MapReduce over Tables

•  No requirement to use provided framework
•  Can read from or write to one or many tables in
mapper and reducer
•  Can split not on regions but arbitrary boundaries
•  Make sure to use write buffer in OutputFormat to
get best performance (do not forget to call
flushCommits() at the end!)

Advanced Techniques

•  Key/Table Design
•  DDI
•  Salting
•  Hashing vs. Sequential Keys
•  ColumnFamily vs. Column
•  Using BloomFilter
•  Data Locality
•  checkAndPut() and checkAndDelete()
•  Coprocessors

Coprocessors

•  New addition to feature set
•  Based on talk by Jeff Dean at LADIS 2009
•  Run arbitrary code on each region in RegionServer
•  High level call interface for clients
•  Calls are addressed to rows or ranges of rows while
Coprocessors client library resolves locations
•  Calls to multiple rows are atomically split
•  Provides model for distributed services
•  Automatic scaling, load balancing, request routing

Coprocessors in HBase

•  Use for efficient computational parallelism
•  Secondary indexing (HBASE-2038)
•  Column Aggregates (HBASE-1512)
•  SQL-like sum(), avg(), max(), min(), etc.
•  Access control (HBASE-3025, HBASE-3045)
•  Provide basic access control
•  Table Metacolumns
•  New filtering
•  predicate pushdown
•  Table/Region access statistics
•  HLog extensions (HBASE-3257)

Coprocessor and RegionObserver

•  The Coprocessor interface defines these hooks
•  preOpen, postOpen: Called before and after the
region is reported as online to the master
•  preFlush, postFlush: Called before and after the
memstore is flushed into a new store file
•  preCompact, postCompact: Called before and after
compaction
•  preSplit, postSplit: Called after the region is split
•  preClose, postClose: Called before and after the
region is reported as closed to the master

Coprocessor and RegionObserver

•  The RegionObserver interface is defines these hooks:
•  preGet, postGet: Called before and after a client makes a Get
request
•  preExists, postExists: Called before and after the client tests for
existence using a Get
•  prePut, postPut: Called before and after the client stores a value
•  preDelete, postDelete: Called before and after the client deletes a
value
•  preScannerOpen, postScannerOpen: Called before and after the
client opens a new scanner
•  preScannerNext, postScannerNext: Called before and after the
client asks for the next row on a scanner
•  preScannerClose, postScannerClose: Called before and after the
client closes a scanner
•  preCheckAndPut, postCheckAndPut: Called before and after the
client calls checkAndPut()
•  preCheckAndDelete, postCheckAndDelete: Called before and after
the client calls checkAndDelete()

Current Project Status

•  HBase 0.90.x “Advanced Concepts”
•  Master Rewrite – More Zookeeper
•  Intra Row Scanning
•  Further optimizations on algorithms and data
structures
CDH3
•  HBase 0.92.x “Coprocessors”
•  Multi-DC Replication
•  Discretionary Access Control
•  Coprocessors
CDH4

Current Project Status (cont.)

•  HBase 0.94.x “Performance Release”
•  Read CRC Improvements
•  Seek Optimizations
•  WAL Compression
•  Prefix Compression (aka Block Encoding)
•  Atomic Append
•  Atomic put+delete
•  Multi Increment and Multi Append
•  Per-region (i.e. local) Multi-Row Transactions
•  WALPlayer

CDH4.x (soon)

Current Project Status (cont.)

•  HBase 0.96.x “The Singularity”
•  Protobuf RPC
•  Rolling Upgrades
•  Multiversion Access
•  Metrics V2
•  Preview Technologies
•  Snapshots
•  PrefixTrie Block Encoding

CDH5 ?

Intro to HBase - Lars George

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to Intro to HBase - Lars George (20)

More from JAX London (20)

Recently uploaded (20)

Intro to HBase - Lars George