SlideShare a Scribd company logo
Apache	
  HBase	
  1.0	
  Release	
  
Nick	
  Dimiduk,	
  Hortonworks	
  
	
  	
  	
  	
  	
  	
  @xefyr	
  	
  	
  	
  	
  	
  	
  	
  n10k.com	
  
February	
  20,	
  2015	
  
Release	
  1.0	
  
	
  
	
  
“The	
  theme	
  of	
  (eventual)	
  1.0	
  release	
  is	
  to	
  become	
  a	
  stable	
  
base	
  for	
  future	
  1.x	
  series	
  of	
  releases.	
  1.0	
  release	
  will	
  aim	
  to	
  
achieve	
  at	
  least	
  the	
  same	
  level	
  of	
  stability	
  of	
  0.98	
  releases	
  
without	
  introducing	
  too	
  many	
  new	
  features.”	
  
	
  
Enis	
  Söztutar	
  
HBase	
  1.0	
  Release	
  Manager	
  
Agenda	
  
•  A	
  Brief	
  History	
  of	
  HBase	
  
•  What	
  is	
  HBase	
  
•  Major	
  Changes	
  for	
  1.0	
  
•  Upgrade	
  Path	
  
A	
  BRIEF	
  HISTORY	
  OF	
  HBASE	
  
How	
  we	
  got	
  here	
  
The	
  Early	
  Years	
  
•  2006:	
  BigTable	
  paper	
  published	
  by	
  Google	
  
•  2006:	
  HBase	
  development	
  starts	
  
•  2007:	
  HBase	
  added	
  Hadoop	
  contrib	
  
•  2007:	
  Release	
  Hadoop	
  0.15.0	
  
•  2008:	
  Hadoop	
  graduates	
  Incubator	
  
•  2008:	
  HBase	
  becomes	
  Hadoop	
  sub-­‐project	
  
•  2008:	
  Release	
  HBase	
  0.18.1	
  
•  2009:	
  Release	
  HBase	
  0.19.0	
  
•  2009:	
  Release	
  HBase	
  0.20.0	
  
Into	
  Produc_on	
  
•  2010:	
  HBase	
  becomes	
  Apache	
  top-­‐level	
  project	
  
•  2011:	
  Release	
  HBase	
  0.90.0	
  
•  2011:	
  Release	
  HBase	
  0.92.0	
  
•  2011:	
  HBase:	
  The	
  Defini1ve	
  Guide	
  published	
  
•  2012:	
  Release	
  HBase	
  0.94.0	
  
•  2012:	
  First	
  HBaseCon	
  
•  2012:	
  HBase	
  Administra1on	
  Cookbook	
  published	
  
•  2012:	
  HBase	
  In	
  Ac1on	
  published	
  
Modern	
  HBase	
  
•  2013:	
  HBaseCon	
  2013	
  
•  2013:	
  Release	
  HBase	
  0.96.0	
  
•  2013:	
  Apache	
  Phoenix	
  enters	
  Incubator	
  
•  2014:	
  Release	
  HBase	
  0.98.0	
  
•  2014:	
  HBaseCon	
  2014	
  
•  2014:	
  Apache	
  Phoenix	
  graduates	
  Incubator	
  
•  2015:	
  Release	
  HBase	
  1.0	
  
…	
  
•  2016:	
  Release	
  HBase	
  2.0?	
  
WHAT	
  IS	
  HBASE	
  
HBase	
  architecture	
  in	
  5	
  minutes	
  or	
  less	
  
Data	
  Model	
  
1368387247 [3.6 kb png data]"thumb"cf2b
a
cf1
1368394583 7
1368394261 "hello"
"bar"
1368394583 22
1368394925 13.6
1368393847 "world"
"foo"
cf2
1368387684 "almost the loneliest number"1.0001
1368396302 "fourth of July""2011-07-04"
Table A
rowkey
column
family
column
qualifier
timestamp value
Rows
Column Families
Logical	
  Architecture	
  
a
b
d
c
e
f
h
g
i
j
l
k
m
n
p
o
Table A
Region 1
Region 2
Region 3
Region 4
Region Server 7
Table A, Region 1
Table A, Region 2
Table G, Region 1070
Table L, Region 25
Region Server 86
Table A, Region 3
Table C, Region 30
Table F, Region 160
Table F, Region 776
Region Server 367
Table A, Region 4
Table C, Region 17
Table E, Region 52
Table P, Region 1116
Physical	
  Architecture	
  
system and can therefore host any region (figure 3.8). By physically collocating Data
Nodes and RegionServers, you can use the data locality property; that is, RegionServ
ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In some
HBase deployments, the MapReduce framework isn’t deployed at all if the workload i
primarily random reads and writes. In other deployments, where the MapReduce pro
cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region
Servers can run together.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host
system and can therefore host any region (figure 3.8)
Nodes and RegionServers, you can use the data locali
ers can theoretically read and write to the local DataN
You may wonder where the TaskTrackers are in t
HBase deployments, the MapReduce framework isn’t d
primarily random reads and writes. In other deployme
cessing is also a part of the workloads, TaskTrackers,
Servers can run together.
DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are
system and can therefore host any region (figure 3.8). By physically colloca
Nodes and RegionServers, you can use the data locality property; that is, R
ers can theoretically read and write to the local DataNode as the primary D
You may wonder where the TaskTrackers are in this scheme of thing
HBase deployments, the MapReduce framework isn’t deployed at all if the w
primarily random reads and writes. In other deployments, where the MapR
cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBa
Servers can run together.
DataNode RegionServer DataNode RegionServer DataNode Reg
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on th
system and can therefore host any region (figure 3.8). By physica
Nodes and RegionServers, you can use the data locality property
ers can theoretically read and write to the local DataNode as the p
You may wonder where the TaskTrackers are in this scheme
HBase deployments, the MapReduce framework isn’t deployed at
primarily random reads and writes. In other deployments, where
cessing is also a part of the workloads, TaskTrackers, DataNodes
Servers can run together.
DataNode RegionServer DataNode RegionServer Dat
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically col
Region
Server
Data
Node
Region
Server
Data
Node
Region
Server
Data
Node
Region
Server
Data
Node
...
Nodes and RegionServers, you can use th
ers can theoretically read and write to the
You may wonder where the TaskTrac
HBase deployments, the MapReduce fram
primarily random reads and writes. In oth
cessing is also a part of the workloads, T
Servers can run together.
DataNode RegionServer DataNode
Figure 3.7 HBase RegionServer and HDFS DataNo
Master
Zoo
Keeper
Given that the underlying data is stored in HDFS, which is available to all clients as
a single namespace, all RegionServers have access to the same persisted files in the file
system and can therefore host any region (figure 3.8). By physically collocating Data-
Nodes and RegionServers, you can use the data locality property; that is, RegionServ-
ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In some
HBase deployments, the MapReduce framework isn’t deployed at all if the workload is
primarily random reads and writes. In other deployments, where the MapReduce pro-
cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-
Servers can run together.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Name
Node
cessing is also a part of the workloads, TaskTrackers, DataNode
Servers can run together.
DataNode RegionServer DataNode RegionServer Da
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically co
Licensed to Nick Dimiduk <ndimiduk@gmail.com>
HBase
Client
HDFS
HBase
MAJOR	
  CHANGES	
  FOR	
  1.0	
  
What’s	
  all	
  the	
  excitement	
  about?	
  
Stability:	
  Co-­‐Locate	
  Meta	
  with	
  Master	
  
•  Simplify,	
  Improve	
  region	
  assignment	
  reliability	
  
–  Fewer	
  components	
  involved	
  in	
  upda_ng	
  “truth”	
  
•  Master	
  embeds	
  a	
  RegionServer	
  
–  Will	
  host	
  only	
  system	
  tables	
  
–  Baby	
  step	
  towards	
  combining	
  RS/Master	
  into	
  a	
  single	
  hbase	
  daemon	
  
•  Backup	
  masters	
  unchanged	
  
–  Can	
  be	
  configured	
  to	
  host	
  user	
  tables	
  while	
  in	
  standby	
  
•  Plumbing	
  is	
  all	
  there,	
  off	
  by	
  default	
  
	
  
hip://issues.apache.org/jira/browse/HBASE-­‐10569	
  
Availability:	
  Region	
  Replicas	
  
•  Mul_ple	
  RegionServers	
  host	
  a	
  Region	
  
–  One	
  is	
  “primary”,	
  others	
  are	
  “replicas”	
  
–  Only	
  primary	
  accepts	
  writes	
  
•  Client	
  reads	
  against	
  primary	
  only	
  or	
  any	
  
–  Results	
  marked	
  as	
  appropriate	
  
•  Baby	
  step	
  toward	
  quorum	
  reads,	
  writes	
  
	
  
	
  
hip://issues.apache.org/jira/browse/HBASE-­‐10070	
  
hip://www.slideshare.net/HBaseCon/features-­‐session-­‐1	
  
Usability:	
  Client	
  API	
  Cleanup	
  
•  Improved	
  self-­‐consistency	
  
•  Simpler	
  seman_cs	
  
•  Easier	
  to	
  maintain	
  
•  Obvious	
  @InterfaceAudience	
  annota_ons	
  
	
  
	
  
hip://issues.apache.org/jira/browse/HBASE-­‐10602	
  
hip://s.apache.org/hbase-­‐1.0-­‐api	
  
hips://github.com/ndimiduk/hbase-­‐1.0-­‐api-­‐examples	
  
New	
  and	
  Noteworthy	
  
•  Greatly	
  expanded	
  hbase.apache.org/book.html	
  
•  Truncate	
  table	
  shell	
  command	
  
•  Automa_c	
  tuning	
  of	
  global	
  MemStore	
  and	
  BlockCache	
  sizes	
  
•  Basic	
  backpressure	
  mechanism	
  
•  BucketCache	
  easier	
  to	
  configure	
  
•  Compressed	
  BlockCache	
  
•  Pluggable	
  replica_on	
  endpoint	
  
•  A	
  Dockerfile	
  to	
  easily	
  run	
  HBase	
  from	
  source	
  
Under	
  the	
  Covers	
  
•  ZooKeeper	
  abstrac_ons	
  
•  Meta	
  table	
  used	
  for	
  assignment	
  
•  Cell-­‐based	
  read/write	
  path	
  
•  Combining	
  mvcc/seqid	
  
•  Sundry	
  security,	
  tags,	
  labels	
  improvements	
  
Groundwork	
  for	
  2.0	
  
•  More,	
  Smaller	
  Regions	
  
–  Millions,	
  1G	
  or	
  less	
  
–  Less	
  write	
  amplifica_on	
  
–  Splinng	
  hbase:meta	
  
•  Performance	
  
–  More	
  off-­‐heap	
  
–  Less	
  resource	
  conten_on	
  
–  Faster	
  region	
  failover/recovery	
  
–  Mul_ple	
  WALs	
  
–  QoS/Quotas/Mul_-­‐tenancy	
  
	
  
•  Rigging	
  
–  Faster,	
  more	
  intelligent	
  assignment	
  
–  Procedure	
  bus	
  
–  Resumable,	
  query-­‐able	
  opera_ons	
  
•  Other	
  possibili_es	
  
–  Quorum/consensus	
  reads,	
  writes?	
  
–  Hydrabase,	
  mul_-­‐DC	
  consensus?	
  
–  Streaming	
  RPCs?	
  
–  High	
  level	
  coprocessor	
  API	
  
Seman_c	
  Versioning	
  
•  Major/Minor/Patch	
  version	
  numbers	
  
–  Only	
  major/minor	
  pre-­‐1.0	
  
•  Dimensions	
  
–  Client/Server	
  wire	
  compa_bility	
  
–  Server/Server	
  wire	
  and	
  feature	
  compa_bility	
  
–  API	
  compa_bility	
  
–  ABI	
  compa_bility	
  
•  Proposal	
  up	
  for	
  a	
  vote	
  
	
  
hip://s.apache.org/hbase-­‐semver	
  
UPGRADE	
  PATH	
  
Tell	
  it	
  to	
  me	
  straight,	
  how	
  bad	
  is	
  it?	
  
Online/Wire	
  Compa_bility	
  
•  Direct	
  migra_on	
  from	
  0.94	
  supported	
  
–  Looks	
  a	
  lot	
  like	
  upgrade	
  from	
  0.94	
  to	
  0.96:	
  requires	
  down_me	
  
–  Not	
  tested	
  yet,	
  will	
  be	
  before	
  release	
  
•  RPC	
  is	
  backward-­‐compa_ble	
  to	
  0.96	
  
–  Enabled	
  mixing	
  clients	
  and	
  servers	
  across	
  versions	
  
–  So	
  long	
  as	
  no	
  new	
  features	
  are	
  enabled	
  
•  Rolling	
  upgrade	
  "out	
  of	
  the	
  box"	
  from	
  0.98	
  
•  Rolling	
  upgrade	
  "with	
  some	
  massaging"	
  from	
  0.96	
  
–  IE,	
  0.96	
  cannot	
  read	
  HFileV3,	
  the	
  new	
  default	
  
–  not	
  tested	
  yet,	
  will	
  be	
  before	
  release	
  
Client	
  Applica_on	
  Compa_bility	
  
•  API	
  is	
  backward	
  compa_ble	
  to	
  0.96	
  
–  No	
  code	
  change	
  required	
  
–  You’ll	
  start	
  genng	
  new	
  depreca_on	
  warnings	
  
–  We	
  recommend	
  you	
  start	
  using	
  new	
  APIs	
  
•  ABI	
  is	
  NOT	
  backward	
  compa_ble	
  
–  Cannot	
  drop	
  current	
  applica_on	
  jars	
  onto	
  new	
  run_me	
  
–  Recompile	
  your	
  applica_on	
  vs.	
  1.0	
  jars	
  
–  Just	
  like	
  0.96	
  to	
  0.98	
  upgrade	
  
Hadoop	
  Versions	
  
•  Hadoop	
  1.x	
  is	
  NOT	
  supported	
  
–  Bite	
  the	
  bullet;	
  you’ll	
  enjoy	
  the	
  performance	
  benefits	
  
•  Hadoop	
  2.x	
  only	
  
–  Most	
  thoroughly	
  tested	
  on	
  2.4.x,	
  2.5.x	
  
–  Probably	
  works	
  on	
  2.2.x,	
  2.3.x,	
  but	
  less	
  thoroughly	
  tested	
  
	
  
	
  
hips://hbase.apache.org/book/configura_on.html#hadoop	
  
Java	
  Versions	
  
•  JDK	
  6	
  is	
  NOT	
  supported!	
  
•  JDK	
  7	
  is	
  the	
  target	
  run_me	
  
•  JDK	
  8	
  support	
  is	
  experimental	
  
	
  
	
  
hips://hbase.apache.org/book/configura_on.html#hadoop	
  
1.0.0	
  RCs	
  Available	
  Now!	
  
•  Release	
  Candidate	
  vo_ng	
  has	
  commenced	
  	
  
•  Last	
  chance	
  to	
  catch	
  show-­‐stopping	
  bugs	
  
	
  
RELEASE	
  CANDIDATES	
  NOT	
  FOR	
  PRODUCTION	
  USE	
  
	
  
•  Try	
  out	
  the	
  new	
  features	
  
•  Help	
  us	
  test	
  your	
  upgrade	
  path	
  
•  Be	
  a	
  part	
  of	
  history	
  in	
  the	
  making!	
  
•  1.0.0rc5	
  available	
  2015-­‐02-­‐19	
  
	
  
hip://search-­‐hadoop.com/m/DHED40Ih5n	
  
Thanks!	
  
M A N N I N G
Nick Dimiduk
Amandeep Khurana
FOREWORD BY
Michael Stack
hbaseinac_on.com	
  
Nick	
  Dimiduk	
  
	
   	
  	
  	
  	
  	
  github.com/ndimiduk	
  
	
   	
  	
  	
  	
  	
  @xefyr	
  
	
   	
  	
  	
  	
  	
  n10k.com	
  
hip://www.apache.org/dyn/closer.cgi/hbase/	
  

More Related Content

What's hot (20)

PDF
Meet HBase 1.0
enissoz
 
PDF
HBase for Architects
Nick Dimiduk
 
PDF
Intro to HBase - Lars George
JAX London
 
ODP
Apache hadoop hbase
sheetal sharma
 
PPTX
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Cloudera, Inc.
 
PDF
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
mas4share
 
PPTX
Apache HBase Performance Tuning
Lars Hofhansl
 
PPTX
HBase: Just the Basics
HBaseCon
 
PPTX
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
 
PPTX
HBase Accelerated: In-Memory Flush and Compaction
DataWorks Summit/Hadoop Summit
 
PDF
HBase Application Performance Improvement
Biju Nair
 
PPTX
Hadoop hbase mapreduce
FARUK BERKSÖZ
 
PPTX
HBase Advanced Schema Design - Berlin Buzzwords - June 2012
larsgeorge
 
PPTX
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon
 
PDF
Facebook keynote-nicolas-qcon
Yiwei Ma
 
PPTX
HBase Data Modeling and Access Patterns with Kite SDK
HBaseCon
 
PDF
White paper hadoop performancetuning
Anil Reddy
 
PDF
Apache HBase - Just the Basics
HBaseCon
 
PPTX
HBaseCon 2013: Compaction Improvements in Apache HBase
Cloudera, Inc.
 
PDF
HBase Storage Internals
DataWorks Summit
 
Meet HBase 1.0
enissoz
 
HBase for Architects
Nick Dimiduk
 
Intro to HBase - Lars George
JAX London
 
Apache hadoop hbase
sheetal sharma
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Cloudera, Inc.
 
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
mas4share
 
Apache HBase Performance Tuning
Lars Hofhansl
 
HBase: Just the Basics
HBaseCon
 
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
 
HBase Accelerated: In-Memory Flush and Compaction
DataWorks Summit/Hadoop Summit
 
HBase Application Performance Improvement
Biju Nair
 
Hadoop hbase mapreduce
FARUK BERKSÖZ
 
HBase Advanced Schema Design - Berlin Buzzwords - June 2012
larsgeorge
 
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon
 
Facebook keynote-nicolas-qcon
Yiwei Ma
 
HBase Data Modeling and Access Patterns with Kite SDK
HBaseCon
 
White paper hadoop performancetuning
Anil Reddy
 
Apache HBase - Just the Basics
HBaseCon
 
HBaseCon 2013: Compaction Improvements in Apache HBase
Cloudera, Inc.
 
HBase Storage Internals
DataWorks Summit
 

Viewers also liked (19)

PDF
Facebook Messages & HBase
强 王
 
PPTX
Ppt shapes
Nag S
 
PDF
Apache Big Data EU 2015 - HBase
Nick Dimiduk
 
PPTX
HBase Operations and Best Practices
Venu Anuganti
 
PPTX
Hive: Loading Data
Benjamin Leonhardi
 
PPTX
An Overview of Apache Cassandra
DataStax
 
PDF
Intro to HBase
alexbaranau
 
PDF
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
Cloudera, Inc.
 
PDF
How to start a startup 1-10강
종익 주
 
ZIP
Performance and Fault Tolerance for the Netflix API
Ben Christensen
 
PDF
The analytics edge
Rafael Mendes
 
PDF
Web intelligence and big data
Rafael Mendes
 
PPTX
The Evolution of a Relational Database Layer over HBase
DataWorks Summit
 
PPTX
Celi @Codemotion 2014 - Roberto Franchini GlusterFS
CELI
 
PDF
Introduction to HBase
Avkash Chauhan
 
PDF
HBase Data Types (WIP)
Nick Dimiduk
 
PPTX
HBase Low Latency, StrataNYC 2014
Nick Dimiduk
 
PDF
Bring Cartography to the Cloud
Nick Dimiduk
 
PDF
Big Data – HBase, integrando hadoop, bi e dw; Montando o seu big data Cloude...
Flavio Fonte, PMP, ITIL
 
Facebook Messages & HBase
强 王
 
Ppt shapes
Nag S
 
Apache Big Data EU 2015 - HBase
Nick Dimiduk
 
HBase Operations and Best Practices
Venu Anuganti
 
Hive: Loading Data
Benjamin Leonhardi
 
An Overview of Apache Cassandra
DataStax
 
Intro to HBase
alexbaranau
 
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
Cloudera, Inc.
 
How to start a startup 1-10강
종익 주
 
Performance and Fault Tolerance for the Netflix API
Ben Christensen
 
The analytics edge
Rafael Mendes
 
Web intelligence and big data
Rafael Mendes
 
The Evolution of a Relational Database Layer over HBase
DataWorks Summit
 
Celi @Codemotion 2014 - Roberto Franchini GlusterFS
CELI
 
Introduction to HBase
Avkash Chauhan
 
HBase Data Types (WIP)
Nick Dimiduk
 
HBase Low Latency, StrataNYC 2014
Nick Dimiduk
 
Bring Cartography to the Cloud
Nick Dimiduk
 
Big Data – HBase, integrando hadoop, bi e dw; Montando o seu big data Cloude...
Flavio Fonte, PMP, ITIL
 
Ad

Similar to Apache HBase 1.0 Release (20)

PDF
Hbase mhug 2015
Joseph Niemiec
 
PDF
HBase, crazy dances on the elephant back.
Roman Nikitchenko
 
DOC
Hadoop cluster configuration
prabakaranbrick
 
PPT
1.0 vs2.0
Ramnaresh Mantri
 
PDF
Big Data Conference April 2015
Aaron Benz
 
PPTX
Unit II Hadoop Ecosystem_Updated.pptx
BhavanaHotchandani
 
PPTX
Hive It stores schema in a database and processed data into HDFS. It provides...
rajsigh020
 
PDF
HBaseCon 2015: Meet HBase 1.0
HBaseCon
 
PPT
Taylor bosc2010
BOSC 2010
 
PDF
支撑Facebook消息处理的h base存储系统
yongboy
 
PPTX
Apache HBase™
Prashant Gupta
 
PPTX
Hbase
AllsoftSolutions
 
PPT
Apache hadoop, hdfs and map reduce Overview
Nisanth Simon
 
PPTX
H base introduction & development
Shashwat Shriparv
 
PDF
Hypertable Distilled by edydkim.github.com
Edward D. Kim
 
PPTX
H base
Shashwat Shriparv
 
PDF
Mar 2012 HUG: Hive with HBase
Yahoo Developer Network
 
PPTX
Big Data and Hadoop Guide
Simplilearn
 
PPTX
Hadoop_arunam_ppt
jerrin joseph
 
Hbase mhug 2015
Joseph Niemiec
 
HBase, crazy dances on the elephant back.
Roman Nikitchenko
 
Hadoop cluster configuration
prabakaranbrick
 
1.0 vs2.0
Ramnaresh Mantri
 
Big Data Conference April 2015
Aaron Benz
 
Unit II Hadoop Ecosystem_Updated.pptx
BhavanaHotchandani
 
Hive It stores schema in a database and processed data into HDFS. It provides...
rajsigh020
 
HBaseCon 2015: Meet HBase 1.0
HBaseCon
 
Taylor bosc2010
BOSC 2010
 
支撑Facebook消息处理的h base存储系统
yongboy
 
Apache HBase™
Prashant Gupta
 
Apache hadoop, hdfs and map reduce Overview
Nisanth Simon
 
H base introduction & development
Shashwat Shriparv
 
Hypertable Distilled by edydkim.github.com
Edward D. Kim
 
Mar 2012 HUG: Hive with HBase
Yahoo Developer Network
 
Big Data and Hadoop Guide
Simplilearn
 
Hadoop_arunam_ppt
jerrin joseph
 
Ad

More from Nick Dimiduk (7)

PDF
Apache Big Data EU 2015 - Phoenix
Nick Dimiduk
 
PDF
HBase Blockcache 101
Nick Dimiduk
 
PDF
HBase Data Types
Nick Dimiduk
 
PDF
Apache HBase Low Latency
Nick Dimiduk
 
PDF
HBase Client APIs (for webapps?)
Nick Dimiduk
 
PPTX
Pig, Making Hadoop Easy
Nick Dimiduk
 
KEY
Introduction to Hadoop, HBase, and NoSQL
Nick Dimiduk
 
Apache Big Data EU 2015 - Phoenix
Nick Dimiduk
 
HBase Blockcache 101
Nick Dimiduk
 
HBase Data Types
Nick Dimiduk
 
Apache HBase Low Latency
Nick Dimiduk
 
HBase Client APIs (for webapps?)
Nick Dimiduk
 
Pig, Making Hadoop Easy
Nick Dimiduk
 
Introduction to Hadoop, HBase, and NoSQL
Nick Dimiduk
 

Recently uploaded (20)

PPTX
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
PPTX
Simple and concise overview about Quantum computing..pptx
mughal641
 
PPTX
AVL ( audio, visuals or led ), technology.
Rajeshwri Panchal
 
PDF
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
PDF
TrustArc Webinar - Navigating Data Privacy in LATAM: Laws, Trends, and Compli...
TrustArc
 
PPTX
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
PPTX
Agile Chennai 18-19 July 2025 | Workshop - Enhancing Agile Collaboration with...
AgileNetwork
 
PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PPTX
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
PPTX
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
PDF
RAT Builders - How to Catch Them All [DeepSec 2024]
malmoeb
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
PDF
Build with AI and GDG Cloud Bydgoszcz- ADK .pdf
jaroslawgajewski1
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PPTX
Farrell_Programming Logic and Design slides_10e_ch02_PowerPoint.pptx
bashnahara11
 
PDF
Generative AI vs Predictive AI-The Ultimate Comparison Guide
Lily Clark
 
PPTX
The Future of AI & Machine Learning.pptx
pritsen4700
 
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
Simple and concise overview about Quantum computing..pptx
mughal641
 
AVL ( audio, visuals or led ), technology.
Rajeshwri Panchal
 
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
TrustArc Webinar - Navigating Data Privacy in LATAM: Laws, Trends, and Compli...
TrustArc
 
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
Agile Chennai 18-19 July 2025 | Workshop - Enhancing Agile Collaboration with...
AgileNetwork
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
RAT Builders - How to Catch Them All [DeepSec 2024]
malmoeb
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
Build with AI and GDG Cloud Bydgoszcz- ADK .pdf
jaroslawgajewski1
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
Farrell_Programming Logic and Design slides_10e_ch02_PowerPoint.pptx
bashnahara11
 
Generative AI vs Predictive AI-The Ultimate Comparison Guide
Lily Clark
 
The Future of AI & Machine Learning.pptx
pritsen4700
 

Apache HBase 1.0 Release

  • 1. Apache  HBase  1.0  Release   Nick  Dimiduk,  Hortonworks              @xefyr                n10k.com   February  20,  2015  
  • 2. Release  1.0       “The  theme  of  (eventual)  1.0  release  is  to  become  a  stable   base  for  future  1.x  series  of  releases.  1.0  release  will  aim  to   achieve  at  least  the  same  level  of  stability  of  0.98  releases   without  introducing  too  many  new  features.”     Enis  Söztutar   HBase  1.0  Release  Manager  
  • 3. Agenda   •  A  Brief  History  of  HBase   •  What  is  HBase   •  Major  Changes  for  1.0   •  Upgrade  Path  
  • 4. A  BRIEF  HISTORY  OF  HBASE   How  we  got  here  
  • 5. The  Early  Years   •  2006:  BigTable  paper  published  by  Google   •  2006:  HBase  development  starts   •  2007:  HBase  added  Hadoop  contrib   •  2007:  Release  Hadoop  0.15.0   •  2008:  Hadoop  graduates  Incubator   •  2008:  HBase  becomes  Hadoop  sub-­‐project   •  2008:  Release  HBase  0.18.1   •  2009:  Release  HBase  0.19.0   •  2009:  Release  HBase  0.20.0  
  • 6. Into  Produc_on   •  2010:  HBase  becomes  Apache  top-­‐level  project   •  2011:  Release  HBase  0.90.0   •  2011:  Release  HBase  0.92.0   •  2011:  HBase:  The  Defini1ve  Guide  published   •  2012:  Release  HBase  0.94.0   •  2012:  First  HBaseCon   •  2012:  HBase  Administra1on  Cookbook  published   •  2012:  HBase  In  Ac1on  published  
  • 7. Modern  HBase   •  2013:  HBaseCon  2013   •  2013:  Release  HBase  0.96.0   •  2013:  Apache  Phoenix  enters  Incubator   •  2014:  Release  HBase  0.98.0   •  2014:  HBaseCon  2014   •  2014:  Apache  Phoenix  graduates  Incubator   •  2015:  Release  HBase  1.0   …   •  2016:  Release  HBase  2.0?  
  • 8. WHAT  IS  HBASE   HBase  architecture  in  5  minutes  or  less  
  • 9. Data  Model   1368387247 [3.6 kb png data]"thumb"cf2b a cf1 1368394583 7 1368394261 "hello" "bar" 1368394583 22 1368394925 13.6 1368393847 "world" "foo" cf2 1368387684 "almost the loneliest number"1.0001 1368396302 "fourth of July""2011-07-04" Table A rowkey column family column qualifier timestamp value Rows Column Families
  • 10. Logical  Architecture   a b d c e f h g i j l k m n p o Table A Region 1 Region 2 Region 3 Region 4 Region Server 7 Table A, Region 1 Table A, Region 2 Table G, Region 1070 Table L, Region 25 Region Server 86 Table A, Region 3 Table C, Region 30 Table F, Region 160 Table F, Region 776 Region Server 367 Table A, Region 4 Table C, Region 17 Table E, Region 52 Table P, Region 1116
  • 11. Physical  Architecture   system and can therefore host any region (figure 3.8). By physically collocating Data Nodes and RegionServers, you can use the data locality property; that is, RegionServ ers can theoretically read and write to the local DataNode as the primary DataNode. You may wonder where the TaskTrackers are in this scheme of things. In some HBase deployments, the MapReduce framework isn’t deployed at all if the workload i primarily random reads and writes. In other deployments, where the MapReduce pro cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region Servers can run together. DataNode RegionServer DataNode RegionServer DataNode RegionServer Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host system and can therefore host any region (figure 3.8) Nodes and RegionServers, you can use the data locali ers can theoretically read and write to the local DataN You may wonder where the TaskTrackers are in t HBase deployments, the MapReduce framework isn’t d primarily random reads and writes. In other deployme cessing is also a part of the workloads, TaskTrackers, Servers can run together. DataNode RegionServer DataNode RegionServer Figure 3.7 HBase RegionServer and HDFS DataNode processes are system and can therefore host any region (figure 3.8). By physically colloca Nodes and RegionServers, you can use the data locality property; that is, R ers can theoretically read and write to the local DataNode as the primary D You may wonder where the TaskTrackers are in this scheme of thing HBase deployments, the MapReduce framework isn’t deployed at all if the w primarily random reads and writes. In other deployments, where the MapR cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBa Servers can run together. DataNode RegionServer DataNode RegionServer DataNode Reg Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on th system and can therefore host any region (figure 3.8). By physica Nodes and RegionServers, you can use the data locality property ers can theoretically read and write to the local DataNode as the p You may wonder where the TaskTrackers are in this scheme HBase deployments, the MapReduce framework isn’t deployed at primarily random reads and writes. In other deployments, where cessing is also a part of the workloads, TaskTrackers, DataNodes Servers can run together. DataNode RegionServer DataNode RegionServer Dat Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically col Region Server Data Node Region Server Data Node Region Server Data Node Region Server Data Node ... Nodes and RegionServers, you can use th ers can theoretically read and write to the You may wonder where the TaskTrac HBase deployments, the MapReduce fram primarily random reads and writes. In oth cessing is also a part of the workloads, T Servers can run together. DataNode RegionServer DataNode Figure 3.7 HBase RegionServer and HDFS DataNo Master Zoo Keeper Given that the underlying data is stored in HDFS, which is available to all clients as a single namespace, all RegionServers have access to the same persisted files in the file system and can therefore host any region (figure 3.8). By physically collocating Data- Nodes and RegionServers, you can use the data locality property; that is, RegionServ- ers can theoretically read and write to the local DataNode as the primary DataNode. You may wonder where the TaskTrackers are in this scheme of things. In some HBase deployments, the MapReduce framework isn’t deployed at all if the workload is primarily random reads and writes. In other deployments, where the MapReduce pro- cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region- Servers can run together. DataNode RegionServer DataNode RegionServer DataNode RegionServer Name Node cessing is also a part of the workloads, TaskTrackers, DataNode Servers can run together. DataNode RegionServer DataNode RegionServer Da Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically co Licensed to Nick Dimiduk <[email protected]> HBase Client HDFS HBase
  • 12. MAJOR  CHANGES  FOR  1.0   What’s  all  the  excitement  about?  
  • 13. Stability:  Co-­‐Locate  Meta  with  Master   •  Simplify,  Improve  region  assignment  reliability   –  Fewer  components  involved  in  upda_ng  “truth”   •  Master  embeds  a  RegionServer   –  Will  host  only  system  tables   –  Baby  step  towards  combining  RS/Master  into  a  single  hbase  daemon   •  Backup  masters  unchanged   –  Can  be  configured  to  host  user  tables  while  in  standby   •  Plumbing  is  all  there,  off  by  default     hip://issues.apache.org/jira/browse/HBASE-­‐10569  
  • 14. Availability:  Region  Replicas   •  Mul_ple  RegionServers  host  a  Region   –  One  is  “primary”,  others  are  “replicas”   –  Only  primary  accepts  writes   •  Client  reads  against  primary  only  or  any   –  Results  marked  as  appropriate   •  Baby  step  toward  quorum  reads,  writes       hip://issues.apache.org/jira/browse/HBASE-­‐10070   hip://www.slideshare.net/HBaseCon/features-­‐session-­‐1  
  • 15. Usability:  Client  API  Cleanup   •  Improved  self-­‐consistency   •  Simpler  seman_cs   •  Easier  to  maintain   •  Obvious  @InterfaceAudience  annota_ons       hip://issues.apache.org/jira/browse/HBASE-­‐10602   hip://s.apache.org/hbase-­‐1.0-­‐api   hips://github.com/ndimiduk/hbase-­‐1.0-­‐api-­‐examples  
  • 16. New  and  Noteworthy   •  Greatly  expanded  hbase.apache.org/book.html   •  Truncate  table  shell  command   •  Automa_c  tuning  of  global  MemStore  and  BlockCache  sizes   •  Basic  backpressure  mechanism   •  BucketCache  easier  to  configure   •  Compressed  BlockCache   •  Pluggable  replica_on  endpoint   •  A  Dockerfile  to  easily  run  HBase  from  source  
  • 17. Under  the  Covers   •  ZooKeeper  abstrac_ons   •  Meta  table  used  for  assignment   •  Cell-­‐based  read/write  path   •  Combining  mvcc/seqid   •  Sundry  security,  tags,  labels  improvements  
  • 18. Groundwork  for  2.0   •  More,  Smaller  Regions   –  Millions,  1G  or  less   –  Less  write  amplifica_on   –  Splinng  hbase:meta   •  Performance   –  More  off-­‐heap   –  Less  resource  conten_on   –  Faster  region  failover/recovery   –  Mul_ple  WALs   –  QoS/Quotas/Mul_-­‐tenancy     •  Rigging   –  Faster,  more  intelligent  assignment   –  Procedure  bus   –  Resumable,  query-­‐able  opera_ons   •  Other  possibili_es   –  Quorum/consensus  reads,  writes?   –  Hydrabase,  mul_-­‐DC  consensus?   –  Streaming  RPCs?   –  High  level  coprocessor  API  
  • 19. Seman_c  Versioning   •  Major/Minor/Patch  version  numbers   –  Only  major/minor  pre-­‐1.0   •  Dimensions   –  Client/Server  wire  compa_bility   –  Server/Server  wire  and  feature  compa_bility   –  API  compa_bility   –  ABI  compa_bility   •  Proposal  up  for  a  vote     hip://s.apache.org/hbase-­‐semver  
  • 20. UPGRADE  PATH   Tell  it  to  me  straight,  how  bad  is  it?  
  • 21. Online/Wire  Compa_bility   •  Direct  migra_on  from  0.94  supported   –  Looks  a  lot  like  upgrade  from  0.94  to  0.96:  requires  down_me   –  Not  tested  yet,  will  be  before  release   •  RPC  is  backward-­‐compa_ble  to  0.96   –  Enabled  mixing  clients  and  servers  across  versions   –  So  long  as  no  new  features  are  enabled   •  Rolling  upgrade  "out  of  the  box"  from  0.98   •  Rolling  upgrade  "with  some  massaging"  from  0.96   –  IE,  0.96  cannot  read  HFileV3,  the  new  default   –  not  tested  yet,  will  be  before  release  
  • 22. Client  Applica_on  Compa_bility   •  API  is  backward  compa_ble  to  0.96   –  No  code  change  required   –  You’ll  start  genng  new  depreca_on  warnings   –  We  recommend  you  start  using  new  APIs   •  ABI  is  NOT  backward  compa_ble   –  Cannot  drop  current  applica_on  jars  onto  new  run_me   –  Recompile  your  applica_on  vs.  1.0  jars   –  Just  like  0.96  to  0.98  upgrade  
  • 23. Hadoop  Versions   •  Hadoop  1.x  is  NOT  supported   –  Bite  the  bullet;  you’ll  enjoy  the  performance  benefits   •  Hadoop  2.x  only   –  Most  thoroughly  tested  on  2.4.x,  2.5.x   –  Probably  works  on  2.2.x,  2.3.x,  but  less  thoroughly  tested       hips://hbase.apache.org/book/configura_on.html#hadoop  
  • 24. Java  Versions   •  JDK  6  is  NOT  supported!   •  JDK  7  is  the  target  run_me   •  JDK  8  support  is  experimental       hips://hbase.apache.org/book/configura_on.html#hadoop  
  • 25. 1.0.0  RCs  Available  Now!   •  Release  Candidate  vo_ng  has  commenced     •  Last  chance  to  catch  show-­‐stopping  bugs     RELEASE  CANDIDATES  NOT  FOR  PRODUCTION  USE     •  Try  out  the  new  features   •  Help  us  test  your  upgrade  path   •  Be  a  part  of  history  in  the  making!   •  1.0.0rc5  available  2015-­‐02-­‐19     hip://search-­‐hadoop.com/m/DHED40Ih5n  
  • 26. Thanks!   M A N N I N G Nick Dimiduk Amandeep Khurana FOREWORD BY Michael Stack hbaseinac_on.com   Nick  Dimiduk              github.com/ndimiduk              @xefyr              n10k.com   hip://www.apache.org/dyn/closer.cgi/hbase/  

Editor's Notes

  • #2: Now with 1000% more Orca!
  • #3: Stable Reliable Performant
  • #14: Improving the distributed system “rigging” Consider enabling in highly volatile environments (like EC2)
  • #18: “paving the way for new features and 2.0”
  • #21: How to get from here to there
  • #27: hbaseugcf (43% off HBase in Action, all formats, valid through Nov 20)