HBase vs. Hive

     Philip Wickline
Chief Technology Officer
         Hadapt
Goals


Brief introduction to the differences between
   transactional/operational and analytical systems



Understand when to use Hive and when to use HBase and why




                                                            2
Databases




            3
Datastores




             4
Differences of Purpose : “Transaction Processing”
Operational systems
• Optimized for small short random access – reads and writes
• E.g. record that an employee invested $100 in a S&P500 index
  fund in his 401(k) *or* record that a user posted something on
  another users “wall”

Traditional DB examples
• Oracle
• MySQL
NoSQL Examples
• HBase
• MongoDB
• Cassandra
                                                                   5
Differences of Purpose: Analytics
Analytics
• Optimized for read-only computations about large amounts of
  data
• E.g. compute the average amount invested in bond funds and
  stock funds for all employees at all employers over the last 5
  years                              10
                                                           5
                                                           0                       5-10

DB Examples                                                             Option 1   0-5


• Netezza
• Vertica
                  16
                  14
                  12                                                     Option 1
NoSQL Examples    10
                   8                                           Plan                       Acme
                   6
• Hive                                                         Actual                     GM
                   4
                                                                                          Newco
                   2

• Pig              0                                                                      Oldco
                       Oct   Nov   Dec   Jan   Feb   Mar                                  Bigcorp


                                                                                                    6
HBase Data Model : Conceptual


From the BigTable paper:
“a sparse, distributed, persistent multi-dimensional sorted map”



(row : bytestring, column family : bytestring, column : bytestring,
time : int64) -> byte string




                                                                      7
HBase Map
{ ”key_1" : {
   ”columnfamily_a" : {
     ”column_i" : {
       15 : "y",
       4 : "m"
     },
     ”column_ii" : {
       15 : "d”,
   }},
   “columnfamily_b" : {
     ”column_other" : {
       6 : "w"
       3 : "o"
       1 : "w”
  }}}}
                          8
Hive Data Model : Conceptual
Traditional Relational Tables

CUSTKEY   NAME   ADDRESS      NATIONKEY   PHONE      ACCTBAL      COMMENT
451234    NEWC   196          1           111-555-   $1,231,285   NULL
          ORP    Broadway                 1212
                 …
887765    ACME   1 Main st.   2           222-555-   $46,945      “Top
                 …                        1212                    customer”




                                                                              9
HBase Data Model : Physical

Every cell stored with row, family, column and timestamp
Allows fast lookup with low copy overhead
BUT
Space inefficient (optional compression available) and inefficient
   to scan

      “key_1”   “cf_a”    “c_i”     15        “foo”
      “key_1”   “cf_a”    “c_ii”    15        “bar”
      “key_2”   “cf_a”    “c_ii”    4         “baz”




                                                                     10
Hive Data Model : Physical
Depends on the underlying storage files
Can use flat text files, RCFiles, even use HBase for storage



Standard Row Storage

    C_1        C_2        C_3        C_4
    11         12         13         14
    21         22         23         24
    31         32         33         34
    41         42         43         44
    51         52         53         54



                                                               11
Hive Data Model : RCFile
Break into row groups, and then store as columns

                         Row Group 1
       C_1          11           21           31
       C_2          12           22           32
       C_3          13           23           33
       C_4          14           24           34


                   Row Group 2
       C_1          41           51
       C_2          42           52
       C_3          43           53
       C_4          44           54



                                                   12
Informal Performance Comparison


                   Hive                HBase
  Insert Speed     batch               Fast!
  Update Speed     NA                  Fast!
  Lookup speed     MR lower bound      Fast!
                   (10s of seconds)
  Data warehouse   15x faster on one   Uh oh
  queries          test




                                               13
THANK YOU

More Related Content

PPTX
NoSQL & HBase overview
PDF
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...
PDF
Ceph and RocksDB
PDF
HBase for Architects
PDF
Hypertable
PDF
xPatterns on Spark, Tachyon and Mesos - Bucharest meetup
PPTX
Pptx present
PDF
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
NoSQL & HBase overview
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...
Ceph and RocksDB
HBase for Architects
Hypertable
xPatterns on Spark, Tachyon and Mesos - Bucharest meetup
Pptx present
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...

What's hot (20)

PDF
Hypertable - massively scalable nosql database
PDF
NoSQL Overview
PPT
Database Architectures and Hypertable
PPT
Redis深入浅出
PDF
HBaseCon 2013: Honeycomb - MySQL Backed by Apache HBase
PDF
What's New Tajo 0.10 and Its Beyond
PPTX
Millions of Regions in HBase: Size Matters
PPTX
HBase: Just the Basics
PPTX
No sql solutions - 공개용
PPTX
New features in Pig 0.11
PPTX
BIG DATA: Apache Hadoop
PDF
Boosting Machine Learning with Redis Modules and Spark
PPTX
Introduction to Apache Drill
PPTX
NoSQL: Cassadra vs. HBase
PDF
Introducción a hadoop
PDF
Key-Value-Stores -- The Key to Scaling?
PPTX
Apache drill
PPTX
Pig with Cassandra: Adventures in Analytics
PDF
Apache Hadoop and HBase
Hypertable - massively scalable nosql database
NoSQL Overview
Database Architectures and Hypertable
Redis深入浅出
HBaseCon 2013: Honeycomb - MySQL Backed by Apache HBase
What's New Tajo 0.10 and Its Beyond
Millions of Regions in HBase: Size Matters
HBase: Just the Basics
No sql solutions - 공개용
New features in Pig 0.11
BIG DATA: Apache Hadoop
Boosting Machine Learning with Redis Modules and Spark
Introduction to Apache Drill
NoSQL: Cassadra vs. HBase
Introducción a hadoop
Key-Value-Stores -- The Key to Scaling?
Apache drill
Pig with Cassandra: Adventures in Analytics
Apache Hadoop and HBase
Ad

Similar to H base vs hive srp vs analytics 2-14-2012 (20)

PDF
Apache HBase: Introduction to a column-oriented data store
PDF
Breaking with relational dbms and dating with hbase
PPTX
Cassandra 2012 scandit
PDF
Philly DB MapR M7 - March 2013
PDF
PhillyDB Hbase and MapR M7 - March 2013
PPTX
HBase Advanced Schema Design - Berlin Buzzwords - June 2012
PPTX
Couchbase at the academic bisilim, Turkey
PPTX
TriHUG January 2012 Talk by Chris Shain
PDF
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012
PDF
Slide presentation pycassa_upload
PDF
Hw09 Sqoop Database Import For Hadoop
PDF
Go simple-fast-elastic-with-couchbase-server-borkar
PDF
Hbase schema design and sizing apache-con europe - nov 2012
PDF
Hbase jdd
PPT
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
PDF
MyCassandra (Full English Version)
PDF
Hadoop and Hive Development at Facebook
 
PDF
Hadoop and Hive Development at Facebook
PDF
Intro to HBase
PDF
Navigating the Transition from relational to NoSQL - CloudCon Expo 2012
Apache HBase: Introduction to a column-oriented data store
Breaking with relational dbms and dating with hbase
Cassandra 2012 scandit
Philly DB MapR M7 - March 2013
PhillyDB Hbase and MapR M7 - March 2013
HBase Advanced Schema Design - Berlin Buzzwords - June 2012
Couchbase at the academic bisilim, Turkey
TriHUG January 2012 Talk by Chris Shain
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012
Slide presentation pycassa_upload
Hw09 Sqoop Database Import For Hadoop
Go simple-fast-elastic-with-couchbase-server-borkar
Hbase schema design and sizing apache-con europe - nov 2012
Hbase jdd
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
MyCassandra (Full English Version)
Hadoop and Hive Development at Facebook
 
Hadoop and Hive Development at Facebook
Intro to HBase
Navigating the Transition from relational to NoSQL - CloudCon Expo 2012
Ad

Recently uploaded (20)

PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PDF
Developing a website for English-speaking practice to English as a foreign la...
PPTX
Chapter 5: Probability Theory and Statistics
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
CloudStack 4.21: First Look Webinar slides
PPTX
observCloud-Native Containerability and monitoring.pptx
PDF
August Patch Tuesday
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PDF
Getting Started with Data Integration: FME Form 101
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
WOOl fibre morphology and structure.pdf for textiles
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PDF
A comparative study of natural language inference in Swahili using monolingua...
PPT
Module 1.ppt Iot fundamentals and Architecture
PPTX
O2C Customer Invoices to Receipt V15A.pptx
PDF
STKI Israel Market Study 2025 version august
PPTX
Modernising the Digital Integration Hub
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
Zenith AI: Advanced Artificial Intelligence
A contest of sentiment analysis: k-nearest neighbor versus neural network
Developing a website for English-speaking practice to English as a foreign la...
Chapter 5: Probability Theory and Statistics
Group 1 Presentation -Planning and Decision Making .pptx
CloudStack 4.21: First Look Webinar slides
observCloud-Native Containerability and monitoring.pptx
August Patch Tuesday
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
Getting Started with Data Integration: FME Form 101
Univ-Connecticut-ChatGPT-Presentaion.pdf
WOOl fibre morphology and structure.pdf for textiles
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
A comparative study of natural language inference in Swahili using monolingua...
Module 1.ppt Iot fundamentals and Architecture
O2C Customer Invoices to Receipt V15A.pptx
STKI Israel Market Study 2025 version august
Modernising the Digital Integration Hub
Final SEM Unit 1 for mit wpu at pune .pptx
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
Zenith AI: Advanced Artificial Intelligence

H base vs hive srp vs analytics 2-14-2012

  • 1. HBase vs. Hive Philip Wickline Chief Technology Officer Hadapt
  • 2. Goals Brief introduction to the differences between transactional/operational and analytical systems Understand when to use Hive and when to use HBase and why 2
  • 5. Differences of Purpose : “Transaction Processing” Operational systems • Optimized for small short random access – reads and writes • E.g. record that an employee invested $100 in a S&P500 index fund in his 401(k) *or* record that a user posted something on another users “wall” Traditional DB examples • Oracle • MySQL NoSQL Examples • HBase • MongoDB • Cassandra 5
  • 6. Differences of Purpose: Analytics Analytics • Optimized for read-only computations about large amounts of data • E.g. compute the average amount invested in bond funds and stock funds for all employees at all employers over the last 5 years 10 5 0 5-10 DB Examples Option 1 0-5 • Netezza • Vertica 16 14 12 Option 1 NoSQL Examples 10 8 Plan Acme 6 • Hive Actual GM 4 Newco 2 • Pig 0 Oldco Oct Nov Dec Jan Feb Mar Bigcorp 6
  • 7. HBase Data Model : Conceptual From the BigTable paper: “a sparse, distributed, persistent multi-dimensional sorted map” (row : bytestring, column family : bytestring, column : bytestring, time : int64) -> byte string 7
  • 8. HBase Map { ”key_1" : { ”columnfamily_a" : { ”column_i" : { 15 : "y", 4 : "m" }, ”column_ii" : { 15 : "d”, }}, “columnfamily_b" : { ”column_other" : { 6 : "w" 3 : "o" 1 : "w” }}}} 8
  • 9. Hive Data Model : Conceptual Traditional Relational Tables CUSTKEY NAME ADDRESS NATIONKEY PHONE ACCTBAL COMMENT 451234 NEWC 196 1 111-555- $1,231,285 NULL ORP Broadway 1212 … 887765 ACME 1 Main st. 2 222-555- $46,945 “Top … 1212 customer” 9
  • 10. HBase Data Model : Physical Every cell stored with row, family, column and timestamp Allows fast lookup with low copy overhead BUT Space inefficient (optional compression available) and inefficient to scan “key_1” “cf_a” “c_i” 15 “foo” “key_1” “cf_a” “c_ii” 15 “bar” “key_2” “cf_a” “c_ii” 4 “baz” 10
  • 11. Hive Data Model : Physical Depends on the underlying storage files Can use flat text files, RCFiles, even use HBase for storage Standard Row Storage C_1 C_2 C_3 C_4 11 12 13 14 21 22 23 24 31 32 33 34 41 42 43 44 51 52 53 54 11
  • 12. Hive Data Model : RCFile Break into row groups, and then store as columns Row Group 1 C_1 11 21 31 C_2 12 22 32 C_3 13 23 33 C_4 14 24 34 Row Group 2 C_1 41 51 C_2 42 52 C_3 43 53 C_4 44 54 12
  • 13. Informal Performance Comparison Hive HBase Insert Speed batch Fast! Update Speed NA Fast! Lookup speed MR lower bound Fast! (10s of seconds) Data warehouse 15x faster on one Uh oh queries test 13

Editor's Notes

  • #2: Not about HadaptBe inclusive of beginnersBe brief
  • #3: Not a religious presentation – different systems have different properties that work for different needs
  • #14: 10 GB tpc_h dataCDH3B3 hive and HBaseSingle node desktop workstation, 4 cores, 8GB, a few drives