H base vs hive srp vs analytics 2-14-2012

HBase vs. Hive

Philip Wickline
Chief Technology Officer
Hadapt

Goals

Brief introduction to the differences between
transactional/operational and analytical systems

Understand when to use Hive and when to use HBase and why

2

Differences of Purpose : “Transaction Processing”
Operational systems
• Optimized for small short random access – reads and writes
• E.g. record that an employee invested $100 in a S&P500 index
fund in his 401(k) *or* record that a user posted something on
another users “wall”

Traditional DB examples
• Oracle
• MySQL
NoSQL Examples
• HBase
• MongoDB
• Cassandra
5

Differences of Purpose: Analytics
Analytics
• Optimized for read-only computations about large amounts of
data
• E.g. compute the average amount invested in bond funds and
stock funds for all employees at all employers over the last 5
years 10
5
0 5-10

DB Examples Option 1 0-5

• Netezza
• Vertica
16
14
12 Option 1
NoSQL Examples 10
8 Plan Acme
6
• Hive Actual GM
4
Newco
2

• Pig 0 Oldco
Oct Nov Dec Jan Feb Mar Bigcorp

6

HBase Data Model : Conceptual

From the BigTable paper:
“a sparse, distributed, persistent multi-dimensional sorted map”

(row : bytestring, column family : bytestring, column : bytestring,
time : int64) -> byte string

7

HBase Map
{ ”key_1" : {
”columnfamily_a" : {
”column_i" : {
15 : "y",
4 : "m"
},
”column_ii" : {
15 : "d”,
}},
“columnfamily_b" : {
”column_other" : {
6 : "w"
3 : "o"
1 : "w”
}}}}
8

Hive Data Model : Conceptual
Traditional Relational Tables

CUSTKEY NAME ADDRESS NATIONKEY PHONE ACCTBAL COMMENT
451234 NEWC 196 1 111-555- $1,231,285 NULL
ORP Broadway 1212
…
887765 ACME 1 Main st. 2 222-555- $46,945 “Top
… 1212 customer”

9

HBase Data Model : Physical

Every cell stored with row, family, column and timestamp
Allows fast lookup with low copy overhead
BUT
Space inefficient (optional compression available) and inefficient
to scan

“key_1” “cf_a” “c_i” 15 “foo”
“key_1” “cf_a” “c_ii” 15 “bar”
“key_2” “cf_a” “c_ii” 4 “baz”

10

Hive Data Model : Physical
Depends on the underlying storage files
Can use flat text files, RCFiles, even use HBase for storage

Standard Row Storage

C_1 C_2 C_3 C_4
11 12 13 14
21 22 23 24
31 32 33 34
41 42 43 44
51 52 53 54

11

Hive Data Model : RCFile
Break into row groups, and then store as columns

Row Group 1
C_1 11 21 31
C_2 12 22 32
C_3 13 23 33
C_4 14 24 34

Row Group 2
C_1 41 51
C_2 42 52
C_3 43 53
C_4 44 54

12

Informal Performance Comparison

Hive HBase
Insert Speed batch Fast!
Update Speed NA Fast!
Lookup speed MR lower bound Fast!
(10s of seconds)
Data warehouse 15x faster on one Uh oh
queries test

13

H base vs hive srp vs analytics 2-14-2012

More Related Content

What's hot (20)

Similar to H base vs hive srp vs analytics 2-14-2012 (20)

Recently uploaded (20)

H base vs hive srp vs analytics 2-14-2012

Editor's Notes