HBase at Flurry

 History
 Stats
 HowWe Store Data
 Challenges
 MistakesWe Made
 Tips / Patterns
 Future
 Moral of the Story

 2008 –Flurry Analytics for MobileApps
 Sharded MySQL, or
 HBase!
 Launched on 0.18.1 with a 3 node cluster
 Great community
 Now running 0.94.5 (+ patches)
 2 data centers with 2 clusters each
 Bidirectional replication

 1000 slave nodes per cluster
 32 GB RAM, 4 drives (1 or 2TB), 1 GigE, dual quad-
core * 2 HT = 16 procs
 DataNode,TaskTracker, RegionServer
(11GB), 5 Mappers, 2 Reducers
 ~30 tables, 250k regions, 430TB (after LZO)
 2 big tables are about 90% of that
▪ 1 wide table: 3 CF, 4 billion rows, up to 1MM cells per row
▪ 1 tall table: 1 CF, 1 trillion rows, most 1 cell per row

 12 physical nodes
 5 region servers with 20GB heaps on each
 1 table - 8 billion small rows - 500GB (LZO)
 All in block cache (after 20 minute warmup)
 100k-1MM QPS - 99.9% Reads
 2ms mean, 99% <10ms
 25 ms GC pause every 40 seconds
 slow after compaction

 DAO for Java apps
 Requires:
▪ writeRowIndex / readRowIndex
▪ readKeyValue / writeRowContents
 Provides:
▪ save / delete
▪ streamEntities / pagination
▪ MR input formats on entities (rather than Result)
 Uses HTable or asynchbase

 Change row key format
 DAO supports both formats
1. Create new table
2. Writes to both
3. Migrate existing
4. Validate
5. Reads to new table
6. Write to (only) new table
7. Drop old table

 Bottlenecks (not horizontally scalable)
 HMaster (e.g. HLog cleaning falls behind creation
[HBASE-9208])
 NameNode
▪ Disable table / shutdown => many HDFS files at once
▪ Scan table directory => slow region assignments
 ZooKeeper (HBase replication)
 JobTracker (heap)
 META region

 Too many regions (250k)
 Max size 256M -> 1 GB -> 5 GB
 Slow reassignments on failure
 Slow hbck recovery
 Lots of META queries / big client cache
▪ Soft refs can exacerbate
 Slow rolling restarts
 More failures (Common and otherwise)
 Zombie RS

 Latency long tail
 HTable Flush write buffer
 GC pauses
 RegionServer failure
 (SeeTheTail at Scale – Jeff Dean, Luiz André Barroso)

 Shared cluster for MapReduce and live
queries
 IO bound requests hog handler threads
 Even cached reads get slow
 RegionServer falls behind, stays behind
 If the cluster goes down, it takes awhile to come
back

 HDFS-5042 Completed files lost after power failure
 ZOOKEEPER-1277 servers stop serving when lower 32bits of
zxid roll over
 ZOOKEEPER-1731 Unsynchronized access to
ServerCnxnFactory.connectionBeans results in deadlock

 Small region size -> many regions
 Nagle’s
 Trying to solve a crisis you don’t understand
(hbck fixSplitParents)
 Setting up replication
 Custom backup / restore
 CopyTable OOM
 Verification

 Compact data matters (even with
compression)
 Block cache, network not compressed
 Avoid random reads on non cached tables (duh!)
 Write cell fragments, combine at read time to
avoid doing random reads
 compact later - coprocessor?
 can lead to large rows
▪ probabilistic counter

 HDFS HA
 Snapshots (see how it works with 100k
regions on 1000 servers)
 2000 node clusters
 test those bottlenecks
 larger regions, larger HDFS blocks, larger HLogs
 More (independent) clusters
 Load aware balancing?
 Separate RPC priorities for workloads
 0.96

 Scaled 1000x and more on the same DB
 If you’re on the edge you need to understand
your system
 Monitor
 Open Source
 Load test
 Know your load
 Disk or Cache (or SSDs?)

HBase at Flurry

More Related Content

What's hot (20)

Similar to HBase at Flurry (20)

Recently uploaded (20)

HBase at Flurry