SlideShare a Scribd company logo
Optimizing HBase for Cloud
Storage in Microsoft Azure
HDInsight
Nitin Verma, Pravin Mittal, Maxim Lukiyanov
May 24th 2016, HBaseCon 2016
About Us
Nitin Verma
Senior Software Development Engineer – Microsoft, Big Data Platform
Contact: nitinver@microsoft
Pravin Mittal
Principal Software Engineering Manager – Microsoft, Big Data
Contact: pravinm@microsoft
Maxim Lukiyanov
Senior Program Manager – Microsoft, Big Data Platform
Contact: maxluk@microsoft
Outline
 Overview of HBase Service in HDInsight
 Customer Case Study
 Performance Debugging
 Key Takeaways
What is HDInsight HBase Service
 On demand cluster with few clicks
 Out of the box performance
 Supports both Linux & Windows
 Enterprise SLA of 99.9% availability
 Active Health Monitoring via
Telemetry
 24/7 Customer Support
Unique Features
 Storage is decoupled from compute
 Flexibility to scale-out and scale-in
 Write/read unlimited amount of data
irrespective of cluster size
 Data is preserved and accessible
even when cluster is down or deleted
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight








Azure Data Lake Storage: Built For Cloud
Maxim Lukiyanov, Ashit Gosalia7
Secure Must be highly secure to prevent unauthorized access (especially as all data is in one place).
Native format Must permit data to be stored in its ‘native format’ to track lineage and for data provenance.
Low latency Must have low latency for high-frequency operations.
Must support multiple analytic frameworks—Batch, Real-time, Streaming, Machine Learning, etc.
No one analytic framework can work for all data and all types of analysis.
Multiple analytic
frameworks
Details Must be able to store data with all details; aggregation may lead to loss of details.
Throughput Must have high throughput for massively parallel processing via frameworks such as Hadoop and Spark.
Reliable Must be highly available and reliable (no permanent loss of data).
Scalable Must be highly scalable. When storing all data indefinitely, data volumes can quickly add up.
All sources Must be able ingest data from a variety of sources-LOB/ERP, Logs, Devices, Social NWs etc.
Customer Case Study and Performance
Optimization
Microsoft’s Real Time Analytics Platform
 Modern self-service telemetry
platform
 Near real-time analytics
 Product health and user engagement
monitoring with custom dashboards
 Performs large-scale indexing on
HDInsight Hbase
4.01 million
EVENTS PER SECOND AT PEAK
12.8 petabytes
INGESTION PER MONTH
>500 million
WEEKLY UNIQUE DEVICES AND MACHINES
450 + 2600
PRODUCTION + INT/SANDBOX
SELF-SERVE TENANTS
__________________________________________
1,600
STORAGE ACCOUNTS
500,000
AZURE STORAGE TRANSACTIONS / SEC
0
20
40
60
80
100
Feb-21 Feb-22 Feb-23
TBingress/hr
Azure Storage traffic
0
1000
2000
3000
4000
5000
6000
7000
Feb-21 Feb-22 Feb-23
Millionstransactions/hr
Table Blob Queue
Results of Early HBase Evaluation
 Customer had very high throughput need for key-value store
 Performance was ~10X lower than their requirement
 Bigger concern: Throughput didn’t scale from 15 -> 30 nodes
Developing a Strategy
Understand the architecture
Run the workload
Collect Metrics & Profile
Profile relevant components
Make performance fixes
Isolate/divide the problem (unit test)
Reproduce at lower scale
Fixed?
Identify Performance Bottlenecks
 Automation can save time
YES
Iterative
process
Pipelineofdataingestion
VM VM VM VM
VM VM VM VM
Data Ingestion Client App [PaaS]
Multiple Storage Accounts and Queues
300 VM’s
VM VM HDI Gateway &
Load Balancer
REST SERVERS
REGION SERVERS
HBASE CLUSTER
Cloud
Storage30 x large
worker nodes
1000+ cores
Medium Latency
High Bandwidth
REST REQUEST
Batch Size = 1000
Row Size = 1400 bytes
Initial Iterations
1. High CPU utilization with REST being top consumer
2. GZ compression was turned ON in REST
3. Heavy GC activity on REST and REGION processes
4. Starvation of REGION process by REST [busy wait for network IO]
 Throughput improved by 10-30% after each iteration
Initial Iterations (contd.)
5. REST server threads waiting on network IO
Collected TCP dump on all the nodes of cluster
REST
REGION SERVER 1
REGION SERVER 2
REGION SERVER 3
REGION SERVER 30
BATCH
 REST server was fanning-out batch
request to all the region servers
 Slowest region server governed the
throughput
 Used SALT_BUCKET scheme to
improve the locality
SLOWEST
Insight from tcpdump
Improvement
 Throughput improved by 2.75X
 Measurement window = ~72 hours
 Avg. Cluster CPU utilization = ~60%
 But no scaling from 30 node to 60
node cluster 
 Time to get back to the architecture
Pipelineofdataingestion
VM VM VM VM
VM VM VM VM
Data Ingestion Client App [PaaS]
Multiple Storage Accounts and Queues
300 VM’s
VM VM HDI Gateway &
Load Balancer
REST SERVERS
REGION SERVERS
HBASE CLUSTER
WASB
60 x large
worker nodes
1000+ cores
Medium Latency
High Bandwidth
REST REQUEST
Batch Size = 1000
Row Size = 1400 bytes
Could GW be a
Bottleneck at
Such high ingestion
rate?
We Had Gateway Bottleneck
And the guess was right!!
 Collected perfmon data on GW nodes
 Core#0 was 100% busy
 RSS is a trick to balance the DPC’s
 Performance improved but not significant
 Both CPU and networking was a
bottleneck
 Time to scale-up the gateway VM size
Configuring private gateway
 We provisioned custom gateway on large VM’s using NginX
 We confirmed that gateway issue was indeed fixed,
 Throughput problem was still not solved and continued to give us
new puzzles
20
Pipelineofdataingestion
VM VM VM VM
VM VM VM VM
Data Ingestion Client App [PaaS]
Multiple Storage Accounts and Queues
300 VM’s
VM VM
Programmed
NGINX as Gateway
and Load Balancer
REST SERVERS
REGION SERVERS
HBASE CLUSTER
WASB
60 x D14
worker nodes
1040 cores
Could customer app
be a
Bottleneck?
21
Pipelineofdataingestion
VM VM VM VM
VM VM VM VM
Data Ingestion Client App [PaaS]
Multiple Storage Accounts and Queues
300 VM’s
VM VM
Programmed
NGINX as Gateway
and Load Balancer
REST SERVERS
REGION SERVERS
HBASE CLUSTER
WASB
60 x D14
worker nodes
1040 cores
DISCONNECTED
RETURN 200
New strategy
 We divided the data pipeline into two parts and debugged them in
isolation
1) Client  Gateway [solved]
2) Rest  Region  WASB [unsolved]
 For fast turn-around, we decided to use YCSB for debugging #2
 We configured YCSB with characteristics of customer’s workload
 We ran YCSB locally inside HBase cluster
22
YCSB Experiments
 We had suspicion on one of the following two:
1) REST
2) Azure Storage
 We isolated the problem by replacing Azure Storage with local SSDs
 We then compared the performance of REST v/s RPC
 Results:
 REST was clearly a bottleneck!
23
YCSB Experiments (contd.)
 Root cause of bottleneck in REST:
• Profiling the REST Servers uncovered multiple threads that were blocked on
INFO/DEBUG logging.
• Limiting the logging to WARNING/ERROR level dramatically improved the REST
server performance and brought it very close to RPC.
Sample Stack:
Thread 11540: (state = BLOCKED)
- org.apache.log4j.Category.callAppenders(org.apache.log4j.spi.LoggingEvent) @bci=12, line=204 (Compiled frame)
- org.apache.log4j.Category.forcedLog(java.lang.String, org.apache.log4j.Priority, java.lang.Object, java.lang.Throwable) @bci=14, line=391 (Compiled frame)
- org.apache.log4j.Category.log(java.lang.String, org.apache.log4j.Priority, java.lang.Object, java.lang.Throwable) @bci=34, line=856 (Compiled frame)
- org.apache.commons.logging.impl.Log4JLogger.debug(java.lang.Object) @bci=12, line=155 (Compiled frame)
- org.apache.hadoop.hbase.rest.RowResource.update(org.apache.hadoop.hbase.rest.model.CellSetModel, boolean) @bci=580, line=225 (Compiled frame)
- org.apache.hadoop.hbase.rest.RowResource.put(org.apache.hadoop.hbase.rest.model.CellSetModel, javax.ws.rs.core.UriInfo) @bci=60, line=318 (Interpreted frame)
- sun.reflect.GeneratedMethodAccessor27.invoke(java.lang.Object, java.lang.Object[]) @bci=48 (Interpreted frame)
- sun.reflect.DelegatingMethodAccessorImpl.invoke(java.lang.Object, java.lang.Object[]) @bci=6, line=43 (Compiled frame)
- java.lang.reflect.Method.invoke(java.lang.Object, java.lang.Object[]) @bci=57, line=606 (Compiled frame)
- com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(java.lang.reflect.Method, java.lang.Object, java.lang.Object[]) @bci=3, line=60 (Interpreted frame)
- com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(java.lang.Object, com.sun.jersey.api.core.HttpContext)
@bci=16, line=205 (Interpreted frame)
YCSB Experiments (contd.)
 RPC v/s REST after fixing INFO message logging
 We could saturate the SSD performance at 160K requests/sec
throughput
 This confirmed that the bottleneck in REST server was solved
25
Back to Customer Workload
 After limiting logging level to WARN, throughput improved further by
~5.5X
 This was ~15X gain from the point where we started
 Customer is happy and use HDInsight Hbase service in production
 They are able to meet the throughput goals with enough margin to
scale further
26
Tools Utilized
27
Category Tools on Windows Tools on Linux for Java Process
System Counters: CPU, Memory, IO,
Process etc.
Perfmon mpstat, iostat, vmstat, sar, nload,
glances
Networking tcpdump Tcpdump
CPU Profiling kernrate, f1 sample, xperf YourKit, jvmtop, jprof
CPU blocking issues Xperf, concurrency visualizer, ppa Jstack
Debugging Large Clusters powershell, python expect, bash, awk, python, screen,
expect
New performance features in
HBase
28
Overcoming Storage Latency
 HBase now has MultiWAL and BucketCaching features
 Made to minimize the impact of high storage latency
 Parallelism and batching are the keys to hide write latency (MultiWAL)
 MultiWAL gives higher throughput with lower number of region nodes
 We achieve 500K inserts/sec with just 8 small region nodes for an IoT
customer
29
Overcoming Storage Latency (contd.)
 What about read latency?
 Caching and Read-Ahead are the keys to overcome read latency
 Cache on write helps application that are temporal in nature
 HDInsight VM’s are backed with SSD’s
 BucketCaching feature can utilize SSD as L2 cache
 BucketCaching gives ~20X-30X gain in read performance to our
customers
Conclusion
 The performance issue was quite complex, where bottlenecks were
hiding at several layers and components in the pipeline
 Deeper engagement with customers helped in optimizing HDInsight
HBase service
 HDI Team has been actively productizing performance fixes
 ADLS, MultiWAL and BucketCache help in minimizing the latency impact
31
Thank You!
32

More Related Content

What's hot (20)

PPTX
HBaseCon 2012 | HBase, the Use Case in eBay Cassini
Cloudera, Inc.
 
PDF
HBaseCon 2015: HBase Operations at Xiaomi
HBaseCon
 
PDF
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
Cloudera, Inc.
 
PDF
Meet HBase 1.0
enissoz
 
PPTX
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
Cloudera, Inc.
 
PPTX
Time-Series Apache HBase
HBaseCon
 
PPTX
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
Cloudera, Inc.
 
PPTX
Digital Library Collection Management using HBase
HBaseCon
 
PPTX
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...
Cloudera, Inc.
 
PPTX
NoSQL: Cassadra vs. HBase
Antonio Severien
 
PDF
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
Michael Stack
 
PPTX
HBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket
Cloudera, Inc.
 
PPTX
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
HBaseCon
 
PPTX
HBaseCon 2015: Optimizing HBase for the Cloud in Microsoft Azure HDInsight
HBaseCon
 
PDF
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase
Cloudera, Inc.
 
PDF
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBaseCon
 
PDF
HBaseCon 2013: Apache HBase at Pinterest - Scaling Our Feed Storage
Cloudera, Inc.
 
PPTX
HBase Data Modeling and Access Patterns with Kite SDK
HBaseCon
 
PDF
HBaseCon 2015: HBase at Scale in an Online and High-Demand Environment
HBaseCon
 
PPTX
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
Cloudera, Inc.
 
HBaseCon 2012 | HBase, the Use Case in eBay Cassini
Cloudera, Inc.
 
HBaseCon 2015: HBase Operations at Xiaomi
HBaseCon
 
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
Cloudera, Inc.
 
Meet HBase 1.0
enissoz
 
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
Cloudera, Inc.
 
Time-Series Apache HBase
HBaseCon
 
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
Cloudera, Inc.
 
Digital Library Collection Management using HBase
HBaseCon
 
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...
Cloudera, Inc.
 
NoSQL: Cassadra vs. HBase
Antonio Severien
 
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
Michael Stack
 
HBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket
Cloudera, Inc.
 
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
HBaseCon
 
HBaseCon 2015: Optimizing HBase for the Cloud in Microsoft Azure HDInsight
HBaseCon
 
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase
Cloudera, Inc.
 
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBaseCon
 
HBaseCon 2013: Apache HBase at Pinterest - Scaling Our Feed Storage
Cloudera, Inc.
 
HBase Data Modeling and Access Patterns with Kite SDK
HBaseCon
 
HBaseCon 2015: HBase at Scale in an Online and High-Demand Environment
HBaseCon
 
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
Cloudera, Inc.
 

Viewers also liked (20)

PPTX
Apache HBase at Airbnb
HBaseCon
 
PDF
Apache HBase - Just the Basics
HBaseCon
 
PDF
Apache HBase Improvements and Practices at Xiaomi
HBaseCon
 
PDF
Improvements to Apache HBase and Its Applications in Alibaba Search
HBaseCon
 
PDF
Breaking the Sound Barrier with Persistent Memory
HBaseCon
 
PDF
Argus Production Monitoring at Salesforce
HBaseCon
 
PPTX
Keynote: Apache HBase at Yahoo! Scale
HBaseCon
 
PPTX
Rolling Out Apache HBase for Mobile Offerings at Visa
HBaseCon
 
PPTX
Apache HBase, Accelerated: In-Memory Flush and Compaction
HBaseCon
 
PPTX
Update on OpenTSDB and AsyncHBase
HBaseCon
 
PPTX
Introduction to PolyBase
James Serra
 
PPTX
Keynote: Welcome Message/State of Apache HBase
HBaseCon
 
PPTX
Design Patterns for Building 360-degree Views with HBase and Kiji
HBaseCon
 
PDF
Architecting big data solutions in the cloud
Mostafa
 
PDF
HBaseCon 2015: HBase @ Flipboard
HBaseCon
 
PDF
Tales from Taming the Long Tail
HBaseCon
 
PPTX
Update on OpenTSDB and AsyncHBase
HBaseCon
 
PPTX
HBase: Just the Basics
HBaseCon
 
PDF
Solving Multi-tenancy and G1GC in Apache HBase
HBaseCon
 
PDF
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
HBaseCon
 
Apache HBase at Airbnb
HBaseCon
 
Apache HBase - Just the Basics
HBaseCon
 
Apache HBase Improvements and Practices at Xiaomi
HBaseCon
 
Improvements to Apache HBase and Its Applications in Alibaba Search
HBaseCon
 
Breaking the Sound Barrier with Persistent Memory
HBaseCon
 
Argus Production Monitoring at Salesforce
HBaseCon
 
Keynote: Apache HBase at Yahoo! Scale
HBaseCon
 
Rolling Out Apache HBase for Mobile Offerings at Visa
HBaseCon
 
Apache HBase, Accelerated: In-Memory Flush and Compaction
HBaseCon
 
Update on OpenTSDB and AsyncHBase
HBaseCon
 
Introduction to PolyBase
James Serra
 
Keynote: Welcome Message/State of Apache HBase
HBaseCon
 
Design Patterns for Building 360-degree Views with HBase and Kiji
HBaseCon
 
Architecting big data solutions in the cloud
Mostafa
 
HBaseCon 2015: HBase @ Flipboard
HBaseCon
 
Tales from Taming the Long Tail
HBaseCon
 
Update on OpenTSDB and AsyncHBase
HBaseCon
 
HBase: Just the Basics
HBaseCon
 
Solving Multi-tenancy and G1GC in Apache HBase
HBaseCon
 
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
HBaseCon
 
Ad

Similar to Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight (20)

PPTX
High-speed, Reactive Microservices 2017
Rick Hightower
 
PDF
MySQL High Availability and Disaster Recovery with Continuent, a VMware company
Continuent
 
PPTX
How leading financial services organisations are winning with tech
MongoDB
 
PDF
TechTalkThai-CiscoHyperFlex
Jarut Nakaramaleerat
 
PPTX
High-Speed Reactive Microservices - trials and tribulations
Rick Hightower
 
PPTX
Introduction To Cloud Computing
Rinat Shagisultanov
 
PDF
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
Jamie Kinney
 
PDF
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
ScyllaDB
 
PDF
Webinar Slides: Geo-Scale MySQL in AWS
Continuent
 
PDF
Azure Databases for PostgreSQL, MySQL and MariaDB
rockplace
 
PDF
brocade-virtual-adx-ds
Dimitris Antonellis
 
PPTX
EEDC 2010. Scaling SaaS Applications
Expertos en TI
 
PDF
GECon2017_High-volume data streaming in azure_ Aliaksandr Laisha
GECon_Org Team
 
PDF
Keynote sp summit 2014 final
Amazon Web Services LATAM
 
PDF
Majid_Jalili_SRC_2014
Majid Jalili
 
PPTX
Database as a Service - Tutorial @ICDE 2010
DBIS @ Ilmenau University of Technology
 
PDF
Optimising Service Deployment and Infrastructure Resource Configuration
RECAP Project
 
PDF
VMworld 2013: How to Replace Websphere Application Server (WAS) with TCserver
VMworld
 
PPTX
Improve Customer Experience with Multi CDN Solution
Cloudxchange.io
 
PDF
VMworld 2013: vCloud Powered HPC is Better and Outperforming Physical
VMworld
 
High-speed, Reactive Microservices 2017
Rick Hightower
 
MySQL High Availability and Disaster Recovery with Continuent, a VMware company
Continuent
 
How leading financial services organisations are winning with tech
MongoDB
 
TechTalkThai-CiscoHyperFlex
Jarut Nakaramaleerat
 
High-Speed Reactive Microservices - trials and tribulations
Rick Hightower
 
Introduction To Cloud Computing
Rinat Shagisultanov
 
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
Jamie Kinney
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
ScyllaDB
 
Webinar Slides: Geo-Scale MySQL in AWS
Continuent
 
Azure Databases for PostgreSQL, MySQL and MariaDB
rockplace
 
brocade-virtual-adx-ds
Dimitris Antonellis
 
EEDC 2010. Scaling SaaS Applications
Expertos en TI
 
GECon2017_High-volume data streaming in azure_ Aliaksandr Laisha
GECon_Org Team
 
Keynote sp summit 2014 final
Amazon Web Services LATAM
 
Majid_Jalili_SRC_2014
Majid Jalili
 
Database as a Service - Tutorial @ICDE 2010
DBIS @ Ilmenau University of Technology
 
Optimising Service Deployment and Infrastructure Resource Configuration
RECAP Project
 
VMworld 2013: How to Replace Websphere Application Server (WAS) with TCserver
VMworld
 
Improve Customer Experience with Multi CDN Solution
Cloudxchange.io
 
VMworld 2013: vCloud Powered HPC is Better and Outperforming Physical
VMworld
 
Ad

More from HBaseCon (20)

PDF
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
HBaseCon
 
PDF
hbaseconasia2017: HBase on Beam
HBaseCon
 
PDF
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
HBaseCon
 
PDF
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon
 
PDF
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
HBaseCon
 
PDF
hbaseconasia2017: Apache HBase at Netease
HBaseCon
 
PDF
hbaseconasia2017: HBase在Hulu的使用和实践
HBaseCon
 
PDF
hbaseconasia2017: 基于HBase的企业级大数据平台
HBaseCon
 
PDF
hbaseconasia2017: HBase at JD.com
HBaseCon
 
PDF
hbaseconasia2017: Large scale data near-line loading method and architecture
HBaseCon
 
PDF
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
HBaseCon
 
PDF
hbaseconasia2017: HBase Practice At XiaoMi
HBaseCon
 
PDF
hbaseconasia2017: hbase-2.0.0
HBaseCon
 
PDF
HBaseCon2017 Democratizing HBase
HBaseCon
 
PDF
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon
 
PDF
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon
 
PDF
HBaseCon2017 Transactions in HBase
HBaseCon
 
PDF
HBaseCon2017 Highly-Available HBase
HBaseCon
 
PDF
HBaseCon2017 Apache HBase at Didi
HBaseCon
 
PDF
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon
 
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
HBaseCon
 
hbaseconasia2017: HBase on Beam
HBaseCon
 
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
HBaseCon
 
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon
 
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
HBaseCon
 
hbaseconasia2017: Apache HBase at Netease
HBaseCon
 
hbaseconasia2017: HBase在Hulu的使用和实践
HBaseCon
 
hbaseconasia2017: 基于HBase的企业级大数据平台
HBaseCon
 
hbaseconasia2017: HBase at JD.com
HBaseCon
 
hbaseconasia2017: Large scale data near-line loading method and architecture
HBaseCon
 
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
HBaseCon
 
hbaseconasia2017: HBase Practice At XiaoMi
HBaseCon
 
hbaseconasia2017: hbase-2.0.0
HBaseCon
 
HBaseCon2017 Democratizing HBase
HBaseCon
 
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon
 
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon
 
HBaseCon2017 Transactions in HBase
HBaseCon
 
HBaseCon2017 Highly-Available HBase
HBaseCon
 
HBaseCon2017 Apache HBase at Didi
HBaseCon
 
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon
 

Recently uploaded (20)

PDF
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
PPTX
Home Care Tools: Benefits, features and more
Third Rock Techkno
 
PPTX
Tally_Basic_Operations_Presentation.pptx
AditiBansal54083
 
PDF
SAP Firmaya İade ABAB Kodları - ABAB ile yazılmıl hazır kod örneği
Salih Küçük
 
PPTX
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
PDF
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
PPTX
Migrating Millions of Users with Debezium, Apache Kafka, and an Acyclic Synch...
MD Sayem Ahmed
 
PDF
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
PDF
[Solution] Why Choose the VeryPDF DRM Protector Custom-Built Solution for You...
Lingwen1998
 
PDF
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
PDF
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
PDF
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
PDF
The 5 Reasons for IT Maintenance - Arna Softech
Arna Softech
 
PDF
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
PPTX
Transforming Mining & Engineering Operations with Odoo ERP | Streamline Proje...
SatishKumar2651
 
PPTX
Human Resources Information System (HRIS)
Amity University, Patna
 
PDF
Open Chain Q2 Steering Committee Meeting - 2025-06-25
Shane Coughlan
 
PPTX
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
PDF
Top Agile Project Management Tools for Teams in 2025
Orangescrum
 
PDF
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
Home Care Tools: Benefits, features and more
Third Rock Techkno
 
Tally_Basic_Operations_Presentation.pptx
AditiBansal54083
 
SAP Firmaya İade ABAB Kodları - ABAB ile yazılmıl hazır kod örneği
Salih Küçük
 
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
Migrating Millions of Users with Debezium, Apache Kafka, and an Acyclic Synch...
MD Sayem Ahmed
 
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
[Solution] Why Choose the VeryPDF DRM Protector Custom-Built Solution for You...
Lingwen1998
 
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
The 5 Reasons for IT Maintenance - Arna Softech
Arna Softech
 
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
Transforming Mining & Engineering Operations with Odoo ERP | Streamline Proje...
SatishKumar2651
 
Human Resources Information System (HRIS)
Amity University, Patna
 
Open Chain Q2 Steering Committee Meeting - 2025-06-25
Shane Coughlan
 
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
Top Agile Project Management Tools for Teams in 2025
Orangescrum
 
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 

Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight

  • 1. Optimizing HBase for Cloud Storage in Microsoft Azure HDInsight Nitin Verma, Pravin Mittal, Maxim Lukiyanov May 24th 2016, HBaseCon 2016
  • 2. About Us Nitin Verma Senior Software Development Engineer – Microsoft, Big Data Platform Contact: nitinver@microsoft Pravin Mittal Principal Software Engineering Manager – Microsoft, Big Data Contact: pravinm@microsoft Maxim Lukiyanov Senior Program Manager – Microsoft, Big Data Platform Contact: maxluk@microsoft
  • 3. Outline  Overview of HBase Service in HDInsight  Customer Case Study  Performance Debugging  Key Takeaways
  • 4. What is HDInsight HBase Service  On demand cluster with few clicks  Out of the box performance  Supports both Linux & Windows  Enterprise SLA of 99.9% availability  Active Health Monitoring via Telemetry  24/7 Customer Support Unique Features  Storage is decoupled from compute  Flexibility to scale-out and scale-in  Write/read unlimited amount of data irrespective of cluster size  Data is preserved and accessible even when cluster is down or deleted
  • 7. Azure Data Lake Storage: Built For Cloud Maxim Lukiyanov, Ashit Gosalia7 Secure Must be highly secure to prevent unauthorized access (especially as all data is in one place). Native format Must permit data to be stored in its ‘native format’ to track lineage and for data provenance. Low latency Must have low latency for high-frequency operations. Must support multiple analytic frameworks—Batch, Real-time, Streaming, Machine Learning, etc. No one analytic framework can work for all data and all types of analysis. Multiple analytic frameworks Details Must be able to store data with all details; aggregation may lead to loss of details. Throughput Must have high throughput for massively parallel processing via frameworks such as Hadoop and Spark. Reliable Must be highly available and reliable (no permanent loss of data). Scalable Must be highly scalable. When storing all data indefinitely, data volumes can quickly add up. All sources Must be able ingest data from a variety of sources-LOB/ERP, Logs, Devices, Social NWs etc.
  • 8. Customer Case Study and Performance Optimization
  • 9. Microsoft’s Real Time Analytics Platform  Modern self-service telemetry platform  Near real-time analytics  Product health and user engagement monitoring with custom dashboards  Performs large-scale indexing on HDInsight Hbase
  • 10. 4.01 million EVENTS PER SECOND AT PEAK 12.8 petabytes INGESTION PER MONTH >500 million WEEKLY UNIQUE DEVICES AND MACHINES 450 + 2600 PRODUCTION + INT/SANDBOX SELF-SERVE TENANTS __________________________________________ 1,600 STORAGE ACCOUNTS 500,000 AZURE STORAGE TRANSACTIONS / SEC 0 20 40 60 80 100 Feb-21 Feb-22 Feb-23 TBingress/hr Azure Storage traffic 0 1000 2000 3000 4000 5000 6000 7000 Feb-21 Feb-22 Feb-23 Millionstransactions/hr Table Blob Queue
  • 11. Results of Early HBase Evaluation  Customer had very high throughput need for key-value store  Performance was ~10X lower than their requirement  Bigger concern: Throughput didn’t scale from 15 -> 30 nodes
  • 12. Developing a Strategy Understand the architecture Run the workload Collect Metrics & Profile Profile relevant components Make performance fixes Isolate/divide the problem (unit test) Reproduce at lower scale Fixed? Identify Performance Bottlenecks  Automation can save time YES Iterative process
  • 13. Pipelineofdataingestion VM VM VM VM VM VM VM VM Data Ingestion Client App [PaaS] Multiple Storage Accounts and Queues 300 VM’s VM VM HDI Gateway & Load Balancer REST SERVERS REGION SERVERS HBASE CLUSTER Cloud Storage30 x large worker nodes 1000+ cores Medium Latency High Bandwidth REST REQUEST Batch Size = 1000 Row Size = 1400 bytes
  • 14. Initial Iterations 1. High CPU utilization with REST being top consumer 2. GZ compression was turned ON in REST 3. Heavy GC activity on REST and REGION processes 4. Starvation of REGION process by REST [busy wait for network IO]  Throughput improved by 10-30% after each iteration
  • 15. Initial Iterations (contd.) 5. REST server threads waiting on network IO Collected TCP dump on all the nodes of cluster REST REGION SERVER 1 REGION SERVER 2 REGION SERVER 3 REGION SERVER 30 BATCH  REST server was fanning-out batch request to all the region servers  Slowest region server governed the throughput  Used SALT_BUCKET scheme to improve the locality SLOWEST Insight from tcpdump
  • 16. Improvement  Throughput improved by 2.75X  Measurement window = ~72 hours  Avg. Cluster CPU utilization = ~60%  But no scaling from 30 node to 60 node cluster   Time to get back to the architecture
  • 17. Pipelineofdataingestion VM VM VM VM VM VM VM VM Data Ingestion Client App [PaaS] Multiple Storage Accounts and Queues 300 VM’s VM VM HDI Gateway & Load Balancer REST SERVERS REGION SERVERS HBASE CLUSTER WASB 60 x large worker nodes 1000+ cores Medium Latency High Bandwidth REST REQUEST Batch Size = 1000 Row Size = 1400 bytes Could GW be a Bottleneck at Such high ingestion rate?
  • 18. We Had Gateway Bottleneck And the guess was right!!  Collected perfmon data on GW nodes  Core#0 was 100% busy  RSS is a trick to balance the DPC’s  Performance improved but not significant  Both CPU and networking was a bottleneck  Time to scale-up the gateway VM size
  • 19. Configuring private gateway  We provisioned custom gateway on large VM’s using NginX  We confirmed that gateway issue was indeed fixed,  Throughput problem was still not solved and continued to give us new puzzles
  • 20. 20 Pipelineofdataingestion VM VM VM VM VM VM VM VM Data Ingestion Client App [PaaS] Multiple Storage Accounts and Queues 300 VM’s VM VM Programmed NGINX as Gateway and Load Balancer REST SERVERS REGION SERVERS HBASE CLUSTER WASB 60 x D14 worker nodes 1040 cores Could customer app be a Bottleneck?
  • 21. 21 Pipelineofdataingestion VM VM VM VM VM VM VM VM Data Ingestion Client App [PaaS] Multiple Storage Accounts and Queues 300 VM’s VM VM Programmed NGINX as Gateway and Load Balancer REST SERVERS REGION SERVERS HBASE CLUSTER WASB 60 x D14 worker nodes 1040 cores DISCONNECTED RETURN 200
  • 22. New strategy  We divided the data pipeline into two parts and debugged them in isolation 1) Client  Gateway [solved] 2) Rest  Region  WASB [unsolved]  For fast turn-around, we decided to use YCSB for debugging #2  We configured YCSB with characteristics of customer’s workload  We ran YCSB locally inside HBase cluster 22
  • 23. YCSB Experiments  We had suspicion on one of the following two: 1) REST 2) Azure Storage  We isolated the problem by replacing Azure Storage with local SSDs  We then compared the performance of REST v/s RPC  Results:  REST was clearly a bottleneck! 23
  • 24. YCSB Experiments (contd.)  Root cause of bottleneck in REST: • Profiling the REST Servers uncovered multiple threads that were blocked on INFO/DEBUG logging. • Limiting the logging to WARNING/ERROR level dramatically improved the REST server performance and brought it very close to RPC. Sample Stack: Thread 11540: (state = BLOCKED) - org.apache.log4j.Category.callAppenders(org.apache.log4j.spi.LoggingEvent) @bci=12, line=204 (Compiled frame) - org.apache.log4j.Category.forcedLog(java.lang.String, org.apache.log4j.Priority, java.lang.Object, java.lang.Throwable) @bci=14, line=391 (Compiled frame) - org.apache.log4j.Category.log(java.lang.String, org.apache.log4j.Priority, java.lang.Object, java.lang.Throwable) @bci=34, line=856 (Compiled frame) - org.apache.commons.logging.impl.Log4JLogger.debug(java.lang.Object) @bci=12, line=155 (Compiled frame) - org.apache.hadoop.hbase.rest.RowResource.update(org.apache.hadoop.hbase.rest.model.CellSetModel, boolean) @bci=580, line=225 (Compiled frame) - org.apache.hadoop.hbase.rest.RowResource.put(org.apache.hadoop.hbase.rest.model.CellSetModel, javax.ws.rs.core.UriInfo) @bci=60, line=318 (Interpreted frame) - sun.reflect.GeneratedMethodAccessor27.invoke(java.lang.Object, java.lang.Object[]) @bci=48 (Interpreted frame) - sun.reflect.DelegatingMethodAccessorImpl.invoke(java.lang.Object, java.lang.Object[]) @bci=6, line=43 (Compiled frame) - java.lang.reflect.Method.invoke(java.lang.Object, java.lang.Object[]) @bci=57, line=606 (Compiled frame) - com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(java.lang.reflect.Method, java.lang.Object, java.lang.Object[]) @bci=3, line=60 (Interpreted frame) - com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(java.lang.Object, com.sun.jersey.api.core.HttpContext) @bci=16, line=205 (Interpreted frame)
  • 25. YCSB Experiments (contd.)  RPC v/s REST after fixing INFO message logging  We could saturate the SSD performance at 160K requests/sec throughput  This confirmed that the bottleneck in REST server was solved 25
  • 26. Back to Customer Workload  After limiting logging level to WARN, throughput improved further by ~5.5X  This was ~15X gain from the point where we started  Customer is happy and use HDInsight Hbase service in production  They are able to meet the throughput goals with enough margin to scale further 26
  • 27. Tools Utilized 27 Category Tools on Windows Tools on Linux for Java Process System Counters: CPU, Memory, IO, Process etc. Perfmon mpstat, iostat, vmstat, sar, nload, glances Networking tcpdump Tcpdump CPU Profiling kernrate, f1 sample, xperf YourKit, jvmtop, jprof CPU blocking issues Xperf, concurrency visualizer, ppa Jstack Debugging Large Clusters powershell, python expect, bash, awk, python, screen, expect
  • 29. Overcoming Storage Latency  HBase now has MultiWAL and BucketCaching features  Made to minimize the impact of high storage latency  Parallelism and batching are the keys to hide write latency (MultiWAL)  MultiWAL gives higher throughput with lower number of region nodes  We achieve 500K inserts/sec with just 8 small region nodes for an IoT customer 29
  • 30. Overcoming Storage Latency (contd.)  What about read latency?  Caching and Read-Ahead are the keys to overcome read latency  Cache on write helps application that are temporal in nature  HDInsight VM’s are backed with SSD’s  BucketCaching feature can utilize SSD as L2 cache  BucketCaching gives ~20X-30X gain in read performance to our customers
  • 31. Conclusion  The performance issue was quite complex, where bottlenecks were hiding at several layers and components in the pipeline  Deeper engagement with customers helped in optimizing HDInsight HBase service  HDI Team has been actively productizing performance fixes  ADLS, MultiWAL and BucketCache help in minimizing the latency impact 31

Editor's Notes

  • #13: Understand the architecture and overall pipeline of data movement Monitor the resource utilization of each layer in the pipeline Profile the components with high resource utilization and identify hotspots When resource utilization is low, identify blocking issues (if any) Divide and Conquer – Develop a strategy to isolate the components that could be culprit. Isolation makes debugging easier. Iterative Process!!
  • #15: Reproduced customer scenario with 30 worker nodes Collected system metrics (CPU, Memory, IO, etc.) on all the worker nodes Started our analysis with HBase CPU consumption was very high on nearly all REST servers We then profiled the REST servers and observed following Compression was ON by default (GZ filter) and was consuming ~70% CPU Heavy GC activity on REST and REGION servers. We had to tune certain GC related parameters REST Server busy wait for network IO’s. Bumping REGION server priority solved that issue Tools like YourKit and JVMTop helped in uncovering efficiency issues
  • #16: We noticed multiple threads in REST server waiting on network IO We performed a deep networking analysis using TCP Dump and uncovered the locality issue with the key REST server was fanning-out each batch request to almost all the region servers Overall throughput seemed to be governed by the slowest region server We used SALT_BUCKET scheme to improve the locality of batch requests
  • #19: At this high ingestion rate, we suspected HDI gateway being a bottleneck and confirmed it by collecting perfmon data on both the gateways Core#0 was ~100% on both the gateway nodes Fixing RSS helped, but we started hitting network throttling The network utilization on gateway nodes (A2 instances), surpassed Azure throttling limit
  • #21: The custom GW gave us ability to debug the ingestion bottlenecks from customer app From custom GW rules, we could directly return success without sending data to HBase cluster We identified a few occasions, where client app wasn’t sending enough load to Hbase After fixing scalability issues in the client application, it was able to send ~2 Gbps data to GW nodes But we couldn’t push 2 Gbps data into HBase cluster The next bottleneck was clearly in HBase
  • #22: The custom GW gave us ability to debug the ingestion bottlenecks from customer app From custom GW rules, we could directly return success without sending data to HBase cluster We identified scalability bottlenecks in the client app and fixed them with customer’s help The client application, was now able to send ~15X data to GW nodes But we couldn’t push that much into HBase cluster The next bottleneck was clearly in HBase