SlideShare a Scribd company logo
THE COMMUNITY EVENT FOR
APACHE HBASE™
BDS: A data synchronization
platform for HBase
熊嘉男(侧⽥田)
Ali-HBase 数据链路路负责⼈人
Requirement
• HBase support cross-version migration without downtime?
• HBase support data backup to OSS or other storage?
• HBase support replicate incremental data to MQ,ES,Solr?
• Replicate incremental data from RDS to HBase?
• HBase data can be archived to Spark cluster for offline analysis?
• HBase High Availability

…….
Challenges
• Table Structure transformation
• Real-time data replication
• Client double write
• HBase Replication
• Full data migration
• DataX
• CopyTable
• Create Snapshot & Export Snapshot
• Data consistency verification
HBase clusters Migration
• Cross-Version Migration
compatibility issues
• Impact on Business
• Lack of integrated solutions
Migration Step Defect
• Heterogeneous full data migration
• DataX
• Sqoop
• Heterogeneous Real-time Data Replication
• HBase Real-time Data export
• Custom Replication Endpoint
• Custom Replication Sink
Heterogeneous Data Transmission
BDS
&
&
• Master & Slave
• Stateless Slave
• Plugin-in mode
• Higher scalability and better performance
High-Level Architecture
Technical Detail
HBase full data migration
3
3
.
3
3
.
.
. 3
. 3
. 3
. 3
1 3
35
3 2 4 3
5
3 4 3
2
HBase full data migration
• Avoid the impact on business
• Only access HDFS
• Dynamic migration rate
• Decoupled from HBase
• One-click migration
• Create table automaticlly
• Perceive changes in region
• Perceive HFiles compaction
• Efficient
• 100MB/s (single node)
• Higher scalability
Data localization rate
DataNode1 DataNode2
HFile HFile
RegionServer
Region
Local	read remote	read
• Data migration takes the issue of
data localization rates into account
• Avoid low localization rate after data
migration
File split
HFile1
HFile2
HFile3
HFile4
HFile1
HFile2-1
HFile2-2
HFile3
HFile4
Split
Region1
Region2
Load • Migration will split HFiles
according to the partitions of the
original and target tables
• Increase the speed of bulkload
HBase Real-time Replication
&
&
Data pipeline
352 1162 5 3
4
• Using RingBuffer as a queue
• AckQueue maintains offset
• Write throughput support dynamic configuration
Impact on business
4
43
4
43
2 2
43
43
2
2
2 43 4 2 43 31
2 43 4 2 43 31
• Read and write affect data replication
HBASE Replication BDS
• Decoupled from HBase
• Only access HDFS
• Data Replication is not affected by HBase crash
Hotspot
4
43
4
43
2 2
43
43
2
2
2 43 4 2 43 31
2 43 4 2 43 31
2
3 2
1
1
1
HBASE Replication BDS
• Hotspot • Round robin scheduling
Replication backlog
2
3 2
1
1
1
BDS
• Add slave nodes
• Slave throughput support
dynamic configuration
增加Worker节点并发处理理⽇日志的数量量 增加AsyncWriter并发
Add Worker nodes
Operation and maintenance
•BDS
•Easy to expand
•Easy to upgrade
•monitor
•alarm mechanism
•HBase Replication
•Bug fix
•No alarm
•Configuration modification and
system upgrade requires RS to
restart
BDS in Ali-Cloud
Clusters Migration
High Availability
--
Data Backup
Archive data to Spark
1 0
RDS
About me
Thanks!

More Related Content

What's hot (20)

PDF
Rails on HBase
EffectiveUI
 
PPTX
Chicago Data Summit: Geo-based Content Processing Using HBase
Cloudera, Inc.
 
PDF
From 0 to syncing
Philipp Fehre
 
PPTX
Innovation with Connection, The new HPCC Systems Plugins and Modules
HPCC Systems
 
PDF
2016 may-countdown-to-postgres-v96-parallel-query
Ashnikbiz
 
PPTX
Installing Postgres on Linux
EDB
 
PPTX
Operationalizing Data Science Using Cloud Foundry
VMware Tanzu
 
PPTX
Powering GIS Application with PostgreSQL and Postgres Plus
Ashnikbiz
 
PDF
SAP OS/DB Migration using Azure Storage Account
Gary Jackson MBCS
 
PDF
Apachecon Europe 2012: Operating HBase - Things you need to know
Christian Gügi
 
PPTX
Trusted advisory on technology comparison --exadata, hana, db2
Ajay Kumar Uppal
 
PPTX
Apache geode
Yogesh BG
 
PPTX
X-DB Replication Server and MMR
Ashnikbiz
 
PDF
PostreSQL HA and DR Setup & Use Cases
Ashnikbiz
 
PPTX
HBase: Where Online Meets Low Latency
HBaseCon
 
PPTX
HBase Accelerated: In-Memory Flush and Compaction
DataWorks Summit/Hadoop Summit
 
PDF
Training Slides: Basics 103: The Power of Tungsten Connector / Proxy
Continuent
 
PDF
DBaaS with EDB Postgres on AWS
EDB
 
PPTX
Managing storage on Prem and in Cloud
Howard Marks
 
PPTX
Spark streaming with apache kafka
punesparkmeetup
 
Rails on HBase
EffectiveUI
 
Chicago Data Summit: Geo-based Content Processing Using HBase
Cloudera, Inc.
 
From 0 to syncing
Philipp Fehre
 
Innovation with Connection, The new HPCC Systems Plugins and Modules
HPCC Systems
 
2016 may-countdown-to-postgres-v96-parallel-query
Ashnikbiz
 
Installing Postgres on Linux
EDB
 
Operationalizing Data Science Using Cloud Foundry
VMware Tanzu
 
Powering GIS Application with PostgreSQL and Postgres Plus
Ashnikbiz
 
SAP OS/DB Migration using Azure Storage Account
Gary Jackson MBCS
 
Apachecon Europe 2012: Operating HBase - Things you need to know
Christian Gügi
 
Trusted advisory on technology comparison --exadata, hana, db2
Ajay Kumar Uppal
 
Apache geode
Yogesh BG
 
X-DB Replication Server and MMR
Ashnikbiz
 
PostreSQL HA and DR Setup & Use Cases
Ashnikbiz
 
HBase: Where Online Meets Low Latency
HBaseCon
 
HBase Accelerated: In-Memory Flush and Compaction
DataWorks Summit/Hadoop Summit
 
Training Slides: Basics 103: The Power of Tungsten Connector / Proxy
Continuent
 
DBaaS with EDB Postgres on AWS
EDB
 
Managing storage on Prem and in Cloud
Howard Marks
 
Spark streaming with apache kafka
punesparkmeetup
 

Similar to hbaseconasia2019 BDS: A data synchronization platform for HBase (20)

PPTX
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Esther Kundin
 
PDF
Hbase status quo apache-con europe - nov 2012
Chris Huang
 
PDF
HBase Status Report - Hadoop Summit Europe 2014
larsgeorge
 
PDF
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Toshihiro Suzuki
 
PDF
Facebook keynote-nicolas-qcon
Yiwei Ma
 
PDF
支撑Facebook消息处理的h base存储系统
yongboy
 
PDF
Facebook Messages & HBase
强 王
 
PPTX
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
Michael Stack
 
PDF
Hadoop 3.0 - Revolution or evolution?
Uwe Printz
 
PDF
Large-scale Web Apps @ Pinterest
HBaseCon
 
PDF
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
gluent.
 
PDF
Hive spark-s3acommitter-hbase-nfs
Yifeng Jiang
 
PPTX
Real time fraud detection at 1+M scale on hadoop stack
DataWorks Summit/Hadoop Summit
 
PDF
Architectural Evolution Starting from Hadoop
SpagoWorld
 
PPTX
Getting Started with Hadoop
Cloudera, Inc.
 
PDF
Intro to HBase - Lars George
JAX London
 
PDF
StreamHorizon and bigdata overview
StreamHorizon
 
PPTX
HBase Low Latency
DataWorks Summit
 
PPTX
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
Cloudera, Inc.
 
PPTX
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Esther Kundin
 
Hbase status quo apache-con europe - nov 2012
Chris Huang
 
HBase Status Report - Hadoop Summit Europe 2014
larsgeorge
 
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Toshihiro Suzuki
 
Facebook keynote-nicolas-qcon
Yiwei Ma
 
支撑Facebook消息处理的h base存储系统
yongboy
 
Facebook Messages & HBase
强 王
 
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
Michael Stack
 
Hadoop 3.0 - Revolution or evolution?
Uwe Printz
 
Large-scale Web Apps @ Pinterest
HBaseCon
 
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
gluent.
 
Hive spark-s3acommitter-hbase-nfs
Yifeng Jiang
 
Real time fraud detection at 1+M scale on hadoop stack
DataWorks Summit/Hadoop Summit
 
Architectural Evolution Starting from Hadoop
SpagoWorld
 
Getting Started with Hadoop
Cloudera, Inc.
 
Intro to HBase - Lars George
JAX London
 
StreamHorizon and bigdata overview
StreamHorizon
 
HBase Low Latency
DataWorks Summit
 
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
Cloudera, Inc.
 
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
 
Ad

More from Michael Stack (20)

PDF
hbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloud
Michael Stack
 
PDF
hbaseconasia2019 Recent work on HBase at Pinterest
Michael Stack
 
PDF
hbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltd
Michael Stack
 
PDF
hbaseconasia2019 HBase at Didi
Michael Stack
 
PDF
hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...
Michael Stack
 
PDF
hbaseconasia2019 HBase at Tencent
Michael Stack
 
PDF
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
Michael Stack
 
PDF
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...
Michael Stack
 
PDF
hbaseconasia2019 Pharos as a Pluggable Secondary Index Component
Michael Stack
 
PDF
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
Michael Stack
 
PDF
hbaseconasia2019 OpenTSDB at Xiaomi
Michael Stack
 
PDF
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
Michael Stack
 
PDF
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
Michael Stack
 
PDF
hbaseconasia2019 Distributed Bitmap Index Solution
Michael Stack
 
PDF
hbaseconasia2019 HBase Bucket Cache on Persistent Memory
Michael Stack
 
PDF
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACL
Michael Stack
 
PDF
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
Michael Stack
 
PDF
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
Michael Stack
 
PDF
HBaseConAsia2019 Keynote
Michael Stack
 
PDF
HBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latencies
Michael Stack
 
hbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloud
Michael Stack
 
hbaseconasia2019 Recent work on HBase at Pinterest
Michael Stack
 
hbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltd
Michael Stack
 
hbaseconasia2019 HBase at Didi
Michael Stack
 
hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...
Michael Stack
 
hbaseconasia2019 HBase at Tencent
Michael Stack
 
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
Michael Stack
 
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...
Michael Stack
 
hbaseconasia2019 Pharos as a Pluggable Secondary Index Component
Michael Stack
 
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
Michael Stack
 
hbaseconasia2019 OpenTSDB at Xiaomi
Michael Stack
 
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
Michael Stack
 
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
Michael Stack
 
hbaseconasia2019 Distributed Bitmap Index Solution
Michael Stack
 
hbaseconasia2019 HBase Bucket Cache on Persistent Memory
Michael Stack
 
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACL
Michael Stack
 
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
Michael Stack
 
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
Michael Stack
 
HBaseConAsia2019 Keynote
Michael Stack
 
HBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latencies
Michael Stack
 
Ad

Recently uploaded (20)

PPTX
Lec15_Mutability Immutability-converted.pptx
khanjahanzaib1
 
PPT
Computer Securityyyyyyyy - Chapter 2.ppt
SolomonSB
 
PPTX
Orchestrating things in Angular application
Peter Abraham
 
PPTX
internet básico presentacion es una red global
70965857
 
PPTX
04 Output 1 Instruments & Tools (3).pptx
GEDYIONGebre
 
PDF
Build Fast, Scale Faster: Milvus vs. Zilliz Cloud for Production-Ready AI
Zilliz
 
PPT
Agilent Optoelectronic Solutions for Mobile Application
andreashenniger2
 
PPTX
ONLINE BIRTH CERTIFICATE APPLICATION SYSYTEM PPT.pptx
ShyamasreeDutta
 
PPTX
Optimization_Techniques_ML_Presentation.pptx
farispalayi
 
PPT
introductio to computers by arthur janry
RamananMuthukrishnan
 
DOCX
Custom vs. Off-the-Shelf Banking Software
KristenCarter35
 
PPTX
sajflsajfljsdfljslfjslfsdfas;fdsfksadfjlsdflkjslgfs;lfjlsajfl;sajfasfd.pptx
theknightme
 
PPTX
英国假毕业证诺森比亚大学成绩单GPA修改UNN学生卡网上可查学历成绩单
Taqyea
 
PPT
Computer Securityyyyyyyy - Chapter 1.ppt
SolomonSB
 
PPTX
PE introd.pptxfrgfgfdgfdgfgrtretrt44t444
nepmithibai2024
 
PDF
𝐁𝐔𝐊𝐓𝐈 𝐊𝐄𝐌𝐄𝐍𝐀𝐍𝐆𝐀𝐍 𝐊𝐈𝐏𝐄𝐑𝟒𝐃 𝐇𝐀𝐑𝐈 𝐈𝐍𝐈 𝟐𝟎𝟐𝟓
hokimamad0
 
PPTX
Presentation3gsgsgsgsdfgadgsfgfgsfgagsfgsfgzfdgsdgs.pptx
SUB03
 
PPTX
西班牙武康大学毕业证书{UCAMOfferUCAM成绩单水印}原版制作
Taqyea
 
PPTX
PM200.pptxghjgfhjghjghjghjghjghjghjghjghjghj
breadpaan921
 
PPTX
原版西班牙莱昂大学毕业证(León毕业证书)如何办理
Taqyea
 
Lec15_Mutability Immutability-converted.pptx
khanjahanzaib1
 
Computer Securityyyyyyyy - Chapter 2.ppt
SolomonSB
 
Orchestrating things in Angular application
Peter Abraham
 
internet básico presentacion es una red global
70965857
 
04 Output 1 Instruments & Tools (3).pptx
GEDYIONGebre
 
Build Fast, Scale Faster: Milvus vs. Zilliz Cloud for Production-Ready AI
Zilliz
 
Agilent Optoelectronic Solutions for Mobile Application
andreashenniger2
 
ONLINE BIRTH CERTIFICATE APPLICATION SYSYTEM PPT.pptx
ShyamasreeDutta
 
Optimization_Techniques_ML_Presentation.pptx
farispalayi
 
introductio to computers by arthur janry
RamananMuthukrishnan
 
Custom vs. Off-the-Shelf Banking Software
KristenCarter35
 
sajflsajfljsdfljslfjslfsdfas;fdsfksadfjlsdflkjslgfs;lfjlsajfl;sajfasfd.pptx
theknightme
 
英国假毕业证诺森比亚大学成绩单GPA修改UNN学生卡网上可查学历成绩单
Taqyea
 
Computer Securityyyyyyyy - Chapter 1.ppt
SolomonSB
 
PE introd.pptxfrgfgfdgfdgfgrtretrt44t444
nepmithibai2024
 
𝐁𝐔𝐊𝐓𝐈 𝐊𝐄𝐌𝐄𝐍𝐀𝐍𝐆𝐀𝐍 𝐊𝐈𝐏𝐄𝐑𝟒𝐃 𝐇𝐀𝐑𝐈 𝐈𝐍𝐈 𝟐𝟎𝟐𝟓
hokimamad0
 
Presentation3gsgsgsgsdfgadgsfgfgsfgagsfgsfgzfdgsdgs.pptx
SUB03
 
西班牙武康大学毕业证书{UCAMOfferUCAM成绩单水印}原版制作
Taqyea
 
PM200.pptxghjgfhjghjghjghjghjghjghjghjghjghj
breadpaan921
 
原版西班牙莱昂大学毕业证(León毕业证书)如何办理
Taqyea
 

hbaseconasia2019 BDS: A data synchronization platform for HBase

  • 1. THE COMMUNITY EVENT FOR APACHE HBASE™
  • 2. BDS: A data synchronization platform for HBase 熊嘉男(侧⽥田) Ali-HBase 数据链路路负责⼈人
  • 4. • HBase support cross-version migration without downtime? • HBase support data backup to OSS or other storage? • HBase support replicate incremental data to MQ,ES,Solr? • Replicate incremental data from RDS to HBase? • HBase data can be archived to Spark cluster for offline analysis? • HBase High Availability
 …….
  • 6. • Table Structure transformation • Real-time data replication • Client double write • HBase Replication • Full data migration • DataX • CopyTable • Create Snapshot & Export Snapshot • Data consistency verification HBase clusters Migration • Cross-Version Migration compatibility issues • Impact on Business • Lack of integrated solutions Migration Step Defect
  • 7. • Heterogeneous full data migration • DataX • Sqoop • Heterogeneous Real-time Data Replication • HBase Real-time Data export • Custom Replication Endpoint • Custom Replication Sink Heterogeneous Data Transmission
  • 8. BDS
  • 9. & & • Master & Slave • Stateless Slave • Plugin-in mode • Higher scalability and better performance High-Level Architecture
  • 11. HBase full data migration 3 3 . 3 3 . . . 3 . 3 . 3 . 3 1 3 35 3 2 4 3 5 3 4 3 2
  • 12. HBase full data migration • Avoid the impact on business • Only access HDFS • Dynamic migration rate • Decoupled from HBase • One-click migration • Create table automaticlly • Perceive changes in region • Perceive HFiles compaction • Efficient • 100MB/s (single node) • Higher scalability
  • 13. Data localization rate DataNode1 DataNode2 HFile HFile RegionServer Region Local read remote read • Data migration takes the issue of data localization rates into account • Avoid low localization rate after data migration
  • 14. File split HFile1 HFile2 HFile3 HFile4 HFile1 HFile2-1 HFile2-2 HFile3 HFile4 Split Region1 Region2 Load • Migration will split HFiles according to the partitions of the original and target tables • Increase the speed of bulkload
  • 16. Data pipeline 352 1162 5 3 4 • Using RingBuffer as a queue • AckQueue maintains offset • Write throughput support dynamic configuration
  • 17. Impact on business 4 43 4 43 2 2 43 43 2 2 2 43 4 2 43 31 2 43 4 2 43 31 • Read and write affect data replication HBASE Replication BDS • Decoupled from HBase • Only access HDFS • Data Replication is not affected by HBase crash
  • 18. Hotspot 4 43 4 43 2 2 43 43 2 2 2 43 4 2 43 31 2 43 4 2 43 31 2 3 2 1 1 1 HBASE Replication BDS • Hotspot • Round robin scheduling
  • 19. Replication backlog 2 3 2 1 1 1 BDS • Add slave nodes • Slave throughput support dynamic configuration 增加Worker节点并发处理理⽇日志的数量量 增加AsyncWriter并发 Add Worker nodes
  • 20. Operation and maintenance •BDS •Easy to expand •Easy to upgrade •monitor •alarm mechanism •HBase Replication •Bug fix •No alarm •Configuration modification and system upgrade requires RS to restart
  • 25. Archive data to Spark 1 0
  • 26. RDS