SlideShare a Scribd company logo
hbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltd
Phoenix Practice in China Life
Insurance Co., Ltd.
Leo Yuan
yuanliou2015@e-chinalife.com
1 Scenarios
2 Designs
3 Optimizations
4 Problems
5 Future Work
Agenda
Scenarios - Overview
Main Processing
Cluster A
Application Procession
Cluster C
Readonly cluster B
Sync
basic
data
Sync result
Real Time Query
Cluster E
Scenarios - Overview
4 Cluste200+ Nodes(30+ Phoenix Nodes)
Clusters
Data
1300TB+ data
30TB+ biggest table
Querys
Ten million level perday
Processing
Hundreds of MR/Hive/Spark jobs per day
50TB+ Incremental data for update & insert
Scenarios - Overview
step 1 get some money from
company counter
step 2 query operation detail from app
This Is What We Do
Real Cust Rights View System !
Scenarios - Overview
30,000,000 + Incremental data/day 700,000 + users / day
10,000+ records/sec 8,000+ sqls / sec
1 Scenarios
2 Designs
3 Optimizations
4 Problems
5 Future Work
Agenda
Designs - Data Architecture
ü Initialize Data
ü Build Phoenix Index
ü Sync Real Time Data
ü Provide Data Service
China Life Insurance APP
Business
System
SharePlex
Kafka
Spark
Streaming
Cust View System
Phoenix
HBase
Designs - Development Architecture
Data Integration
Data View
Cross-province Integration
Query Distribution Engine
V_constract table
(4+3)
Data Service
Ultimate Real Time Query Service
Data Exchange Initial Data Sync Program
Data Source Business System DataBase
JobSchedule
Monitor
Resourceschedule
monitor
Schedule
Monitor
Real Time Data Link
Incremental Data Sync Program
V_pay table
(4+3)
V_cust service table
(4+3)
V_claim table
(4+3) …
dim
Resourceaccess
control
UserAccess
Control
Privilege
Control
Designs - Physical Architecture
Phoenix Cluster
Configuration DB Weblogic Server Weblogic Server Weblogic Server
F5APP Server
Phoenix Gateway
Kafka
Designs - Real Time Data Sync
cid c_no type amount branch syssource updtime incr_flag commit_time ……
001 001 M 100 000000 V6 2019-06-11 16:37.322 1 2019-06-11 16:37.322 ……
02 002 R 100 000001 V7 2019-06-11 16:38.689 2 2019-06-11 16:38.689 ……
Contract(BeiJing)
cid c_no type amount
01 001 M 100
1、Partition By Global Primary Key
2、Shield Upstream System Table Structure Adjustment
3、No Effect on Normal Stream Process when Data
Supplement
…
… ……
SparkStreaming
Contract(ShangHai)
cid c_no type amount
02 002 R 100
SparkStreaming_compt
Designs - Ultimate Real Time Query Service
label name label type isHolder logic
paidBonus sum holder sql1,2,3
paidMoenyList list holder sql4,5
paidExpire sum Insured sql6
… … … …
label.properties
Analyze
Parameters
Package Sqls
Execue Sqls
Collect Results
Chinalife Insurance APP
Ultimate Real Time Query Service
1 Scenarios
2 Designs
3 Optimizations
4 Problems
5 Future Work
Agenda
Optimizations - Sql Execution Process
DriverManager.getConnection("phoenixUrl")
con.prepareStatement(sql)
pstat.executeQuery()
rs.next()
1、SYSTEM.CATALOG
2、SYSTEM.STATS
3、SYSTEM.LOG
4、 SYSTEM.SEQUENCE
1、Query meta data of table/index
from phoenix server(SYSTEM.CATALOG)
2、Determine the table/index sql need to scan
3、Query statistics information of table/index
from phoenix server(SYSTEM.STATS)
4、Generate scans based on statistics information、meta data、sql
1、Parallelity decided by phoenix.query.threadPoolSize
Optimizations – Phoenix System Table
SYSTEM.CATALOG
SYSTEM.STATS
Describe table/index meta information,
such as
l TABLE NEME
l COLUMN NAME
l SALT_BUCKETS
l UPDATE_CACHE_FREQUENCY
l GUIDE_POST_WIDTH
Describe table/index accurate
statistics information, such as
l GUIDE_POST_KEY
l GUIDE_POSTS_WIDTH
PS:UPDATE STATISTICS TABLE_NEME
Optimizations - RS Group
Hmaster
SYSTEM:CATALOG
SYSTEM:MUTEX
SYSTEM:STATS
CONTRACT
INCOME
PERSON
RS Group 1 RS Group 2
rs1
SYSTEM:CATALOG SYSTEM:STATS
Hmaster
rs2 rs3 rs4 … rs1 rs3 rs4 …rs2
……
SYSTEM:STATSCONTRACTINCOME
PERSON ……
• Metadata Table Isolation to Decrease Impact on Business Table Query
Optimizations - UPDATE_CACHE_FREQUENCY
• Adjust this parameter to decrease hotspot in SYSTEM.CATALOG
p Decide the query frequency of SYSTEM.CATALOG
p Default value is “Always” and will Cause read/write pressure in SYSTEM.CATALOG
p Can be set per Cluster/Table
1. “phoenix.default.update.cache.frequency”: 86412345
2. create table test.test (a varchar not null primary key,b varchar )
SALT_BUCKETS = 10, UPDATE_CACHE_FREQUENCY=86400000;
Optimizations – Salt & Pre-Split
• Data table use salt_buckets, Index table use pre-split
CREATE TABLE …(
)SALT_BUCKETS = 60
CREATE INDEX …
ON …(
…
) INCLUDE (
…
) ASYNC SALT_BUCKETS
= 0
SPLIT ON (
…
)
1、Create index to
ensure that all of query
process stop at index
table
2、use pre split to reduce
the chunk number when
execute a query
3、use async index to
avoid OOM when
building the index table
1、Use
salted
table to
distribute
data evenly
Phoenix Client
write
Phoenix Client
query
Optimizations - Open Offheap
• Minimal read cost to improve query efficiency
BucketCache Configuration Properties
l hbase.bucketcache.combinedcache.enabled
l hbase.bucketcache.ioengine
l hfile.block.cache.size
l hbase.bucketcache.size
l hbase.bucketcache.bucket.sizes
l -XX:MaxDirectMemorySize
Optimizations - Other Configurations
• Region Balance By Table
• G1GC
• Manual MajorCompaction
1 Scenarios
2 Designs
3 Optimizations
4 Problems
5 Future Work
Agenda
Problems - Current time function cause Query performance degradation
• This function will lead to client-cluster interaction frequently
SELECT …
FROM …
WHERE date <=
CURRENT_TIME()
SELECT …
FROM …
WHERE date <=
CURRENT_TIME()
SELECT …
FROM …
WHERE date <= NOW_DATE
replace CURRENT_TIME
In java code to NOW_DATE
①
②
③
Problems - HBase cluster balance abnormal
1.Turn on RSGroup
2.Set two RSGroup default 、my_group
3.Move rs to default and my_group
4.Restart one rs in default
process Problems Resolve
org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer
2. balance_rsgroup 'default' abnormal
1. balancer abnormal
Problems - ACL abnormal when come together with RSGroup
1.Turn on RSGroup
2.Turn on ACL
process Problems
1. Non-hbase user can’t creat table
Resolve
1, Use hbase user to create table
2, grant this table 'RWX’ to Non-hbase user
<property>
<name>
hbase.coprocessor.master.classes
</name>
<value>
org.apache.hadoop.hbase.security.
access.AccessController,
org.apache.hadoop.
hbase.rsgroup.RSGroupAdminEndpoint
</value>
</property>
1 Scenarios
2 Designs
3 Optimizations
4 Problems
5 Future Work
Agenda
Future Work - RPC Read/Write Isolation
total queue = hbase.ipc.server.callqueue.handler.factor * handler
read queue = total queue * hbase.ipc.server.callqueue.read.ratio
write queue = total queue * (1- hbase.ipc.server.callqueue.read.ratio )
scan queue = = total queue * hbase.ipc.server.callqueue.read.ratio *
hbase.ipc.server.callqueue.scan.ratio
total queue
write
queue
read queue
scan
queue
Future Work - Compaction Contral
Set Offpeak Time
Set Peak Time Throughput
l hbase.offpeak.end.hour
l hbase.offpeak.start.hour
l hbase.hstore.compaction.throughput.offpeak
l key:hbase.regionserver.throughput.controller
value:org.apache.hadoop.hbase.regionserver.compactions.
PressureAwareCompactionThroughputController
Open Compaction Controller
l hbase.hstore.compaction.throughput.higher.bound
l hbase.hstore.compaction.throughput.lower.bound
Future Work - Join Optimization
SELECT …
FROM ( a1
JOIN (SELECT...
FROM b1
WHERE …) a2
ON a1…. = a2…)
WHERE …
EXPLAIN
Thanks!
yuanliou2015@e-chinalife.com

More Related Content

PPTX
The Stream Processor as a Database Apache Flink
DataWorks Summit/Hadoop Summit
 
PPTX
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Robert Metzger
 
PPTX
Flink internals web
Kostas Tzoumas
 
PDF
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...
ucelebi
 
PDF
Monitoring Postgres at Scale | PostgresConf US 2018 | Lukas Fittl
Citus Data
 
PDF
Real time analytics at any scale | PostgreSQL User Group NL | Marco Slot
Citus Data
 
PDF
Distributed Point-in-Time Recovery with Postgres | PGConf.Russia 2018 | Eren ...
Citus Data
 
PDF
Postgres Performance for Humans
Citus Data
 
The Stream Processor as a Database Apache Flink
DataWorks Summit/Hadoop Summit
 
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Robert Metzger
 
Flink internals web
Kostas Tzoumas
 
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...
ucelebi
 
Monitoring Postgres at Scale | PostgresConf US 2018 | Lukas Fittl
Citus Data
 
Real time analytics at any scale | PostgreSQL User Group NL | Marco Slot
Citus Data
 
Distributed Point-in-Time Recovery with Postgres | PGConf.Russia 2018 | Eren ...
Citus Data
 
Postgres Performance for Humans
Citus Data
 

What's hot (20)

PPTX
January 2016 Flink Community Update & Roadmap 2016
Robert Metzger
 
PDF
Distributing Queries the Citus Way | PostgresConf US 2018 | Marco Slot
Citus Data
 
PDF
MongoDB World 2019: The Journey of Migration from Oracle to MongoDB at Rakuten
MongoDB
 
PDF
The Challenges of Distributing Postgres: A Citus Story
Hanna Kelman
 
PPTX
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
Fabian Hueske
 
PDF
Towards sql for streams
Radu Tudoran
 
PPTX
Real-time Stream Processing with Apache Flink
DataWorks Summit
 
PDF
Virtual Flink Forward 2020: Production-Ready Flink and Hive Integration - wha...
Flink Forward
 
PPTX
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Robert Metzger
 
PPTX
Apache Flink@ Strata & Hadoop World London
Stephan Ewen
 
PDF
Structured streaming for machine learning
Seth Hendrickson
 
PDF
Taking Spark Streaming to the Next Level with Datasets and DataFrames
Databricks
 
PDF
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Databricks
 
PPTX
Apache Beam: A unified model for batch and stream processing data
DataWorks Summit/Hadoop Summit
 
PDF
Batch and Stream Graph Processing with Apache Flink
Vasia Kalavri
 
PPTX
Data Stream Processing with Apache Flink
Fabian Hueske
 
PDF
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Vasia Kalavri
 
PPTX
Flink 0.10 @ Bay Area Meetup (October 2015)
Stephan Ewen
 
PDF
Flink Gelly - Karlsruhe - June 2015
Andra Lungu
 
PPTX
Apache Flink Overview at SF Spark and Friends
Stephan Ewen
 
January 2016 Flink Community Update & Roadmap 2016
Robert Metzger
 
Distributing Queries the Citus Way | PostgresConf US 2018 | Marco Slot
Citus Data
 
MongoDB World 2019: The Journey of Migration from Oracle to MongoDB at Rakuten
MongoDB
 
The Challenges of Distributing Postgres: A Citus Story
Hanna Kelman
 
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
Fabian Hueske
 
Towards sql for streams
Radu Tudoran
 
Real-time Stream Processing with Apache Flink
DataWorks Summit
 
Virtual Flink Forward 2020: Production-Ready Flink and Hive Integration - wha...
Flink Forward
 
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Robert Metzger
 
Apache Flink@ Strata & Hadoop World London
Stephan Ewen
 
Structured streaming for machine learning
Seth Hendrickson
 
Taking Spark Streaming to the Next Level with Datasets and DataFrames
Databricks
 
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Databricks
 
Apache Beam: A unified model for batch and stream processing data
DataWorks Summit/Hadoop Summit
 
Batch and Stream Graph Processing with Apache Flink
Vasia Kalavri
 
Data Stream Processing with Apache Flink
Fabian Hueske
 
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Vasia Kalavri
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Stephan Ewen
 
Flink Gelly - Karlsruhe - June 2015
Andra Lungu
 
Apache Flink Overview at SF Spark and Friends
Stephan Ewen
 
Ad

Similar to hbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltd (20)

PPTX
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
SingleStore
 
PPTX
Flexible and Real-Time Stream Processing with Apache Flink
DataWorks Summit
 
PPTX
Scheduling in Linux and Web Servers
David Evans
 
PDF
Apache Spark 2.0: Faster, Easier, and Smarter
Databricks
 
PDF
SamzaSQL QCon'16 presentation
Yi Pan
 
PPTX
Yogesh kumar kushwah represent’s
Yogesh Kushwah
 
PPTX
Synapse 2018 Guarding against failure in a hundred step pipeline
Calvin French-Owen
 
PDF
Become a Performance Diagnostics Hero
TechWell
 
PDF
CAOS: A CAD Framework for FPGA-Based Systems
NECST Lab @ Politecnico di Milano
 
PDF
SnappyData at Spark Summit 2017
Jags Ramnarayan
 
PPTX
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData
 
PDF
Building Operational Data Lake using Spark and SequoiaDB with Yang Peng
Databricks
 
PDF
Apache Kafka, and the Rise of Stream Processing
Guozhang Wang
 
PDF
Circonus: Design failures - A Case Study
Heinrich Hartmann
 
PDF
London Redshift Meetup - July 2017
Pratim Das
 
PDF
The Future of Real-Time in Spark
Reynold Xin
 
PDF
The Future of Real-Time in Spark
Databricks
 
PDF
Tecnicas e Instrumentos de Recoleccion de Datos
Angel Giraldo
 
PPTX
#Virtualdreamin Meera_Nar_Salesforce_performance_considerations
Meera R Nair
 
PPTX
Oracle Database Performance Tuning Basics
nitin anjankar
 
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
SingleStore
 
Flexible and Real-Time Stream Processing with Apache Flink
DataWorks Summit
 
Scheduling in Linux and Web Servers
David Evans
 
Apache Spark 2.0: Faster, Easier, and Smarter
Databricks
 
SamzaSQL QCon'16 presentation
Yi Pan
 
Yogesh kumar kushwah represent’s
Yogesh Kushwah
 
Synapse 2018 Guarding against failure in a hundred step pipeline
Calvin French-Owen
 
Become a Performance Diagnostics Hero
TechWell
 
CAOS: A CAD Framework for FPGA-Based Systems
NECST Lab @ Politecnico di Milano
 
SnappyData at Spark Summit 2017
Jags Ramnarayan
 
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData
 
Building Operational Data Lake using Spark and SequoiaDB with Yang Peng
Databricks
 
Apache Kafka, and the Rise of Stream Processing
Guozhang Wang
 
Circonus: Design failures - A Case Study
Heinrich Hartmann
 
London Redshift Meetup - July 2017
Pratim Das
 
The Future of Real-Time in Spark
Reynold Xin
 
The Future of Real-Time in Spark
Databricks
 
Tecnicas e Instrumentos de Recoleccion de Datos
Angel Giraldo
 
#Virtualdreamin Meera_Nar_Salesforce_performance_considerations
Meera R Nair
 
Oracle Database Performance Tuning Basics
nitin anjankar
 
Ad

More from Michael Stack (20)

PDF
hbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloud
Michael Stack
 
PDF
hbaseconasia2019 Recent work on HBase at Pinterest
Michael Stack
 
PDF
hbaseconasia2019 HBase at Didi
Michael Stack
 
PDF
hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...
Michael Stack
 
PDF
hbaseconasia2019 HBase at Tencent
Michael Stack
 
PDF
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
Michael Stack
 
PDF
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...
Michael Stack
 
PDF
hbaseconasia2019 Pharos as a Pluggable Secondary Index Component
Michael Stack
 
PDF
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
Michael Stack
 
PDF
hbaseconasia2019 OpenTSDB at Xiaomi
Michael Stack
 
PDF
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
Michael Stack
 
PDF
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
Michael Stack
 
PDF
hbaseconasia2019 Distributed Bitmap Index Solution
Michael Stack
 
PDF
hbaseconasia2019 HBase Bucket Cache on Persistent Memory
Michael Stack
 
PDF
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACL
Michael Stack
 
PDF
hbaseconasia2019 BDS: A data synchronization platform for HBase
Michael Stack
 
PDF
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
Michael Stack
 
PDF
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
Michael Stack
 
PDF
HBaseConAsia2019 Keynote
Michael Stack
 
PDF
HBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latencies
Michael Stack
 
hbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloud
Michael Stack
 
hbaseconasia2019 Recent work on HBase at Pinterest
Michael Stack
 
hbaseconasia2019 HBase at Didi
Michael Stack
 
hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...
Michael Stack
 
hbaseconasia2019 HBase at Tencent
Michael Stack
 
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
Michael Stack
 
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...
Michael Stack
 
hbaseconasia2019 Pharos as a Pluggable Secondary Index Component
Michael Stack
 
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
Michael Stack
 
hbaseconasia2019 OpenTSDB at Xiaomi
Michael Stack
 
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
Michael Stack
 
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
Michael Stack
 
hbaseconasia2019 Distributed Bitmap Index Solution
Michael Stack
 
hbaseconasia2019 HBase Bucket Cache on Persistent Memory
Michael Stack
 
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACL
Michael Stack
 
hbaseconasia2019 BDS: A data synchronization platform for HBase
Michael Stack
 
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
Michael Stack
 
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
Michael Stack
 
HBaseConAsia2019 Keynote
Michael Stack
 
HBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latencies
Michael Stack
 

Recently uploaded (20)

PPTX
Unlocking Hope : How Crypto Recovery Services Can Reclaim Your Lost Funds
lionsgate network
 
PPTX
Google SGE SEO: 5 Critical Changes That Could Wreck Your Rankings in 2025
Reversed Out Creative
 
PDF
KIPER4D situs Exclusive Game dari server Star Gaming Asia
hokimamad0
 
PDF
APNIC Update, presented at PHNOG 2025 by Shane Hermoso
APNIC
 
PPTX
办理方法西班牙假毕业证蒙德拉贡大学成绩单MULetter文凭样本
xxxihn4u
 
PDF
Slides: PDF Eco Economic Epochs for World Game (s) pdf
Steven McGee
 
PDF
BGP Security Best Practices that Matter, presented at PHNOG 2025
APNIC
 
PPTX
The Latest Scam Shocking the USA in 2025.pptx
onlinescamreport4
 
PPTX
ppt lighfrsefsefesfesfsefsefsefsefserrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrt.pptx
atharvawafgaonkar
 
PDF
KIPER4D situs Exclusive Game dari server Star Gaming Asia
hokimamad0
 
PDF
Cybersecurity Awareness Presentation ppt.
banodhaharshita
 
PDF
5g is Reshaping the Competitive Landscape
Stellarix
 
PPTX
Parallel & Concurrent ...
yashpavasiya892
 
PPTX
SEO Trends in 2025 | B3AITS - Bow & 3 Arrows IT Solutions
B3AITS - Bow & 3 Arrows IT Solutions
 
PDF
KIPER4D situs Exclusive Game dari server Star Gaming Asia
hokimamad0
 
PPTX
AI ad its imp i military life read it ag
ShwetaBharti31
 
PPTX
nagasai stick diagrams in very large scale integratiom.pptx
manunagapaul
 
PPTX
谢尔丹学院毕业证购买|Sheridan文凭不见了怎么办谢尔丹学院成绩单
mookxk3
 
PDF
Data Protection & Resilience in Focus.pdf
AmyPoblete3
 
PPTX
Different Generation Of Computers .pptx
divcoder9507
 
Unlocking Hope : How Crypto Recovery Services Can Reclaim Your Lost Funds
lionsgate network
 
Google SGE SEO: 5 Critical Changes That Could Wreck Your Rankings in 2025
Reversed Out Creative
 
KIPER4D situs Exclusive Game dari server Star Gaming Asia
hokimamad0
 
APNIC Update, presented at PHNOG 2025 by Shane Hermoso
APNIC
 
办理方法西班牙假毕业证蒙德拉贡大学成绩单MULetter文凭样本
xxxihn4u
 
Slides: PDF Eco Economic Epochs for World Game (s) pdf
Steven McGee
 
BGP Security Best Practices that Matter, presented at PHNOG 2025
APNIC
 
The Latest Scam Shocking the USA in 2025.pptx
onlinescamreport4
 
ppt lighfrsefsefesfesfsefsefsefsefserrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrt.pptx
atharvawafgaonkar
 
KIPER4D situs Exclusive Game dari server Star Gaming Asia
hokimamad0
 
Cybersecurity Awareness Presentation ppt.
banodhaharshita
 
5g is Reshaping the Competitive Landscape
Stellarix
 
Parallel & Concurrent ...
yashpavasiya892
 
SEO Trends in 2025 | B3AITS - Bow & 3 Arrows IT Solutions
B3AITS - Bow & 3 Arrows IT Solutions
 
KIPER4D situs Exclusive Game dari server Star Gaming Asia
hokimamad0
 
AI ad its imp i military life read it ag
ShwetaBharti31
 
nagasai stick diagrams in very large scale integratiom.pptx
manunagapaul
 
谢尔丹学院毕业证购买|Sheridan文凭不见了怎么办谢尔丹学院成绩单
mookxk3
 
Data Protection & Resilience in Focus.pdf
AmyPoblete3
 
Different Generation Of Computers .pptx
divcoder9507
 

hbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltd

  • 2. Phoenix Practice in China Life Insurance Co., Ltd. Leo Yuan [email protected]
  • 3. 1 Scenarios 2 Designs 3 Optimizations 4 Problems 5 Future Work Agenda
  • 4. Scenarios - Overview Main Processing Cluster A Application Procession Cluster C Readonly cluster B Sync basic data Sync result Real Time Query Cluster E
  • 5. Scenarios - Overview 4 Cluste200+ Nodes(30+ Phoenix Nodes) Clusters Data 1300TB+ data 30TB+ biggest table Querys Ten million level perday Processing Hundreds of MR/Hive/Spark jobs per day 50TB+ Incremental data for update & insert
  • 6. Scenarios - Overview step 1 get some money from company counter step 2 query operation detail from app This Is What We Do Real Cust Rights View System !
  • 7. Scenarios - Overview 30,000,000 + Incremental data/day 700,000 + users / day 10,000+ records/sec 8,000+ sqls / sec
  • 8. 1 Scenarios 2 Designs 3 Optimizations 4 Problems 5 Future Work Agenda
  • 9. Designs - Data Architecture ü Initialize Data ü Build Phoenix Index ü Sync Real Time Data ü Provide Data Service China Life Insurance APP Business System SharePlex Kafka Spark Streaming Cust View System Phoenix HBase
  • 10. Designs - Development Architecture Data Integration Data View Cross-province Integration Query Distribution Engine V_constract table (4+3) Data Service Ultimate Real Time Query Service Data Exchange Initial Data Sync Program Data Source Business System DataBase JobSchedule Monitor Resourceschedule monitor Schedule Monitor Real Time Data Link Incremental Data Sync Program V_pay table (4+3) V_cust service table (4+3) V_claim table (4+3) … dim Resourceaccess control UserAccess Control Privilege Control
  • 11. Designs - Physical Architecture Phoenix Cluster Configuration DB Weblogic Server Weblogic Server Weblogic Server F5APP Server Phoenix Gateway
  • 12. Kafka Designs - Real Time Data Sync cid c_no type amount branch syssource updtime incr_flag commit_time …… 001 001 M 100 000000 V6 2019-06-11 16:37.322 1 2019-06-11 16:37.322 …… 02 002 R 100 000001 V7 2019-06-11 16:38.689 2 2019-06-11 16:38.689 …… Contract(BeiJing) cid c_no type amount 01 001 M 100 1、Partition By Global Primary Key 2、Shield Upstream System Table Structure Adjustment 3、No Effect on Normal Stream Process when Data Supplement … … …… SparkStreaming Contract(ShangHai) cid c_no type amount 02 002 R 100 SparkStreaming_compt
  • 13. Designs - Ultimate Real Time Query Service label name label type isHolder logic paidBonus sum holder sql1,2,3 paidMoenyList list holder sql4,5 paidExpire sum Insured sql6 … … … … label.properties Analyze Parameters Package Sqls Execue Sqls Collect Results Chinalife Insurance APP Ultimate Real Time Query Service
  • 14. 1 Scenarios 2 Designs 3 Optimizations 4 Problems 5 Future Work Agenda
  • 15. Optimizations - Sql Execution Process DriverManager.getConnection("phoenixUrl") con.prepareStatement(sql) pstat.executeQuery() rs.next() 1、SYSTEM.CATALOG 2、SYSTEM.STATS 3、SYSTEM.LOG 4、 SYSTEM.SEQUENCE 1、Query meta data of table/index from phoenix server(SYSTEM.CATALOG) 2、Determine the table/index sql need to scan 3、Query statistics information of table/index from phoenix server(SYSTEM.STATS) 4、Generate scans based on statistics information、meta data、sql 1、Parallelity decided by phoenix.query.threadPoolSize
  • 16. Optimizations – Phoenix System Table SYSTEM.CATALOG SYSTEM.STATS Describe table/index meta information, such as l TABLE NEME l COLUMN NAME l SALT_BUCKETS l UPDATE_CACHE_FREQUENCY l GUIDE_POST_WIDTH Describe table/index accurate statistics information, such as l GUIDE_POST_KEY l GUIDE_POSTS_WIDTH PS:UPDATE STATISTICS TABLE_NEME
  • 17. Optimizations - RS Group Hmaster SYSTEM:CATALOG SYSTEM:MUTEX SYSTEM:STATS CONTRACT INCOME PERSON RS Group 1 RS Group 2 rs1 SYSTEM:CATALOG SYSTEM:STATS Hmaster rs2 rs3 rs4 … rs1 rs3 rs4 …rs2 …… SYSTEM:STATSCONTRACTINCOME PERSON …… • Metadata Table Isolation to Decrease Impact on Business Table Query
  • 18. Optimizations - UPDATE_CACHE_FREQUENCY • Adjust this parameter to decrease hotspot in SYSTEM.CATALOG p Decide the query frequency of SYSTEM.CATALOG p Default value is “Always” and will Cause read/write pressure in SYSTEM.CATALOG p Can be set per Cluster/Table 1. “phoenix.default.update.cache.frequency”: 86412345 2. create table test.test (a varchar not null primary key,b varchar ) SALT_BUCKETS = 10, UPDATE_CACHE_FREQUENCY=86400000;
  • 19. Optimizations – Salt & Pre-Split • Data table use salt_buckets, Index table use pre-split CREATE TABLE …( )SALT_BUCKETS = 60 CREATE INDEX … ON …( … ) INCLUDE ( … ) ASYNC SALT_BUCKETS = 0 SPLIT ON ( … ) 1、Create index to ensure that all of query process stop at index table 2、use pre split to reduce the chunk number when execute a query 3、use async index to avoid OOM when building the index table 1、Use salted table to distribute data evenly Phoenix Client write Phoenix Client query
  • 20. Optimizations - Open Offheap • Minimal read cost to improve query efficiency BucketCache Configuration Properties l hbase.bucketcache.combinedcache.enabled l hbase.bucketcache.ioengine l hfile.block.cache.size l hbase.bucketcache.size l hbase.bucketcache.bucket.sizes l -XX:MaxDirectMemorySize
  • 21. Optimizations - Other Configurations • Region Balance By Table • G1GC • Manual MajorCompaction
  • 22. 1 Scenarios 2 Designs 3 Optimizations 4 Problems 5 Future Work Agenda
  • 23. Problems - Current time function cause Query performance degradation • This function will lead to client-cluster interaction frequently SELECT … FROM … WHERE date <= CURRENT_TIME() SELECT … FROM … WHERE date <= CURRENT_TIME() SELECT … FROM … WHERE date <= NOW_DATE replace CURRENT_TIME In java code to NOW_DATE ① ② ③
  • 24. Problems - HBase cluster balance abnormal 1.Turn on RSGroup 2.Set two RSGroup default 、my_group 3.Move rs to default and my_group 4.Restart one rs in default process Problems Resolve org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer 2. balance_rsgroup 'default' abnormal 1. balancer abnormal
  • 25. Problems - ACL abnormal when come together with RSGroup 1.Turn on RSGroup 2.Turn on ACL process Problems 1. Non-hbase user can’t creat table Resolve 1, Use hbase user to create table 2, grant this table 'RWX’ to Non-hbase user <property> <name> hbase.coprocessor.master.classes </name> <value> org.apache.hadoop.hbase.security. access.AccessController, org.apache.hadoop. hbase.rsgroup.RSGroupAdminEndpoint </value> </property>
  • 26. 1 Scenarios 2 Designs 3 Optimizations 4 Problems 5 Future Work Agenda
  • 27. Future Work - RPC Read/Write Isolation total queue = hbase.ipc.server.callqueue.handler.factor * handler read queue = total queue * hbase.ipc.server.callqueue.read.ratio write queue = total queue * (1- hbase.ipc.server.callqueue.read.ratio ) scan queue = = total queue * hbase.ipc.server.callqueue.read.ratio * hbase.ipc.server.callqueue.scan.ratio total queue write queue read queue scan queue
  • 28. Future Work - Compaction Contral Set Offpeak Time Set Peak Time Throughput l hbase.offpeak.end.hour l hbase.offpeak.start.hour l hbase.hstore.compaction.throughput.offpeak l key:hbase.regionserver.throughput.controller value:org.apache.hadoop.hbase.regionserver.compactions. PressureAwareCompactionThroughputController Open Compaction Controller l hbase.hstore.compaction.throughput.higher.bound l hbase.hstore.compaction.throughput.lower.bound
  • 29. Future Work - Join Optimization SELECT … FROM ( a1 JOIN (SELECT... FROM b1 WHERE …) a2 ON a1…. = a2…) WHERE … EXPLAIN