SlideShare a Scribd company logo
POLARDB:
A database architecture for
the cloud
ØYSTEIN GRØVLEN
Sr. Staff Engineer @ Alibaba Cloud
Bio:
Before joining Alibaba, Øystein worked for 10 years in the
MySQL optimizer team at Sun/Oracle. At Sun Microsystems,
he was also a contributor on the Apache Derby project and
Sun's Architectural Lead on Java DB. Prior to that, he worked
for 10 years on development of Clustra, a highly available
DBMS.
POLARDB:a Cloud Native Database
Emerging
Hardware
• NVM
• RDMA
• FPGA
Serverless
• Auto Scaling
• Paid by Usage
• Zero Downtime
Security
• Encryption
• Audit
• Access Control
Intelligence
• Self-configuration
• Self-optimization
• Self-diagnosis
• Self-healing
CLOUD NATIVE
User Oriented
Database Architecture Revolution : Separation of Storage and Computation
Transaction
Architecture: Separation of Storage and Computation
Database Storage Engine
Computation OffloadingStorage
Compatibility
SecurityHTAPMulti-Model
Usability
Self-Driving
Manageability
Cloud Native Architecture
• Scale compute and storage independently
• Shared storage
• Across AZ fail-over without data loss
• Optimize division of functionality between
storage and compute
• Tight integration with other cloud components
like metering, monitoring, control plan
• Optimize for hardware in the data centers
• Compatible with MySQL/PG etc
• Security
PolarProxy
PolarStore
POLARDB
Intelligent proxy
100% Compatible
Storage Optimized
For Database
PolarFS
PolarStore: Architecture overview
- Design for Emerging Hardware
- Low Latency Oriented
- Active R/W – Active RO
- High Availability
libpfs
Host1
POLARDB
libpfs
POLARDB
Host2
volume 1 Volume 2
chunk1 chunk2 chunk1 chunk2
PolarSwitch
libpfs
POLARDB
volume 1
PolarSwitch
chunk1 chunk2
ChunkServer ChunkServer ChunkServer ChunkServer
chunk chunk chunk chunk
ParallelRaft
PolarCtrl
metadata
Key Components: 1. libpfs 2. PolarSwitch 3. ChunksServer 4. PolarCtrl
data route
control route
PolarStore: Design for Emerging Hardware
- No Context Switch
- OS-bypass & zero-copy
RDMA-NIC
Network Over RDMA
libpfs
POLARDB
Memory
- Parallel Random I/O absorbed by Optane
- Excellent performance with less long tail latency issue
- No need of Over Provisioning
WAL Log in 3Dxpoint optane
RDMA Network
RDMA
RDMA-NIC
Optane
NVMe SSDs
Memory
Chunkserver 1
RDMA-NIC
Optane
NVMe SSDs
Memory
Chunkserver 3
RDMA-NIC
Optane
NVMe SSDs
Memory
Chunkserver 2
PolarDB write to shm
PolarFS: posix distributed file system closely with DB
Pure User Space
For Extra-low Latency
- No Sys call

- No Context Switch

- Zero Data Copy
Posix Semantics
- Easy Porting
Node 1
libpfs
POLARDB
Journal file
Paxos file
Low Latency Oriented
libpfs
POLARDB
libpfs
POLARDB
Node 2 Node 3
1 2 3
4
5 6
head
pending tail
tail
POLARDB Cluster File System Metadata Cache
Directory Tree File Mapping Table
root FileBlk VolBlk
0
1
2
…
348
1500
0 201
…
6
Database Volume
Chunks
…
Block Mapping Table
FileID FileBlk
489
478
…
16
0 201
…
VolBlk
200
201
202
0 2010 316
…
3
PolarFS: An Ultra-low Latency and Failure Resilient Distributed File System for Shared
Storage Cloud Database (VLDB 2018)
Dynamic Scaling
Local
Storage
Fast Scaling
MySQL
POLARDB
Master
Local
Storage
Replica
Local
Storage
Replica
Master Replica Replica
Shared Storage
Upgrade 2vCPU to 32vCPU, only in 5 minutes
Add more Replicas, only in 5 minutes.
数值轴
1 Replica 2 Replica 3 Replica 4 Replica 5 Replica 10 Replica
20,949
11,349
9,749
8,149
6,549
4,949
39,844
20,102
16,811
13,521
10,230
6,940
RDS MySQL POLARDB
Lower Cost: 30%~50% OFF
Total costs of 4vCPU 32G Memory 500G Storage with
different replica numbers
0
10000
20000
30000
40000
Shared Nothing Logical Replication vs. Shared Storage Physical Replication
Local Storage Local Storage
Master
POLARSTORE
Slave Master Slave
Data
Binlog
Redo
log Data
Master
Binlog
Slave
Binlog
Redo
log
Data
Redo
log
Data
Redo
log
Binlog
Physical Replication is much more reliable than Logical Replication
Shared Nothing Logical Replication vs. Shared Storage Physical Replication
Non-blocking low-latency DDL synchronization
Master
Slave
Timeline
Add Column
Running 1 Hour
Add Column
Blocked 1 Hour
Applying DDL will block following events
Add Column
Update
data files
Update metadata
Need not modify data files
MySQL POLARDB
Shared Storage
Master
Slave
Physical Replication by Redo Log
Commit
Async Flush
Data File Redo Log
DATA LOG & MEMORY
Primary
Shared Storage
Log Parse
Hash
Table
Redo
Buffer
Pool
Buffer Pool
Write Memory
Query
Snapshot of T4
T2
T4
T5
T1
T3
T3T2T1 T4 T5
T3T2T1 T4
T3T2T1 T4 T5
RO Node
T4
Transactions
Buffer Pool
Shared Storage Continuous Recovery Consistent Snapshot Read
T1
Physical Replication - Page from Past
Oldest read view
Control purge
Avoid Data Gap
Checkpoint LSN
(T1)
Primary
Shared Storage
Log Parse
Hash
Table
Redo
Buffer Pool
Snapshot of T4
T2
T4
T5
T1
T3
T3T2T1 T4 T5
T3T2T1 T4
T1
T4Buffer Pool
Data
Redo Log
Checkpoint
T1
T3T2T1 T4
Purgeable Unpurgeable
RO
Node
Primary
RO Node
Physical Replication - Page from Future
Avoid Data Overstep
Control flush datafile
Primary
Shared Storage
Log Parse
Hash
Table
Redo
Buffer Pool
Snapshot of T4
T2
T4
T5
T1
T3
T3T2T1 T4 T5
T3T2T1 T4
T1
T4Buffer Pool
Data
Redo Log
Snapshot Version
T4
Unflushable
T5
T3T2T1 T4
Flushable
T4T3T2Primary
Snapshot Version
T4
LSN of the latest
applied redolog
RO Node
RO Node
Single Master
Single Endpoint Transparent Failover
Attacks Protection Causal Consist Read
Proxy Cluster
Master Replica Replica
Shared Storage
Application
Replica
Read/Write Split
High Availability
Load Balance
Security
Read and Write Separation - Session Consistent
Problem Can’t read latest data Solved!
connection.query
{
UPDATE user SET name=‘Jimmy’ WHERE id=1;
COMMIT;
SELECT name FROM user WHERE id=1; // name is Jimmy
}
SELECT can always get the latest data
POLARDB
Cluster
LSN 30 LSN 35
1. UPDATE
2. SELECT
Log Serial Number
LSN 35
1. UPDATE 3. SELECT Require LSN>=35)
2. Return LSN=35
M R1 R2
Application
Smart
Proxy
Read & Write
Separation
Load Balance
Module
Multi-master
Storage
Monitoring
Segment Servers
DB Servers DB ServersDB Servers
AZ1 AZ2 AZ3
segment 1
Segment Servers Segment Servers
CS1 CS2 CS3 CS4 CS5 CS6
Database Cluster
(Compute)
Storage
Service
3-AZ Persistent Core
(append-only)
TCPTCP
Storage Cluster
Page read
RDMA RDMA RDMA
redo
segment 1 segment 1
Thank You

More Related Content

What's hot (20)

PDF
Building Operational Data Lake using Spark and SequoiaDB with Yang Peng
Databricks
 
PDF
Presto updates to 0.178
Kai Sasaki
 
PPTX
Apache Spark and Online Analytics
Databricks
 
PDF
SSR: Structured Streaming for R and Machine Learning
felixcss
 
PDF
Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...
Databricks
 
PDF
Top 5 mistakes when writing Streaming applications
hadooparchbook
 
PPTX
How to ensure Presto scalability 
in multi use case
Kai Sasaki
 
PPTX
Amazon Aurora TechConnect
LavanyaMurthy9
 
PDF
How Adobe Does 2 Million Records Per Second Using Apache Spark!
Databricks
 
PDF
Apache Flink vs Apache Spark - Reproducible experiments on cloud.
Shelan Perera
 
PDF
Spark Summit EU 2015: Lessons from 300+ production users
Databricks
 
PDF
Re-Architecting Spark For Performance Understandability
Jen Aman
 
PPTX
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
Michael Stack
 
PDF
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Databricks
 
PDF
Use of Spark MLib for Predicting the Offlining of Digital Media-(Christopher ...
Spark Summit
 
PDF
Apache Drill (ver. 0.1, check ver. 0.2)
Camuel Gilyadov
 
PDF
Spark Summit EU talk by Mike Percy
Spark Summit
 
PDF
Scylla Virtual Workshop 2020
ScyllaDB
 
PDF
Deep Dive into GPU Support in Apache Spark 3.x
Databricks
 
PDF
TeraCache: Efficient Caching Over Fast Storage Devices
Databricks
 
Building Operational Data Lake using Spark and SequoiaDB with Yang Peng
Databricks
 
Presto updates to 0.178
Kai Sasaki
 
Apache Spark and Online Analytics
Databricks
 
SSR: Structured Streaming for R and Machine Learning
felixcss
 
Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...
Databricks
 
Top 5 mistakes when writing Streaming applications
hadooparchbook
 
How to ensure Presto scalability 
in multi use case
Kai Sasaki
 
Amazon Aurora TechConnect
LavanyaMurthy9
 
How Adobe Does 2 Million Records Per Second Using Apache Spark!
Databricks
 
Apache Flink vs Apache Spark - Reproducible experiments on cloud.
Shelan Perera
 
Spark Summit EU 2015: Lessons from 300+ production users
Databricks
 
Re-Architecting Spark For Performance Understandability
Jen Aman
 
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
Michael Stack
 
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Databricks
 
Use of Spark MLib for Predicting the Offlining of Digital Media-(Christopher ...
Spark Summit
 
Apache Drill (ver. 0.1, check ver. 0.2)
Camuel Gilyadov
 
Spark Summit EU talk by Mike Percy
Spark Summit
 
Scylla Virtual Workshop 2020
ScyllaDB
 
Deep Dive into GPU Support in Apache Spark 3.x
Databricks
 
TeraCache: Efficient Caching Over Fast Storage Devices
Databricks
 

Similar to POLARDB: A database architecture for the cloud (20)

PPTX
Sql server 2016 it just runs faster sql bits 2017 edition
Bob Ward
 
PPT
MYSQL
gilashikwa
 
PDF
Under The Hood Of A Shard-Per-Core Database Architecture
ScyllaDB
 
PPTX
SQL Server It Just Runs Faster
Bob Ward
 
PPTX
Exadata 12c New Features RMOUG
Fuad Arshad
 
PDF
Database as a Service on the Oracle Database Appliance Platform
Maris Elsins
 
PDF
[db tech showcase Tokyo 2018] #dbts2018 #B17 『オラクル パフォーマンス チューニング - 神話、伝説と解決策』
Insight Technology, Inc.
 
PDF
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
Lucidworks
 
PDF
Tuning Solr & Pipeline for Logs
Sematext Group, Inc.
 
PDF
Workshop para diseño de Lustre para sistemas HPC
heckm
 
ODP
Experience In Building Scalable Web Sites Through Infrastructure's View
Phuwadon D
 
PPTX
ZFS appliance
Fran Navarro
 
PDF
Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour...
Sandesh Rao
 
PPTX
Data Analytics Meetup: Introduction to Azure Data Lake Storage
CCG
 
PPTX
Cost Effectively Run Multiple Oracle Database Copies at Scale
NetApp
 
PDF
Hadoop 3.0 - Revolution or evolution?
Uwe Printz
 
PDF
SOUG_GV_Flashgrid_V4
UniFabric
 
PPT
Oracle RAC Presentation at Oracle Open World
Paul Marden
 
PDF
DBA 101 : Calling all New Database Administrators (PPT)
Gustavo Rene Antunez
 
Sql server 2016 it just runs faster sql bits 2017 edition
Bob Ward
 
MYSQL
gilashikwa
 
Under The Hood Of A Shard-Per-Core Database Architecture
ScyllaDB
 
SQL Server It Just Runs Faster
Bob Ward
 
Exadata 12c New Features RMOUG
Fuad Arshad
 
Database as a Service on the Oracle Database Appliance Platform
Maris Elsins
 
[db tech showcase Tokyo 2018] #dbts2018 #B17 『オラクル パフォーマンス チューニング - 神話、伝説と解決策』
Insight Technology, Inc.
 
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
Lucidworks
 
Tuning Solr & Pipeline for Logs
Sematext Group, Inc.
 
Workshop para diseño de Lustre para sistemas HPC
heckm
 
Experience In Building Scalable Web Sites Through Infrastructure's View
Phuwadon D
 
ZFS appliance
Fran Navarro
 
Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour...
Sandesh Rao
 
Data Analytics Meetup: Introduction to Azure Data Lake Storage
CCG
 
Cost Effectively Run Multiple Oracle Database Copies at Scale
NetApp
 
Hadoop 3.0 - Revolution or evolution?
Uwe Printz
 
SOUG_GV_Flashgrid_V4
UniFabric
 
Oracle RAC Presentation at Oracle Open World
Paul Marden
 
DBA 101 : Calling all New Database Administrators (PPT)
Gustavo Rene Antunez
 
Ad

More from oysteing (15)

PDF
The MySQL Query Optimizer Explained Through Optimizer Trace
oysteing
 
PDF
JSON_TABLE -- The best of both worlds
oysteing
 
PDF
Histogram Support in MySQL 8.0
oysteing
 
PDF
MySQL Optimizer: What’s New in 8.0
oysteing
 
PDF
How to Analyze and Tune MySQL Queries for Better Performance
oysteing
 
PDF
Common Table Expressions (CTE) & Window Functions in MySQL 8.0
oysteing
 
PDF
How to analyze and tune sql queries for better performance
oysteing
 
PDF
Using Optimizer Hints to Improve MySQL Query Performance
oysteing
 
PDF
MySQL 8.0: Common Table Expressions
oysteing
 
PDF
How to Analyze and Tune MySQL Queries for Better Performance
oysteing
 
PDF
MySQL 8.0: Common Table Expressions
oysteing
 
PDF
How to analyze and tune sql queries for better performance vts2016
oysteing
 
PDF
How to Analyze and Tune MySQL Queries for Better Performance
oysteing
 
PDF
How to analyze and tune sql queries for better performance percona15
oysteing
 
PDF
How to analyze and tune sql queries for better performance webinar
oysteing
 
The MySQL Query Optimizer Explained Through Optimizer Trace
oysteing
 
JSON_TABLE -- The best of both worlds
oysteing
 
Histogram Support in MySQL 8.0
oysteing
 
MySQL Optimizer: What’s New in 8.0
oysteing
 
How to Analyze and Tune MySQL Queries for Better Performance
oysteing
 
Common Table Expressions (CTE) & Window Functions in MySQL 8.0
oysteing
 
How to analyze and tune sql queries for better performance
oysteing
 
Using Optimizer Hints to Improve MySQL Query Performance
oysteing
 
MySQL 8.0: Common Table Expressions
oysteing
 
How to Analyze and Tune MySQL Queries for Better Performance
oysteing
 
MySQL 8.0: Common Table Expressions
oysteing
 
How to analyze and tune sql queries for better performance vts2016
oysteing
 
How to Analyze and Tune MySQL Queries for Better Performance
oysteing
 
How to analyze and tune sql queries for better performance percona15
oysteing
 
How to analyze and tune sql queries for better performance webinar
oysteing
 
Ad

Recently uploaded (20)

PPTX
Tally software_Introduction_Presentation
AditiBansal54083
 
PPTX
Customise Your Correlation Table in IBM SPSS Statistics.pptx
Version 1 Analytics
 
PPTX
Tally_Basic_Operations_Presentation.pptx
AditiBansal54083
 
PPTX
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
The 5 Reasons for IT Maintenance - Arna Softech
Arna Softech
 
PDF
SciPy 2025 - Packaging a Scientific Python Project
Henry Schreiner
 
PDF
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
PPTX
In From the Cold: Open Source as Part of Mainstream Software Asset Management
Shane Coughlan
 
PPTX
Foundations of Marketo Engage - Powering Campaigns with Marketo Personalization
bbedford2
 
PDF
NEW-Viral>Wondershare Filmora 14.5.18.12900 Crack Free
sherryg1122g
 
PDF
Driver Easy Pro 6.1.1 Crack Licensce key 2025 FREE
utfefguu
 
PDF
Empower Your Tech Vision- Why Businesses Prefer to Hire Remote Developers fro...
logixshapers59
 
PPTX
Agentic Automation Journey Session 1/5: Context Grounding and Autopilot for E...
klpathrudu
 
PDF
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
PDF
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
PDF
SAP Firmaya İade ABAB Kodları - ABAB ile yazılmıl hazır kod örneği
Salih Küçük
 
PDF
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
PDF
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
PDF
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
PPTX
Coefficient of Variance in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Tally software_Introduction_Presentation
AditiBansal54083
 
Customise Your Correlation Table in IBM SPSS Statistics.pptx
Version 1 Analytics
 
Tally_Basic_Operations_Presentation.pptx
AditiBansal54083
 
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
The 5 Reasons for IT Maintenance - Arna Softech
Arna Softech
 
SciPy 2025 - Packaging a Scientific Python Project
Henry Schreiner
 
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
In From the Cold: Open Source as Part of Mainstream Software Asset Management
Shane Coughlan
 
Foundations of Marketo Engage - Powering Campaigns with Marketo Personalization
bbedford2
 
NEW-Viral>Wondershare Filmora 14.5.18.12900 Crack Free
sherryg1122g
 
Driver Easy Pro 6.1.1 Crack Licensce key 2025 FREE
utfefguu
 
Empower Your Tech Vision- Why Businesses Prefer to Hire Remote Developers fro...
logixshapers59
 
Agentic Automation Journey Session 1/5: Context Grounding and Autopilot for E...
klpathrudu
 
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
SAP Firmaya İade ABAB Kodları - ABAB ile yazılmıl hazır kod örneği
Salih Küçük
 
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
Coefficient of Variance in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 

POLARDB: A database architecture for the cloud

  • 2. ØYSTEIN GRØVLEN Sr. Staff Engineer @ Alibaba Cloud Bio: Before joining Alibaba, Øystein worked for 10 years in the MySQL optimizer team at Sun/Oracle. At Sun Microsystems, he was also a contributor on the Apache Derby project and Sun's Architectural Lead on Java DB. Prior to that, he worked for 10 years on development of Clustra, a highly available DBMS.
  • 3. POLARDB:a Cloud Native Database Emerging Hardware • NVM • RDMA • FPGA Serverless • Auto Scaling • Paid by Usage • Zero Downtime Security • Encryption • Audit • Access Control Intelligence • Self-configuration • Self-optimization • Self-diagnosis • Self-healing CLOUD NATIVE User Oriented
  • 4. Database Architecture Revolution : Separation of Storage and Computation Transaction Architecture: Separation of Storage and Computation Database Storage Engine Computation OffloadingStorage Compatibility SecurityHTAPMulti-Model Usability Self-Driving Manageability
  • 5. Cloud Native Architecture • Scale compute and storage independently • Shared storage • Across AZ fail-over without data loss • Optimize division of functionality between storage and compute • Tight integration with other cloud components like metering, monitoring, control plan • Optimize for hardware in the data centers • Compatible with MySQL/PG etc • Security PolarProxy PolarStore POLARDB Intelligent proxy 100% Compatible Storage Optimized For Database PolarFS
  • 6. PolarStore: Architecture overview - Design for Emerging Hardware - Low Latency Oriented - Active R/W – Active RO - High Availability libpfs Host1 POLARDB libpfs POLARDB Host2 volume 1 Volume 2 chunk1 chunk2 chunk1 chunk2 PolarSwitch libpfs POLARDB volume 1 PolarSwitch chunk1 chunk2 ChunkServer ChunkServer ChunkServer ChunkServer chunk chunk chunk chunk ParallelRaft PolarCtrl metadata Key Components: 1. libpfs 2. PolarSwitch 3. ChunksServer 4. PolarCtrl data route control route
  • 7. PolarStore: Design for Emerging Hardware - No Context Switch - OS-bypass & zero-copy RDMA-NIC Network Over RDMA libpfs POLARDB Memory - Parallel Random I/O absorbed by Optane - Excellent performance with less long tail latency issue - No need of Over Provisioning WAL Log in 3Dxpoint optane RDMA Network RDMA RDMA-NIC Optane NVMe SSDs Memory Chunkserver 1 RDMA-NIC Optane NVMe SSDs Memory Chunkserver 3 RDMA-NIC Optane NVMe SSDs Memory Chunkserver 2 PolarDB write to shm
  • 8. PolarFS: posix distributed file system closely with DB Pure User Space For Extra-low Latency - No Sys call
 - No Context Switch
 - Zero Data Copy Posix Semantics - Easy Porting Node 1 libpfs POLARDB Journal file Paxos file Low Latency Oriented libpfs POLARDB libpfs POLARDB Node 2 Node 3 1 2 3 4 5 6 head pending tail tail POLARDB Cluster File System Metadata Cache Directory Tree File Mapping Table root FileBlk VolBlk 0 1 2 … 348 1500 0 201 … 6 Database Volume Chunks … Block Mapping Table FileID FileBlk 489 478 … 16 0 201 … VolBlk 200 201 202 0 2010 316 … 3 PolarFS: An Ultra-low Latency and Failure Resilient Distributed File System for Shared Storage Cloud Database (VLDB 2018)
  • 9. Dynamic Scaling Local Storage Fast Scaling MySQL POLARDB Master Local Storage Replica Local Storage Replica Master Replica Replica Shared Storage Upgrade 2vCPU to 32vCPU, only in 5 minutes Add more Replicas, only in 5 minutes. 数值轴 1 Replica 2 Replica 3 Replica 4 Replica 5 Replica 10 Replica 20,949 11,349 9,749 8,149 6,549 4,949 39,844 20,102 16,811 13,521 10,230 6,940 RDS MySQL POLARDB Lower Cost: 30%~50% OFF Total costs of 4vCPU 32G Memory 500G Storage with different replica numbers 0 10000 20000 30000 40000
  • 10. Shared Nothing Logical Replication vs. Shared Storage Physical Replication Local Storage Local Storage Master POLARSTORE Slave Master Slave Data Binlog Redo log Data Master Binlog Slave Binlog Redo log Data Redo log Data Redo log Binlog Physical Replication is much more reliable than Logical Replication
  • 11. Shared Nothing Logical Replication vs. Shared Storage Physical Replication Non-blocking low-latency DDL synchronization Master Slave Timeline Add Column Running 1 Hour Add Column Blocked 1 Hour Applying DDL will block following events Add Column Update data files Update metadata Need not modify data files MySQL POLARDB Shared Storage Master Slave
  • 12. Physical Replication by Redo Log Commit Async Flush Data File Redo Log DATA LOG & MEMORY Primary Shared Storage Log Parse Hash Table Redo Buffer Pool Buffer Pool Write Memory Query Snapshot of T4 T2 T4 T5 T1 T3 T3T2T1 T4 T5 T3T2T1 T4 T3T2T1 T4 T5 RO Node T4 Transactions Buffer Pool Shared Storage Continuous Recovery Consistent Snapshot Read T1
  • 13. Physical Replication - Page from Past Oldest read view Control purge Avoid Data Gap Checkpoint LSN (T1) Primary Shared Storage Log Parse Hash Table Redo Buffer Pool Snapshot of T4 T2 T4 T5 T1 T3 T3T2T1 T4 T5 T3T2T1 T4 T1 T4Buffer Pool Data Redo Log Checkpoint T1 T3T2T1 T4 Purgeable Unpurgeable RO Node Primary RO Node
  • 14. Physical Replication - Page from Future Avoid Data Overstep Control flush datafile Primary Shared Storage Log Parse Hash Table Redo Buffer Pool Snapshot of T4 T2 T4 T5 T1 T3 T3T2T1 T4 T5 T3T2T1 T4 T1 T4Buffer Pool Data Redo Log Snapshot Version T4 Unflushable T5 T3T2T1 T4 Flushable T4T3T2Primary Snapshot Version T4 LSN of the latest applied redolog RO Node RO Node
  • 15. Single Master Single Endpoint Transparent Failover Attacks Protection Causal Consist Read Proxy Cluster Master Replica Replica Shared Storage Application Replica Read/Write Split High Availability Load Balance Security
  • 16. Read and Write Separation - Session Consistent Problem Can’t read latest data Solved! connection.query { UPDATE user SET name=‘Jimmy’ WHERE id=1; COMMIT; SELECT name FROM user WHERE id=1; // name is Jimmy } SELECT can always get the latest data POLARDB Cluster LSN 30 LSN 35 1. UPDATE 2. SELECT Log Serial Number LSN 35 1. UPDATE 3. SELECT Require LSN>=35) 2. Return LSN=35 M R1 R2 Application Smart Proxy Read & Write Separation Load Balance Module
  • 17. Multi-master Storage Monitoring Segment Servers DB Servers DB ServersDB Servers AZ1 AZ2 AZ3 segment 1 Segment Servers Segment Servers CS1 CS2 CS3 CS4 CS5 CS6 Database Cluster (Compute) Storage Service 3-AZ Persistent Core (append-only) TCPTCP Storage Cluster Page read RDMA RDMA RDMA redo segment 1 segment 1