SlideShare a Scribd company logo
Overview of postgres cluster
solutions
Stas Kelvich, @kelvich
Let’s define it just as database that utilizes several servers.
Design space is huge. Let’s also put some constraints:
Do we want to improve perfomance?
Do we want to improve availability?
Is load will be more like OLAP or OLTP? Or both?
What consistency guarantees do we need?
Contradictory topics!
What do we mean by distributed DB
or cluster?
2
Master-slave replication
Postgres-XC/XC2/XL
Greenplum
CitusDB / pg_shard
BDR / pg_logical
...
Postgres clustering
3
Async.
Muster just streams journal (WAL) to slaves. Slave can lag.
Sync.
Master waits for WAL sync on replica. But still we can read
old data from replica – time gap between WAL fsync and files
delivery.
Sync + remote_apply (new in 9.6).
Wait for data to be delivered to files on replica. Perserves
causality.
Master-slave replication
4
Pros:
Improves availability
Read-only queries can be sent to replica – improves
performance in that sence.
Cons:
Automatic failover is hard to achive and external tools are
required. (pacemaker, patroni, etc)
Read-only queries on replica are limited: no temp tables, no
catalog access.
Master-slave replication
5
Postgres-XC distributed db grew out of postgres sources. Originally
for write scalability. Now development continues in Huawei
(Postgres-XC2) and 2ndQuadrant (Postgres-XL)
Let’s concentrate on Postgres-XL:
postgres fork built for write scalability and OLAP performance.
distributed transactions
3 types of nodes in system: coordinator, node, gtm (+
gtm-proxy)
Postgres-XC/XC2/XL
6
Postgres-XL
7
Pros:
now merged with 9.5
good for OLAP
good for write-heavy load
Pros:
fork – isn’t maintained by postgres core team
isn’t good for high TPS
GTM is a hot spot
Postgres-XL
8
MPP database by Pivotal.
Opensourced in 2015.
Built on top of postgres 8.3.
Same niche as a Postgres-XL, but more mature.
Greenplum
9
Also sharding + analytics (distributed joins)
Not a fork! Extension to an ordinary postgres!
Isn’t good for high TPS (master instances is a hot spot)
2 type of nodes: master-node and data-node
No distributed transactions (and hence no proper isolation)
CitusDB
10
In-core extension to access remote node as a local table.
In core
Some distributed planning (aggregate pushdown)
Can be used for sharding (along with standand table
inheritance mechanism)
Neither atomical nor isolated
Postgres_fdw
11
Bi-Directional Replication. Fork but most of the features are
merged into postgres core.
Multimaster replication, but without conflict detection. (No
distributed transactions, no distributed queries).
Very useful if application can determine node to access itself.
Good for georeplication.
BDR
12
Was part of BDR, now shipped as an extension.
Can decode journal (WAL) into sequence of logical operations in
custom format.
Libpq binary and json formats support included.
Server-side events (filter and send to WebSocket connections)
Replication between different versions
Replication to other databases!
Part of BDR
pg_logical + decoding plugin
13
Started about a year ago.
Konstantin Knizhnik, Constantin Pan, Stas Kelvich
Cluster group in PgPro
14
Started to playing with Postgres-XC. 2ndQuadrant also had project
(finished now) to port XC to 9.5.
Fork is painful;
How can we bring functionality of XC in core?
Cluster group in PgPro
15
Distributed transactions - nothing in-core;
Distributed planner - fdw, pg_shard, greenplum planner (?);
HA/Autofailover - can be built on top of logical decoding.
Distributed postgres
16
acc1
{5, 0, 1, 500}
{5, 42, 1, 500}
{42, 0, 1, 600}
T42
r1(500)
w1(600)
r2(600)
Comm
T43
r2(500)
w1(400)
Comm
acc2
{7, 0, 2, 500}
{7, 43, 1, 500}
{43, 0, 1, 400}
MVCC / snapshot isolation
17
SQL-92
Уровни изоляции
18
(Peter Bailis, VLDB, 2013)
Уровни изоляции
19
create extension pg_dtm;
select begin_glob...();
begin
update ...;
commit;
create extension pg_dtm;
select join_glob...(xid);
begin;
update ...;
commit;
src server dst server
Client side of view
20
Achieve proper isolation between tx for multi-node transactions.
Now in postgres on write tx start:
Aquire XID;
Get list of running tx’s;
Use that info in visibility checks.
Distributed transactions
21
transam/clog.c:
GetTransactionStatus
SetTransactionStatus
transam/varsup.c:
GetNewTransactionId
ipc/procarray.c:
TransactionIdIsInProgress
GetOldestXmin
GetSnapshotData
time/tqual.c:
XidInMVCCSnapshot
XTM API:
vanilla
22
transam/clog.c:
GetTransactionStatus
SetTransactionStatus
transam/varsup.c:
GetNewTransactionId
ipc/procarray.c:
TransactionIdIsInProgress
GetOldestXmin
GetSnapshotData
time/tqual.c:
XidInMVCCSnapshot
Transaction
Manager
XTM API:
after patch
23
transam/clog.c:
GetTransactionStatus
SetTransactionStatus
transam/varsup.c:
GetNewTransactionId
ipc/procarray.c:
TransactionIdIsInProgress
GetOldestXmin
GetSnapshotData
time/tqual.c:
XidInMVCCSnapshot
Transaction
Manager
pg_dtm.so
XTM API:
after tm load
24
Aquire XID centrally (DTMd, arbiter);
No local tx possible;
DTMd is a bottleneck.
XTM implementations
GTM or snapshot sharing
25
Paper from SAP HANA team;
Central daemon is needed, but only for multi-node tx;
Snapshots -> Commit Sequence Number;
DTMd is still a bottleneck.
XTM implementations
Incremental SI
26
XID/CSN are gathered from all nodes that participates in tx;
No central service;
local tx;
possible to reduce communication by using time (Spanner,
CockroachDB).
XTM implementations
Clock-SI or tsDTM
27
XTM implementations
tsDTM scalability
28
More nodes, higher probability of failure in system.
Possible problems with nodes:
Node stopped (and will not be back);
Node was down small amount of time (and we should bring it
back to operation);
Network partitions (avoid split-brain).
If we want to survive network partitions than we can have not more
than [N/2] - 1 failures.
HA/autofailover
29
Possible usage of such system:
Multimaster replication;
Tables with metainformation in sharded databases;
Sharding with redundancy.
HA/autofailover
30
By Multimaster we mean strongly coupled one, that acts as a single
database. With proper isolation and no merge conflicts.
Ways to build:
Global order to XLOG (Postgres-R, MySQL Galera);
Wrap each tx as distributed – allows parallelism while applying
tx.
Multimaster
31
Our implementation:
Built on top of pg_logical;
Make use of tsDTM;
Pool of workers for tx replay;
Raft-based storage for dealing with failures and distributed
deadlock detection.
Multimaster
32
Our implementation:
Approximately half of a speed of standalone postgres;
Same speed for reads;
Deals with nodes autorecovery;
Deals with network partitions (debugging right now).
Can work as an extension (if community accept XTM API in
core).
Multimaster
33

More Related Content

What's hot (20)

PDF
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
Equnix Business Solutions
 
PDF
Postgres-XC Write Scalable PostgreSQL Cluster
Mason Sharp
 
PDF
TeraCache: Efficient Caching Over Fast Storage Devices
Databricks
 
PDF
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
Equnix Business Solutions
 
PDF
What's New In PostgreSQL 9.4
Pavan Deolasee
 
PDF
PostgreSQL Replication High Availability Methods
Mydbops
 
PDF
PGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien Rouhaud
Equnix Business Solutions
 
PDF
Postgres-XC as a Key Value Store Compared To MongoDB
Mason Sharp
 
PDF
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon
 
PPTX
HBaseCon 2015: OpenTSDB and AsyncHBase Update
HBaseCon
 
PDF
HBaseCon 2013: Scalable Network Designs for Apache HBase
Cloudera, Inc.
 
PDF
Hbase Nosql
elliando dias
 
PDF
On The Building Of A PostgreSQL Cluster
Srihari Sriraman
 
PDF
Replicate from Oracle to data warehouses and analytics
Continuent
 
PDF
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
Altinity Ltd
 
PDF
PostgreSQL HA
haroonm
 
PDF
Kafka on ZFS: Better Living Through Filesystems
confluent
 
PDF
Distribute Key Value Store
Santal Li
 
PDF
PGConf.ASIA 2019 Bali - Setup a High-Availability and Load Balancing PostgreS...
Equnix Business Solutions
 
PDF
pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)
Wei Shan Ang
 
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
Equnix Business Solutions
 
Postgres-XC Write Scalable PostgreSQL Cluster
Mason Sharp
 
TeraCache: Efficient Caching Over Fast Storage Devices
Databricks
 
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
Equnix Business Solutions
 
What's New In PostgreSQL 9.4
Pavan Deolasee
 
PostgreSQL Replication High Availability Methods
Mydbops
 
PGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien Rouhaud
Equnix Business Solutions
 
Postgres-XC as a Key Value Store Compared To MongoDB
Mason Sharp
 
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon
 
HBaseCon 2015: OpenTSDB and AsyncHBase Update
HBaseCon
 
HBaseCon 2013: Scalable Network Designs for Apache HBase
Cloudera, Inc.
 
Hbase Nosql
elliando dias
 
On The Building Of A PostgreSQL Cluster
Srihari Sriraman
 
Replicate from Oracle to data warehouses and analytics
Continuent
 
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
Altinity Ltd
 
PostgreSQL HA
haroonm
 
Kafka on ZFS: Better Living Through Filesystems
confluent
 
Distribute Key Value Store
Santal Li
 
PGConf.ASIA 2019 Bali - Setup a High-Availability and Load Balancing PostgreS...
Equnix Business Solutions
 
pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)
Wei Shan Ang
 

Viewers also liked (6)

PDF
Multimaster
Stas Kelvich
 
PDF
Pgxc scalability pg_open2012
Ashutosh Bapat
 
PDF
Flexible Indexing with Postgres
EDB
 
PDF
How the Postgres Query Optimizer Works
EDB
 
PDF
PostgresOpen 2013 A Comparison of PostgreSQL Encryption Options
Faisal Akber
 
Multimaster
Stas Kelvich
 
Pgxc scalability pg_open2012
Ashutosh Bapat
 
Flexible Indexing with Postgres
EDB
 
How the Postgres Query Optimizer Works
EDB
 
PostgresOpen 2013 A Comparison of PostgreSQL Encryption Options
Faisal Akber
 
Ad

Similar to Postgres clusters (20)

PDF
PostgreSQL Sharding and HA: Theory and Practice (PGConf.ASIA 2017)
Aleksander Alekseev
 
PDF
Postgres Vienna DB Meetup 2014
Michael Renner
 
PDF
Beyond Postgres: Interesting Projects, Tools and forks
Sameer Kumar
 
PPTX
PostgreSQL as an Alternative to MSSQL
Alexei Krasner
 
PDF
Trivadis TechEvent 2017 PostgreSQL für die (Orakel) DBA by Ludovico Caldara
Trivadis
 
PDF
What’s New In PostgreSQL 9.3
Pavan Deolasee
 
PDF
MySQL Replication vs Galera_ which is better for your workload_.pptx_.pdf
NishanthReddy97
 
PDF
SFScon14: Schrödinger’s elephant: why PostgreSQL can solve all your database ...
South Tyrol Free Software Conference
 
PDF
What's New in PostgreSQL 9.6
EDB
 
PPTX
SQL Server vs Postgres
chandra sekhar pathivada , PMP,ITIL,MCTS
 
PDF
The Challenges of Distributing Postgres: A Citus Story
Hanna Kelman
 
PDF
The Challenges of Distributing Postgres: A Citus Story | DataEngConf NYC 2017...
Citus Data
 
PDF
PostgreSQL 9.4 and Beyond @ FOSSASIA 2015 Singapore
Satoshi Nagayasu
 
PDF
PostgreSQL 9.4
Satoshi Nagayasu
 
PDF
9.6_Course Material-Postgresql_002.pdf
sreedb2
 
PDF
10 things, an Oracle DBA should care about when moving to PostgreSQL
PostgreSQL-Consulting
 
PDF
Understanding Presto - Presto meetup @ Tokyo #1
Sadayuki Furuhashi
 
PDF
PostgreSQL Extension APIs are Changing the Face of Relational Databases | PGC...
Teresa Giacomini
 
PPTX
PostgreSQL 10: What to Look For
Amit Langote
 
PPTX
Community vs. Commercial Open Source
Justin Reock
 
PostgreSQL Sharding and HA: Theory and Practice (PGConf.ASIA 2017)
Aleksander Alekseev
 
Postgres Vienna DB Meetup 2014
Michael Renner
 
Beyond Postgres: Interesting Projects, Tools and forks
Sameer Kumar
 
PostgreSQL as an Alternative to MSSQL
Alexei Krasner
 
Trivadis TechEvent 2017 PostgreSQL für die (Orakel) DBA by Ludovico Caldara
Trivadis
 
What’s New In PostgreSQL 9.3
Pavan Deolasee
 
MySQL Replication vs Galera_ which is better for your workload_.pptx_.pdf
NishanthReddy97
 
SFScon14: Schrödinger’s elephant: why PostgreSQL can solve all your database ...
South Tyrol Free Software Conference
 
What's New in PostgreSQL 9.6
EDB
 
The Challenges of Distributing Postgres: A Citus Story
Hanna Kelman
 
The Challenges of Distributing Postgres: A Citus Story | DataEngConf NYC 2017...
Citus Data
 
PostgreSQL 9.4 and Beyond @ FOSSASIA 2015 Singapore
Satoshi Nagayasu
 
PostgreSQL 9.4
Satoshi Nagayasu
 
9.6_Course Material-Postgresql_002.pdf
sreedb2
 
10 things, an Oracle DBA should care about when moving to PostgreSQL
PostgreSQL-Consulting
 
Understanding Presto - Presto meetup @ Tokyo #1
Sadayuki Furuhashi
 
PostgreSQL Extension APIs are Changing the Face of Relational Databases | PGC...
Teresa Giacomini
 
PostgreSQL 10: What to Look For
Amit Langote
 
Community vs. Commercial Open Source
Justin Reock
 
Ad

Recently uploaded (20)

PDF
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
PDF
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
PPTX
Human Resources Information System (HRIS)
Amity University, Patna
 
PDF
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
PDF
Online Queue Management System for Public Service Offices in Nepal [Focused i...
Rishab Acharya
 
PDF
The 5 Reasons for IT Maintenance - Arna Softech
Arna Softech
 
PDF
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
PDF
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
PDF
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
PPTX
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
PDF
Thread In Android-Mastering Concurrency for Responsive Apps.pdf
Nabin Dhakal
 
PDF
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
PPTX
Homogeneity of Variance Test Options IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PPTX
Tally software_Introduction_Presentation
AditiBansal54083
 
PDF
Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...
VictoriaMetrics
 
PDF
Top Agile Project Management Tools for Teams in 2025
Orangescrum
 
PPTX
In From the Cold: Open Source as Part of Mainstream Software Asset Management
Shane Coughlan
 
PDF
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
PDF
Driver Easy Pro 6.1.1 Crack Licensce key 2025 FREE
utfefguu
 
PDF
SciPy 2025 - Packaging a Scientific Python Project
Henry Schreiner
 
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
Human Resources Information System (HRIS)
Amity University, Patna
 
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
Online Queue Management System for Public Service Offices in Nepal [Focused i...
Rishab Acharya
 
The 5 Reasons for IT Maintenance - Arna Softech
Arna Softech
 
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
Thread In Android-Mastering Concurrency for Responsive Apps.pdf
Nabin Dhakal
 
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
Homogeneity of Variance Test Options IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Tally software_Introduction_Presentation
AditiBansal54083
 
Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...
VictoriaMetrics
 
Top Agile Project Management Tools for Teams in 2025
Orangescrum
 
In From the Cold: Open Source as Part of Mainstream Software Asset Management
Shane Coughlan
 
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
Driver Easy Pro 6.1.1 Crack Licensce key 2025 FREE
utfefguu
 
SciPy 2025 - Packaging a Scientific Python Project
Henry Schreiner
 

Postgres clusters