SlideShare a Scribd company logo
Clustering in PostgreSQL
Because one database server is never enough
(and neither is two)
postgres=# select * from umair;
-[ RECORD 1 ]-----------------------------
name | Umair Shahid
description | 20+ year PostgreSQL veteran
company | Stormatics
designation | Founder
location | Islamabad, Pakistan
family | Mom, Wife & 2 kids
kid1 | Son, 17 year old
kid2 | Daughter, 14 year old
Our mission is to help businesses scale PostgreSQL
reliably for their mission-critical data
On to the topic now!
What is High Availability?
● Remain operational even in the face of hardware or
software failure
● Minimize downtime
● Essential for mission-critical applications that require
24/7 availability
● Measured in ‘Nines of Availability’
Nines of Availability
Availability Downtime per year
90% (one nine) 36.53 days
99% (two nines) 3.65 days
99.9% (three nines) 8.77 hours
99.99% (four nines) 52.60 minutes
99.999% (five nines) 5.26 minutes
But my database resides
in the cloud, and the
cloud is always available
Right?
Wrong!
Amazon RDS Service Level Agreement
Multi-AZ configurations for MySQL, MariaDB, Oracle, and PostgreSQL are
covered by the Amazon RDS Service Level Agreement ("SLA"). The RDS SLA
affirms that AWS will use commercially reasonable efforts to make Multi-AZ
instances of Amazon RDS available with a Monthly Uptime Percentage of at
least 99.95% during any monthly billing cycle. In the event Amazon RDS does
not meet the Monthly Uptime Percentage commitment, affected customers
will be eligible to receive a service credit.*
99.95% = 4.38 hours of downtime per year!
22 minutes of downtime per month!
*
https://aws.amazon.com/rds/ha/
So - what do I do if I want
better reliability for my
mission-critical data?
Clustering!
What is clustering?
Primary
Standby 1 Standby 2
Application
Write
Read
Replicate
● Multiple database servers work
together to provide redundancy
● Gives the appearance of a single
database server
● Application communicates with
the primary PostgreSQL instance
● Data is replicated to standby
instances
● Auto failover in case the primary
node goes down
What is auto failover?
Primary
Standby 1 Standby 2
Application
Standby 1
Primary
Standby 2 New Standby
Application
Primary
Standby 1 Standby 2
Application
1 2 3
* Primary node goes down * Standby 1 gets promoted to Primary
* Standby 2 becomes subscriber to
Standby 1
* New Standby is added to the cluster
* Application talks to the new Primary
● Write to the primary
PostgreSQL instance and
read from standbys
● Data redundancy through
replication to two standbys
● Auto failover in case the
primary node goes down
Clusters with load balancing
Primary
Standby 1 Standby 2
Application
Write
Read
Replicate
Clusters with backups and disaster recovery
● Off-site backups
● RTO and RPO requirements
dictate configuration
● Point-in-time recovery
It is extremely important to periodically test your backups
Primary
Standby 1 Standby 2
Application
Write
Read
Replicate
Backup Backup
● Shared-Everything architecture
● Load balancing for read as well
as write operations
● Database redundancy to achieve
high availability
● Asynchronous replication
between nodes for better
efficiency
*
with conflict resolution at the application layer
Multi-node clusters with Active-Active configuration*
Active 2
Application
Write Read Replicate
Active 1 Active 3
Multi-node clusters with data sharding and horizontal scaling
Node 2
Application
Node 1 Node 3
Coordinator
Write Read
● Shared-Nothing architecture
● Automatic data sharding based
on defined criteria
● Read and write operations are
auto directed to the relevant
node
● Each node can have its own
standbys for high availability
Globally distributed clusters
● Spin up clusters on the cloud,
on-prem, bare metal, VMs, or
a hybrid of the above
● Geo fencing for regulatory
compliance and better local
performance
● High availability across data
centers and geographies
Asynchronous
● Data may not be transferred immediately
● Transaction commits without waiting for
confirmation from replica
● Data may be inconsistent across nodes
● Faster and more scalable
● Used where performance matters more
than data accuracy
Replication - Synchronous vs Asynchronous
Synchronous
● Data is transferred immediately
● Transaction waits for confirmation from
replica before it commits
● Ensures data consistency across all nodes
● Performance overhead caused by latency
● Used where data accuracy is critical, even
at the expense of performance
#AI
Challenges in
Clustering
● Split brain
● Network latency
● False alarms
● Data inconsistency
Challenges in
Clustering
● Split brain
● Network latency
● False alarms
● Data inconsistency
Split Brain
Defined
Node in a highly available cluster lose
connectivity with each other but continue to
function independently
Challenge
More than one node believes that it is the
primary leading to inconsistencies and
possible data loss
● Network reliability and redundancy
○ Minimize the risk of partitions due to
connectivity issues
○ Redundant network hardware and paths
between nodes
○ Reliable cross datacenter connectivity
● Miscellaneous
○ Monitoring and alerts
○ Regular testing
○ Clear and precise documentation
○ Training
Split Brain - Prevention
● Use a reliable cluster manager
○ Algos and heartbeat mechanisms to
monitor node availability
○ Make decisions about failovers and
promotions
● Quorum-based decision making
○ Majority of nodes must agree on primary
node’s status
○ Requires odd number of nodes
● Witness server
○ Used to achieve a majority in an even-node
cluster
○ Does not store data
1. Identify the situation
○ Monitoring and alerting is crucial
2. Stop traffic
○ Application will need to pause
3. Determine the most up to date node
○ Compare transaction logs, timestamps,
transaction IDs, etc …
4. Isolate the nodes from each other
○ Prevent further replication so outdated
data does not overwrite latest one
5. Restore data consistency
○ Apply missed transactions
○ Resolve data conflicts
6. Reconfigure replication
○ Make the most update to date node the
primary
○ Reinstate the remaining nodes as replicas
7. Confirm integrity of the cluster
○ Monitor and double-check replication
8. Re-enable traffic
○ Allow read-only traffic, confirm reliability,
then allow write operations
9. Run a retrospective
○ Thorough analysis of the incident to
prevent future occurrences
○ Update docs and training to capture the
cause of split brain
Split Brain - Resolution
Challenges in
Clustering
● Split brain
● Network latency
● False alarms
● Data inconsistency
Defined
Time delay when data is transmitted from
one point to another
Challenge
Delayed replication can result in data loss.
Delayed signals can trigger a false positive
for failover.
Network Latency
● Network congestion
● Low quality network hardware
● Distance between nodes
● Virtualization overheads
● Bandwidth limitations
● Security devices and policies
● Transmission medium
Network Latency - Causes
● Employ redundancy
○ Network paths as well as health checks
● Best practices
○ Test and simulate various network
conditions
○ Monitoring and alerting for early detection
of problems
○ Documentation of rationale behind values
chosen
○ Periodic training
● Adjust heartbeat & timeout settings
○ Fine tune frequency of heartbeat and
timeout to match typical network behavior
● High speed & low latency network
○ Investing in high quality networking pays
dividends
● Quorum-based decision making
○ Majority of nodes must agree on primary
node’s status
○ Requires odd number of nodes or a
witness node for tie-breaker
Network Latency - Prevention of False Positive
Challenges in
Clustering
● Split brain
● Network latency
● False alarms
● Data inconsistency
Defined
A problem is reported, but in reality, there is
no issue
Challenge
Can trigger a failover when one isn’t
required, leading to unnecessary disruptions
and impacting performance
False Alarms
False Alarms - Causes
● Network issues
○ Latency, congestion, misconfiguration
● Configuration errors
○ Thresholds set too low?
● Resource constraints
○ High CPU load, memory pressure, I/O bottleneck
● Human error
○ Misreading information, miscommunication of scheduled maintenance, …
● Database locks
○ Long running queries with exclusive locks
False Alarms - Prevention
● Optimized thresholds
○ Best practices, past experience, and some hit & trial is required to ensure that the thresholds
are configured appropriately
● Regular upgrades and testing
○ Latest version of software and firmware to be used
○ Testing of various use cases can help identify possible misconfigurations
● Resource and performance optimization
○ Regularly monitor resource utilization and tune queries and database for performance
○ Maintenance tasks like vacuum, analyze, …
● Comprehensive monitoring and alerting
○ Monitoring can help with early detection of anomalies
○ Alerts can give early warnings as the database approaches defined thresholds
Challenges in
Clustering
● Split brain
● Network latency
● False alarms
● Data inconsistency
Defined
Situations where data in different nodes of a
cluster becomes out of sync, leading to
inconsistent results and potential data
corruption
Challenge
Inaccurate query results that vary based on
which node is queried. Such issues are very
hard to debug.
Data Inconsistency
● Replication lag
○ Network latency and high workloads can be big contributors
○ Data loss in case of failover
● Split brain
● Incorrect configuration
○ Log shipping configurations
○ Replication slots setup
○ Replication filters
Data Inconsistency - Causes
Data Inconsistency - Prevention
● Closely manage asynchronous replication
○ Closely monitor pg_stat_replication for replication lag
○ Place nodes in close proximity and use high quality network hardware
● Regularly check XID across the cluster
● Monitor replication conflicts and resolve promptly
● Regular maintenance and performance optimization
○ Vacuum, analyze, …
○ XID wraparound
This all sounds really hard
Open source clustering tools for PostgreSQL
● Repmgr
○ https://repmgr.org/
○ GPL v3
○ Provides automatic failover
○ Manage and monitor replication
● pgpool-II
○ https://pgpool.net/
○ Similar to BSD & MIT
○ Middleware between PostgreSQL and client applications
○ Connection pooling, load balancing, caching, and automatic failover
● Patroni
○ https://patroni.readthedocs.io/en/latest/
○ MIT
○ Template for PostgreSQL high availability clusters
○ Automatic failover, configuration management, & cluster management
Questions?
pg_umair

More Related Content

Similar to Clustering in PostgreSQL - Because one database server is never enough (and neither is two).pdf (20)

PDF
On The Building Of A PostgreSQL Cluster
Srihari Sriraman
 
PDF
SVCC-2014
John Brinnand
 
PDF
Monitoring Clusters and Load Balancers
Prince JabaKumar
 
PPTX
Presentation on Bigdata (Energy Efficient Failure Recovery in Hadoop)
saumo
 
PDF
Tachyon memory centric, fault tolerance storage for cluster framworks
Viet-Trung TRAN
 
PPTX
Compare Clustering Methods for MS SQL Server
AlexDepo
 
PDF
Webinar slides: How to Automate & Manage PostgreSQL with ClusterControl
Severalnines
 
PPTX
Seminar PPT on computer cluster by unknown.pptx
p4969246
 
PDF
Basics of the Highly Available Distributed Databases - teowaki - javier ramir...
javier ramirez
 
PDF
Everything you always wanted to know about highly available distributed datab...
Codemotion
 
PPTX
Architecting for Massive Scalability - St. Louis Day of .NET 2011 - Aug 6, 2011
Eric D. Boyd
 
PDF
HA Summary
Zhang Fan
 
PPTX
osi-oss-dbs.pptx
Shivji Kumar Jha
 
PDF
Distributed Systems: scalability and high availability
Renato Lucindo
 
PDF
Cluster Computing
BishowRajBaral
 
PPTX
Distributed systems and scalability rules
Oleg Tsal-Tsalko
 
PDF
Architecture for building scalable and highly available Postgres Cluster
Ashnikbiz
 
PPTX
Cloud computing
Aaron Tushabe
 
PPTX
Cloud Computing - Geektalk
Malisa Ncube
 
PDF
Pilot Hadoop Towards 2500 Nodes and Cluster Redundancy
Stuart Pook
 
On The Building Of A PostgreSQL Cluster
Srihari Sriraman
 
SVCC-2014
John Brinnand
 
Monitoring Clusters and Load Balancers
Prince JabaKumar
 
Presentation on Bigdata (Energy Efficient Failure Recovery in Hadoop)
saumo
 
Tachyon memory centric, fault tolerance storage for cluster framworks
Viet-Trung TRAN
 
Compare Clustering Methods for MS SQL Server
AlexDepo
 
Webinar slides: How to Automate & Manage PostgreSQL with ClusterControl
Severalnines
 
Seminar PPT on computer cluster by unknown.pptx
p4969246
 
Basics of the Highly Available Distributed Databases - teowaki - javier ramir...
javier ramirez
 
Everything you always wanted to know about highly available distributed datab...
Codemotion
 
Architecting for Massive Scalability - St. Louis Day of .NET 2011 - Aug 6, 2011
Eric D. Boyd
 
HA Summary
Zhang Fan
 
osi-oss-dbs.pptx
Shivji Kumar Jha
 
Distributed Systems: scalability and high availability
Renato Lucindo
 
Cluster Computing
BishowRajBaral
 
Distributed systems and scalability rules
Oleg Tsal-Tsalko
 
Architecture for building scalable and highly available Postgres Cluster
Ashnikbiz
 
Cloud computing
Aaron Tushabe
 
Cloud Computing - Geektalk
Malisa Ncube
 
Pilot Hadoop Towards 2500 Nodes and Cluster Redundancy
Stuart Pook
 

More from Umair Shahid (6)

PDF
20240518 - VixulCon 2024 - The Rise of PostgreSQL_ Historic Trends and Modern...
Umair Shahid
 
PDF
20221019 - Singapore Roadshow - Open source licenses, the impact on PostgreSQ...
Umair Shahid
 
PDF
Driving the future of PostgreSQL adoption
Umair Shahid
 
PDF
Islamabad PUG - 7th Meetup - performance tuning
Umair Shahid
 
PDF
Islamabad PUG - 7th meetup - performance tuning
Umair Shahid
 
ODP
Logical replication with pglogical
Umair Shahid
 
20240518 - VixulCon 2024 - The Rise of PostgreSQL_ Historic Trends and Modern...
Umair Shahid
 
20221019 - Singapore Roadshow - Open source licenses, the impact on PostgreSQ...
Umair Shahid
 
Driving the future of PostgreSQL adoption
Umair Shahid
 
Islamabad PUG - 7th Meetup - performance tuning
Umair Shahid
 
Islamabad PUG - 7th meetup - performance tuning
Umair Shahid
 
Logical replication with pglogical
Umair Shahid
 
Ad

Recently uploaded (20)

PPTX
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
PPTX
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
PPTX
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
PDF
Data Retrieval and Preparation Business Analytics.pdf
kayserrakib80
 
PPTX
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
PDF
apidays Singapore 2025 - How APIs can make - or break - trust in your AI by S...
apidays
 
PPTX
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
PPTX
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
PPTX
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
PDF
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
PDF
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
PDF
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
PPTX
apidays Helsinki & North 2025 - Vero APIs - Experiences of API development in...
apidays
 
PDF
Data Science Course Certificate by Sigma Software University
Stepan Kalika
 
PPTX
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
PPTX
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
PDF
The Best NVIDIA GPUs for LLM Inference in 2025.pdf
Tamanna36
 
PPTX
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
PDF
Using AI/ML for Space Biology Research
VICTOR MAESTRE RAMIREZ
 
PPT
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
Data Retrieval and Preparation Business Analytics.pdf
kayserrakib80
 
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
apidays Singapore 2025 - How APIs can make - or break - trust in your AI by S...
apidays
 
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
apidays Helsinki & North 2025 - Vero APIs - Experiences of API development in...
apidays
 
Data Science Course Certificate by Sigma Software University
Stepan Kalika
 
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
The Best NVIDIA GPUs for LLM Inference in 2025.pdf
Tamanna36
 
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
Using AI/ML for Space Biology Research
VICTOR MAESTRE RAMIREZ
 
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
Ad

Clustering in PostgreSQL - Because one database server is never enough (and neither is two).pdf

  • 1. Clustering in PostgreSQL Because one database server is never enough (and neither is two)
  • 2. postgres=# select * from umair; -[ RECORD 1 ]----------------------------- name | Umair Shahid description | 20+ year PostgreSQL veteran company | Stormatics designation | Founder location | Islamabad, Pakistan family | Mom, Wife & 2 kids kid1 | Son, 17 year old kid2 | Daughter, 14 year old
  • 3. Our mission is to help businesses scale PostgreSQL reliably for their mission-critical data
  • 4. On to the topic now!
  • 5. What is High Availability? ● Remain operational even in the face of hardware or software failure ● Minimize downtime ● Essential for mission-critical applications that require 24/7 availability ● Measured in ‘Nines of Availability’
  • 6. Nines of Availability Availability Downtime per year 90% (one nine) 36.53 days 99% (two nines) 3.65 days 99.9% (three nines) 8.77 hours 99.99% (four nines) 52.60 minutes 99.999% (five nines) 5.26 minutes
  • 7. But my database resides in the cloud, and the cloud is always available Right?
  • 9. Amazon RDS Service Level Agreement Multi-AZ configurations for MySQL, MariaDB, Oracle, and PostgreSQL are covered by the Amazon RDS Service Level Agreement ("SLA"). The RDS SLA affirms that AWS will use commercially reasonable efforts to make Multi-AZ instances of Amazon RDS available with a Monthly Uptime Percentage of at least 99.95% during any monthly billing cycle. In the event Amazon RDS does not meet the Monthly Uptime Percentage commitment, affected customers will be eligible to receive a service credit.* 99.95% = 4.38 hours of downtime per year! 22 minutes of downtime per month! * https://aws.amazon.com/rds/ha/
  • 10. So - what do I do if I want better reliability for my mission-critical data? Clustering!
  • 11. What is clustering? Primary Standby 1 Standby 2 Application Write Read Replicate ● Multiple database servers work together to provide redundancy ● Gives the appearance of a single database server ● Application communicates with the primary PostgreSQL instance ● Data is replicated to standby instances ● Auto failover in case the primary node goes down
  • 12. What is auto failover? Primary Standby 1 Standby 2 Application Standby 1 Primary Standby 2 New Standby Application Primary Standby 1 Standby 2 Application 1 2 3 * Primary node goes down * Standby 1 gets promoted to Primary * Standby 2 becomes subscriber to Standby 1 * New Standby is added to the cluster * Application talks to the new Primary
  • 13. ● Write to the primary PostgreSQL instance and read from standbys ● Data redundancy through replication to two standbys ● Auto failover in case the primary node goes down Clusters with load balancing Primary Standby 1 Standby 2 Application Write Read Replicate
  • 14. Clusters with backups and disaster recovery ● Off-site backups ● RTO and RPO requirements dictate configuration ● Point-in-time recovery It is extremely important to periodically test your backups Primary Standby 1 Standby 2 Application Write Read Replicate Backup Backup
  • 15. ● Shared-Everything architecture ● Load balancing for read as well as write operations ● Database redundancy to achieve high availability ● Asynchronous replication between nodes for better efficiency * with conflict resolution at the application layer Multi-node clusters with Active-Active configuration* Active 2 Application Write Read Replicate Active 1 Active 3
  • 16. Multi-node clusters with data sharding and horizontal scaling Node 2 Application Node 1 Node 3 Coordinator Write Read ● Shared-Nothing architecture ● Automatic data sharding based on defined criteria ● Read and write operations are auto directed to the relevant node ● Each node can have its own standbys for high availability
  • 17. Globally distributed clusters ● Spin up clusters on the cloud, on-prem, bare metal, VMs, or a hybrid of the above ● Geo fencing for regulatory compliance and better local performance ● High availability across data centers and geographies
  • 18. Asynchronous ● Data may not be transferred immediately ● Transaction commits without waiting for confirmation from replica ● Data may be inconsistent across nodes ● Faster and more scalable ● Used where performance matters more than data accuracy Replication - Synchronous vs Asynchronous Synchronous ● Data is transferred immediately ● Transaction waits for confirmation from replica before it commits ● Ensures data consistency across all nodes ● Performance overhead caused by latency ● Used where data accuracy is critical, even at the expense of performance
  • 19. #AI
  • 20. Challenges in Clustering ● Split brain ● Network latency ● False alarms ● Data inconsistency
  • 21. Challenges in Clustering ● Split brain ● Network latency ● False alarms ● Data inconsistency
  • 22. Split Brain Defined Node in a highly available cluster lose connectivity with each other but continue to function independently Challenge More than one node believes that it is the primary leading to inconsistencies and possible data loss
  • 23. ● Network reliability and redundancy ○ Minimize the risk of partitions due to connectivity issues ○ Redundant network hardware and paths between nodes ○ Reliable cross datacenter connectivity ● Miscellaneous ○ Monitoring and alerts ○ Regular testing ○ Clear and precise documentation ○ Training Split Brain - Prevention ● Use a reliable cluster manager ○ Algos and heartbeat mechanisms to monitor node availability ○ Make decisions about failovers and promotions ● Quorum-based decision making ○ Majority of nodes must agree on primary node’s status ○ Requires odd number of nodes ● Witness server ○ Used to achieve a majority in an even-node cluster ○ Does not store data
  • 24. 1. Identify the situation ○ Monitoring and alerting is crucial 2. Stop traffic ○ Application will need to pause 3. Determine the most up to date node ○ Compare transaction logs, timestamps, transaction IDs, etc … 4. Isolate the nodes from each other ○ Prevent further replication so outdated data does not overwrite latest one 5. Restore data consistency ○ Apply missed transactions ○ Resolve data conflicts 6. Reconfigure replication ○ Make the most update to date node the primary ○ Reinstate the remaining nodes as replicas 7. Confirm integrity of the cluster ○ Monitor and double-check replication 8. Re-enable traffic ○ Allow read-only traffic, confirm reliability, then allow write operations 9. Run a retrospective ○ Thorough analysis of the incident to prevent future occurrences ○ Update docs and training to capture the cause of split brain Split Brain - Resolution
  • 25. Challenges in Clustering ● Split brain ● Network latency ● False alarms ● Data inconsistency
  • 26. Defined Time delay when data is transmitted from one point to another Challenge Delayed replication can result in data loss. Delayed signals can trigger a false positive for failover. Network Latency
  • 27. ● Network congestion ● Low quality network hardware ● Distance between nodes ● Virtualization overheads ● Bandwidth limitations ● Security devices and policies ● Transmission medium Network Latency - Causes
  • 28. ● Employ redundancy ○ Network paths as well as health checks ● Best practices ○ Test and simulate various network conditions ○ Monitoring and alerting for early detection of problems ○ Documentation of rationale behind values chosen ○ Periodic training ● Adjust heartbeat & timeout settings ○ Fine tune frequency of heartbeat and timeout to match typical network behavior ● High speed & low latency network ○ Investing in high quality networking pays dividends ● Quorum-based decision making ○ Majority of nodes must agree on primary node’s status ○ Requires odd number of nodes or a witness node for tie-breaker Network Latency - Prevention of False Positive
  • 29. Challenges in Clustering ● Split brain ● Network latency ● False alarms ● Data inconsistency
  • 30. Defined A problem is reported, but in reality, there is no issue Challenge Can trigger a failover when one isn’t required, leading to unnecessary disruptions and impacting performance False Alarms
  • 31. False Alarms - Causes ● Network issues ○ Latency, congestion, misconfiguration ● Configuration errors ○ Thresholds set too low? ● Resource constraints ○ High CPU load, memory pressure, I/O bottleneck ● Human error ○ Misreading information, miscommunication of scheduled maintenance, … ● Database locks ○ Long running queries with exclusive locks
  • 32. False Alarms - Prevention ● Optimized thresholds ○ Best practices, past experience, and some hit & trial is required to ensure that the thresholds are configured appropriately ● Regular upgrades and testing ○ Latest version of software and firmware to be used ○ Testing of various use cases can help identify possible misconfigurations ● Resource and performance optimization ○ Regularly monitor resource utilization and tune queries and database for performance ○ Maintenance tasks like vacuum, analyze, … ● Comprehensive monitoring and alerting ○ Monitoring can help with early detection of anomalies ○ Alerts can give early warnings as the database approaches defined thresholds
  • 33. Challenges in Clustering ● Split brain ● Network latency ● False alarms ● Data inconsistency
  • 34. Defined Situations where data in different nodes of a cluster becomes out of sync, leading to inconsistent results and potential data corruption Challenge Inaccurate query results that vary based on which node is queried. Such issues are very hard to debug. Data Inconsistency
  • 35. ● Replication lag ○ Network latency and high workloads can be big contributors ○ Data loss in case of failover ● Split brain ● Incorrect configuration ○ Log shipping configurations ○ Replication slots setup ○ Replication filters Data Inconsistency - Causes
  • 36. Data Inconsistency - Prevention ● Closely manage asynchronous replication ○ Closely monitor pg_stat_replication for replication lag ○ Place nodes in close proximity and use high quality network hardware ● Regularly check XID across the cluster ● Monitor replication conflicts and resolve promptly ● Regular maintenance and performance optimization ○ Vacuum, analyze, … ○ XID wraparound
  • 37. This all sounds really hard
  • 38. Open source clustering tools for PostgreSQL ● Repmgr ○ https://repmgr.org/ ○ GPL v3 ○ Provides automatic failover ○ Manage and monitor replication ● pgpool-II ○ https://pgpool.net/ ○ Similar to BSD & MIT ○ Middleware between PostgreSQL and client applications ○ Connection pooling, load balancing, caching, and automatic failover ● Patroni ○ https://patroni.readthedocs.io/en/latest/ ○ MIT ○ Template for PostgreSQL high availability clusters ○ Automatic failover, configuration management, & cluster management