SlideShare a Scribd company logo
Clustering in PostgreSQL
Because one database server is never enough
(and neither is two)
Chicago PostgreSQL User Group
15th May 2024
postgres=# select * from umair;
-[ RECORD 1 ]-----------------------------
name | Umair Shahid
description | 20+ year PostgreSQL veteran
company | Stormatics
designation | Founder
location | Islamabad, Pakistan
family | Mom, Wife & 2 kids
kid1 | Son, 17 year old
kid2 | Daughter, 14 year old
Our mission is to help businesses scale
PostgreSQL reliably for critical data
On to the topic now!
What is High Availability?
● Remain operational even in the face of hardware or
software failure
● Minimize downtime
● Essential for mission-critical applications that
require 24/7 availability
● Measured in ‘Nines of Availability’
Nines of Availability
Availability Downtime per year
90% (one nine) 36.53 days
99% (two nines) 3.65 days
99.9% (three nines) 8.77 hours
99.99% (four nines) 52.60 minutes
99.999% (five nines) 5.26 minutes
But my database resides
in the cloud, and the
cloud is always available
Right?
Wrong!
Amazon RDS Service Level Agreement
Multi-AZ configurations for MySQL, MariaDB, Oracle, and PostgreSQL are
covered by the Amazon RDS Service Level Agreement ("SLA"). The RDS SLA
affirms that AWS will use commercially reasonable efforts to make Multi-AZ
instances of Amazon RDS available with a Monthly Uptime Percentage of at
least 99.95% during any monthly billing cycle. In the event Amazon RDS does
not meet the Monthly Uptime Percentage commitment, affected customers
will be eligible to receive a service credit.*
99.95% = 4.38 hours of downtime per year!
22 minutes of downtime per month!
* https://aws.amazon.com/rds/ha/
So - what do I do if I want
better reliability for my
mission-critical data?
Clustering!
What is clustering?
Primary
Standby 1 Standby 2
Application
Write
Read
Replicate
● Multiple database servers work
together to provide redundancy
● Gives the appearance of a single
database server
● Application communicates with
the primary PostgreSQL instance
● Data is replicated to standby
instances
● Auto failover in case the primary
node goes down
What is auto failover?
Primary
Standby 1 Standby 2
Application
Standby 1
Primary
Standby 2 New Standby
Application
Primary
Standby 1 Standby 2
Application
1 2 3
* Primary node goes down * Standby 1 gets promoted to Primary
* Standby 2 becomes subscriber to
Standby 1
* New Standby is added to the cluster
* Application talks to the new Primary
● Write to the primary
PostgreSQL instance and
read from standbys
● Data redundancy through
replication to two standbys
● Auto failover in case the
primary node goes down
Clusters with load balancing
Primary
Standby 1 Standby 2
Application
Write
Read
Replicate
Clusters with backups and disaster recovery
● Off-site backups
● RTO and RPO requirements
dictate configuration
● Point-in-time recovery
It is extremely important to periodically test your backups
Primary
Standby 1 Standby 2
Application
Write
Read
Replicate
Backup Backup
● Shared-Everything architecture
● Load balancing for read as well
as write operations
● Database redundancy to achieve
high availability
● Asynchronous replication
between nodes for better
efficiency
* with conflict resolution at the application layer
Multi-node clusters with Active-Active configuration*
Active 2
Application
Write Read Replicate
Active 1 Active 3
Multi-node clusters with data sharding and horizontal scaling
Node 2
Application
Node 1 Node 3
Coordinator
Write Read
● Shared-Nothing architecture
● Automatic data sharding based
on defined criteria
● Read and write operations are
auto directed to the relevant
node
● Each node can have its own
standbys for high availability
Globally distributed clusters
● Spin up clusters on the cloud,
on-prem, bare metal, VMs, or
a hybrid of the above
● Geo fencing for regulatory
compliance and better local
performance
● High availability across data
centers and geographies
Asynchronous
● Data may not be transferred immediately
● Transaction commits without waiting for
confirmation from replica
● Data may be inconsistent across nodes
● Faster and more scalable
● Used where performance matters more
than data accuracy
Replication - Synchronous vs Asynchronous
Synchronous
● Data is transferred immediately
● Transaction waits for confirmation from
replica before it commits
● Ensures data consistency across all
nodes
● Performance overhead caused by latency
● Used where data accuracy is critical, even
at the expense of performance
#AI
Challenges in
Clustering
● Split brain
● Network latency
● False alarms
● Data inconsistency
Challenges in
Clustering
● Split brain
● Network latency
● False alarms
● Data inconsistency
Split Brain
Defined
Node in a highly available cluster lose
connectivity with each other but continue to
function independently
Challenge
More than one node believes that it is the
primary leading to inconsistencies and
possible data loss
● Network reliability and redundancy
○ Minimize the risk of partitions due to
connectivity issues
○ Redundant network hardware and paths
between nodes
○ Reliable cross datacenter connectivity
● Miscellaneous
○ Monitoring and alerts
○ Regular testing
○ Clear and precise documentation
○ Training
Split Brain - Prevention
● Use a reliable cluster manager
○ Algos and heartbeat mechanisms to
monitor node availability
○ Make decisions about failovers and
promotions
● Quorum-based decision making
○ Majority of nodes must agree on primary
node’s status
○ Requires odd number of nodes
● Witness server
○ Used to achieve a majority in an even-node
cluster
○ Does not store data
1. Identify the situation
○ Monitoring and alerting is crucial
2. Stop traffic
○ Application will need to pause
3. Determine the most up to date node
○ Compare transaction logs, timestamps,
transaction IDs, etc …
4. Isolate the nodes from each other
○ Prevent further replication so outdated
data does not overwrite latest one
5. Restore data consistency
○ Apply missed transactions
○ Resolve data conflicts
6. Reconfigure replication
○ Make the most update to date node the
primary
○ Reinstate the remaining nodes as replicas
7. Confirm integrity of the cluster
○ Monitor and double-check replication
8. Re-enable traffic
○ Allow read-only traffic, confirm reliability,
then allow write operations
9. Run a retrospective
○ Thorough analysis of the incident to
prevent future occurrences
○ Update docs and training to capture the
cause of split brain
Split Brain - Resolution
Challenges in
Clustering
● Split brain
● Network latency
● False alarms
● Data inconsistency
Defined
Time delay when data is transmitted from
one point to another
Challenge
Delayed replication can result in data loss.
Delayed signals can trigger a false positive
for failover.
Network Latency
● Network congestion
● Low quality network hardware
● Distance between nodes
● Virtualization overheads
● Bandwidth limitations
● Security devices and policies
● Transmission medium
Network Latency - Causes
● Employ redundancy
○ Network paths as well as health checks
● Best practices
○ Test and simulate various network
conditions
○ Monitoring and alerting for early detection
of problems
○ Documentation of rationale behind values
chosen
○ Periodic training
● Adjust heartbeat & timeout settings
○ Fine tune frequency of heartbeat and
timeout to match typical network behavior
● High speed & low latency network
○ Investing in high quality networking pays
dividends
● Quorum-based decision making
○ Majority of nodes must agree on primary
node’s status
○ Requires odd number of nodes or a
witness node for tie-breaker
Network Latency - Prevention of False Positive
Challenges in
Clustering
● Split brain
● Network latency
● False alarms
● Data inconsistency
Defined
A problem is reported, but in reality, there is
no issue
Challenge
Can trigger a failover when one isn’t
required, leading to unnecessary disruptions
and impacting performance
False Alarms
False Alarms - Causes
● Network issues
○ Latency, congestion, misconfiguration
● Configuration errors
○ Thresholds set too low?
● Resource constraints
○ High CPU load, memory pressure, I/O bottleneck
● Human error
○ Misreading information, miscommunication of scheduled maintenance, …
● Database locks
○ Long running queries with exclusive locks
False Alarms - Prevention
● Optimized thresholds
○ Best practices, past experience, and some hit & trial is required to ensure that the thresholds
are configured appropriately
● Regular upgrades and testing
○ Latest version of software and firmware to be used
○ Testing of various use cases can help identify possible misconfigurations
● Resource and performance optimization
○ Regularly monitor resource utilization and tune queries and database for performance
○ Maintenance tasks like vacuum, analyze, …
● Comprehensive monitoring and alerting
○ Monitoring can help with early detection of anomalies
○ Alerts can give early warnings as the database approaches defined thresholds
Challenges in
Clustering
● Split brain
● Network latency
● False alarms
● Data inconsistency
Defined
Situations where data in different nodes of a
cluster becomes out of sync, leading to
inconsistent results and potential data
corruption
Challenge
Inaccurate query results that vary based on
which node is queried. Such issues are very
hard to debug.
Data Inconsistency
● Replication lag
○ Network latency and high workloads can be big contributors
○ Data loss in case of failover
● Split brain
● Incorrect configuration
○ Log shipping configurations
○ Replication slots setup
○ Replication filters
Data Inconsistency - Causes
Data Inconsistency - Prevention
● Closely manage asynchronous replication
○ Closely monitor pg_stat_replication for replication lag
○ Place nodes in close proximity and use high quality network hardware
● Regularly check XID across the cluster
● Monitor replication conflicts and resolve promptly
● Regular maintenance and performance optimization
○ Vacuum, analyze, …
○ XID wraparound
This all sounds really hard
Open source clustering tools for PostgreSQL
● Repmgr
○ https://repmgr.org/
○ GPL v3
○ Provides automatic failover
○ Manage and monitor replication
● pgpool-II
○ https://pgpool.net/
○ Similar to BSD & MIT
○ Middleware between PostgreSQL and client applications
○ Connection pooling, load balancing, caching, and automatic failover
● Patroni
○ https://patroni.readthedocs.io/en/latest/
○ MIT
○ Template for PostgreSQL high availability clusters
○ Automatic failover, configuration management, & cluster management
Questions?
pg_umair

More Related Content

Similar to 20240515 - Chicago PUG - Clustering in PostgreSQL: Because one database server is never enough (and neither is two) (20)

PPTX
Megastore by Google
Ankita Kapratwar
 
PPTX
Design patterns for scaling web applications
Ivan Dimitrov
 
PDF
Resource replication in cloud computing.
Hitesh Mohapatra
 
PDF
Concurrency, Parallelism And IO
Piyush Katariya
 
ODP
Zero Downtime JEE Architectures
Alexander Penev
 
PDF
MongoDB Operational Best Practices (mongosf2012)
Scott Hernandez
 
PPT
Using galera replication to create geo distributed clusters on the wan
Codership Oy - Creators of Galera Cluster
 
PDF
02_Chapter_WorkLoads_DataModeling_Mongodb.pdf
hieuminhpham1001
 
PDF
02_Chapter_WorkLoads_DataModeling_Mongodb.pdf
hieuminhpham1001
 
PPTX
Best Practices Using RTI Connext DDS
Real-Time Innovations (RTI)
 
PDF
Using galera replication to create geo distributed clusters on the wan
Sakari Keskitalo
 
PDF
Using galera replication to create geo distributed clusters on the wan
Sakari Keskitalo
 
PDF
Data Management in Cloud Platforms
shnkoc
 
PDF
MariaDB Server Performance Tuning & Optimization
MariaDB plc
 
PDF
Cassandra - A Decentralized Structured Storage System
Varad Meru
 
PDF
kranonit S06E01 Игорь Цинько: High load
Krivoy Rog IT Community
 
ODP
Using Galera Cluster to Power Geo-distributed Applications on the WAN
philip_stoev
 
PPTX
Maximizing performance via tuning and optimization
MariaDB plc
 
PPTX
Maximizing performance via tuning and optimization
MariaDB plc
 
PPTX
Taking Splunk to the Next Level - Architecture Breakout Session
Splunk
 
Megastore by Google
Ankita Kapratwar
 
Design patterns for scaling web applications
Ivan Dimitrov
 
Resource replication in cloud computing.
Hitesh Mohapatra
 
Concurrency, Parallelism And IO
Piyush Katariya
 
Zero Downtime JEE Architectures
Alexander Penev
 
MongoDB Operational Best Practices (mongosf2012)
Scott Hernandez
 
Using galera replication to create geo distributed clusters on the wan
Codership Oy - Creators of Galera Cluster
 
02_Chapter_WorkLoads_DataModeling_Mongodb.pdf
hieuminhpham1001
 
02_Chapter_WorkLoads_DataModeling_Mongodb.pdf
hieuminhpham1001
 
Best Practices Using RTI Connext DDS
Real-Time Innovations (RTI)
 
Using galera replication to create geo distributed clusters on the wan
Sakari Keskitalo
 
Using galera replication to create geo distributed clusters on the wan
Sakari Keskitalo
 
Data Management in Cloud Platforms
shnkoc
 
MariaDB Server Performance Tuning & Optimization
MariaDB plc
 
Cassandra - A Decentralized Structured Storage System
Varad Meru
 
kranonit S06E01 Игорь Цинько: High load
Krivoy Rog IT Community
 
Using Galera Cluster to Power Geo-distributed Applications on the WAN
philip_stoev
 
Maximizing performance via tuning and optimization
MariaDB plc
 
Maximizing performance via tuning and optimization
MariaDB plc
 
Taking Splunk to the Next Level - Architecture Breakout Session
Splunk
 

More from Umair Shahid (6)

PDF
20240518 - VixulCon 2024 - The Rise of PostgreSQL_ Historic Trends and Modern...
Umair Shahid
 
PDF
20221019 - Singapore Roadshow - Open source licenses, the impact on PostgreSQ...
Umair Shahid
 
PDF
Driving the future of PostgreSQL adoption
Umair Shahid
 
PDF
Islamabad PUG - 7th Meetup - performance tuning
Umair Shahid
 
PDF
Islamabad PUG - 7th meetup - performance tuning
Umair Shahid
 
ODP
Logical replication with pglogical
Umair Shahid
 
20240518 - VixulCon 2024 - The Rise of PostgreSQL_ Historic Trends and Modern...
Umair Shahid
 
20221019 - Singapore Roadshow - Open source licenses, the impact on PostgreSQ...
Umair Shahid
 
Driving the future of PostgreSQL adoption
Umair Shahid
 
Islamabad PUG - 7th Meetup - performance tuning
Umair Shahid
 
Islamabad PUG - 7th meetup - performance tuning
Umair Shahid
 
Logical replication with pglogical
Umair Shahid
 
Ad

Recently uploaded (20)

PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PPTX
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
PPTX
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
PDF
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PPTX
Digital Circuits, important subject in CS
contactparinay1
 
PDF
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
PDF
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
Digital Circuits, important subject in CS
contactparinay1
 
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
Ad

20240515 - Chicago PUG - Clustering in PostgreSQL: Because one database server is never enough (and neither is two)

  • 1. Clustering in PostgreSQL Because one database server is never enough (and neither is two) Chicago PostgreSQL User Group 15th May 2024
  • 2. postgres=# select * from umair; -[ RECORD 1 ]----------------------------- name | Umair Shahid description | 20+ year PostgreSQL veteran company | Stormatics designation | Founder location | Islamabad, Pakistan family | Mom, Wife & 2 kids kid1 | Son, 17 year old kid2 | Daughter, 14 year old
  • 3. Our mission is to help businesses scale PostgreSQL reliably for critical data
  • 4. On to the topic now!
  • 5. What is High Availability? ● Remain operational even in the face of hardware or software failure ● Minimize downtime ● Essential for mission-critical applications that require 24/7 availability ● Measured in ‘Nines of Availability’
  • 6. Nines of Availability Availability Downtime per year 90% (one nine) 36.53 days 99% (two nines) 3.65 days 99.9% (three nines) 8.77 hours 99.99% (four nines) 52.60 minutes 99.999% (five nines) 5.26 minutes
  • 7. But my database resides in the cloud, and the cloud is always available Right?
  • 9. Amazon RDS Service Level Agreement Multi-AZ configurations for MySQL, MariaDB, Oracle, and PostgreSQL are covered by the Amazon RDS Service Level Agreement ("SLA"). The RDS SLA affirms that AWS will use commercially reasonable efforts to make Multi-AZ instances of Amazon RDS available with a Monthly Uptime Percentage of at least 99.95% during any monthly billing cycle. In the event Amazon RDS does not meet the Monthly Uptime Percentage commitment, affected customers will be eligible to receive a service credit.* 99.95% = 4.38 hours of downtime per year! 22 minutes of downtime per month! * https://aws.amazon.com/rds/ha/
  • 10. So - what do I do if I want better reliability for my mission-critical data? Clustering!
  • 11. What is clustering? Primary Standby 1 Standby 2 Application Write Read Replicate ● Multiple database servers work together to provide redundancy ● Gives the appearance of a single database server ● Application communicates with the primary PostgreSQL instance ● Data is replicated to standby instances ● Auto failover in case the primary node goes down
  • 12. What is auto failover? Primary Standby 1 Standby 2 Application Standby 1 Primary Standby 2 New Standby Application Primary Standby 1 Standby 2 Application 1 2 3 * Primary node goes down * Standby 1 gets promoted to Primary * Standby 2 becomes subscriber to Standby 1 * New Standby is added to the cluster * Application talks to the new Primary
  • 13. ● Write to the primary PostgreSQL instance and read from standbys ● Data redundancy through replication to two standbys ● Auto failover in case the primary node goes down Clusters with load balancing Primary Standby 1 Standby 2 Application Write Read Replicate
  • 14. Clusters with backups and disaster recovery ● Off-site backups ● RTO and RPO requirements dictate configuration ● Point-in-time recovery It is extremely important to periodically test your backups Primary Standby 1 Standby 2 Application Write Read Replicate Backup Backup
  • 15. ● Shared-Everything architecture ● Load balancing for read as well as write operations ● Database redundancy to achieve high availability ● Asynchronous replication between nodes for better efficiency * with conflict resolution at the application layer Multi-node clusters with Active-Active configuration* Active 2 Application Write Read Replicate Active 1 Active 3
  • 16. Multi-node clusters with data sharding and horizontal scaling Node 2 Application Node 1 Node 3 Coordinator Write Read ● Shared-Nothing architecture ● Automatic data sharding based on defined criteria ● Read and write operations are auto directed to the relevant node ● Each node can have its own standbys for high availability
  • 17. Globally distributed clusters ● Spin up clusters on the cloud, on-prem, bare metal, VMs, or a hybrid of the above ● Geo fencing for regulatory compliance and better local performance ● High availability across data centers and geographies
  • 18. Asynchronous ● Data may not be transferred immediately ● Transaction commits without waiting for confirmation from replica ● Data may be inconsistent across nodes ● Faster and more scalable ● Used where performance matters more than data accuracy Replication - Synchronous vs Asynchronous Synchronous ● Data is transferred immediately ● Transaction waits for confirmation from replica before it commits ● Ensures data consistency across all nodes ● Performance overhead caused by latency ● Used where data accuracy is critical, even at the expense of performance
  • 19. #AI
  • 20. Challenges in Clustering ● Split brain ● Network latency ● False alarms ● Data inconsistency
  • 21. Challenges in Clustering ● Split brain ● Network latency ● False alarms ● Data inconsistency
  • 22. Split Brain Defined Node in a highly available cluster lose connectivity with each other but continue to function independently Challenge More than one node believes that it is the primary leading to inconsistencies and possible data loss
  • 23. ● Network reliability and redundancy ○ Minimize the risk of partitions due to connectivity issues ○ Redundant network hardware and paths between nodes ○ Reliable cross datacenter connectivity ● Miscellaneous ○ Monitoring and alerts ○ Regular testing ○ Clear and precise documentation ○ Training Split Brain - Prevention ● Use a reliable cluster manager ○ Algos and heartbeat mechanisms to monitor node availability ○ Make decisions about failovers and promotions ● Quorum-based decision making ○ Majority of nodes must agree on primary node’s status ○ Requires odd number of nodes ● Witness server ○ Used to achieve a majority in an even-node cluster ○ Does not store data
  • 24. 1. Identify the situation ○ Monitoring and alerting is crucial 2. Stop traffic ○ Application will need to pause 3. Determine the most up to date node ○ Compare transaction logs, timestamps, transaction IDs, etc … 4. Isolate the nodes from each other ○ Prevent further replication so outdated data does not overwrite latest one 5. Restore data consistency ○ Apply missed transactions ○ Resolve data conflicts 6. Reconfigure replication ○ Make the most update to date node the primary ○ Reinstate the remaining nodes as replicas 7. Confirm integrity of the cluster ○ Monitor and double-check replication 8. Re-enable traffic ○ Allow read-only traffic, confirm reliability, then allow write operations 9. Run a retrospective ○ Thorough analysis of the incident to prevent future occurrences ○ Update docs and training to capture the cause of split brain Split Brain - Resolution
  • 25. Challenges in Clustering ● Split brain ● Network latency ● False alarms ● Data inconsistency
  • 26. Defined Time delay when data is transmitted from one point to another Challenge Delayed replication can result in data loss. Delayed signals can trigger a false positive for failover. Network Latency
  • 27. ● Network congestion ● Low quality network hardware ● Distance between nodes ● Virtualization overheads ● Bandwidth limitations ● Security devices and policies ● Transmission medium Network Latency - Causes
  • 28. ● Employ redundancy ○ Network paths as well as health checks ● Best practices ○ Test and simulate various network conditions ○ Monitoring and alerting for early detection of problems ○ Documentation of rationale behind values chosen ○ Periodic training ● Adjust heartbeat & timeout settings ○ Fine tune frequency of heartbeat and timeout to match typical network behavior ● High speed & low latency network ○ Investing in high quality networking pays dividends ● Quorum-based decision making ○ Majority of nodes must agree on primary node’s status ○ Requires odd number of nodes or a witness node for tie-breaker Network Latency - Prevention of False Positive
  • 29. Challenges in Clustering ● Split brain ● Network latency ● False alarms ● Data inconsistency
  • 30. Defined A problem is reported, but in reality, there is no issue Challenge Can trigger a failover when one isn’t required, leading to unnecessary disruptions and impacting performance False Alarms
  • 31. False Alarms - Causes ● Network issues ○ Latency, congestion, misconfiguration ● Configuration errors ○ Thresholds set too low? ● Resource constraints ○ High CPU load, memory pressure, I/O bottleneck ● Human error ○ Misreading information, miscommunication of scheduled maintenance, … ● Database locks ○ Long running queries with exclusive locks
  • 32. False Alarms - Prevention ● Optimized thresholds ○ Best practices, past experience, and some hit & trial is required to ensure that the thresholds are configured appropriately ● Regular upgrades and testing ○ Latest version of software and firmware to be used ○ Testing of various use cases can help identify possible misconfigurations ● Resource and performance optimization ○ Regularly monitor resource utilization and tune queries and database for performance ○ Maintenance tasks like vacuum, analyze, … ● Comprehensive monitoring and alerting ○ Monitoring can help with early detection of anomalies ○ Alerts can give early warnings as the database approaches defined thresholds
  • 33. Challenges in Clustering ● Split brain ● Network latency ● False alarms ● Data inconsistency
  • 34. Defined Situations where data in different nodes of a cluster becomes out of sync, leading to inconsistent results and potential data corruption Challenge Inaccurate query results that vary based on which node is queried. Such issues are very hard to debug. Data Inconsistency
  • 35. ● Replication lag ○ Network latency and high workloads can be big contributors ○ Data loss in case of failover ● Split brain ● Incorrect configuration ○ Log shipping configurations ○ Replication slots setup ○ Replication filters Data Inconsistency - Causes
  • 36. Data Inconsistency - Prevention ● Closely manage asynchronous replication ○ Closely monitor pg_stat_replication for replication lag ○ Place nodes in close proximity and use high quality network hardware ● Regularly check XID across the cluster ● Monitor replication conflicts and resolve promptly ● Regular maintenance and performance optimization ○ Vacuum, analyze, … ○ XID wraparound
  • 37. This all sounds really hard
  • 38. Open source clustering tools for PostgreSQL ● Repmgr ○ https://repmgr.org/ ○ GPL v3 ○ Provides automatic failover ○ Manage and monitor replication ● pgpool-II ○ https://pgpool.net/ ○ Similar to BSD & MIT ○ Middleware between PostgreSQL and client applications ○ Connection pooling, load balancing, caching, and automatic failover ● Patroni ○ https://patroni.readthedocs.io/en/latest/ ○ MIT ○ Template for PostgreSQL high availability clusters ○ Automatic failover, configuration management, & cluster management