A couple of weeks ago, I attended a PGDay event in Blumenau, a city not far away from where I live in Brazil. Opening the day were former Percona colleagues Marcelo Altmann and Wagner Bianchi, showcasing ReadySet’s support for PostgreSQL. Readyset is a source-available database cache service that differs from other solutions by not relying on a timer-based approach to expire/invalidate a cached result: it uses PostgreSQL (or MySQL) own replication stream for that. In other words, Readyset acts as a replication client and keeps its cached data in sync by updating it based on row changes in the replication flow.

My understanding is that the risk of accessing dirty data should, thus, be limited to how fast Readyset can sync its cached data. In a way, that’s not very different from a replica keeping data in sync while avoiding replication lag. The expectation is for Readyset to perform this operation in a more efficient manner (that’s where the caching should shine) while returning data for customers faster and, ideally, supporting more requests at the same time (higher concurrency).

As for ProxySQL, it wasn’t on my initial plan to give it a try. This tool was a game changer in the MySQL ecosystem in so many ways, and I have been looking for an opportunity to test it with PostgreSQL since they started supporting it (although it is not GA yet), so I couldn’t resist. But I will only be using ProxySQL’s query rules feature for splitting traffic, and not its own caching feature. Finally, why not test with HAproxy too, since it was already, right?

This post became too long, so I divided it into two parts: in this initial part, I outline my test environment and methodology and explain how to install ReadySet and configure it to work with PostgreSQL. I do the same for ProxySQL and HAproxy, although with fewer details. In part two, I present the test results and discuss them.

A simple test suite

What I had in mind was a simple experiment: I wanted to take advantage of a test environment I had deployed for another project and give ReadySet a try. It ended up evolving to something a bit bigger, but it helped me better understand how the different technologies compare, and I trust you will too.

Environment

The main elements that are relevant for us here in this particular test environment are the main Patroni cluster with two nodes (the primary and a standby server), an application server used to generate load, and an additional server that was added to host ReadySet, all running on small cloud instances (GCP):

  • The Patroni nodes and the ReadySet server with 4 vCPUs (2 cores), 4 GB of memory, and dedicated data disks:
    • Primary: 10.128.0.84
    • Standby: 10.128.0.67
    • ReadySet: 10.128.0.113
  • The application server with 2 vCPUs (1 core) and 2 GB of memory:
    • App: 10.128.0.112

The essence of the PostgreSQL 17.5.1 configuration is shown below, in Patroni’s yaml format:

Methodology

On the application server, I compiled Sysbench from source with support for PostgreSQL:

and created an OLTP test database with eight 10k-row tables:

resulting in a dataset of approximately 20 GB:

I ran VACUUM ANALYZE on the sysbench database a single time after populating it:

The actual test consisted of running the Sysbench OLTP Read-Write and OLTP Read-Only workloads for 20 minutes for the different scenarios I envisioned, which will be described in the follow-up post (because it’s easier to make sense of them once we understand how ReadySet works and how ProxySQL and HAproxy are configured):

  • For the Read-Only test, I used the oltp_read_only.lua script instead.
  • For each test scenario, I adjusted the –pgsql-host and –pgsql-port to the target server/connection point each time.

The tests were run with an interval of five minutes between them.

Installing and configuring ReadySet

I installed ReadySet using the Ubuntu package:

I created a separate directory to store the service logs:

And configured the service as follows:

Note that I’ve used two database accounts above for two different connection strings:

  • UPSTREAM_DB_URL: Used to route write and non-cacheable statements to the Primary server; the database user needs to have read and write access to the target database.
  • CDC_DB_URL: Used to replicate data; my understanding is that we need to connect with a SUPER user to the primary, as ReadySet needs to create a replication slot, copy the initial data (snapshot), and start a replication process. Note that I’m also connecting to my target database (sysbench) here, too, and not to the main postgres one.

In my initial experiments, I used a smaller cloud instance to host ReadySet, the same type and size as the one hosting the application server, described above. Limiting the amount of memory ReadySet can use (READYSET_MEMORY_LIMIT) was essential in such a small test server; without that, ReadySet may not leave much left for the OS, and performance will suffer (watch out for messages of the type “readyset_util::time_scope: operation took X seconds” in the log file). For reasons I will explain later, I have since upgraded the server to a bigger cloud instance, but I continued leaving 1 GB of memory unallocated, indirectly reserving it for the OS.
You can then start the service:

Once ReadySet starts running, if you point database connections to it, at this stage, it will simply route all requests to the primary (UPSTREAM_DB_URL). And it will continue to do so until the following requirements are met:

  1. The initial data snapshot is completed.
  2. You define at least one query to be cached by ReadySet.

Initial snapshot and data replication

I thought ReadySet would only keep records of the data it is configured to cache, but I was wrong. Once the service is started, it will initiate a snapshot process of the target data (using PostgreSQL logical replication, hence the need for setting wal_level to replica), which you can follow along in its log file:

Alternatively, you can query its current state with the command SHOW READYSET STATUS (the output below is from a previous snapshot run):

Once the initial snapshot process is completed, ReadySet starts consuming replication data:

If you query the pg_stat_replication table on the primary server, you will see its connection state changed from startup to streaming:

and ReadySet status changes to Online:

From now on, ReadySet will start serving cached data – that is, if you have any defined.

Caching data

If we start running traffic through ReadySet, we can see all query requests with the command SHOW PROXIED QUERIES. ReadySet’s documentation explains its current limitations for query support. We can limit the output of that command to display only queries that are SUPPORTED by ReadySet, limiting it to those that can be cached:

Note that you may find in that list queries that, while supported by ReadySet, employ exact values (constants) for certain parameters:

Unless your workload tends to employ the same values repeatedly, these might not be good candidates for caching. After running the Sysbench OLTP read-write workload against the primary, I found that the majority of it is composed of the very simple queries at the top of the list above:


Thus, I decided to limit my test on ReadySet to those eight queries, each targeting a different test table, and created cache definitions for them using their query IDs:

Providing the full query instead also works:

We can confirm the queries have been added to the “cacheable” list with the SHOW CACHES command:

They will also show in the list of “materializations”, alongside our target tables:

ProxySQL

ProxySQL is another open source proxy software, but different from HAProxy, it “speaks” the MySQL language (or, in their words, it is “database protocol aware”) and, since version 3.0, PostgreSQL too. This opens the space to explore many different features, including query caching (using a more traditional TTL-based implementation). Ironically, I had a different feature in mind for this project: query rules.
With ReadySet, we are effectively splitting traffic between two servers: read-only queries of the type “SELECT c FROM sbtestX WHERE id=?” were being processed by the ReadySet server, whereas everything else (other SELECT queries and all writes) was being routed to the PostgreSQL Primary. The query rules in ProxySQL allow us to use a similar approach, sending the target SELECT queries to a PostgreSQL standby server. It is not the same thing because there is a chance that the standby server might have stale data due to replication lag, whereas ReadySet should protect against this. Still, I was curious to see which performance would result in splitting the traffic between two servers in this other way.

Installing ProxySQL

I just followed their documentation to configure their Ubuntu repository and install ProxySQL 3.0.1 on the application server itself:

Configuring ProxySQL

We configure ProxySQL by accessing the administration “door” (port 6132):

I don’t intend to provide a detailed explanation of how ProxySQL works in this post, so I will restrict myself to basic instructions:

  • Provide the credentials to access the database servers, used to check their health status (I’m using the SUPER user account below, but you don’t have to):

  • We configure two hostgroups: one to send writes (id 1), to which we add the primary server, and another one to distribute reads (id 2) according to the rules we will create later, to which we add the standby server:

Here’s a clearer view of the hostgroups and their members:

  • Clients need to authenticate to the ProxySQL server, so we add the database user used by our application:

  • Finally, we create the query rules in a similar way to how we did for ReadSet’s query cache, and indicate that those queries should be sent to the hostgroup 2:

Clients should connect to ProxySQL through a different port, 6133:

HAproxy

For “completeness”, why not run the read-only test equally balancing loads between the primary and the standby server? That’s easy to do with HAproxy. It was already installed on my setup alongside Patroni, but I opted to also install it on the application server, alongside ProxySQL, and run the tests from there too.

Installing HAproxy

Just:

Configuring HAproxy

I could have created a single pool, but followed the standard that uses the Patroni REST API to create a pool with the current primary (http://<dbnode>:8008/primary), listening on port 5433, and another one where I had both the primary and standby servers (http://<dbnode>:8008/read-only) listening on port 5434:

I only used the reads pool in my test. The API endpoint named read-only is misleading: it includes both primary and standby servers in the pool. The default HAproxy algorithm is round robin, so it balances connection requests between both servers.
Remember to restart the service:

This concludes the initial post in this series, in which I covered the test environment and methodology and explained how to install ReadySet, ProxySQL, and HAproxy and configure them to work with PostgreSQL. Here is the second and final part presenting the test results.

 

enterprise PostgreSQL

Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments