In the first post of this series (Testing ReadySet as a Query Cacher for PostgreSQL (Plus ProxySQL and HAproxy) Part 1: How-To), I presented my test environment and methodology and explained how to install ReadySet, ProxySQL, and HAproxy and configure them to work with PostgreSQL. In this final part, I present the different test scenarios I have used and the results obtained.

Quick recap

  • The test environment is composed of small cloud instances, as indicated in the table below:
Server IP vCPUs Memory Services
Primary 10.128.0.84 4 4 PostgreSQL
Standby 10.128.0.67 4 4 PostgreSQL
ReadySet 10.128.0.113 4 4 ReadySet
Application 10.128.0.112 2 2 Sysbench, ProxySQL, HAproxy

 

  • The database nodes and the ReadySet server are configured with dedicated (but regular) data disks.
  • The dataset is composed of eight 10k-row tables for a total of about 20 GB.
  • Not all queries can be (or, in my understanding, are worth being) cached by ReadySet, so I have limited the target SELECTs to the simplest ones from the Sysbench OLTP workload, which happen to also be the most frequently executed queries:

  • I configured ProxySQL to route those same queries to the reader hostgroup, which I limited to the Standby server. I’ve not used ProxySQL’s query cache functionality.
  • All other SELECTs, as well as all writes in the read-write workload, are routed to the Primary. 
  • The base Sysbench command used in the tests is shown below:

For the read-only test, I used the oltp_read_only.lua script instead.

→ For each test scenario, I adjusted the –pgsql-host and –pgsql-port to the target server/connection point, as described in the following section.

→ The test environment also included a PMM instance that monitored the servers, but the throughput computed for each test (queries per second, or QPS) was taken from the perspective of the client (application server) using the metrics reported by Sysbench every second (–report-interval=1) as the data source.

Test scenarios

I ran seven tests in total, three Sysbench OLTP Read-Write targeting:

  1. Just the Primary (–pgsql-host=10.128.0.84  –pgsql-port=5432).
  2. ReadySet (–pgsql-host=10.128.0.113  –pgsql-port=5435), which processed the target SELECT queries and routed everything else to the Primary.
  3. ProxySQL (–pgsql-host=127.0.0.1  –pgsql-port=6133), which routed the target SELECT queries to the Standby server and sent everything else to the Primary.

and four OLTP Read-Only targeting:

  1. Just the Primary (–pgsql-host=10.128.0.84  –pgsql-port=5432).
  2. ReadySet (–pgsql-host=10.128.0.113  –pgsql-port=5435), which processed the target SELECT queries while all other SELECTs were sent to the Primary.
  3. ProxySQL (–pgsql-host=127.0.0.1  –pgsql-port=6133), which routed the target SELECT queries to the Standby server and sent all other SELECTs to the Primary.
  4. HAproxy (–pgsql-host=127.0.0.1  –pgsql-port=5434), which load balances connections between the Primary and Standby servers.

The table below summarizes how the traffic was distributed in each test as well as their respective “codenames”, which are used to identify them in the graphs presented in the next section, with the test results:

Test # Codename Writes Target SELECTs Other SELECTs
1 primary-rw Primary Primary Primary
2 readyset-rw Primary ReadySet Primary
3 proxysql-rw Primary Standby Primary
4 primary-ro N/A Primary Primary
5 readyset-ro N/A ReadySet Primary
6 proxysql-ro N/A Standby Primary
7 haproxy-ro N/A Primary / Standby Primary / Standby

 

Please note that I haven’t included HAproxy in the read-write test because doing so would require formally splitting read and write traffic and directing each to the respective HAproxy pool/port.  HAproxy proxies traffic at the network layer 4, and is very good at this, but all it sees are TCP packets. One of the advantages of using ReadySet and ProxySQL is that they understand the database protocol and, based on their respective features (query cache and query rules), they are able to split read and write traffic in a way that is transparent to the application, a flexibility HAproxy cannot offer.

Results

Without further ado, here are the results for the read-write test:

read-write test
and read-only test:
read-only test
When we just look at these graphs, we tend to make some general conclusions. For these workloads, dataset, and hardware:

  • Any scenario where the traffic is split between two servers performs better than running the workload on the Primary alone, which was the expected outcome. But how much more?
  • ProxySQL is performing better than Readyset, but more so for the read-write traffic than for the read-only traffic, where the differences are much smaller.
  • HAProxy is one step ahead of the other two solutions.

I was surprised to notice that, when we order the tests by the average throughput obtained, the read-write tests rank higher than the respective read-only tests within the same technologies, including when we send all data to the primary:

Test QPS (average)
haproxy-ro 2139.74
proxysql-rw 1913.08
proxysql-ro 1565.14
readyset-ro 1425.70
readyset-rw 1358.73
primary-rw 1074.33
primary-ro 971.68

 

I have double-checked the numbers and found that this pattern was showing in all the preliminary tests I ran with this test setup. But that’s just from looking at the surface. What else do we see when we dig deeper into performance metrics? And are the proxies really doing the job they are supposed to?

Query processing

Let’s start with that second question first. To answer it, for each test, I selected the time window covering the test on PMM and then switched to the Query Analytics dashboard (QAN), which provides a list ranking the queries according to the load on the monitored servers during the selected period. I filtered the data to only show queries targeting the sysbench database. The following table shows how the queries have been distributed between the primary and standby servers for each test:

Test Primary Standby
primary-rw 100%
readyset-rw
proxysql-rw 11.5% 88.5%
primary-ro 100%
readyset-ro
proxysql-ro 4.1% 95.9%
haproxy-ro 49.3% 50.7%

 

QAN tracked the queries processed in the primary during the tests with ReadySet, but since it doesn’t have data on the latter, it cannot provide query distribution for those tests. We will have a look at this in a different way later.

Even though the data above shows that the query traffic was being split between primary and standby, was it done the way we expected it to be, according to the “rules” defined for each test?

To check this, I further filtered the queries per database server (primary and standby). The table below contains screenshots of the QAN view for the database servers, for each test:

Test Primary Standby
primary-rw

readyset-rw

proxysql-rw
primary-ro

readyset-ro

proxysql-ro
haproxy-ro

The Query Count numbers showing in QAN don’t match exactly what Sysbench reported, and we also see a query in the ninth position of the QAN rank for the standby server that should not be reaching it in the ProxySQL tests, as that query is not one of the target SELECTs. Otherwise, we can clearly see the pattern we expected:

  1. Except for the tests in which all traffic is directed to the primary (primary-rw and primary-ro), or load-balanced between primary and secondary (haproxy-ro), the primary is not processing the target SELECTs.
  2. We see the target SELECTs hitting the secondary in the ProxySQL tests (proxysql-rw and proxysql-ro).

Since the target SELECTs are not hitting the primary in the ReadySet tests, they should have been processed by ReadySet. But, I wanted to validate that ReadySet was indeed processing those queries.

ReadySet metrics

ReadySet has an HTTP API that exposes its metrics on port 6034:

Marcelo suggested that I configure PMM to track these metrics using an external service, which I did:

I got a copy of the Adapter Metrics dashboard from their ReadySet’s customized Grafana Docker image (public.ecr.aws/readyset/readyset-grafana), and after some tweaking, I was able to have some of the metrics displayed on PMM. The table below provides the essential information I was looking for:

 

  • The ReadySet QPS graph shows the requests being processed by ReadySet in each of the two quests (readyset-rw on the left, readyset-ro on the right). Its legend is filled with the IDs of the queries we created “caches” for.
  • The Proxied QPS graph shows the requests that have been “proxied” to the primary server, which should be all the remaining traffic from the two workloads.

The total number of requests processed during the read-write workload is considerably higher than that for the read-only workload. I’m not sure how these requests compare to the throughput reported by Sysbench and the query load reported by QAN; they do not all match, but I suppose they are not counting things in the same way.

Performance metrics

There’s so much we can explore from the PMM graphs to further understand what is behind these numbers. I comment on a few of these points below.

Disk performance

When it comes to pure performance, the most important thing to consider is how IO-bound the workload is, considering the size of the dataset and how it just doesn’t fit in memory. The virtual cores spent most of their time waiting for the disk to return page requests:

Primary
Standby
ReadySet

Since the regular disks I used on the data servers have their performance “capped” by the cloud provider for the respective instance type and size, the ones hosting PostgreSQL were either hitting an IOPS limit or a bandwidth limit:

Primary Standby ReadySet

Active connections and throughput

Another interesting point is related to the number of active connections and throughput in the primary on certain tests:

 

  • When the read-only workload is executed on the primary alone (primary-ro), we can see how dealing with those 64 active connections is a bit too much for it, and it shows in a much less “flat” throughput curve when compared to the HAproxy test (haproxy-ro), when the primary only have to deal with around half as many active connections (the other half goes to the standby server).
  • When both ProxySQL workloads are executed (proxysql-rw and proxysql-ro), there is only a small fraction of active connections in the primary compared to the ReadySet tests (readyset-rw and readyset-ro), which maintains as many connections open as initially requested by the client.

Replication lag

That is a very important detail that we should pay attention to, and I almost missed it. We do not have a replication graph for PostgreSQL in PMM yet, although it is coming soon. While I was still experimenting with the setup, I remembered to check the state of replication in the standby server, and for one particular test, it was lagging massively:

My colleague Neha, from the Managed Services team, gave me a hand and provided a copy of one of the graphs they are using to monitor customer environments based on the pg_replication_lag_seconds metric (which is being actively collected by the PostgreSQL explorer), while they wait for the official one. Here is a close look at the only test that triggered the issue on the standby server, proxysql-rw:

If we look at the CPU usage graphs presented earlier, that’s when the standby server is operating at disk capacity, both processing the replication changes generated by the primary and serving the target SELECTs routed to it by ProxySQL.

It is possible to configure ProxySQL to automatically stop routing traffic to a replica that is lagging beyond a given threshold. For this to not completely interrupt service, we also need to configure the primary in the reader hostgroup, although we may set a lower weight for it. This ensures that, under normal circumstances, the primary won’t have to do much of the “reader” work, but will be fully engaged if the replica starts lagging behind.

I don’t know if ReadySet has a similar mechanism in place to prevent serving dirty data from its cache when it is lagging behind the primary. I will leave exploring this for a future blog post, alongside experimenting with different workloads; it’s possible that the target SELECTs, whose WHERE clause is supported by the table’s primary key, are too simple queries to fully benefit from ReadySet’s caching feature.

 

Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments