Benchmarking Postgres 17 vs 18
By Ben Dicken |
Postgres 18 released a few weeks ago, and there's plenty of hype around the improvements it's bringing. Most notably, Postgres 18 introduces the io_method
configuration option, allowing users more control over how disk I/O is handled.
Setting this to sync
results in the same behavior as 17 and earlier versions. With this, all I/O happens via synchronous requests.
18 introduces two alternatives: worker
and io_uring
. worker
(the new default) causes Postgres to use dedicated background worker processes to handle all I/O operations. io_uring
is the change many are excited about for performance reasons, as it uses the Linux io_uring
interface to allow all disk reads to happen asynchronously. The hope is that this can lead to significantly better I/O performance.
We conducted detailed benchmarks to compare the performance on Postgres 17 and 18. Let's see if these improvements are all they're hyped up to be.
Benchmark configurations
sysbench was used for executing the benchmarks. The io_uring
improvements only apply to reads, so the focus here will be on the oltp_read_only
benchmark. This includes both point selects and queries that do range scans and aggregations controlled by the --range_size
argument. Though it would be interesting to also benchmark write and read/write combo performance as well, sticking to read-only helps focus this discussion. We set the data size to be TABLES=100
and SCALE=13000000
which produces a ~300 GB database (100 tables with 13 million rows each).
The benchmarks were conducted on four different EC2 instance configurations:
Instance | vCPUs | RAM | Disk | Disk type | IOPS | Throughput |
---|---|---|---|---|---|---|
r7i.2xlarge | 8 | 64 GB | 700 GB | gp3 | 3,000 | 125 MB/s |
r7i.2xlarge | 8 | 64 GB | 700 GB | gp3 | 10,000 | 500 MB/s |
r7i.2xlarge | 8 | 64 GB | 700 GB | io2 | 16,000 | - |
i7i.2xlarge | 8 | 64 GB | 1,875 GB | NVMe | 300,000 | - |
All these instances run on the same (or extremely similar) Intel CPUs. We include the i7i
instance to show what Postgres is capable of with fast, local NVMe drives. This is what we use for PlanetScale Metal and have seen amazing performance results for Postgres 17. Many other cloud providers only provide a form of network-attached storage, whereas we provide both options.
Each server is warmed with 10 minutes of query load prior to benchmarking.
On each one of these configurations, we ran the sysbench oltp_read_only
benchmark with the following configurations for 5 minutes each:
- Single connection and
--range_size
= 100 - 10 connections and
--range_size
= 100 - 50 connections and
--range_size
= 100 - Single connection and
--range_size
= 10,000 - 10 connections and
--range_size
= 10,000 - 50 connections and
--range_size
= 10,000
This leads to a total of 24 unique 5-minute benchmark runs. These 24 configurations were run four times each! Once on Postgres 17 and once on each of Postgres 18 with io_method=worker
, io_method=io_uring
and io_method=sync
. This makes a total of 96 benchmark combinations. Go ahead, tell me I'm crazy!
This is an extremely I/O-intensive workload. The data size (300 GB) far exceeds RAM size (64 GB), so there will be significant disk accesses for the queries being executed here.
Single connection
Though a single connection is an unrealistic production workload, it offers a baseline for how the different I/O settings affect straight-line performance. Let's assess the QPS we can achieve here.
Below is the average QPS for all single-connection runs where the --range_size
value is set to the default of 100. This means that the full read workload is composed of a combination of point-select queries, and queries that do scans / aggregations of 100-row sequences.
A few things are clear:
- On network-attached storage (gp3, io2) Postgres 18 in
sync
andworker
modes perform noticeably better than 17 and 18 withio_uring
. I'll admit, this surprised me! My expectation was thatio_uring
would perform as well as if not better than all these options. - The latency of gp3 and even io2 is clearly a factor in this difference. On an instance with a low-latency local NVME drive, all options are much more evenly matched.
- The latency / IOPS of gp3 and even the very expensive io2 drive are a limiting factor. Local disks outperform in all configurations.
- For straight-line performance with short-lived queries, Postgres 18 is much faster. Welcome improvements!
Here is the same test except with --range_size=10000
. This means the workload has much larger scans / aggregates, which means more sequential I/O and lower QPS:
Local disks still clearly outperform, but the difference between the other three options is less stark. This is due to a combination of (a) more sequential I/O but more importantly (b) more CPU work (aggregating 10k rows is more CPU-intensive than aggregating 100 rows). Additionally, the delta between postgres 17 and 18 is much smaller.
Below is an interactive visual comparing the Postgres 17 results for all instance types with the best performer on Postgres 18, workers. Click on the thumbnails to add or remove lines from the graph and compare various combinations. This is based on a 10-second sample rate.
High concurrency
In real-world scenarios, we have many simultaneous connections and many reads happening at once. Let's look at how each of these servers handles the same benchmark but with much higher load across 50 connections.
Of course, oltp_read_only
does not capture a realistic OLTP workload, especially since it does not include writes, but we use it as a proxy for workloads with high read demands. Below, we show the average QPS for all of the 50-connection oltp_read_only
with --range_size=100
.
Now with a high level of parallelism and increased I/O demand, several additional things are clear:
- IOPS and throughput are clear bottlenecks for each of the EBS-backed instances. The different versions / I/O settings don't make a huge difference in such cases.
- As we increase the EBS capabilities, the QPS grows in lockstep, and the local-NVME instance outperforms them all.
- Postgres 18 with
sync
andworker
have the best performance on all the EBS-backed instances by a small margin.
Again, the same benchmark but with --range_size=10000
.
The gp3-10k
and io2-16k
instances get much closer to local-disk performance. However, this is because we have made the benchmark much more CPU-bound vs I/O-bound, so the low latency of local disks gives less of an advantage (though still the best!) But importantly, we finally have a scenario where io_uring
wins! On the NVMe
instance, it slightly outperforms the other options.
Below we again compare these results for Postgres 17 and Postgres 18, workers. Click on the thumbnails to add or remove lines from the graph and compare various combinations.
Moderate concurrency
These same benchmarks were also executed with 10 concurrent connections. The results are pretty similar so they will not all be shown, but I do want to point out this graph where --range_size=100
:
Look carefully at the first bar group (for gp3-3k
). The io_uring
setting performed significantly worse than the rest. But if you look at that same part of the graph when there were 50 connections, io_uring
performs only slightly worse than the rest. To me, this indicates that io_uring
performs well when there's lots of I/O concurrency, but in low-concurrency scenarios it isn't as beneficial.
Cost
Cost should always be a consideration when comparing infrastructure setups. Here are the on-demand costs of each server configuration in AWS:
r7i
with gp3 3k IOPS and 125 Mbps: $442.32/mor7i
with gp3 10k IOPS and 500 Mbps: $492.32/mor7i
with io2 16k IOPS: $1,513.82/moi7i
with local NVMe (no EBS): $551.15/mo
And keep in mind, the first three only have 700 GB of storage, whereas the i7i
has a 1.8 TB volume! The server with a local NVMe disk is the clear price-performance winner.
Why isn't io_uring the winner?
Given my excitement over the new io_uring
capabilities of Postgres 18, I was expecting it to win in many more scenarios. So what's going on here?
For one, this is a very specific type of workload. It is read only, and does a combination of point-selects, range scans, and range aggregations. io_uring
surely has other workloads where it would shine. It's also possible that with different postgresql.conf
tunings, we'd see improvements from io_uring
.
While writing this, I stumbled across Tomas Vondra's excellent blog discussing the new io_method
options, how to tune them, and the pros and cons of each. He makes several good points regarding why workers
would outperform io_uring
, and I recommend you read it. In short:
- Index scans don't (yet) use AIO.
- Though the I/O happens in the background with
io_uring
, the checksums / memcpy can still be a bottleneck. workers
allows better parallelism for I/O from the perspective of a single process.
So there are legitimate cases where io_uring
won't always perform better! I'd love to see further benchmarks from others, spanning other workload types on configurations. You can find many of the configs used for these tests in the appendix.
Conclusions
Though narrow, this was a fun experiment to compare performance of Postgres versions and I/O settings. My key takeaways are:
- Postgres 18 brings nice I/O improvements and configuration flexibility. Great job to the maintainer team!
- Local disks are the clear winner. When you have low-latency I/O and immense IOPS, the rest matters less. This is why PlanetScale Metal makes for best-in-class database performance.
- Using
io_method=worker
was a good choice as the new default. It comes with a lot of the "asynchronous" benefits ofio_uring
without relying on that specific kernel interface, and can be tuned by settingio_workers=X
. - There's no one-size-fits-all best I/O configuration.
- Though they do benefit, the new
workers
I/O configuration doesn't help network-attached storage scenarios as much as one might hope.
What else do you want to see benchmarked? Reach out to let us know.
Appendix: configuration
Here are a selection of the critical custom-tuned Postgres configs used for this benchmark:
shared_buffers = 16GB # 25% of RAM effective_cache_size = 48GB # 75% of RAM work_mem = 64MB maintenance_work_mem = 2GB wal_level = replica max_wal_size = 16GB min_wal_size = 2GB wal_buffers = 16MB checkpoint_completion_target = 0.9 random_page_cost = 1.1 effective_io_concurrency = 200 default_statistics_target = 100 max_worker_processes = 8 max_parallel_workers_per_gather = 4 max_parallel_workers = 8 max_parallel_maintenance_workers = 4 bgwriter_delay = 200ms bgwriter_lru_maxpages = 100 bgwriter_lru_multiplier = 2.0 autovacuum = on autovacuum_max_workers = 4 autovacuum_naptime = 10s autovacuum_vacuum_scale_factor = 0.05 autovacuum_analyze_scale_factor = 0.025 logging_collector = on ...more log configs... shared_preload_libraries = 'pg_stat_statements' track_activity_query_size = 2048 track_io_timing = on jit = on # io_workers left at default = 3