Summary
- Use dedicated PgBouncers (not local) for production OLTP
- Connect via port 6432 with
username|bouncer-nameformat - Keep transactions under 3 seconds for clean failovers
- Add retry logic for read queries to handle transient failures
Overview
Production teams running OLTP workloads need connection strategies that maximize availability during failovers and planned maintenance. This guide covers how to configure PlanetScale Postgres for resilience and what to expect during failovers. For background on PlanetScale Postgres cluster architecture, see operations philosophy.Connection interruption
Postgres design dictates connection interruptions for several reasons:- Parameter changes — Changing certain parameters requires connection interruption.
- Cluster resizing — If you’re upgrading which cluster size you’re running on, or the amount of CPU/memory, connections will be interrupted as we move the cluster to new hardware.
- Enabling extensions — The design of the extension ecosystem means most new extensions need connection interruption.
- PlanetScale maintenance — Very occasionally, PlanetScale has to perform maintenance on your database (e.g. for operating system updates and other infrastructure changes). We notify your organization’s admin contacts at least 24 hours before scheduled maintenance (typically weeks beforehand).
Use Dedicated PgBouncer
PlanetScale Postgres offers one connection path that is significantly more resilient in the face of maintenance and other failovers, which is a dedicated PgBouncer. We strongly recommend production customers use a dedicated PgBouncer. A dedicated PgBouncer allows us to turn all planned failovers into a brief period of elevated latency with no connection drops (usingPAUSE).
It’s worth noting that PgBouncer has several important compatibility problems that can cause unexpected application issues. See transaction pooling limitations.
See the PgBouncer documentation for configuration options and the connections overview for a comparison of all connection methods.
If you have replica traffic that is production critical, we also recommend using dedicated PgBouncers for your replicas.
Without a dedicated PgBouncer, every connection interruption means every client connection gets terminated. Your application must detect the closed connection, reconnect, and retry the failed query. During a failover, this can cause a burst of errors and retries across your entire application fleet.
If you cannot use a dedicated PgBouncer due to compatibility issues, you will see errors during failovers.
We recommend tuning timeouts (see below), but also to expect brief error windows during these events.
Dedicated bouncer maintenance
Configuration changes to dedicated bouncers are very safe. When you modify a dedicated bouncer’s settings, we create a new bouncer with the updated configuration and drain connections from the old one over 24 hours. New connections go to the new bouncer immediately; existing connections continue on the old bouncer until they disconnect naturally or the drain period ends. This means:- Bouncer configuration changes cause no connection drops for well-behaved clients
- Clients that hold connections indefinitely (beyond 24 hours) will eventually be disconnected
- Long-lived connections will continue using old settings until they reconnect
Failover behavior
What happens during a failover
PlanetScale failovers typically complete in seconds. See operations philosophy for details on our failover approach. During a planned failover (e.g. maintenance, parameter changes, etc.), we can usePAUSE with a dedicated PgBouncer to replace connection interruptions with a latency increase on starting new transactions.
How PAUSE keeps connections alive
Dedicated PgBouncers use PgBouncer’sPAUSE command during failovers to maintain client connections:
- PAUSE stops assigning new transactions to backend Postgres connections
- New queries queue in PgBouncer rather than erroring
- Once the new primary/replica is ready, RESUME allows queued queries to execute
Long-running queries and failovers
Why long queries delay failovers
Fast failovers depend on clean primary shutdowns. When you change a configuration or PlanetScale needs to failover, PlanetScale asks Postgres to shut down gracefully. If queries are still running, Postgres waits for them to complete. Long-running queries can extend unavailability from a few seconds to 30 seconds (at which point we hard cancel queries).Keeping transactions short
For OLTP workloads, keep transactions as short as possible (at least under 3 seconds, but ideally a few milliseconds max per transaction for performance-critical workloads). Move network calls, external API requests, and heavy computation outside the transaction boundary. Do that work beforeBEGIN or after COMMIT.
Use statement_timeout to bound how long individual queries can run. Route analytical queries and batch jobs to replicas where they won’t block failovers on the primary.
Monitoring query latency
Use Query Insights to identify slow queries that could block failovers.Timeouts
Postgres timeouts
Use Postgres timeouts to prevent runaway queries from blocking failovers:statement_timeout— Aborts any statement that takes longer than the specified duration. Set this to bound individual query execution time.transaction_timeout— Aborts any transaction that exceeds the specified duration. This catches cases where an application opens a transaction but takes too long between statements (e.g. due to network calls or slow application logic within the transaction).
Application timeouts
Configure timeouts in your connection pool and database driver:- Connection timeout — How long to wait when establishing a new connection. Set this low enough to fail fast during outages rather than queueing requests indefinitely.
- Query timeout — Application query timeout should be set slightly higher than your Postgres
statement_timeoutto allow Postgres to cancel the query cleanly. - Pool checkout timeout — How long to wait for a connection from the pool. During failovers, this determines how long requests queue before failing.
test on borrow) to detect stale connections before using them.

