Cluster Metrics

July 1, 2025

The Metrics dashboard provides comprehensive monitoring and observability for your PostgreSQL database cluster. This centralized view helps you track performance, identify bottlenecks, and ensure optimal database health.

Metrics Dashboard

Dashboard overview

The Metrics dashboard displays real-time and historical data about your database cluster's performance across multiple dimensions. You can filter metrics by:

Server Filter: Monitor all servers or focus on specific instances
Branch: Select which database branch to monitor
Time Range: View data from the past 15 minutes up to custom time ranges
Live update: Toggle on/off the auto-refresh of data every ~30 seconds

Key metrics categories

Primary cluster utilization

The primary cluster utilization panel shows your primary database server's resource consumption:

Metric	Unit	Purpose	Key Insights
CPU	Percent	Real-time CPU utilization	Monitor for consistent performance and identify when optimization may be needed
Memory	Percent	Current memory consumption	Track memory usage patterns and plan for scaling when approaching limits

Replica monitoring

Each replica displays individual performance metrics in dedicated panels:

Metric	Unit	Purpose	Key Insights
CPU	Percent	Individual CPU tracking per replica	Compare replica performance against primary and identify load distribution
Memory	Percent	Individual memory tracking per replica	Monitor replica resource consumption and ensure balanced utilization

Primary IOPS

Metric	Unit	Purpose	Key Insights
IOPS	Operations/second	Tracks database read/write operations per second	Monitor I/O patterns and identify peak usage periods for performance optimization

Primary storage usage

Metric	Unit	Purpose	Key Insights
Storage Usage	MB/GB	Current storage consumption	Track storage growth trends for capacity planning and ensure adequate free space

PSBouncer connections

Metric	Unit	Purpose	Key Insights
Total Connections	Count	Active database connections	Monitor connection patterns and trends for capacity planning cluster size

PSBouncer peer utilization

Metric	Unit	Purpose	Key Insights
CPU	Percent	PSBouncer process CPU usage	Monitor connection pooler performance and resource consumption
Memory	Percent	PSBouncer process memory usage	Track memory usage of the connection pooling layer

PSBouncer server pools

Metric	Unit	Purpose	Key Insights
Active	Count	Active server connections	Monitor backend database connections from the pool
Active Cancel	Count	Connections being cancelled	Track connection cleanup and cancellation events
Being Cancelled	Count	Connections in cancellation process	Monitor connection state transitions
Idle	Count	Idle server connections	Track connection pool efficiency and unused connections
Login	Count	Connections in login state	Monitor authentication and connection establishment
Testing	Count	Connections being tested	Track connection health check activities
Tested	Count	Recently tested connections	Monitor connection validation processes
Used	Count	Total used connections	Overall connection utilization from the pool

PSBouncer client pools

Metric	Unit	Purpose	Key Insights
Active	Count	Active client connections	Monitor incoming client connection load
Active Cancel	Count	Client connections being cancelled	Track client-side connection cleanup
Waiting	Count	Client connections waiting for server	Identify connection queue buildup and potential bottlenecks

WAL archival rate

Metric	Unit	Purpose	Key Insights
Success	Count	Successfully archived WAL files	Monitor backup and replication health
Failed	Count	Failed WAL archival attempts	Track archival failures that could impact recovery capabilities

WAL archive age

Metric	Unit	Purpose	Key Insights
Seconds	Time	Age of oldest unarchived WAL	Monitor WAL archival latency and ensure timely backup operations

WAL storage

Metric	Unit	Purpose	Key Insights
Storage Usage	MB	Write-ahead log storage consumption	Track WAL disk usage for capacity planning and cleanup monitoring

Replication lag

Metric	Unit	Purpose	Key Insights
Lag	Seconds	Time delay between primary and replica	Monitor replication health and ensure acceptable lag for read replica consistency

Interpreting metrics

Normal operating ranges

CPU: 0-30% for typical workloads
Memory: 20-80% depending on dataset size
IOPS: Varies by workload type (OLTP vs. analytics)
Disk Usage: Keep below 80% for optimal performance

Performance indicators

Consistent Low CPU/Memory: Indicates healthy, optimized queries
Spiky IOPS: May indicate batch processing or analytical workloads
Low Connection Pool Utilization: Suggests efficient connection management

Troubleshooting with metrics

High CPU: Check for inefficient queries or missing indexes
High Memory: Monitor for high memory usage from large queries or buffer cache pressure
High IOPS: Analyze query patterns and consider query optimization
High Disk Usage: Plan for storage scaling or data archiving

WAL monitoring best practices

Archive age: Should typically be under 60 seconds for healthy systems
Archival success rate: Aim for 100% success rate with zero failures
WAL storage: Monitor for steady-state usage with periodic cleanup cycles
Replication lag: High lag may indicate WAL transmission issues

Best practices

Baseline Establishment: Understand your normal operating ranges
Alert Thresholds: Set up monitoring alerts for critical thresholds
Trend Analysis: Use historical data to predict scaling needs
Performance Correlation: Cross-reference metrics with application performance

The Metrics dashboard serves as your primary tool for maintaining optimal database performance and ensuring reliable service delivery.

Need help?

Get help from the PlanetScale Support team, or join our GitHub discussion board to see how others are using PlanetScale.