Navigation

Cluster Metrics

The Metrics dashboard provides comprehensive monitoring and observability for your PostgreSQL database cluster. This centralized view helps you track performance, identify bottlenecks, and ensure optimal database health.

Metrics Dashboard

Dashboard overview

The Metrics dashboard displays real-time and historical data about your database cluster's performance across multiple dimensions. You can filter metrics by:

  • Server Filter: Monitor all servers or focus on specific instances
  • Branch: Select which database branch to monitor
  • Time Range: View data from the past 15 minutes up to custom time ranges
  • Live update: Toggle on/off the auto-refresh of data every ~30 seconds

Key metrics categories

Primary cluster utilization

The primary cluster utilization panel shows your primary database server's resource consumption:

MetricUnitPurposeKey Insights
CPUPercentReal-time CPU utilizationMonitor for consistent performance and identify when optimization may be needed
MemoryPercentCurrent memory consumptionTrack memory usage patterns and plan for scaling when approaching limits

Replica monitoring

Each replica displays individual performance metrics in dedicated panels:

MetricUnitPurposeKey Insights
CPUPercentIndividual CPU tracking per replicaCompare replica performance against primary and identify load distribution
MemoryPercentIndividual memory tracking per replicaMonitor replica resource consumption and ensure balanced utilization

Primary IOPS

MetricUnitPurposeKey Insights
IOPSOperations/secondTracks database read/write operations per secondMonitor I/O patterns and identify peak usage periods for performance optimization

Primary storage usage

MetricUnitPurposeKey Insights
Storage UsageMB/GBCurrent storage consumptionTrack storage growth trends for capacity planning and ensure adequate free space

PSBouncer connections

MetricUnitPurposeKey Insights
Total ConnectionsCountActive database connectionsMonitor connection patterns and trends for capacity planning cluster size

PSBouncer peer utilization

MetricUnitPurposeKey Insights
CPUPercentPSBouncer process CPU usageMonitor connection pooler performance and resource consumption
MemoryPercentPSBouncer process memory usageTrack memory usage of the connection pooling layer

PSBouncer server pools

MetricUnitPurposeKey Insights
ActiveCountActive server connectionsMonitor backend database connections from the pool
Active CancelCountConnections being cancelledTrack connection cleanup and cancellation events
Being CancelledCountConnections in cancellation processMonitor connection state transitions
IdleCountIdle server connectionsTrack connection pool efficiency and unused connections
LoginCountConnections in login stateMonitor authentication and connection establishment
TestingCountConnections being testedTrack connection health check activities
TestedCountRecently tested connectionsMonitor connection validation processes
UsedCountTotal used connectionsOverall connection utilization from the pool

PSBouncer client pools

MetricUnitPurposeKey Insights
ActiveCountActive client connectionsMonitor incoming client connection load
Active CancelCountClient connections being cancelledTrack client-side connection cleanup
WaitingCountClient connections waiting for serverIdentify connection queue buildup and potential bottlenecks

WAL archival rate

MetricUnitPurposeKey Insights
SuccessCountSuccessfully archived WAL filesMonitor backup and replication health
FailedCountFailed WAL archival attemptsTrack archival failures that could impact recovery capabilities

WAL archive age

MetricUnitPurposeKey Insights
SecondsTimeAge of oldest unarchived WALMonitor WAL archival latency and ensure timely backup operations

WAL storage

MetricUnitPurposeKey Insights
Storage UsageMBWrite-ahead log storage consumptionTrack WAL disk usage for capacity planning and cleanup monitoring

Replication lag

MetricUnitPurposeKey Insights
LagSecondsTime delay between primary and replicaMonitor replication health and ensure acceptable lag for read replica consistency

Interpreting metrics

Normal operating ranges

  • CPU: 0-30% for typical workloads
  • Memory: 20-80% depending on dataset size
  • IOPS: Varies by workload type (OLTP vs. analytics)
  • Disk Usage: Keep below 80% for optimal performance

Performance indicators

  • Consistent Low CPU/Memory: Indicates healthy, optimized queries
  • Spiky IOPS: May indicate batch processing or analytical workloads
  • Low Connection Pool Utilization: Suggests efficient connection management

Troubleshooting with metrics

  • High CPU: Check for inefficient queries or missing indexes
  • High Memory: Monitor for high memory usage from large queries or buffer cache pressure
  • High IOPS: Analyze query patterns and consider query optimization
  • High Disk Usage: Plan for storage scaling or data archiving

WAL monitoring best practices

  • Archive age: Should typically be under 60 seconds for healthy systems
  • Archival success rate: Aim for 100% success rate with zero failures
  • WAL storage: Monitor for steady-state usage with periodic cleanup cycles
  • Replication lag: High lag may indicate WAL transmission issues

Best practices

  1. Baseline Establishment: Understand your normal operating ranges
  2. Alert Thresholds: Set up monitoring alerts for critical thresholds
  3. Trend Analysis: Use historical data to predict scaling needs
  4. Performance Correlation: Cross-reference metrics with application performance

The Metrics dashboard serves as your primary tool for maintaining optimal database performance and ensuring reliable service delivery.

Need help?

Get help from the PlanetScale Support team, or join our GitHub discussion board to see how others are using PlanetScale.