pgcopydb is a highly capable tool with extensive tunables and flags for handling a wide range of migration scenarios. The helper scripts in this guide use sensible defaults, but you can customize them for your specific environment. See the pgcopydb documentation for in-depth coverage of all available options. We recommend testing the migration on a staging environment or from a backup/snapshot before running it on production, and always ensure you have a backup of your source database before starting.
How pgcopydb works
Traditionalpg_dump | pg_restore pipelines write all data to intermediate files on disk before restoring to the target. This doubles disk I/O, creates storage bottlenecks, and builds indexes one at a time. For large databases, this can take days.
pgcopydb takes a different approach:
- Direct streaming with no intermediate files. Table data flows directly from source to target using PostgreSQL’s native COPY protocol. No dump files are written to disk. Data moves over the network in a single pass.
-
Parallel operations at every stage. pgcopydb runs multiple operations concurrently:
- Multiple tables are copied in parallel
- Large tables are automatically split into partitions and copied by multiple workers simultaneously
- Indexes are built concurrently on the target, not sequentially like
pg_restore - VACUUM, sequence resets, and constraint creation all run in parallel
-
Live migration with Change Data Capture (CDC). With the
--followflag, pgcopydb captures changes from the source via logical decoding while the initial copy is still running. Changes are prefetched to disk during the copy phase and replayed once the copy finishes. The target stays in sync with the source, and cutover requires only a brief pause while the final changes apply.
| Phase | Description |
|---|---|
| Catalog source | Query source metadata, apply filters, estimate table sizes |
| Dump schema | Export schema using pg_dump with a consistent snapshot |
| Restore schema | Create tables and types on the target with pg_restore |
| Copy table data | Stream data in parallel using COPY protocol |
| Restore post-data | Create views, materialized views, and functions |
| Create indexes | Build all indexes concurrently |
| Create constraints | Apply foreign keys and check constraints |
| Vacuum and analyze | Update table statistics and reclaim space |
| Reset sequences | Synchronize sequence values from source |
| Finalize | Complete post-data restoration |
--follow is enabled, CDC runs alongside these phases. Changes are prefetched during the copy and switch to live replay once the copy completes. The migration is ready for cutover when CDC lag drops to near zero.
Overview
The migration follows this high-level flow:Prepare your environmentSet up a migration instance, install pgcopydb and the helper scripts, and configure database credentials.
Prepare your source databaseCreate a migration user, enable logical replication, and configure replication settings.
Prepare your PlanetScale databaseCreate a database, configure disk size and parameters, and compare source parameters.
Run the migrationExecute pgcopydb to copy schema and table data in parallel, then build indexes and constraints.
Monitor progressTrack copy progress, index creation, and CDC replication lag using the helper scripts.
Prepare for cutoverCreate application roles, confirm connectivity, configure observability, and prepare connection strings.
CutoverStop writes to the source, wait for CDC to catch up, and switch your application to PlanetScale.
If you do not need near-zero downtime and can tolerate a maintenance window, you can use pgcopydb without CDC (omit the
--follow flag). This is simpler but requires stopping writes to the source for the duration of the copy.1. Prepare your environment
The migration runs from a dedicated migration instance, a compute instance (EC2, GCE, or similar) with network access to both your source database and PlanetScale. This instance runs pgcopydb and the helper scripts. It does not store any persistent data beyond the migration working directory. PlanetScale provides infrastructure-as-code templates that provision a fully configured migration instance with pgcopydb v0.18.0, PostgreSQL 17 client tools, and the helper scripts pre-installed. Choose the template that matches your cloud provider and provisioning tool:- AWS CloudFormation
- AWS Terraform
- GCP Terraform
Upload the CloudFormation template to CloudFormation and fill in VPC ID, Subnet ID, instance type, and volume size. Or deploy via the CLI:Key parameters:You can also use EC2 Instance Connect. Open the instance in the EC2 console and click Connect for browser-based SSH.
VpcId/SubnetId— The VPC and public subnet where the instance will launch. The subnet must have an internet gateway route for PlanetScale connectivity.InstanceType—c7i.xlargefor databases under 100 GB,c7i.2xlargefor 100–500 GB,c7i.4xlargefor 500 GB+.VolumeSize— EBS io2 Block Express volume in GB. Choose based on your source database size (500, 1000, 3000, 6000, or 12000).KeyPairName— Optional. Leave empty to use EC2 Instance Connect for SSH access.
If you prefer to set up your own instance manually, see the pgcopydb installation documentation and the helper scripts README for manual installation steps.
ubuntu user on an instance created by the templates, with helper scripts installed in /home/ubuntu/.
Configure connection credentials
The templates create a~/.env file with placeholder connection strings. Update this file with your actual source and target credentials:
2. Prepare your source database
Create a migration user
The migration user needs read access to all schemas and tables being migrated. For CDC migrations using--follow, it also needs the REPLICATION attribute. If you are using a dedicated migration user rather than the database owner, create one and grant the required permissions:
GRANT USAGE, GRANT SELECT, and ALTER DEFAULT PRIVILEGES statements for each schema being migrated.
Once the user is created, update the PGCOPYDB_SOURCE_PGURI in your ~/.env file on the migration instance with the new credentials.
Enable logical replication (CDC only)
For CDC migrations using--follow, logical replication must be enabled on the source. How to enable it depends on your platform:
- Amazon RDS / Aurora: Set
rds.logical_replication = 1in the parameter group and reboot the instance. - Google Cloud SQL: Set the
cloudsql.logical_decodingdatabase flag toonand restart the instance. - Google AlloyDB: Set
alloydb.logical_decoding = onand restart the instance. - Supabase: Logical replication is enabled by default. Make sure you use a direct connection string (not pooled) from your Supabase dashboard under Project Settings > Database.
- Self-hosted PostgreSQL: Set
wal_level = logicalinpostgresql.confand restart PostgreSQL.
wal_sender_timeout to 0 on the source to prevent the replication slot from being dropped during long COPY phases. The initial data copy can take hours on large databases, and the default timeout may cause PostgreSQL to drop the idle replication connection before CDC streaming begins.
Fix replica identity (CDC migrations only)
For CDC migrations, tables without a primary key needREPLICA IDENTITY FULL set so that UPDATE and DELETE operations can be replicated. The fix-replica-identity script identifies affected tables and applies the change:
ALTER TABLE statements, and prompts for confirmation before applying.
This script requires table ownership on the source, not just SELECT. If you are using a dedicated migration user, grant it membership in the role(s) that own the affected tables:
3. Prepare your PlanetScale database
Create a new database in the PlanetScale dashboard or using the PlanetScale CLI. When configuring the database:- Select the correct cloud region. You typically want to use the same region as your source database and application infrastructure.
- Choose Postgres as the database type.
- Choose the best storage option for your needs. For high-performance, low-latency I/O, use PlanetScale Metal. For more flexible storage options, choose “Elastic Block Storage” (AWS) or “Persistent Disk” (GCP).
Configure disk size
Set your disk size to at least 150% of your source database size. For example, if your source is 330 GB, provision at least 500 GB. The overhead accounts for data growth during import and table/index bloat that occurs during the copy process.- Network-attached storage (EBS, Persistent Disk): Configure the disk in advance by navigating to Clusters > Storage and setting the Minimum disk size. AWS and GCP limit how frequently disks can be resized, so the disk may not be able to resize fast enough during a large import.
- Metal: Choose the disk size when creating the cluster. You can resize after import and cleanup.
Enable extensions
Review the extensions installed on your source database and compare them against the PlanetScale supported extensions list. Any extensions your application depends on should be enabled on PlanetScale before running the migration. Some extensions require enabling through the PlanetScale dashboard and may need a cluster restart. See the extensions documentation for details.Use the Default role
Use the Default role on your PlanetScale database for the migration. This role has the necessary permissions to create schemas, tables, indexes, and manage replication. You can find the Default role credentials by navigating to your database in the PlanetScale dashboard and clicking Connect. Copy the connection credentials into thePGCOPYDB_TARGET_PGURI in your ~/.env file.
Compare and configure parameters
Use the compare-pg-params script to compare PostgreSQL parameters between your source and PlanetScale target to identify any differences that could affect application behavior:max_worker_processes above its default value. This allows pgcopydb to run more parallel workers during the copy phase. You can decrease this value after the migration is complete.
Run the preflight check
The preflight check script validates all migration prerequisites before you start:- Connectivity
wal_level = logical- Replication permissions (
REPLICATION,SUPERUSER, orrds_replication) - Available replication slots
- Available WAL senders
- No leftover pgcopydb replication slots from previous attempts
wal_sender_timeoutconfiguration- No pending prepared transactions
- Connectivity
- Replication permissions
- No leftover pgcopydb schema from previous attempts
~/filters.iniexistspgcopydbbinary is onPATH
4. Configure filtering
The~/filters.ini file controls which schemas, tables, extensions, and event triggers are excluded from the migration. Edit this file to match your source environment:
Any extensions not supported by PlanetScale should be added to the [exclude-extension] section. If unsupported extensions are not excluded, the migration may fail when trying to create those objects on the target. Check the PlanetScale extensions documentation to verify which extensions are supported.
The following are starting-point filters for common source providers. Copy the one that matches your source and adjust as needed:
- AWS RDS/Aurora
- Supabase
- Google AlloyDB
5. Run the migration
Start the migration using the start-migration-screen script which runs in a detached screen session so it survives SSH disconnects:Ctrl-A D.
What happens during migration
The run-migration script executes pgcopydb with the following settings:- 16 parallel table COPY workers (
TABLE_JOBS=16) - 12 parallel index creation workers (
INDEX_JOBS=12) - Tables larger than 50 GB are automatically split into parts for parallel copying
- The
wal2jsonlogical decoding plugin is used for CDC
~/migration_YYYYMMDD-HHMMSS/migration.log.
Tuning parallelism
You can tune the parallel job counts by editing the variables at the top of~/run-migration.sh before starting:
TABLE_JOBS below the lesser of your source and target vCPU counts, and keep INDEX_JOBS below the target vCPU count.
6. Monitor progress
During the copy phase
Use the check-migration-status script to see a real-time dashboard of migration progress:- Phase status (1-10: pending, running, or done)
- Copy progress, including tables in progress, completed, and percentage
- Index and constraint creation progress
- Data transferred in GB
- Error counts with context
- Elapsed runtime
- Active database operations on the target (COPY, CREATE INDEX, VACUUM, ALTER TABLE with durations)
During CDC replication
Once the initial copy completes and pgcopydb enters the CDC phase, use the check-cdc-status script to monitor replication lag:- Apply LSN and Streaming LSN
- Apply backlog gap in MB/GB
- Apply rate (GB/hr)
- Estimated time to catch up
- Source replication slot health
- Total end-to-end lag
7. Prepare for cutover
While CDC is running and caught up, prepare your PlanetScale database for production traffic before performing the cutover. Create application roles. pgcopydb does not migrate roles from the source database. The Default role used for migration has elevated privileges and should not be used by your application. Create dedicated roles with appropriate permissions for your application’s needs using PlanetScale’s built-in role management. See Managing roles for details. Confirm application connectivity. Verify that your application can connect to the PlanetScale database using the new role credentials. Test read queries to confirm connectivity, but do not write to the target while CDC is still replicating. Configure observability. Set up monitoring for your PlanetScale database so you have visibility from the moment you cut over. See Prometheus integration and Metrics for available options. Prepare connection strings. Have your new PlanetScale connection strings ready to deploy in your application configuration. If your application uses environment variables or a secrets manager, stage the new values so the switch can happen quickly. Configure connection pooling if needed. If your application requires dedicated connection pooling, set up PgBouncer ahead of the cutover.8. Cutover
The cutover process switches your application from the source database to PlanetScale. This is the only point where downtime occurs.Stop writes to the source databasePut your application into maintenance mode, switch to read-only, or drain connections. The goal is to ensure no new writes reach the source.
Get the current WAL positionQuery the source database for the current WAL LSN:This returns a value like
41EBA/7C7A1AD8.Set the CDC endpointTell pgcopydb to stop replicating after it reaches the LSN from the previous step:The stop-cdc script updates pgcopydb’s internal state so it stops streaming once it has applied all changes up to this position.
Wait for CDC to reach the endpointMonitor with
~/check-cdc-status.sh until the apply LSN matches the endpoint. pgcopydb will exit cleanly once it has applied all changes.Verify data on the targetRun the verify-migration script to compare source and target databases:This script compares schemas, estimated row counts, table sizes, indexes, constraints, and sequences between source and target. It also performs spot-checks on your largest tables without full table scans. It typically completes in under 2 minutes, even on multi-terabyte databases. The output is color-coded: PASS, WARN, or FAIL.
9. Clean up
After a successful cutover, clean up replication artifacts on the source database:- The logical replication slot on the source (terminates the active consumer if needed)
- The replication origin on the target
- The pgcopydb sentinel schema on the target
- Remove the migration user on the source database. If you created a dedicated user for pgcopydb to connect to your source, drop it now that the migration is complete:
- Decrease
max_worker_processeson the PlanetScale database back to its default value. - Run
VACUUM ANALYZEon the target to reclaim any bloat from the import. You can also use pg_squeeze for online table compaction.
Resuming after interruption
If the migration is interrupted by a crash, reboot, or connection failure, use the resume-migration script to continue from where it left off:--not-consistent to allow resuming from a mid-transaction state. It passes --split-tables-larger-than to match run-migration.sh, since pgcopydb requires catalog consistency between the original run and the resume.
Starting over
If you need to restart the migration from scratch, first clean the target database and replication state:pg_catalog, information_schema, and pscale_extensions. The script previews what will be dropped and prompts for confirmation before proceeding.
Troubleshooting
Checking logs
All migration output is captured in~/migration_YYYYMMDD-HHMMSS/migration.log. Common diagnostic commands:
Common issues
Preflight check showswal_level is not logical:
Changing wal_level requires a database restart. On managed services like RDS, update the parameter group and reboot the instance. On self-managed PostgreSQL, update postgresql.conf and restart.
Extension filtering not working:
If extension-owned objects are still being copied despite [exclude-extension] entries in filters.ini, verify that the extension names match exactly and that the extensions are actually installed on the source database.
Migration fails with restore errors:
pgcopydb tolerates up to 10 pg_restore errors by default. Check the migration log for specific errors. Common causes include unsupported extensions or objects that depend on excluded schemas. Adjust your filters.ini and restart.
Replication slot already exists:
If a previous migration attempt left a replication slot, run ~/drop-replication-slots.sh to clean it up before starting a new migration.
Tables missing data after resume:
If the resume did not recover correctly, run ~/target-clean.sh and ~/drop-replication-slots.sh, then start the migration over with ~/start-migration-screen.sh.
Next steps
After your migration is complete:- Review Query Insights to monitor query performance on PlanetScale.
- Set up Metrics and alerting for your new database.
- Review Schema recommendations for optimization suggestions.

